Added
-
Added
GeminiTTSService
which uses Google Gemini to generate TTS output. The Gemini model can be prompted to insert styled speech to control the TTS output. -
Added Exotel support to Pipecat's development runner. You can now connect using the runner with
uv run bot.py -t exotel
and an ngrok connection to HTTP port 7860. -
Added
enable_direct_mode
argument toFrameProcessor
. The direct mode is for processors which require very little I/O or compute resources, that is processors that can perform their task almost immediately. These type of processors don't need any of the internal tasks and queues usually created by frame processors which means overall application performance might be slightly increased. Use with care. -
Added TTFB metrics for
HeyGenVideoService
andTavusVideoService
. -
Added
endpoint_id
parameter toAzureSTTService
. (Custom EndpointId)
Changed
-
WatchdogPriorityQueue
now requires the items to be inserted to always be tuples and the size of the tuple needs to be specified in the constructor when creating the queue with thetuple_size
argument. -
Updated Moondream to revision
2025-01-09
. -
Updated
PlayHTHttpTTSService
to no longer use thepyht
client to remove compatibility issues with other packages. Now you can use the PlayHT HTTP service with other services, like GoogleLLMService. -
Updated
pyproject.toml
to once again pinnumba
to>=0.61.2
in order to resolve package versioning issues. -
Updated the
STTMuteFilter
to includeVADUserStartedSpeakingFrame
andVADUserStoppedSpeakingFrame
in the list of frames to filter when the filtering is on.
Performance
-
Improving the latency of the
HeyGenVideoService
. -
Improved some frame processors performance by using the new frame processor direct mode. In direct mode a frame processor will process frames right away avoiding the need for internal queues and tasks. This is useful for some simple processors. For example, in processors that wrap other processors (e.g.
Pipeline
,ParallelPipeline
), we add one processor before and one after the wrapped processors (internally, you will see them as sources and sinks). These sources and sinks don't do any special processing and they basically forward frames. So, for these simple processors we now enable the new direct mode which avoids creating any internal tasks (and queues) and therefore improves performance.
Fixed
-
Fixed an issue with the
BaseWhisperSTTService
where the language was specified as an enum and not a string. -
Fixed an issue where
SmallWebRTCTransport
ended before TTS finished. -
Fixed an issue in
OpenAIRealtimeBetaLLMService
where specifying atext
modalities
didn't result in text being outputted from the model. -
Added SSML reserved character escaping to
AzureBaseTTSService
to properly handle special characters in text sent to Azure TTS. This fixes an issue where characters like&
,<
,>
,"
, and'
in LLM-generated text would cause TTS failures. -
Fixed a
WatchdogPriorityQueue
issue that could cause an exception when compating watchdog cancel sentinel items with other items in the queue. -
Fixed an issue that would cause system frames to not be processed with higher priority than other frames. This could cause slower interruption times.
-
Fixed an issue where retrying a websocket connection error would result in an error.
Other
-
Add foundation example
19b-openai-realtime-beta-text.py
, showing how to useOpenAIRealtimeBetaLLMService
to output text to a TTS service. -
Add vision support to release evals so we can run the foundational examples 12 series.
-
Added foundational example
15a-switch-languages.py
to release evals. It is able to detect if we switched the language properly. -
Updated foundational examples to show how to enclose complex logic (e.g.
ParallelPipeline
) into a single processor so the main pipeline becomes simpler. -
Added
07n-interruptible-gemini.py
, demonstrating how to useGeminiTTSService
.