Added
-
Added support for Sarvam Speech-to-Text service (
SarvamSTTService) with streaming WebSocket support forsaarika(STT) andsaaras(STT-translate) models. -
Added support for passing in a
ToolsSchemain lieu of a list of provider- specific dicts when initializingOpenAIRealtimeLLMServiceor when updating it usingLLMUpdateSettingsFrame. -
Added
TransportParams.audio_out_silence_secs, which specifies how many seconds of silence to output when anEndFramereaches the output transport. This can help ensure that all audio data is fully delivered to clients. -
Added new
FrameProcessor.broadcast_frame()method. This will push two instances of a given frame class, one upstream and the other downstream.await self.broadcast_frame(UserSpeakingFrame)
-
Added
MetricsLogObserverfor logging performance metrics fromMetricsFrameinstances. Supports filtering viainclude_metricsparameter to control which metrics types are logged (TTFB, processing time, LLM token usage, TTS usage, smart turn metrics). -
Added
pronunciation_dictionary_locatorstoElevenLabsTTSServiceandElevenLabsHttpTTSService. -
Added support for loading external observers. You can now register custom pipeline observers by setting the
PIPECAT_OBSERVER_FILESenvironment variable. This variable should contain a colon-separated list of Python files (e.g.export PIPECAT_OBSERVER_FILES="observer1.py:observer2.py:..."). Each file must define a function with the following signature:async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]: ...
-
Added support for new sonic-3 languages in
CartesiaTTSServiceandCartesiaHttpTTSService. -
EndFrameandEndTaskFramehave an optionalreasonfield to indicate why the pipeline is being ended. -
CancelFrameandCancelTaskFramehave an optionalreasonfield to indicate why the pipeline is being canceled. This can be also specified when you cancel a task withPipelineTask.cancel(reason="cancellation reason"). -
Added
include_prob_metricsparameter to Whisper STT services to enable access to probability metrics from transcription results. -
Added utility functions
extract_whisper_probability(),extract_openai_gpt4o_probability(), andextract_deepgram_probability()to extract probability metrics fromTranscriptionFrameobjects for Whisper-based, OpenAI GPT-4o-transcribe, and Deepgram STT services respectively. -
Added
LLMSwitcher.register_direct_function(). It works much likeLLMSwitcher.register_function()in that it's a shorthand for registering functions on all LLMs in the switcher, but for direct functions. -
Added
LLMSwitcher.register_direct_function(). It works much likeLLMSwitcher.register_function()in that it's a shorthand for registering a function on all LLMs in the switcher, except this new method takes a direct function (aFunctionSchema-less function). -
Added
MCPClient.get_tools_schema()andMCPClient.register_tools_schema()as a two-step alternative toMCPClient.register_tools(), to allow users to pass MCP tools to, say,GeminiLiveLLMService(as well as other speech-to-speech services) in the constructor. -
Added support for passing in an
LLMSwichertoMCPClient.register_tools()(as well as the newMCPClient.register_tools_schema()). -
Added
cpu_countparameter toLocalSmartTurnAnalyzerV3. This is set to1by default for more predictable performance on low-CPU systems.
Changed
-
Improved
concatenate_aggregated_text()to one word outputs from OpenAI Realtime and Gemini Live. Text fragments are now correctly concatenated without spaces when these patterns are detected. -
STTMuteFilterno longer sendsSTTMuteFrameto the STT service. The filter now blocks frames locally without instructing the STT service to stop processing audio. This prevents inactivity-related errors (such as 409 errors from Google STT) while maintaining the same muting behavior at the application level. Important: The STTMuteFilter should be placed after the STT service itself. -
Improved
GoogleSTTServiceerror handling to properly catch gRPCAbortedexceptions (corresponding to 409 errors) caused by stream inactivity. These exceptions are now logged at DEBUG level instead of ERROR level, since they indicate expected behavior when no audio is sent for 10+ seconds (e.g., during long silences or when audio input is blocked). The service automatically reconnects when this occurs. -
Bumped the
fastapidependency's upperbound to<0.122.0. -
Updated the default model for
GoogleVertexLLMServicetogemini-2.5-flash. -
Updated the
GoogleVertexLLMServiceto use theGoogleLLMServiceas a base
class instead of theOpenAILLMService. -
Updated STT and TTS services to pass through unverified language codes with a warning instead of returning None. This allows developers to use newly supported languages before Pipecat's service classes are updated, while still providing guidance on verified languages.
Removed
- Removed
needs_mcp_alternate_schema()fromLLMService. The mechanism that relied on it went away.
Fixed
-
Restore backwards compatibility for vision/image features (broken in 0.0.92) when using non-universal context and assistant aggregators.
-
Fixed
DeepgramSTTService._disconnect()to properly awaitis_connected()method call, which is an async coroutine in the Deepgram SDK. -
Fixed an issue where the
SmallWebRTCRequestdataclass in runner would scrub arbitrary request data from client due to camelCase typing. This fixes data passthrough for JS clients whereAPIRequestis used. -
Fixed a bug in
GeminiLiveLLMServicewhere in some circumstances it wouldn't respond after a tool call. -
Fixed
GeminiLiveLLMServicesession resumption after a connection timeout. -
GeminiLiveLLMServicenow properly supports context-provided system instruction and tools. -
Fixed
GoogleLLMServicetoken counting to avoid double-counting tokens when Gemini sends usage metadata across multiple streaming chunks.