Added
-
Added
SarvamLLMServicewith support forsarvam-30b,sarvam-30b-16k,sarvam-105bandsarvam-105b-32k.
(PR #3978) -
Added
on_turn_context_created(context_id)hook toTTSService. Override this to perform provider-specific setup (e.g. eagerly opening a server-side context) before text starts flowing. Called each time a new turn context ID is created.
(PR #4013) -
Added
XAIHttpTTSServicefor text-to-speech using xAI's HTTP TTS API.
(PR #4031) -
Added support for "developer" role messages in conversation context across all LLM adapters. For non-OpenAI services (Anthropic, Google, AWS Bedrock), "developer" messages are converted to "user" messages (use
system_instructionto set the system instruction). For OpenAI services, "developer" messages pass through in conversation history. For the Responses API, they are kept as "developer" role (matching the existing "system" → "developer" conversion).
(PR #4089) -
Added
SmallestTTSService, a WebSocket-based TTS service integration with Smallest AI's Waves API. Supports the Lightning v2 and v3.1 models with configurable voice, language, speed, consistency, similarity, and enhancement settings.
(PR #4092) -
Added warnings in turn stop strategies when
VADParams.stop_secsdiffers from the recommended default (0.2s) or whenstop_secs >= STT p99 latency, which collapses the STT wait timeout to 0s and may cause delayed turn detection. The warnings guide developers to re-run the stt-benchmark with their VAD settings.
(PR #4115) -
Added
domainparameter toAssemblyAISTTSettingsfor specialized recognition modes such as Medical Mode (domain="medical-v1").
(PR #4117) -
Added
NovitaLLMServicefor using Novita AI's LLM models via their OpenAI-compatible API.
(PR #4119) -
Added
cleanup()method toVADAnalyzerandVADControllerso VAD analyzer resources are properly released when no longer needed. CustomVADAnalyzersubclasses can overridecleanup()to free any held resources.
(PR #4120) -
Added
on_end_of_turnevent handler toAssemblyAISTTService. This fires after the final transcript is pushed, providing a reliable hook for end-of-turn logic that doesn't race withTranscriptionFrame. Works in both Pipecat and AssemblyAI turn detection modes.
(PR #4128) -
Added
DeepgramFluxSageMakerSTTServicefor running Deepgram Flux speech-to-text on AWS SageMaker endpoints. Use withExternalUserTurnStrategiesto take advantage of Flux's turn detection.
(PR #4143) -
Added
Mem0MemoryService.get_memories()convenience method for retrieving all stored memories outside the pipeline (e.g. to build a personalized greeting at connection time). This avoids the need to manually handle client type branching, filter construction, and async wrapping.
(PR #4156)
Changed
-
Added context prewarming path for
InworldTTSServiceto improve first audio latency.
(PR #4013) -
Added
KrispVivaVadAnalyzerfor Voice Activity Detection using the Krisp VIVA SDK (requireskrisp_audio).
(PR #4022) -
Modified
InworldTTSServiceto close context at end of turn instead of relying on idle timeout. (PR #4028) -
Added Gemini 3 support to the Gemini Live service.
(PR #4078) -
TTSService: the defaultstop_frame_timeout_s(idle time before an automaticTTSStoppedFrameis pushed whenpush_stop_frames=True) has changed from2.0to3.0seconds.
(PR #4084) -
⚠️
GeminiLLMAdapternow only treatsmessages[0]as the initial system message, matching all other adapters. Previously it searched for the first "system" message anywhere in the conversation history. A "system" message appearing later in the list will now be converted to "user" instead of being extracted as the system instruction.(PR #4089)
-
Fixed
InworldTtsServiceto fallback to full text when TTS timestamps are not received.
(PR #4113) -
⚠️ Realtime services (Gemini Live, OpenAI Realtime, Grok Realtime, Nova Sonic) now prefer
system_instructionfrom service settings over an initial system message in the LLM context, matching the behavior of non-realtime services. Previously, context-provided system instructions took precedence. A warning is now logged when both are set.
(PR #4130) -
Bumped
nvidia-riva-clientminimum version to>=2.25.1.
(PR #4136) -
Upgraded
protobuffrom 5.x to 6.x (>=6.31.1,<7).
(PR #4136) -
Unrecognized language strings (e.g. Deepgram's
"multi") no longer produce a warning at startup. The log message has been downgraded to debug level since these are valid service-specific values that are passed through correctly.
(PR #4137) -
GrokLLMServiceandGrokRealtimeLLMServicenow live in thepipecat.services.xaimodule alongsideXAIHttpTTSService, since all three use the same xAI API. Update imports frompipecat.services.grok.*topipecat.services.xai.*(e.g.from pipecat.services.xai.llm import GrokLLMService).
(PR #4142) -
⚠️ Bumped
mem0aidependency from~=0.1.94to>=1.0.8,<2. Users of themem0extra will need to update their mem0ai package.
(PR #4156)
Deprecated
pipecat.services.grok.llm,pipecat.services.grok.realtime.llm, and
pipecat.services.grok.realtime.eventsare deprecated. The old import paths
still work but emit aDeprecationWarning; usepipecat.services.xai.llm,
pipecat.services.xai.realtime.llm, and
pipecat.services.xai.realtime.eventsinstead.
(PR #4142)
Removed
-
⚠️
TTSService.add_word_timestamps()no longer supports the"Reset"and"TTSStoppedFrame"sentinel strings. If you have a custom TTS service that calledawait self.add_word_timestamps([("Reset", 0)])orawait self.add_word_timestamps([("TTSStoppedFrame", 0), ("Reset", 0)], ctx_id), replace them withawait self.append_to_audio_context(ctx_id, TTSStoppedFrame(context_id=ctx_id))and let_handle_audio_contextmanage the word-timestamp reset automatically.
(PR #4145) -
Removed
SambaNovaSTTService. SambaNova no longer offers speech-to-text audio models. Use another STT provider instead.
(PR #4154)
Fixed
-
Fixed Gemini Live (
GoogleGeminiLiveLLMService) not honoringsettings.system_instruction. The system instruction was being read from a deprecated constructor parameter instead of the settings object, causing it to be silently ignored.
(PR #4089) -
Fixed
AWSBedrockLLMAdaptersending an empty message list to the API when the only message in context was a system message. The lone system message is now converted to "user" role instead of being extracted, matching the existing Anthropic adapter behavior.
(PR #4089) -
Fixed Gemini Live pipeline hanging indefinitely when an
EndFramewas deferred while waiting for the bot to finish responding andturn_completenever arrived. As a possible root-cause fix,turn_completemessages are now handled even if they lackusage_metadata. As a fallback, the deferredEndFramenow has a 30-second safety timeout.
(PR #4125) -
Fixed ElevenLabs WebSocket disconnections (1008 "Maximum simultaneous contexts exceeded") caused by rapid user interruptions. When interruptions arrived before any TTS text was generated, phantom contexts were created on the ElevenLabs server that were never closed, eventually exceeding the 5-context limit.
(PR #4126) -
Fixed the final sentence being dropped from the conversation context when using RTVI text input with non-word-timestamp TTS services. The
LLMFullResponseEndFramewas racing ahead of the lastTTSTextFrame, causing theLLMAssistantAggregatorto finalize the context before the final sentence arrived.
(PR #4127) -
Fixed audio crackling and popping in recordings when both user and bot are speaking.
AudioBufferProcessorno longer injects silence into a track's buffer while that track is actively producing audio, preventing mid-utterance interruptions in the recorded output.
(PR #4135) -
Fixed websocket TTS word timestamps so interrupted contexts cannot leak stale words or backward PTS values into later turns.
(PR #4145) -
Fixed a race condition in
InterruptibleTTSServicewhere, ifrun_ttshad been invoked butBotStartedSpeakingFramehad not yet been received, a user interruption could allow stale audio to leak through.
(PR #4145) -
Fixed Gemini Live local VAD mode (
GeminiVADParams(disabled=True)with external VAD) not working. The bot now correctly detects user speech and signals turn boundaries to the Gemini API.
(PR #4146) -
Fixed Gemini Live message handling to process all
server_contentfields independently. Gemini 3.x can bundle multiple fields (e.g.model_turnandoutput_transcription) on the same message, but the previouselifchain only processed the first match, silently dropping the rest.
(PR #4147) -
Fixed
ServiceSwitcherwithServiceSwitcherStrategyFailoverincorrectly triggering failover whenErrorFrames from other pipeline stages (e.g. TTS) propagated upstream through the switcher. Previously, any non-fatal error passing through would be misattributed to the active service and trigger an unwanted service switch. Now only errors originating from the switcher's own managed services trigger failover.
(PR #4149) -
Fixed
LiveKitOutputTransportnot clearing thertc.AudioSourceinternal buffer on interruption, causing the bot to continue speaking for several seconds after being interrupted.
(PR #4151) -
Fixed a crash in OpenAI LLM processing when the provider returns
chunk.choices[0].delta.audio = None, which caused'NoneType' object has no attribute 'get'errors during audio transcript handling.
(PR #4152) -
Fixed error floods in
DeepgramSTTServicewhen the WebSocket connection drops. With Deepgram SDK 6.x,send_media()raises exceptions on a dead connection instead of silently failing, causing every queued audio frame to log an error. Nowsend_media()failures are caught gracefully — a single warning is logged and audio frames are skipped until the existing reconnection logic restores the connection.
(PR #4153) -
Mem0MemoryServiceno longer blocks the event loop during memory storage and retrieval. All Mem0 API calls now run in a background thread, and message storage is fire-and-forget so it doesn't delay downstream processing.
(PR #4156) -
Fixed
Mem0MemoryServicefailing to store messages when the context contained system or developer role messages. The Mem0 API only accepts user and assistant roles, so other roles are now filtered out before storing.
(PR #4156) -
Added missing
on_dtmf_eventcallback toLemonSliceTransportClient.setup()DailyCallbacksconstruction, fixing aValidationErrorat pipeline setup time.
(PR #4161) -
Fixed an issue in
InworldTTSServicewhere, in cases of fast interruption, we would continue receiving audio from the previous context.
(PR #4167) -
Fixed a word timestamp interleaving issue in
InworldTTSServicewhen processing multiple sentences.
(PR #4167) -
Fixed duplicate
TTSStoppedFramebeing pushed in TTS services usingpush_stop_frames=True. When the stop-frame timeout fired, a secondTTSStoppedFramecould be pushed after the normal one at context completion.
(PR #4172) -
⚠️ Fixed
DeepgramSTTServicecompatibility with deepgram-sdk 6.1.0. The SDK now requires explicit message objects forsend_keep_alive(),send_close_stream(), andsend_finalize(). The minimum deepgram-sdk version is now 6.1.0.
(PR #4174) -
Fixed RTVI events not being delivered to clients when using WebSocket transports.
ProtobufFrameSerializernow setsignore_rtvi_messages=Falseby default.
(PR #4176) -
Fixed a timing issue where turn detection timer tasks (idle controller, speech timeout, turn analyzer, and turn completion) could miss their first tick because the newly created asyncio task was not yet scheduled when the caller continued.
(PR #4183) -
Fixed
FastAPIWebsocketTransportintermittently hanging on shutdown when the remote side (e.g. Twilio) disconnects while audio is being sent. A race condition between the send and receive paths could cause theon_client_disconnectedcallback to be skipped, leaving the pipeline waiting for a disconnect signal that never came.
(PR #4186)
Performance
RimeTTSServicenow handles Rime'sdoneWebSocket message to complete audio contexts immediately, eliminating the 3-second idle timeout that previously added latency at the end of each utterance.
(PR #4172)