Added
-
Added concurrent audio context support:
CartesiaTTSServicecan now synthesize the next sentence while the previous one is still playing, by settingpause_frame_processing=Falseand routing each sentence through its own audio context queue.
(PR #3804) -
Added custom video track support to Daily transport. Use
video_out_destinationsinDailyParamsto publish multiple video tracks simultaneously, mirroring the existingaudio_out_destinationsfeature.
(PR #3831) -
Added
ServiceSwitcherStrategyFailoverthat automatically switches to the next service when the active service reports a non-fatal error. Recovery policies can be implemented via theon_service_switchedevent handler.
(PR #3861) -
Added optional
timeout_secsparameter toregister_function()andregister_direct_function()for per-tool function call timeout control, overriding the globalfunction_call_timeout_secsdefault.
(PR #3915) -
Added
cloud-audio-onlyrecording option to Daily transport'senable_recordingproperty.
(PR #3916) -
Wired up
system_instructioninBaseOpenAILLMService,AnthropicLLMService, andAWSBedrockLLMServiceso it works as a default system prompt, matching the behavior of the Google services. This enables sharing a singleLLMContextacross multiple LLM services, where each service provides its own system instruction independently.llm = OpenAILLMService( api_key=os.getenv("OPENAI_API_KEY"), system_instruction="You are a helpful assistant.", ) context = LLMContext() @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): context.add_message({"role": "user", "content": "Please introduce yourself."}) await task.queue_frames([LLMRunFrame()])
(PR #3918)
-
Added
vad_thresholdparameter toAssemblyAIConnectionParamsfor configuring voice activity detection sensitivity in U3 Pro. Aligning this with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone" where AssemblyAI transcribes speech that VAD hasn't detected yet.
(PR #3927) -
Added
push_empty_transcriptsparameter toBaseWhisperSTTServiceandOpenAISTTServiceto allow empty transcripts to be pushed downstream asTranscriptionFrameinstead of discarding them (the default behavior). This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.
(PR #3930) -
LLM services (
BaseOpenAILLMService,AnthropicLLMService,AWSBedrockLLMService) now log a warning when bothsystem_instructionand a system message in the context are set. The constructor'ssystem_instructiontakes precedence.
(PR #3932) -
Runtime settings updates (via
STTUpdateSettingsFrame) now work for AWS Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and Soniox STT services. Previously, changing settings at runtime only stored the new values without reconnecting.
(PR #3946) -
Exposed
on_summary_appliedevent onLLMAssistantAggregator, allowing users to listen for context summarization events without accessing private members.
(PR #3947) -
Deepgram Flux STT settings (
keyterm,eot_threshold,eager_eot_threshold,eot_timeout_ms) can now be updated mid-stream viaSTTUpdateSettingsFramewithout triggering a reconnect. The new values are sent to Deepgram as a Configure WebSocket message on the existing connection.
(PR #3953) -
Added
system_instructionparameter torun_inferenceacross all LLM services, allowing callers to override the system prompt for one-shot inference calls. Used by_generate_summaryto pass the summarization prompt cleanly.
(PR #3968)
Changed
-
Audio context management (previously in
AudioContextTTSService) is now built intoTTSService. All WebSocket providers (cartesia,elevenlabs,asyncai,inworld,rime,gradium,resembleai) now inherit fromWebsocketTTSServicedirectly. Word-timestamp baseline is set automatically on the first audio chunk of each context instead of requiring each provider to callstart_word_timestamps()in their receive loop.
(PR #3804) -
Daily transport now uses
CustomVideoSource/CustomVideoTrackinstead ofVirtualCameraDevicefor the default camera output, mirroring how audio already works withCustomAudioSource/CustomAudioTrack.
(PR #3831) -
⚠️ Updated
DeepgramSTTServiceto usedeepgram-sdkv6. TheLiveOptionsclass was removed from the SDK and is now provided by pipecat directly; import it frompipecat.services.deepgram.sttinstead ofdeepgram.
(PR #3848) -
ServiceSwitcherStrategybase class now provides ahandle_error()hook for subclasses to implement error-based switching.ServiceSwitcherdefaults toServiceSwitcherStrategyManualandstrategy_typeis now optional.
(PR #3861) -
Support for Voice Focus 2.0 models.
- Updated
aic-sdkto~=2.1.0to support Voice Focus 2.0 models. - Cleaned unused
ParameterFixedErrorexception handling inAICFilter
parameter setup.
(PR #3889)
- Updated
-
max_context_tokensandmax_unsummarized_messagesinLLMAutoContextSummarizationConfig(and deprecatedLLMContextSummarizationConfig) can now be set toNoneindependently to disable that summarization threshold. At least one must remain set.
(PR #3914) -
⚠️ Removed
formatted_finalsandword_finalization_max_wait_timefromAssemblyAIConnectionParamsas these were v2 API parameters not supported in v3. Clarified thatformat_turnsonly applies to Universal-Streaming models; U3 Pro has automatic formatting built-in.
(PR #3927) -
Changed
DeepgramTTSServiceto send a Clear message on interruption instead of disconnecting and reconnecting the WebSocket, allowing the connection to persist throughout the session.
(PR #3958) -
Re-added
enhancement_levelsupport toAICFilterwith runtimeFilterEnableFramecontrol, applyingProcessorParameter.BypassandProcessorParameter.EnhancementLeveltogether.
(PR #3961) -
Updated
daily-pythondependency from~=0.23.0to~=0.24.0.
(PR #3970) -
Updated
FishAudioTTSServicedefault model froms1tos2-pro, matching Fish Audio's latest recommended model for improved quality and speed.
(PR #3973) -
AzureSTTServiceregionparameter is now optional whenprivate_endpointis provided. AValueErroris raised if neither is given, and a warning is logged if both are provided (private_endpointtakes priority).
(PR #3974)
Deprecated
-
Deprecated
AudioContextTTSServiceandAudioContextWordTTSService. SubclassWebsocketTTSServicedirectly instead; audio context management is now part of the baseTTSService.- Deprecated
WordTTSService,WebsocketWordTTSService, andInterruptibleWordTTSService. Word timestamp logic is now always active inTTSServiceand no longer needs to be opted into via a subclass.
(PR #3804)
- Deprecated
-
Deprecated
pipecat.services.google.llm_vertex,pipecat.services.google.llm_openai, andpipecat.services.google.gemini_live.llm_vertexmodules. Usepipecat.services.google.vertex.llm,pipecat.services.google.openai.llm, andpipecat.services.google.gemini_live.vertex.llminstead. The old import paths still work but will emit aDeprecationWarning.
(PR #3980)
Removed
- ⚠️ Removed
supports_word_timestampsparameter fromTTSService.__init__(). Word timestamp logic is now always active. Remove this argument from any custom subclasssuper().__init__()calls.
(PR #3804)
Fixed
-
Fixed
DeepgramSTTServicekeepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicitKeepAlivemessages every 5 seconds, within the recommended 3–5 second interval before Deepgram's 10-second inactivity timeout.
(PR #3848) -
Fixed
BufferError: Existing exports of data: object cannot be re-sizedinAICFiltercaused by holding amemoryviewon the mutable audio buffer across async yield points.
(PR #3889) -
Fixed TTS context not being appended to the assistant message history when using
TTSSpeakFramewithappend_to_context=Truewith some TTS providers.
(PR #3936) -
Fixed context summarization leaving orphaned tool responses in the kept context when tool calls were moved to the summarized portion.
(PR #3937) -
Fixed turn completion state not resetting at end of LLM responses.
LLMFullResponseEndFrameis pushed (not received) by the LLM service, so the mixin now handles it inpush_frameinstead ofprocess_frame.
(PR #3956) -
Fixed turn completion instructions being injected as a context system message instead of using
system_instruction. This caused warning spam whensystem_instructionwas also set and didn't persist across full context updates.
(PR #3957) -
Fixed
TTSServiceaudio context queue getting blocked whenappend_to_audio_context()was called with aNonecontext ID, which prevented subsequent audio from being delivered.
(PR #3958) -
Fixed
on_call_state_updatedevent handler in LiveKit transport receiving incorrect number of arguments due to redundantselfpassed to_call_event_handler.
(PR #3959) -
Fixed OpenAI Realtime, OpenAI Realtime Beta, and Grok realtime services treating
conversation_already_has_active_responseas a fatal error. These services now log it as a non-fatal debug event when a response is already in progress.
(PR #3960) -
Fixed
SmallWebRTCConnectionsilently discarding messages sent before the data channel is open by queuing them and flushing once the channel is ready. A bounded queue (MAX_MESSAGE_QUEUE_SIZE = 50) prevents unbounded memory growth, and a 10-second timeout after connection clears the queue and falls back to discard mode if the data channel never opens.
(PR #3962) -
Fixed
AzureSTTServicefailing to initialize whenprivate_endpointis provided. The Azure Speech SDK'sSpeechConfigdoes not accept bothregionandendpointsimultaneously, so they are now passed conditionally.
(PR #3967) -
Fixed
GoogleLLMServiceignoring thesystem_instructionset via constructor orGoogleLLMSettingswhen a system message was also present in the context. The settings value now correctly takes priority, and a warning is logged when both are set.
(PR #3976)
Other
-
Updated foundational examples to use
system_instructionon LLM services instead of adding system messages toLLMContext.
(PR #3918) -
Updated AssemblyAI turn detection example to use
keyterms_promptlist format instead ofpromptstring for improved clarity.
(PR #3929) -
Updated foundational examples and eval scripts to use
"user"role instead of"system"when adding messages toLLMContext, since system prompts should be set viasystem_instructionon the LLM service.
(PR #3931)