Added
-
Added a
session_idfield toRunnerArgumentsso bots can log or trace a per-session identifier in local development the same way they can in Pipecat Cloud. The development runner now mints a UUID at every construction site, and paths that already returned asessionIdto the caller (Daily/start, dial-in webhook) share that same UUID with the runner args instead of generating two. The SmallWebRTC/api/offerendpoint also accepts an optionalsession_idquery parameter so the/sessions/{session_id}/...proxy can thread it through.
(PR #4385) -
Added a
max_buffer_delay_msconstructor argument toCartesiaTTSServicefor controlling Cartesia's server-side text buffering. When unset, Pipecat picks a sensible default based ontext_aggregation_mode:0inSENTENCEmode (custom buffering — avoids stacking client-side aggregation on top of Cartesia's default 3000ms server buffer) and unset inTOKENmode (Cartesia's managed buffering applies). Pass an explicit value (0–5000ms) to override.
(PR #4390) -
Added a
mip_opt_outconstructor argument toDeepgramTTSServiceandDeepgramHttpTTSServiceso callers can opt out of the Deepgram Model Improvement Program. When set, the value is forwarded to Deepgram as a query parameter on the speak request. Defaults toNone, which preserves the existing behavior. See https://dpgr.am/deepgram-mip for pricing implications before enabling.
(PR #4400) -
Added an opt-in
add_tool_change_messagesflag to the LLM aggregators (set viaLLMContextAggregatorPair(..., add_tool_change_messages=True)) that appends a developer-role message to the context wheneverLLMSetToolsFramechanges the set of advertised standard tools. Helps the LLM stay coherent across mid-conversation tool changes, mitigating several flavors of tool-call-related hallucination: calling tools that have been removed, avoiding tools that have been re-added, and hallucinating output (made-up answers or tool-call-shaped non-tool-calls) when tools are unavailable.
(PR #4404) -
Added
deferred(strategy)andDeferredUserTurnStopStrategyinpipecat.turns.user_stop. Wraps a stop strategy so it fires only the inference-triggered event and suppresseson_user_turn_stopped, leaving finalization to another strategy in the chain such asLLMTurnCompletionUserTurnStopStrategy.
(PR #4405) -
Added
ExternalUserTurnCompletionStopStrategyinpipecat.turns.user_stop— a generic stop strategy that finalizes the user turn whenever aUserTurnInferenceCompletedFramearrives, regardless of which component produced it.LLMTurnCompletionUserTurnStopStrategynow extends this base; future producers (Flux, custom end-of-turn classifiers, etc.) can use the base directly or subclass it to add producer-specific setup.
(PR #4405) -
Added
on_user_turn_inference_triggered, a new event on the user turn controller, processor, aggregator and stop strategies that fires when a strategy has enough signal to start LLM inference. By default it fires together withon_user_turn_stopped; a gating strategy can fire only the inference-triggered event and defer finalization to a peer.
(PR #4405) -
Added
FilterIncompleteUserTurnStrategiesinpipecat.turns.user_turn_strategies— aUserTurnStrategiesspecialization that wraps the detector chain withdeferred(...)and appendsLLMTurnCompletionUserTurnStopStrategyas the finalizer. Common case:user_turn_strategies=FilterIncompleteUserTurnStrategies(). Passconfig=UserTurnCompletionConfig(...)to customize timeouts and prompts.
(PR #4405) -
Added
LLMTurnCompletionUserTurnStopStrategyinpipecat.turns.user_stop. When installed, the strategy gateson_user_turn_stoppedon aUserTurnInferenceCompletedFrame(a new fieldless system frame emitted by any component that can judge turn completeness — e.g. theUserTurnCompletionLLMServiceMixinon✓). Afinalization_timeoutprovides a safety net if no completion frame ever arrives.
(PR #4405) -
Added first-class RTVI support for the UI Agent Protocol:
- Adds
ui-event,ui-snapshot, andui-cancel-taskclient-to-server messages, plusui-commandandui-taskserver-to-client messages, with paired*Data/*Messagepydantic models. - Adds built-in command payload models for
Toast,Navigate,ScrollTo,Highlight,Focus,Click,SetInputValue, andSelectText; matching default handlers live in@pipecat-ai/client-react. - Adds
RTVIProcessor.on_ui_messagefor inboundui-event,ui-snapshot, andui-cancel-taskmessages. - Adds five UI pipeline frames, mirroring the
client-messageframe-and-event pattern: downstream code pushesRTVIUICommandFrame/RTVIUITaskFramefor the observer to wrap into outboundUICommandMessage/UITaskMessageenvelopes, while the processor pushes inboundRTVIUIEventFrame,RTVIUISnapshotFrame, andRTVIUICancelTaskFramealongsideon_ui_message. - Bumps the RTVI
PROTOCOL_VERSIONfrom1.2.0to1.3.0.
(PR #4407)
- Adds
-
AWS Transcribe STT, Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor now resolve credentials via the standard boto3 provider chain (EC2 instance profiles, EKS pod roles / IRSA, ECS task roles, SSO,
~/.aws/credentials) when explicit credentials andAWS_*environment variables are absent. Services running with IAM roles no longer need to export static credentials.
(PR #4416) -
Added
keytermssupport to ElevenLabs STT services so Scribe V2 callers can bias transcription for both file-based and realtime transcription.
(PR #4426) -
Added
watchdog_min_timeoutparameter toDeepgramFluxSTTandDeepgramFluxSageMakerSTT(default0.5seconds) to control the minimum silence duration before the watchdog sends a silence packet to prevent dangling turns. The actual threshold ismax(chunk_duration * 2, watchdog_min_timeout), so it also adapts automatically to the audio chunk size in use.
(PR #4430) -
Added
cancel_on_interruption=Falsesupport forGeminiLiveLLMServiceon models that support Gemini's NON_BLOCKING tool mechanism (currently Gemini 2.x); the conversation now continues while the tool runs. On models that don't yet support NON_BLOCKING (Gemini 3.x), the service surfaces a one-time warning explaining the limitation. (Note: an intermittent 1008 error can occasionally fire on Gemini 2.5 during long-running tool calls; we auto-reconnect.)
(PR #4448) -
Added
NvidiaSageMakerWebsocketSTTServicefor streaming speech recognition using NVIDIA Nemotron ASR via an AWS SageMaker bidirectional-stream endpoint. ProducesInterimTranscriptionFrameandTranscriptionFrameframes, is VAD-aware, and automatically reconnects on error.
(PR #4464) -
Added NVIDIA Magpie TTS services via AWS SageMaker:
NvidiaSageMakerHTTPTTSService(single HTTP invocation, streams raw PCM back) andNvidiaSageMakerWebsocketTTSService(persistent HTTP/2 bidi-stream with full interruption support viaInterruptibleTTSService).
(PR #4464) -
Added support for
reasoningconfiguration onOpenAIRealtimeLLMService, for use with reasoning-capable Realtime models such asgpt-realtime-2.
(PR #4470) -
Inworld TTS updates:
- Added
delivery_modesetting (STABLE/BALANCED/CREATIVE) toInworldTTSServiceandInworldHttpTTSService, enabling the stability-vs-creativity tradeoff ininworld-tts-2. - Added language support to
InworldTTSServiceandInworldHttpTTSService. Thelanguagesetting is now forwarded to the API, and a newlanguage_to_inworld_language()helper normalizes PipecatLanguageenums to Inworld's BCP-47 locale tags.
(PR #4473)
- Added
Changed
-
Updated the default
SonioxTTSServicemodel fromtts-rt-v1-previewto the generally availabletts-rt-v1.
(PR #4386) -
Default
cartesia_versionforCartesiaTTSServicebumped from2025-04-16to2026-03-01, matchingCartesiaHttpTTSServiceand unlocking theuse_normalized_timestampsandmax_buffer_delay_msfields.
(PR #4390) -
⚠️
CartesiaTTSServicenow sendsuse_normalized_timestamps: trueinstead of the deprecateduse_original_timestampsfield. Word timestamps now reflect what was actually spoken (post text-normalization and pronunciation-dictionary substitution), matching the convention Pipecat uses for ElevenLabs. This is a behavior change forsonic-3users, who were previously receiving timestamps tied to the input transcript.
(PR #4390) -
Broadened
tool_resourcestoapp_resourcesfor easy access not just in tool handlers but in other places like customFrameProcessors. Three changes: a rename (tool_resources→app_resources), a newapp_resourcesproperty onPipelineTask, and a newpipeline_taskproperty onFrameProcessor. Tool handlers now readparams.app_resources; custom processors readself.pipeline_task.app_resources. The previoustool_resourcesaliases (onPipelineTask,FunctionCallParams, andFrameProcessorSetup) keep working but are deprecated as of 1.2.0 and emitDeprecationWarnings.
(PR #4395) -
Lowered the per-message log in
SmallWebRTCInputTransport._handle_app_messagefromdebugtotrace. App messages can be high-frequency and were noisy at debug level; set the loguru level toTRACEto see them again.
(PR #4397) -
Changed the default model for
GrokRealtimeLLMServicetogrok-voice-think-fast-1.0, xAI's recommended Voice Agent model. The previous default ofgrok-voice-fast-1.0has been deprecated by xAI and is being removed.
(PR #4401) -
Changed the default Inworld TTS model from
inworld-tts-1.5-maxtoinworld-tts-2(Realtime TTS-2) acrossInworldHttpTTSService,InworldTTSService, and theInworldRealtimeLLMServicecascade. Existing users can pin the prior model explicitly via themodel/tts_modelargument; bothinworld-tts-1.5-maxandinworld-tts-1.5-miniremain valid model IDs.
(PR #4422) -
Changed the default model for
GrokLLMServicefromgrok-3togrok-4.20-non-reasoning. xAI is retiringgrok-3on May 15, 2026.
(PR #4429) -
DeepgramFluxSTTwatchdog silence threshold is now dynamic:max(chunk_duration * 2, watchdog_min_timeout)instead of a fixed 500 ms. This prevents false silence injections when large audio chunks are sent at lower frequency.
(PR #4430) -
ElevenLabsTTSServicenow sendsclose_contextto the server as soon as the turn is complete (onon_turn_context_completed) rather than waiting until all audio has finished playing back. TheisFinalmessage from ElevenLabs is now used to signalTTSStoppedFrameand clean up the audio context, improving turn transition timing.
(PR #4433) -
Updated
InworldHttpTTSServiceandInworldTTSServiceto use PCM audio encoding by default, which returns audio bytes without headers.
(PR #4446) -
Moved
create_task,cancel_task, thetask_managerproperty, andsetup(task_manager)up fromFrameProcessortoBaseObject. CustomBaseObjectsubclasses (turn strategies, controllers, etc.) now inherit these methods directly instead of reimplementing the task manager wiring. Owners propagate the task manager to their childBaseObjects viaawait child.setup(task_manager).
(PR #4449) -
Changed the default OpenAI Realtime input audio transcription model from
gpt-4o-transcribetogpt-realtime-whisperfor bothOpenAIRealtimeSTTServiceandOpenAIRealtimeLLMService. The new model does not accept thepromptparameter; if a prompt is supplied alongsidegpt-realtime-whisper, it is dropped automatically and a warning is logged. To keep using prompt hints, explicitly pinmodel="gpt-4o-transcribe"(or"gpt-4o-mini-transcribe").
(PR #4450) -
Updated the default model for
CartesiaTTSServiceandCartesiaHttpTTSServicefromsonic-3tosonic-3.5.
(PR #4462) -
Changed the default model for
OpenAIRealtimeLLMServicefromgpt-realtime-1.5togpt-realtime-2.
(PR #4472)
Deprecated
-
Deprecated
LLMUserAggregatorParams.filter_incomplete_user_turns. Useuser_turn_strategies=FilterIncompleteUserTurnStrategies()(or addLLMTurnCompletionUserTurnStopStrategyto a customuser_turn_strategies.stop) instead. Setting the legacy flag still works for one release: the aggregator emits aDeprecationWarningand rewires the strategies as if you had passedFilterIncompleteUserTurnStrategiesdirectly.
(PR #4405) -
Deprecated
ResampyResamplerin favor ofSOXRAudioResampler(or thecreate_file_resampler()/create_stream_resampler()factories). InstantiatingResampyResamplernow emits aDeprecationWarning. The class will be removed in Pipecat 2.0 along with the defaultresampyandnumbadependencies.
(PR #4428)
Fixed
-
Fixed
CartesiaTTSServicesurfacingflush_donemessages from Cartesia asErrorFrames. The latest API emits aflush_doneper transcript when server-side buffering is disabled; Pipecat now consumes them silently since each turn already has its owncontext_id.
(PR #4390) -
Fixed Cartesia tag helpers (
SPELL,EMOTION_TAG,PAUSE_TAG,VOLUME_TAG,SPEED_TAG) raisingTypeErrorwhen called on an instance (e.g.tts.SPELL("hi")). They're now@staticmethodand callable from both the class and an instance.
(PR #4390) -
Fixed
CartesiaHttpTTSServicepushing twoErrorFrames on a non-200 response — one with the API's error text and a second, less informative "Unknown error" frame from the outer exception handler. It now pushes a single frame that includes the HTTP status code and returns cleanly.
(PR #4390) -
Fixed an issue where
LocalSmartTurnAnalyzerV3was imported unconditionally for user turn stop strategies. It is now only imported whendefault_user_turn_stop_strategies()is called. This improves startup time and removes thetransformers"PyTorch/TensorFlow/Flax not found" warning when the default stop strategies are not used.
(PR #4393) -
Fixed
GrokRealtimeLLMServiceignoring the configured model. The model was stored inSettingsbut never sent to xAI, so every session silently fell back to xAI's server-side default. The model is now passed via the?model=query parameter on the WebSocket URL as xAI's Voice Agent API requires.
(PR #4401) -
Fixed
on_user_turn_stoppedfiring prematurely whenfilter_incomplete_user_turnswas enabled. The event now fires only after the LLM confirms the user turn is complete (✓); previously the smart-turn detector's tentative stop was bubbling up before the LLM had a chance to veto it, causing observers, transcript appenders and UI indicators to receive an early — and sometimes duplicated — signal.
(PR #4405) -
Fixed
TTSSpeakFrame(append_to_context=True)greetings sometimes splitting across two assistant messages in the LLM context and not surfacing inon_assistant_turn_stopped. TheLLMAssistantPushAggregationFrameemitted at the end of a TTS context now carries a PTS just past the last word so it can't overtake clock-queuedTTSTextFrames in the transport's output, andLLMAssistantAggregatornow triggerson_assistant_turn_started/on_assistant_turn_stoppedwhen it receives the frame outside an LLM response cycle (restoring v0.0.104 behavior for greeting transcripts).
(PR #4414) -
Fixed
ElevenLabsTTSServiceandElevenLabsHttpTTSServiceproducing merged words (e.g.bookLook) when using Flash models. Flash often splits sentences mid-stream into alignment chunks that begin with a real inter-word space, but the previous fix unconditionally stripped that space from every chunk. Leading spaces are now stripped only on the first alignment chunk of an utterance, so subsequent chunks correctly flush partial words across boundaries.
(PR #4415) -
Fixed AWS Polly TTS, Bedrock LLM, and the Bedrock AgentCore processor erroring out when only one of
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYwas set in the environment. The half-populated kwargs are no longer forwarded to aioboto3; partial env-var configurations now fall through to the boto3 credential chain like fully-unset configurations do.
(PR #4416) -
Fixed
ElevenLabsTTSServiceandElevenLabsHttpTTSServicewriting romanized/normalized text to the LLM context. With non-Latin input (e.g., Chinese), the assistant transcript was getting populated with pinyin (Ni Hao !instead of你好!), which then degraded subsequent LLM turns. The services now consumealignmentby default and only switch tonormalizedAlignment/normalized_alignmentwhenpronunciation_dictionary_locatorsis configured (wherealignmenthas overlapping restarts that produce duplicated/garbled words, per #4316). Both fields are read with preferred-with-fallback semantics since each is nullable per the API schema.
(PR #4424) -
Fixed a deadlock in
TTSServicethat could permanently stall pipeline processing when all three conditions occurred together:pause_frame_processing=True, an interruption arrived before any TTS audio was played, and anUninterruptibleFrame(e.g.TTSUpdateSettingsFrame,FunctionCallResultFrame) was in the processing queue at that moment. The process task would block on__process_event.wait()indefinitely becauseBotStoppedSpeakingFramenever arrives (no audio was played) and the interruption handler did not resume processing. Affects services usingpause_frame_processing=Truesuch as ElevenLabs, Rime, AsyncAI, Gradium, and ResembleAI.
(PR #4431) -
Fixed interruptions being delayed when a slow non-uninterruptible frame was processing and an uninterruptible frame was waiting in the queue. The bot would stall until the slow frame finished instead of cancelling it immediately on interruption.
(PR #4434) -
Fixed
TTSServicedropping uninterruptible frames (e.g.FunctionCallResultFrame) from its internal serialization queue when an interruption occurs. Previously, the queue was recreated on every interruption, silently discarding any queued frames. The queue is now reset instead of recreated, preserving uninterruptible frames so they are always delivered downstream.
(PR #4435) -
Fixed a race condition in the Daily transport that caused
AttributeError: 'NoneType' object has no attribute 'send_app_message'when tearing down a pipeline. BothDailyInputTransportandDailyOutputTransportshare the sameDailyTransportClientand both callcleanup(), which was releasing the underlyingCallClienton the first call — leaving the second caller with aNoneclient.
(PR #4440) -
Restored
cancel_on_interruption=Falsesupport forAWSNovaSonicLLMServiceandOpenAIRealtimeLLMService. These services previously honored the flag by simply not cancelling in-flight function calls on interruption; the introduction of the new async-tool mechanism (which threads started/intermediate/final messages through the LLM context) broke that path because the realtime services didn't know how to interpret those messages. Note that new-style streamed intermediate results (FunctionCallResultProperties(is_final=False)) are not supported on these realtime services. Similar fixes for other impacted realtime services are forthcoming.
(PR #4441) -
Fixed two misspelled Gemini TTS voice names in
GeminiTTSService.AVAILABLE_VOICES.
(PR #4443) -
Extended the
cancel_on_interruption=Falseregression fix toGrokRealtimeLLMService,AzureRealtimeLLMService, andUltravoxRealtimeLLMService. Grok and Azure use the same approach as in #4441 (each service detects async-tool messages in the LLM context and routes the final result to its formal tool-result channel; Azure inherits transitively fromOpenAIRealtimeLLMService). Ultravox needed a different approach because its API freezes the conversation betweenclient_tool_invocationand the matchingclient_tool_result— for async-registered functions it now ships a placeholderclient_tool_resultimmediately when the function is invoked (to unfreeze the conversation), then injects the real result as user-side text once the tool finishes. Streamed intermediate results (FunctionCallResultProperties(is_final=False)) are still not supported on any of these realtime services.GeminiLiveLLMServiceandInworldRealtimeLLMServiceare excluded for now: Gemini Live's async-tool path needs deeper investigation, and Inworld tool calling needs to be sorted out first.
(PR #4447) -
Fixed
OpenAIRealtimeLLMServicehandling of multi-output-item responses (observed withgpt-realtime-2). A single response can now contain more than one audio item, and the first item'saudio.donemay arrive after the second item's deltas have started. Deltas still arrive strictly in playback order, so we continue to forward them as received (matching OpenAI's reference implementation). The fix removes spurious warnings, ensures truncation always targets the latest audio item, and emits a single bracketingTTSStartedFrame/TTSStoppedFramepair per assistant turn (the Stopped is now pushed onresponse.done).
(PR #4465) -
Fixed missing
outputattribute on LLM OpenTelemetry spans when the LLM call is interrupted mid-stream.
(PR #4467) -
Fixed incorrect
metrics.ttfbon STT OpenTelemetry spans, and parented them to the current turn span.
(PR #4467) -
Fixed incorrect
metrics.ttfbon TTS OpenTelemetry spans for streaming services.
(PR #4467) -
Extended the
cancel_on_interruption=Falseregression fix toInworldRealtimeLLMService. Uses the same approach as in #4441 (the service detects async-tool messages in the LLM context and routes the final result to its formal tool-result channel). Note: as of this writing, Inworld Realtime doesn't appear to handle the resulting delayed tool result reliably — the routing is best-effort and the service surfaces a one-time warning when async-tool messages are seen. Streamed intermediate results (FunctionCallResultProperties(is_final=False)) are still not supported on this realtime service. (Inworld was excluded from #4447 pending resolution of an unrelated tool-calling issue, which turned out to be an account-level matter.)
(PR #4474) -
Fixed Cartesia TTS Korean word timestamps to use normal spacing rules, preserving word boundaries and per-word timestamp alignment during downstream aggregation.
(PR #4475) -
Fixed Cartesia TTS Chinese and Japanese timestamp grouping to preserve provider text spacing, avoiding artificial spaces when timestamp groups are reassembled downstream.
(PR #4475) -
Fixed
SonioxSTTServicefinal transcription frames missing detected language metadata when Soniox returns token-level language annotations.
(PR #4482) -
Fixed Soniox final transcription language detection to use the most common recognized token language, avoiding mislabeling an utterance when the last token is tagged with a different language.
(PR #4495) -
Fixed dropped audio in streaming TTS services whose wire protocol doesn't echo
context_idback on incoming audio (Sarvam, Smallest, Soniox, Inworld, and others). Previously, audio that arrived between contexts or at the very start of a turn was tagged withcontext_id=Noneand silently dropped with an "unable to append audio to context: no context ID provided" debug log.TTSService.get_active_audio_context_id()now falls back to the synthesis-side_turn_context_idwhen the playback cursor isn't set yet.
(PR #4497)
Security
- Fixed a path traversal issue in the development runner's
/files/{filename:path}download endpoint. Previously, when the runner was started with--folder, a request like/files/..%2F..%2Fetc%2Fpasswdcould escape the configured folder because%2F-encoded separators bypassed Starlette's path normalisation. The endpoint now resolves the joined path and rejects any filename that escapes the allowed base with a 403, and also returns 404 (instead of an implicitnull200) when--folderis unset.
(PR #4417)