Added
-
Added
ResembleAITTSServicefor text-to-speech using Resemble AI's streaming WebSocket API with word-level timestamps and jitter buffering for smooth audio playback.
(PR #3134) -
Added
UserBotLatencyObserverfor tracking user-to-bot response latency. When tracing is enabled, latency measurements are automatically recorded asturn.user_bot_latency_secondsattributes on OpenTelemetry turn spans.
(PR #3355) -
Added
append_to_contextparameter toTTSSpeakFramefor conditional LLM context addition.- Allows fine-grained control over whether text should be added to conversation context
- Defaults to
Trueto maintain backward compatibility
(PR #3584)
-
Added TTS context tracking system with
context_idfield to trace audio generation through the pipeline.TTSAudioRawFrame,TTSStartedFrame,TTSStoppedFramenow includecontext_idAggregatedTextFrameandTTSTextFramenow includecontext_id- Enables tracking which TTS request generated specific audio chunks
(PR #3584)
-
Added support for Inworld TTS Websocket Auto Mode for improved latency
(PR #3593) -
Added new frames for context summarization:
LLMContextSummaryRequestFrameandLLMContextSummaryResultFrame.
(PR #3621) -
Added context summarization feature to automatically compress conversation history when conversation length limits (by token or message count) are reached, enabling efficient long-running conversations.
- Configure via
enable_context_summarization=TrueinLLMAssistantAggregatorParams - Customize behavior with
LLMContextSummarizationConfig(max tokens, thresholds, etc.) - Automatically preserves incomplete function call sequences during summarization
- See new examples:
examples/foundational/54-context-summarization-openai.pyand
examples/foundational/54a-context-summarization-google.py
(PR #3621)
- Configure via
-
Added RTVI function call lifecycle events (
llm-function-call-started,llm-function-call-in-progress,llm-function-call-stopped) with configurable security levels viaRTVIObserverParams.function_call_report_level. Supports per-function control over what information is exposed (DISABLED,NONE,NAME, orFULL).
(PR #3630) -
Added
RequestMetadataFrameand metadata handling forServiceSwitcherto ensure STT services correctly emitSTTMetadataFramewhen switching between services. Only the active service's metadata is propagated downstream, switching services triggers the newly active service to re-emit its metadata, and proper frame ordering is maintained at startup.
(PR #3637) -
Added
STTMetadataFrameto broadcast STT service latency information at pipeline start.- STT services broadcast P99 time-to-final-segment (
ttfs_p99_latency) to downstream processors - Turn stop strategies automatically configure their STT timeout from this metadata
- Developers can override
ttfs_p99_latencyvia constructor argument for custom deployments - Added measured P99 values for STT providers.
- See stt-benchmark to measure latency for your configuration
(PR #3637)
- STT services broadcast P99 time-to-final-segment (
-
Added support for
is_sandboxparameter inLiveAvatarNewSessionRequestto enable sandbox mode for HeyGen LiveAvatar sessions.
(PR #3653) -
Added support for
video_settingsparameter inLiveAvatarNewSessionRequestto configure video encoding (H264/VP8) and quality levels.
(PR #3653) -
Added
OpenAIRealtimeSTTServicefor real-time streaming speech-to-text using OpenAI's Realtime API WebSocket transcription sessions. Supports local VAD and server-side VAD modes, noise reduction, and automatic reconnection.
(PR #3656) -
Added
bulbul:v3-betaTTS model support for Sarvam AI with temperature control and 25 new speaker voices.
(PR #3671) -
Added
saaras:v3STT model support for Sarvam AI with newmodeparameter (transcribe, translate, verbatim, translit, codemix) and prompt support.
(PR #3671) -
Added new OpenAI TTS voice options
marinandcedar.
(PR #3682) -
Added
UserMuteStartedFrameandUserMuteStoppedFramesystem frames, and correspondinguser-mute-started/user-mute-stoppedRTVI messages, so clients can observe when mute strategies activate or deactivate.
(PR #3687)
Changed
-
Updated all 30+ TTS service implementations to support context tracking with
context_id.- Services now generate and propagate context IDs through TTS frames
- Enables end-to-end tracing of TTS requests through the pipeline
(PR #3584)
-
⚠️
TTSService.run_tts()now requires acontext_idparameter for context tracking.- Custom TTS service implementations must update their
run_tts()signature - Before:
async def run_tts(self, text: str) -> AsyncGenerator[Frame, None]: - After:
async def run_tts(self, text: str, context_id: str) -> AsyncGenerator[Frame, None]:
(PR #3584)
- Custom TTS service implementations must update their
-
Simplified context aggregators to use
frame.append_to_contextflag instead of tracking internal state.- Cleaner logic in
LLMResponseAggregatorandLLMResponseUniversalAggregator - More consistent behavior across aggregator implementations
(PR #3584)
- Cleaner logic in
-
Updated timestamps to be cumulative within an agent turn, using flushCompleted message as an indication of when timestamps from the server are reset to 0
(PR #3593) -
Changed
KokoroTTSServiceto usekokoro-onnxinstead ofkokoroas the underlying TTS engine.
(PR #3612) -
Improved user turn stop timing in
TranscriptionUserTurnStopStrategyandTurnAnalyzerUserTurnStopStrategy.- Timeout now starts on
VADUserStoppedSpeakingFramefor tighter, more predictable timing - Added support for finalized transcripts (
TranscriptionFrame.finalized=True) to trigger earlier - Added fallback timeout for edge cases where transcripts arrive without VAD events
- Removed
InterimTranscriptionFramehandling (no longer affects timing)
(PR #3637)
- Timeout now starts on
-
Improved the accuracy of the
UserBotLatencyObserverandUserBotLatencyLogObserverby measuring from the time when the user actually starts speaking.
(PR #3637) -
⚠️ Renamed
timeoutparameter touser_speech_timeoutinTranscriptionUserTurnStopStrategy.
(PR #3637) -
Updated the
VADUserStartedSpeakingFrameto includestart_secsandtimestampandVADUserStoppedSpeakingFrameto includestop_secsandtimestamp, removing the need to separately handle theSpeechControlParamsFramefor VADParams values.
(PR #3637) -
⚠️ Renamed
TranscriptionUserTurnStopStrategytoSpeechTimeoutUserTurnStopStrategy. The old name is deprecated and will be removed in a future release.
(PR #3637) -
AssemblyAISTTServicenow automatically configures optimal settings for manual turn detection whenvad_force_turn_endpoint=True. This setsend_of_turn_confidence_threshold=1.0andmax_turn_silence=2000by default, which disables model-based turn detection and reduces latency by relying on external VAD for turn endpoints. Warnings are logged if conflicting settings are detected.
(PR #3644) -
Upgraded the
pipecat-ai-small-webrtc-prebuiltpackage to v2.1.0.
(PR #3652) -
Changed default session mode from "CUSTOM" to "LITE" in HeyGen LiveAvatar integration, with VP8 as the default video encoding.
(PR #3653) -
⚠️ The default
VADParamsstop_secsdefault is changing from0.8seconds to0.2seconds. This change both simplifies the developer experience and improves the performance of STT services. With a shorterstop_secsvalue, STT services using a local VAD can finalize sooner, resulting in faster transcription.SpeechTimeoutUserTurnStopStrategy: control how long to wait for additional user speech usinguser_speech_timeout(default: 0.6 sec).TurnAnalyzerUserTurnStopStrategy: the turn analyzer automatically adjusts the user wait time based on the audio input.
(PR #3659)
-
Moved interruption wait event from per-processor instance state to
InterruptionFrameitself. AddedInterruptionFrame.complete()to signal when the interruption has fully traversed the pipeline. Custom processors that block or consume anInterruptionFramebefore it reaches the pipeline sink must callframe.complete()to avoid stallingpush_interruption_task_frame_and_wait(). A warning is logged if completion does not happen within 2 seconds.
(PR #3660) -
Update the default model to
scribe_v2forElevenLabsSTTService.
(PR #3664) -
Changed the
DeepgramSTTServicedefault setting forsmart_formattoFalse, as agents don't need smart formatting. Disabling this setting provides a small performance improvement, as well.
(PR #3666) -
Changed
FunctionCallCancelFrameto broadcast in both directions for consistency with other function call frames.
(PR #3672) -
Changed default user turn stop strategy from
TranscriptionUserTurnStopStrategytoTurnAnalyzerUserTurnStopStrategywithLocalSmartTurnAnalyzerV3.
(PR #3689) -
Renamed
RequestMetadataFrametoServiceSwitcherRequestMetadataFrameand added aservicefield to target a specific service. The frame is now pushed downstream by services after handling instead of being silently consumed.
(PR #3692) -
Update
SonioxSTTServiceto setvad_force_turn_endpointtoTrue. This setting disabled the turn detection logic available natively in Soniox. Instead, Soniox relies on a local VAD to finalize the transcript. This configuration meaningfully reduces the time to final segment for Soniox. With this setting enabled, Soniox outputs a transcript in ~250ms (median). Pipecat enables smart-turn detection by default using theLocalSmartTurnAnalyzerV3. To use the native turn detection logic in Soniox, just setvad_force_turn_endpointtoFalse.
(PR #3697) -
Update
SonioxSTTServicedefault model tostt-rt-v4.
(PR #3697) -
Updated the default model to
async_flash_v1.0and base URL tohttps://api.async.comforAsyncAITTSService.
(PR #3701)
Deprecated
-
Deprecated
UserBotLatencyLogObserver. UseUserBotLatencyObserverdirectly with itson_latency_measuredevent handler instead.
(PR #3355) -
Deprecated
RTVILLMFunctionCallMessage,RTVILLMFunctionCallMessageData, andRTVIProcessor.handle_function_call(). Use the newllm-function-call-in-progressevent sent automatically byRTVIObserverinstead.
(PR #3630)
Removed
- ⚠️ Removed
timeoutparameter fromTurnAnalyzerUserTurnStopStrategy. The timeout is now managed internally based on STT latency.
(PR #3637)
Fixed
-
Fixed pipeline freeze when
InterruptionFramediscardsEndFrameorStopFrameby making terminal frames uninterruptible.
(PR #3542) -
Fixed OpenAI LLM stream not being closed on cancellation/exception, which could leak sockets.
(PR #3589) -
Fixed
PipelineTaskadding duplicateRTVIProcessorandRTVIObserverwhen they were already provided in the pipeline or observers list. They are now detected and skipped, with appropriate warnings and errors logged for mismatched configurations.
(PR #3610) -
Fixed function call timeout task not being cancelled when the handler completes without calling
result_callbackor is cancelled externally, which causedRuntimeWarning: coroutine was never awaited.
(PR #3616) -
Fixed sentence splitting for Japanese, Chinese, Korean, and other non-Latin languages in TTS pipeline. NLTK's sentence tokenizer does not support CJK languages, causing text to accumulate until flush instead of being split at sentence boundaries. Added fallback detection for unambiguous non-Latin sentence-ending punctuation (e.g.,
。,?,!).
(PR #3617) -
Fixed
PipelineTaskto also callset_bot_ready()when an externalRTVIProcessoris provided.
(PR #3623) -
Fixed
VADControllernot broadcastingSpeechControlParamsFrameon startup, which prevented STT services from receiving VAD params needed for TTFB measurement.
(PR #3628) -
Fixed
StopAsyncIterationexceptions inparse_telephony_websocket()when WebSocket connections close before sending expected messages.
(PR #3629) -
Fixed WebSocket transport error when broadcasting
InputTransportMessageFrameby correctly instantiating the frame with its message parameter.
(PR #3635) -
Fixed orphan OpenTelemetry spans during flow initialization and transitions in tracing.
(PR #3649) -
Fixed
SambaNovaLLMServiceandGoogleLLMOpenAIBetaServicestreams not being closed on cancellation/exception, which could leak sockets.
(PR #3663) -
Fixed an issue in
InworldTTSServicewhere punctuation was pronounced. Now, theInworldTTSServiceensures proper spacing between sentences, resolving pronunciation issues.
(PR #3667) -
Fixed
ParallelPipelineallowing frames pushed by internal processors to escape during lifecycle frame (StartFrame/EndFrame/CancelFrame) synchronization. These frames are now buffered and flushed after all branches complete.
(PR #3668) -
Fixed issues in Sarvam STT and TTS services: missing event handler registration for VAD signals,
Optional[bool]type annotations, WebSocket state cleanup on API errors, and TTS disconnect/reconnection state management.
(PR #3671) -
Fixed
RTVIObserversending duplicate client messages for frames that are broadcast in both directions (e.g.UserStartedSpeakingFrame,FunctionCallResultFrame).
(PR #3672) -
Fixed WebSocket STT services (ElevenLabs, Cartesia, Gladia, Soniox) disconnecting due to idle timeout when no audio is being sent (e.g. when inactive behind a
ServiceSwitcher).WebsocketSTTServicenow provides opt-in silence-based keepalive viakeepalive_timeoutandkeepalive_intervalparameters.
(PR #3675)