Added
-
Added
on_user_turn_message_addedevent handler onLLMUserAggregator, with a newUserTurnMessageAddedMessagearg type. It fires when the user aggregator writes a message to the LLM context, carrying the finalized turn text. In cascade mode it coincides withon_user_turn_stopped; in realtime mode (whenrealtime_service_mode=Trueon the aggregator pair) it's the canonical way to subscribe to "context just updated, here's the user text" (since theon_user_turn_stoppedevent fires before the message is finalized, withUserTurnStoppedMessage.content=None). Note that there's been no change toon_assistant_turn_stopped.
(PR #4533) -
Added
RealtimeServiceMetadataFrame, broadcast at pipeline start by realtime LLM services (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, AWS Nova Sonic, Ultravox). This frame can be used by other processors in the pipeline to configure themselves accordingly. Today, it only advertises two things: that a realtime service is present in the pipeline (indicated by the fact that the frame is sent at all), andemits_user_turn_frames, which says whether the realtime service can emit its ownUserStartedSpeakingFrameandUserStoppedSpeakingFrames (suggesting local VAD/turn detection may not be needed in the pipeline).
(PR #4533) -
Added to our examples "locally-driven-turns" variants for:
- OpenAI Realtime (
realtime-openai-locally-driven-turns.py) - Grok Realtime (
realtime-grok-locally-driven-turns.py) - Inworld Realtime (
realtime-inworld-locally-driven-turns.py)
These join
realtime-gemini-live-locally-driven-turns.pyin showing how to configure each realtime service so that its turn-taking is dictated by local turn detection (e.g. VAD + smart turn analyzer).
(PR #4533) - OpenAI Realtime (
-
Added a startup WARNING log on realtime LLM services that don't emit
UserStartedSpeakingFrame/UserStoppedSpeakingFrame(Gemini Live, AWS Nova Sonic, Ultravox). The log is meant to draw attention to a couple of things:- That other processors in the pipeline (e.g. RTVI) may expect turn frames, and that the developer can enable local VAD/turn detection to supply them, and, relatedly
- That when using local turn detection, local turns may NOT perfectly align with the "ground truth" of server-decided turns, so they should be thought of as APPROXIMATE (unless local turn detection is driving the realtime service's turns, in which case there's no separate server-decided ground truth)
(The warning also serves as a little nudge to the realtime service providers: providing a "ground truth" signal of when the provider thinks the user has started or stopped speaking is very helpful to app developers!)
(PR #4533) -
Added a
realtime_service_mode: boolkwarg onLLMContextAggregatorPair, for opting into a set of behaviors tailored for use with realtime (speech-to-speech) services. Settingrealtime_service_mode=Truedoes three things: 1. Decouples context writes from theUserStoppedSpeakingFramesignal. Instead, the assistant response start triggers the user message writes. This ensures that context is written properly even when the realtime service provides no turn frames and local turn detection (i.e. local VAD) is disabled. This mechanism also enables the next point. 2. LetsUserStoppedSpeakingFramefire without waiting for transcripts. When local turn detection is configured to drive realtime service conversations,UserStoppedSpeakingFrameis the signal that triggers assistant responses. By letting this frame fire earlier, we reduce latency. 3. Replaces the default turn strategies withExternalUserTurnStartStrategyandExternalUserTurnStopStrategywhen the realtime service advertises that it emits its own turn frames. Various realtime services (OpenAI Realtime, Azure, Grok, Inworld) emit their own turn frames; in that case the External strategies fireon_user_turn_started/on_user_turn_stoppedfrom the server-emittedUserStartedSpeakingFrame/UserStoppedSpeakingFrame. For realtime services that don't emit those frames — either because they never do (Gemini Live, Nova Sonic, Ultravox) or because server-side turn detection has been disabled at runtime (e.g. OpenAI Realtime withturn_detection=False, in locally-driven-turns setups) — the defaults stay in place so locally-driven turn detection (e.g. local VAD) can fire the events. Passing customuser_turn_strategiesopts out of the swap.Note that when
realtime_service_mode=True, you should listen for the newon_user_turn_message_addedevent to get the newly-added user message rather thanon_user_turn_stopped, which no longer carries it.
(PR #4533) -
Added
private_endpointparameter toAzureTTSServiceandAzureHttpTTSServicefor connecting via Private Link or custom domain endpoints, matching existingAzureSTTServicesupport.
(PR #4549) -
Added
will_be_spokenfield toAggregatedTextFrame. Set toTrueby the TTS service just before synthesis, allowing downstream processors and observers to know whether TTS will speak a given text segment before audio begins.
(PR #4559) -
Added
AggregatedTextProgressFrame— a new frame emitted alongside eachTTSTextFrameduring word-timestamp playback. It carriesaccumulated_text(text already spoken) andremaining_text(text not yet spoken) for the active segment, enabling downstream consumers such as the RTVI observer to do word-level highlighting without coupling to internal sequencer state.
(PR #4559) -
Added
AICQuailVADAnalyzer(pipecat.audio.vad.aic_quail_vad), a noise-robustVoice Activity Detection analyzer powered by the standalone Quail VAD 2.0 model from the ai-coustics SDK (aic-sdk~=2.3.0). It owns its ownProcessorand works independently ofAICFilter, so it can sit before or after enhancement in the pipeline. Defaults to the publishedquail-vad-2.0-xxs-16khzmodel; supplymodel_id/model_pathto override.
(PR #4588) -
Added
continuous_partialsandinterruption_delayconnection parameters to the AssemblyAI streaming STT service (u3-rt-proonly).continuous_partialsdefaults toTrueso voice agents receive interim transcripts at a steady cadence during long turns;interruption_delay(0–1000 ms) overrides how soon the first partial is emitted. Both are exposed viaAssemblyAISTTService.Settingsand are omitted for non-u3-rt-promodels.
(PR #4593) -
Added a
user_audio_preroll_secsparameter toGeminiLiveLLMServicecontrolling how much "pre-roll" audio is replayed (sent to Gemini Live) when the user turn start is confirmed, in locally-driven-turns mode (server-side VAD disabled). Defaults toNone, auto-sizing the pre-roll duration from the upstream VAD'sstart_secs(which assumes VAD drives turn starts); set it explicitly when using a non-VAD turn-start strategy.
(PR #4597) -
Added a
user_audio_preroll_secsparameter toOpenAIRealtimeLLMServicecontrolling how much "pre-roll" audio is replayed (re-appended to the input audio buffer) when the user turn start is confirmed, in locally-driven-turns mode (server-side turn detection disabled). Defaults toNone, auto-sizing the pre-roll duration from the upstream VAD'sstart_secs(which assumes VAD drives turn starts); set it explicitly when using a non-VAD turn-start strategy.
(PR #4599) -
Added word-level timestamp support to
SmallestTTSService. Enabled by default via theword_timestampsconstructor argument, it emits per-wordTTSTextFrames aligned to audio playback so downstream consumers (captions, lip-sync, RTVI) receive word timing. Timestamps from each TTS request are offset onto the turn's continuous playback timeline, so multi-sentence turns stay correctly ordered. Available on Smallest's word-timestamp-capable voices; other voices simply emit no word events, so leaving it on is safe. Password_timestamps=Falseto fall back to whole-text frames.
(PR #4612) -
Added a
profanitysetting toAzureSTTService(viasettings=AzureSTTService.Settings(profanity=...)) controlling how Azure handles profanity in transcripts. Accepts"raw"(no masking),"masked"(Azure default, replaces profane words with****), or"removed"(drops profane words). Defaults toNone(keeps the Azure SDK default of"masked"). Use"raw"for non-English deployments where Azure's profanity list over-eagerly masks ordinary words. The setting is runtime-updatable and triggers a reconnect when changed.
(PR #4620) -
WhatsApp
connection_callbacknow receives the full call metadata (WhatsAppConnectCall) as a second argument, available in bot code viarunner_args.body. This gives bots access to the caller's phone number, call ID, direction, and timestamp without any extra API calls.
(PR #4622)Added the
pipecat createproject-scaffolding CLI topipecat-ai, available via the optionalcliextra. Install it withuv tool install "pipecat-ai[cli]"(add--with pipecatcloudto enablepipecat cloud), then runpipecat createto scaffold a new bot project. The CLI dependencies are optional, so they are not pulled into a plainpip install pipecat-ai.
(PR #4631)pipecat createtakes an optional target directory: pass a path — for examplepipecat create .— to scaffold directly into that directory (the same convention asnpm create vite@latest .), or omit it to nest the project under a<project-name>/subfolder. The project name defaults to the target directory's basename, and--nameoverrides it.
(PR #4631) -
Added
websocketto the development runner's-t/--transportchoices, so you can now runpython bot.py -t websocketto restrict the server to the plain WebSocket transport (served at/ws-client). The startup banner prints a websocket-specific message with the prebuilt UI andws(s)://host:port/ws-clientendpoint.
(PR #4636) -
Direct functions advertised in an
LLMContextare now registered automatically — no separate registration call. List a direct function inLLMContext(tools=[...]), or push anLLMSetToolsFrameto change tools mid-session, and its handler is registered. The advertised tool set is the single source of truth: dropping a direct function unregisters its handler too. Also applies acrossLLMSwitchermember LLMs.
(PR #4654) -
LLMContext(tools=...)andLLMSetToolsFramenow accept a plain list of direct functions and/orFunctionSchemaobjects, not just aToolsSchema.
(PR #4654) -
Added an optional
@tool_options(cancel_on_interruption=..., timeout_secs=...)decorator for overriding a direct function's call options; defaults apply otherwise.
(PR #4654) -
Added
PipelineFlushFrame, a control frame for draining the pipeline. Push it downstream and the pipeline worker bounces it back upstream so it round-trips through every processor, then sets itsevent. Await that event to know all in-flight frames queued ahead of the probe have been processed (e.g. to let the pipeline settle after an interruption before injecting a new frame). It's anUninterruptibleFrame, so the probe survives anInterruptionFrameand still completes its round-trip.
(PR #4655) -
Added
pipecat.evals, a behavioral eval framework for Pipecat bots, usable both as a library and from the CLI. A YAML scenario describes a scripted conversation and the semantic events expected back from the bot (transcriptions, LLM/TTS responses, function calls) with optional latency budgets and natural-language criteria judged by an LLM, in text or audio mode (audio synthesizes the user's speech and transcribes the bot's actual audio). In code,EvalScenario.load()parses a scenario andEvalSession.from_scenario(...).run()runs it against a bot, returning a structuredEvalResult(withEvalManifest.load()andEvalSuite.run()for the multi-bot path). The newpipecat eval run(against an already-running bot) andpipecat eval suite(a manifest mapping bots to the scenarios they run) commands wrap the same library and are also reachable aspython -m pipecat.evals. Bots opt in by exposing the-t evaltransport.
(PR #4655) -
Added a
bot-interruptedRTVI server message, emitted when the bot's in-flight output is cut off (a VAD-detected user barge-in or a programmatic interrupt), so clients can drop whatever the bot was mid-saying.
(PR #4655) -
Added an opt-in
--evalflag topipecat create(and anEnable evals?wizard prompt, off by default) that makes the generated bot eval-ready without any manual edit:- an
"eval"entry in the bot'stransport_params, so the bot is runnable with-t eval. The entry mirrors the bot's audio/video settings and is inert unless the bot is run with-t eval. - runnable starter scenarios in
server/evals/that pass against the freshly scaffolded bot and double as schema references to copy when adding more:starter_text.yaml(text mode, the fast inner loop; cascade bots only) andstarter_audio.yaml(the full audio round trip, the only mode for realtime speech-to-speech bots). - the dependencies to run them from the project's own environment: the
cliextra (thepipecat evalcommand) pluskokoroandmoonshine(the harness's local speech stack), so audio-mode evals run with no extra setup and no API keys.
(PR #4664)
- an
-
Added
filter_repeated_sequencesparameter toMarkdownTextFilter.InputParamsto allow disabling repeated sequence removal.
(PR #4674) -
Added support for Belgium german in transcription languages
(PR #4682) -
Added
MoonshineSTTService, a local speech-to-text service backed by Moonshine. It runs a small, fast ASR model on the CPU via ONNX Runtime, so it needs no GPU and no API key (the model downloads once on first use and is cached). Install withpip install "pipecat-ai[moonshine]"and choose the model viaMoonshineSTTService.Settings(model=...)(aModelenum member or string):Model.TINY,Model.BASE, or a streaming model run in batch (Model.TINY_STREAMING,Model.SMALL_STREAMING(default),Model.MEDIUM_STREAMING). Seeexamples/voice/voice-moonshine.py.
(PR #4683) -
New features for the Vonage WebRTC transport
- Captions support
- Individual audio stream subscription support
- Updated to Vonage Video Connector library v1.0.2
(PR #4686)
-
FunctionSchemanow accepts an optionalhandler. When set, the LLM service registers it automatically wherever the schema is advertised in anLLMContext(or via anLLMSetToolsFrame), so no separateregister_functioncall is needed. This extends the existing auto-registration of direct functions toFunctionSchema-based tools: the advertised tool set stays the single source of truth, so dropping a handler-carrying schema unregisters its handler too. AFunctionSchemawithout a handler stays advertise-only. Decorate the handler with@tool_optionsto override its default call options (cancel_on_interruption,timeout_secs), the same decorator direct functions use.
(PR #4709) -
Added
pipecat init, which makes a project agent-ready by writing a Pipecat coding-agent guide (AGENTS.mdplus aCLAUDE.mdthat imports it) and developer guidance (GETTING_STARTED.md— MCP setup, how to write a good first prompt with a copyable example, what to expect from the session) into the project, so an AI coding assistant picks up Pipecat conventions automatically and then scaffolds the app withpipecat create. Runpipecat init(prompts for a directory),pipecat init my-bot, orpipecat init .; re-running refreshesAGENTS.mdwhile preserving an existingCLAUDE.md(pass--forceto overwrite it). The writtenAGENTS.mdends with a provenance footer naming thepipecat-aiversion that wrote it, so a stale guide is detectable and refreshable.
(PR #4710) -
Added context carryover support to
AssemblyAISTTServicefor Universal-3 Pro streaming (u3-rt-pro). A newagent_contextsetting seeds the agent's most recent reply at connect time, andAssemblyAISTTService.update_agent_context()updates it mid-session via anUpdateConfigurationmessage (no reconnect). Giving the model the agent's last reply improves transcription of the user's next turn — short answers, spelled-out entities, and similar-sounding words. Aprevious_context_n_turnssetting controls how many prior entries are carried forward (set to0to disable carryover entirely). U3 Pro features are recognized for the wholeu3-rt-profamily, including theu3-rt-pro-beta-1variant.-
Added
universal-3-5-proas a supportedAssemblyAISTTServicemodel. It is recognized as part of the Universal-3 Pro family, so everyu3-rt-profeature (built-in turn detection, prompting, continuous partials,interruption_delay, context carryover, and voice focus) applies to it as well. -
Added
voice_focusandvoice_focus_thresholdsettings toAssemblyAISTTService(Universal-3 Pro models). Setvoice_focusto"near-field"or"far-field"to isolate the primary voice and suppress background noise;voice_focus_threshold(0.0–1.0) tunes how aggressively background audio is suppressed.
(PR #4712)
-
-
Added the "Add a WebRTC transport for local testing?" option to the Daily PSTN and Twilio + Daily SIP scenarios in
pipecat init, so the generated bots can also be run locally with the SmallWebRTC or Daily client.
(PR #4715) -
Realtime and speech-to-speech LLM services that take tools at construction now accept a plain list of standard tools (direct functions and/or
FunctionSchemaobjects), not just aToolsSchema— matchingLLMContext(tools=...). Applies toGeminiLiveLLMService/GeminiLiveVertexLLMService(tools=),AWSNovaSonicLLMService(tools=),UltravoxRealtimeLLMService(one_shot_selected_tools=), andsession_properties.toolson theOpenAIRealtimeLLMService/AzureRealtimeLLMService/GrokRealtimeLLMService/InworldRealtimeLLMService.
(PR #4758) -
Added
STTService.process_assistant_turn(text)hook that subclasses can override to feed the completed bot reply to a provider-side context carryover API. The base implementation is a no-op;STTServicenow handlesLLMContextAssistantTurnFrameand calls this method automatically.
(PR #4759) -
Added
LLMContextAssistantTurnFrame, broadcast byLLMAssistantAggregatorwhen a bot turn completes, carrying the aggregated reply text and start timestamp.
(PR #4759) -
Added
endpoint_sensitivitytoSonioxSTTService.Settings, a float in[-1.0, 1.0]that controls how aggressively Soniox emits speech endpoints. Higher values finalize turns sooner; lower values delay them. Introduced in the Soniox v5 model; earlier models reject it.
(PR #4772) -
DailyTransportcan now publish ascreenAudiooutput track, mirroringscreenVideo. Add"screenAudio"toDailyParams.audio_out_destinations(and optionally configure it viacustom_audio_track_params["screenAudio"]), then write audio frames withtransport_destination="screenAudio". Requiresdaily-python>=0.29.0.
(PR #4775) -
Added an
evalsextra that bundles thepipecat evalcommand (thecliextra) with the harness's default local, no-API-key models: Kokoro (user-turn TTS) and Moonshine (bot-speech transcription). Installpipecat-ai[evals]souv run pipecat eval runworks out of the box. Scaffolded projects (pipecat init) that enable evals now depend onpipecat-ai[evals].
(PR #4776) -
RTVIObservercan now emit raw VAD user speaking events (vad-user-started-speaking/vad-user-stopped-speaking), driven directly by the VAD signal and independent of turn finalization (unlikeuser-started-speaking/user-stopped-speaking, which a turn strategy may gate or defer). Enable withRTVIObserverParams(vad_user_speaking_enabled=True)(off by default), or at runtime viaRTVIConfigureObserverFrame.
(PR #4785)
Changed
-
Migrated all realtime LLM service examples (OpenAI Realtime, Azure Realtime, Inworld, Grok/xAI Realtime, Gemini Live, Gemini Live Vertex, AWS Nova Sonic, Ultravox) to use
LLMContextAggregatorPair(..., realtime_service_mode=True). Where examples previously wiredSileroVADAnalyzerintoLLMUserAggregatorParamsas a workaround for missing turn frames, the local VAD has been removed;LLMContextAggregatorPair'srealtime_service_modemakes this safe in terms of context-writing. Transcript-logging user-side event handlers have moved fromon_user_turn_stoppedto the newon_user_turn_message_addedevent, which carries the finalized message text (the turn-stopped event fires before the message is finalized in realtime service mode). Examples for services without server-side user-turn frames (Gemini Live, AWS Nova Sonic, Ultravox) include a comment block explaining how to add local VAD if needed. Each base example now also subscribes toon_user_turn_stopped— active for services that emit server-side user-turn frames (OpenAI Realtime, Azure Realtime, Grok, Inworld) and commented-out for those that don't (with the same opt-in path as the local-VAD block).
(PR #4533) -
UserTurnStoppedMessage.contentis now typedstr | None. In realtime mode (realtime_service_mode=TrueonLLMContextAggregatorPair) the user message isn't finalized at turn-stop time, socontentisNone; subscribers wanting the finalized text should use the newon_user_turn_message_addedevent. Behavior in cascade (STT -> LLM -> STT) pipelines is unchanged.
(PR #4533) -
SpeechTimeoutUserTurnStopStrategy,TurnAnalyzerUserTurnStopStrategy, andExternalUserTurnStopStrategynow accept await_for_transcript: bool = Truekwarg. When flipped toFalse, the strategy signals end-of-turn as soon as its requirements are met, minus waiting for transcripts — useful when you intend to configure local turn detection to drive realtime service conversations, where waiting for transcripts is unnecessary latency.LLMContextAggregatorPairflips this for you whenrealtime_service_mode=True.
(PR #4533) -
Updated Smallest AI TTS plugin for Waves v4.0.0 API:
- New WebSocket endpoint
/waves/v1/tts/live(previously/waves/v1/{model}/get_speech/stream) - Model is now sent in each message payload instead of the URL, eliminating reconnection on model change
- Updated model names:
lightning_v3.1andlightning_v3.1_pro(underscore convention) - Added
output_formatsetting supportingpcm,mp3,wav,ulaw,alaw - Default model changed to
lightning_v3.1_pro(withmeheras its default voice) - Breaking:
SmallestTTSModel.LIGHTNING_V2removed;consistency,similarity,enhancementsettings removed
(PR #4535)
- New WebSocket endpoint
-
⚠️ RTVI protocol version bumped to
2.0.0. Thebot-outputmessage now includeswill_be_spoken,spoken_status("new"/"in-progress"/"completed"),spoken_progress(accumulated/remaining text), andsegment_idfields. Clients on any1.xprotocol are still served with the legacy format; all other pre-2.x clients are rejected.
(PR #4559) -
bot_output_transformsnow supports a 4-parameter progress-aware signature:(text, agg_type, accumulated_text, remaining_text) -> BotOutputTransformResult. When called for a progress event,accumulated_textandremaining_textare populated and the transform must return aBotOutputTransformResultwith those fields set, enabling word-level transforms on the client side.
(PR #4559) -
Updated
aic-sdkdependency to~=2.3.0. TheAIC_SDK_LICENSEenvironment variable replaces the previousAIC_LICENSE_KEYso the variable matches the SDK's canonical name; users must update their.envfiles.
(PR #4588) -
Aligned the deprecation docstrings in
LLMUserAggregatorParamswith the project's documented convention by removing redundant inline[DEPRECATED]tags, keeping only the.. deprecated::Sphinx directive.
(PR #4592) -
AzureSTTServicenow marks final transcripts as finalized. Azure'sRecognizedSpeechevent is by definition the final recognition for an utterance, so the emittedTranscriptionFramecarriesfinalized=True. This lets downstream user-turn stop strategies (e.g.SpeechTimeoutUserTurnStop) take their finalized fast-path instead of waiting for VAD events that may never arrive on short replies.
(PR #4620) -
⚠️ The
mem0extra now requiresmem0ai>=2,<3.Mem0MemoryServicewas updated for the mem0 2.0.0 breaking changes: entity IDs (user_id/agent_id/run_id) are now passed viafilters=to the local client (top-level kwargs raiseValueErrorin mem0 2.x), and the removedversion/output_formatparameters are no longer sent to the cloud client. Note that mem0 2.0.0 also flips thererankdefault fromTruetoFalseand makesadd()async server-side (stored memories are queryable once processed).
(PR #4626) -
GradiumSTTServicenow defaultsdelay_in_framesto12(960ms) instead of leaving it unset (which used the server default of 10/800ms). The higher default allows more context for improved transcription accuracy. Setdelay_in_framesexplicitly to7-8for faster responses.
(PR #4632) -
GradiumSTTServicehas an updatedttfs_p99_latencyvalue of 0.62 seconds.
(PR #4632) -
Bumped
pipecat-ai-prebuiltto 1.0.2 in therunnerextra, updating the prebuilt client UI served by the development runner.
(PR #4634) -
⚠️ Changed the default of
TTSSpeakFrame.append_to_contextfromNonetoTrue. The oldNonebehavior was situation-dependent and hard to reason about: the spoken text always reached the assistant aggregator's buffer, but whether it was committed to the LLM context depended on what surrounded the frame — committed when the frame was inside an assistant response or immediately followed by one, but silently discarded when it was standalone and followed by a user turn (the interruption cleared the buffer before anything flushed it).Trueis a predictable default: programmatically-spoken text is recorded in the context unless you opt out withappend_to_context=False.BusTTSSpeakMessage.append_to_contextnow defaults toTrueto match.
(PR #4642) -
Switched the
awsextra fromaioboto3toaiobotocore. Pipecat only uses the low-level client API, andaiobotocoreis the async library thataioboto3wraps, so depending on it directly drops an unnecessary wrapper layer. AWS service initialization now usesaiobotocore.session.get_session()andsession.create_client(...); public APIs and credential resolution are unchanged.
(PR #4643) -
websocketsis now a core dependency ofpipecat-aiinstead of thewebsockets-baseoptional extra. Thewebsockets-baseextra has been removed; service extras that used to pull it in (Cartesia, Deepgram, ElevenLabs, OpenAI, Google, and others) still work unchanged, andwebsocketsis now always installed. If you previously installedpipecat-ai[websockets-base]directly, just drop the extra sincepip install pipecat-ainow includes it.
(PR #4658) -
Renamed the
@tooldecorator'stimeoutargument totimeout_secs, matchingregister_function().timeoutstill works as a deprecated alias and will be removed in a future version.
(PR #4671) -
WhisperSTTService'sModelandMLXModelare nowStrEnum, so a member is the string itself (e.g.Model.TINY == "tiny"). Passing aModel/MLXModelmember or a plain string both keep working.
(PR #4684) -
Bumped the
dailyextra'sdaily-pythondependency to>=0.29.1,<1.
(PR #4685) -
LLMWorkernow enables the worker's automatic RTVI support when it is not bridged (bridged=None), so a standaloneLLMWorkerdriving its own transport gets theRTVIProcessor/RTVIObserverpair like anyPipelineWorker. Bridged child workers keep RTVI disabled, since the transport worker owns the client-facing RTVI machinery.
(PR #4690) -
Removed the
asyncio.sleep(0)workarounds that let a just-created timer task start before a possible immediate cancellation.TaskManager.create_task()now cleans up never-started coroutines centrally, so the yields served no purpose.
(PR #4692) -
Worker frames (e.g.
EndWorkerFrame) should now be pushed downstream with a plainpush_frame(frame), so frames queued ahead of them are flushed before the worker acts on them. Pushing them upstream still works.
(PR #4705) -
register_functionnow reads a handler's call options (cancel_on_interruption,timeout_secs) from its@tool_optionsdecorator when they aren't passed explicitly, matching how direct functions resolve them (explicit argument >@tool_options> default). Previously the decorator was ignored on this path.
(PR #4709) -
BaseLLMAdapter.from_standard_toolsnow raisesUserWarninginstead ofDeprecationWarningwhen built-in tools can't be injected because the supplied tools aren't aToolsSchema— it advises about the tools format and is not a deprecation.
(PR #4726) -
Deprecated classes and functions are now marked with the PEP 702
@deprecateddecorator, so type checkers and IDEs (pyright/PylancereportDeprecated, mypy'sdeprecatederror code) flag and strike through deprecated usages statically. Several deprecated classes that previously emitted no runtime warning now raiseDeprecationWarningwhen used, and deprecation messages now state a concrete removal version (e.g.2.0.0) instead of "a future release".
(PR #4726) -
pipecat createnow infers--bot-typefrom the chosen transports in non-interactive mode, so the flag is optional: a bot istelephonywhen any transport is a telephony transport (twilio, telnyx, plivo, exotel, daily_pstn, twilio_daily_sip) andwebotherwise. Pass--bot-typeexplicitly to override (it's still validated and cross-checked against the transports); the interactive wizard is unchanged.
(PR #4735) -
Realtime LLM services now auto-register the handlers bundled on the tools passed at construction time, so a separate
register_function()call is no longer needed — matching how context-advertised tools (a direct function, or aFunctionSchemawith itshandlerset) already register. Applies toGeminiLiveLLMService/GeminiLiveVertexLLMService(tools=),UltravoxRealtimeLLMService(one_shot_selected_tools=),AWSNovaSonicLLMService(tools=), andOpenAIRealtimeLLMService/AzureRealtimeLLMService/GrokRealtimeLLMService/InworldRealtimeLLMService(session_properties.tools).
(PR #4758) -
Updated
SonioxSTTServicedefault model fromstt-rt-v4tostt-rt-v5.
(PR #4772) -
The Kokoro TTS model cache moved to
~/.cache/pipecat/kokoro-onnx(previously~/.cache/kokoro-onnx), so Pipecat's cached files live under a single namespaced directory.
(PR #4776)
Deprecated
-
Deprecated the 2-parameter
bot_output_transformssignature(text, agg_type) -> str. Transforms using it will still work but emit aDeprecationWarningat registration time. Update to the 4-parameter signature(text, agg_type, accumulated_text, remaining_text) -> BotOutputTransformResultto support word-level progress transforms.
(PR #4559) -
⚠️ Deprecated
AICVADAnalyzer(pipecat.audio.vad.aic_vad) andAICFilter.create_vad_analyzer(). Both are tied toAICFilter's model-internal VAD path. UseAICQuailVADAnalyzerinstead — the standalone Quail VAD 2.0 model is the noise-robust VAD differentiator going forward. Both surfaces will be removed in Pipecat 1.6.0 (breaking change shipped in a minor release, per maintainer guidance for plugins).
(PR #4588) -
The single-argument
connection_callback(connection)signature forWhatsAppClient.handle_webhook_requestis deprecated. Update callbacks to accept(connection, call: WhatsAppConnectCall)to receive call metadata alongside the WebRTC connection. The old signature still works but emits aDeprecationWarning.
(PR #4622) -
Deprecated
Mem0MemoryService.InputParams.api_version. It is no longer used — mem0 2.0.0 removed theapi_version/output_formatparameters from the client. Setting it now emits aDeprecationWarning.
(PR #4626) -
Deprecated passing
append_to_context=NonetoTTSSpeakFrame(andBusTTSSpeakMessage).Noneis no longer a supported value: it is coerced toTruewith a warning and will be unsupported in a future release. PassTrueorFalseexplicitly. See the corresponding "Changed" entry for the full rationale behind the newTruedefault.
(PR #4642) -
Deprecated
LLMService.register_direct_function()/unregister_direct_function()andLLMSwitcher.register_direct_function(). Advertise direct functions inLLMContext(tools=[...])or via anLLMSetToolsFrameinstead — handlers are registered and unregistered automatically. These will be removed in a future version.
(PR #4671) -
⚠️ Deprecated
TaskFrame,TaskSystemFrame,EndTaskFrame,StopTaskFrame,CancelTaskFrameandInterruptionTaskFrame. UseWorkerFrame,WorkerSystemFrame,EndWorkerFrame,StopWorkerFrame,CancelWorkerFrameandInterruptionWorkerFrameinstead, matching thePipelineWorkernaming. The old names remain asisinstance-compatible aliases that emit aDeprecationWarningon construction.
(PR #4705) -
Renamed
WebsocketServerTransporttoSingleClientWebsocketServerTransportto make it explicit that the server handles a single client at a time. The supportingWebsocketServerParams,WebsocketServerCallbacks,WebsocketServerInputTransport, andWebsocketServerOutputTransportclasses were renamed with the sameSingleClientprefix. The old names remain as deprecated aliases and will be removed in 2.0.0.
(PR #4774)
Fixed
-
Fixed output image resizing for generated images when video output dimensions differ from the source image size by consistently using Pillow pixel modes instead of encoded formats.
(PR #4483) -
Fixed a benign
ERRORlog line emitted byUltravoxRealtimeLLMServiceduring client-driven teardown. Adds an exception catch which guards the disconnecting case.
(PR #4519) -
Fixed
InworldRealtimeLLMServicenot supporting manual-mode turn detection (session_properties.audio.input.turn_detection=None). Previously_handle_user_stopped_speakingand_handle_interruptionassumed Inworld's server-side VAD handled commit/cancel/response.create automatically and were no-ops on the client side. In manual mode the server doesn't, so local-VAD-driven turns stalled: the bot never responded after the user stopped speaking, and interruptions didn't cancel the in-flight response. Wire the explicitInputAudioBufferCommitEvent+ResponseCreateEventon user-stopped-speaking andInputAudioBufferClearEvent+ResponseCancelEventon interruption, gated on a new_is_manual_turn_detection()check (mirroring the pattern inOpenAIRealtimeLLMService).
(PR #4533) -
InworldRealtimeLLMServiceandGrokRealtimeLLMServiceno longer broadcastUserStartedSpeakingFrame/UserStoppedSpeakingFramewhen configured for manual (locally-driven) turn detection. Both services' server-side speech-started/stopped events fire in manual mode too, but in that setup turn frames are expected to come from local turn detection (e.g. avad_analyzerinLLMUserAggregatorParams) — without the gate, the services were broadcasting alongside the locally-emitted frames, producing duplicateon_user_turn_*events. OpenAI Realtime was already correct here: its server doesn't fire speech events in manual mode at all.
(PR #4533) -
Fixed Ultravox Realtime not surfacing server-side interruption. The server sends a
playback_clear_buffermessage when the user interrupts the bot mid-speech, instructing clients to drop buffered output audio; this was previously unhandled, soBaseOutputTransportkept playing the buffered audio and the bot kept talking past the interruption. Ultravox now broadcastsInterruptionFrameonplayback_clear_buffer. This was previously masked by enabling local VAD on the user aggregator, which generatedUserStartedSpeakingFrameand triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.
(PR #4533) -
Fixed
GrokRealtimeLLMServicestalling the conversation when Grok returns an error in response to aresponse.cancelevent sent while no response is active on the server. This happens routinely in manual-turn-detection mode: when the user starts speaking after the bot has finished, Pipecat broadcasts anInterruptionFrameand the service sendsResponseCancelEvent, which Grok rejects with"Cancellation failed: no active response found". The existing error-suppression list only matched OpenAI'sresponse_cancel_not_active/conversation_already_has_active_responseerror codes, but Grok uses different codes for the same conditions — so the error fell through to the fatal-error path and exited the WebSocket receive loop, preventing any further server events from being processed. The suppression now also matches on the error message substring ("no active response","already has an active response"), so these benign races get logged at debug and the receive loop keeps running.
(PR #4533) -
Fixed AWS Nova Sonic not surfacing server-side interruption. When the user interrupted the bot mid-response, the
INTERRUPTEDstop reason was acknowledged internally but noInterruptionFramewas emitted, soBaseOutputTransportkept draining its audio buffer and the bot kept talking past the interruption. Nova Sonic now broadcastsInterruptionFrameon bothINTERRUPTEDpaths (text-stage and audio-stage). This was previously masked by enabling local VAD on the user aggregator, which generatedUserStartedSpeakingFrameand triggered the aggregator-side interruption path; the fix makes the behavior correct without local VAD as a workaround.
(PR #4533) -
Fixed pipeline shutdown hanging on LiveKit when the remote peer disconnected mid-stream. The trailing
audio_out_end_silence_secswrite is now bounded by a timeout.
(PR #4578) -
Fixed the start of user speech being clipped from transcripts when
GeminiLiveLLMServiceis configured for locally-driven turns (server-side VAD disabled). The problem was that any audio sent up to Gemini Live before sendingactivity_start(sent when user turn start is confirmed) seemed to get discarded; the service now replays (sends to Gemini Live) a short audio "pre-roll" right afteractivity_start, so the onset is preserved.
(PR #4597) -
Fixed the start of user speech being clipped from transcripts when
OpenAIRealtimeLLMServiceis configured for locally-driven turns (server-side turn detection disabled). The problem was that the speech onset already sent to OpenAI got discarded when the service cleared its input audio buffer on barge-in (which it does when the user turn start is confirmed); the service now replays (re-appends to the input audio buffer) a short audio "pre-roll" right after the clear, so the onset is preserved.
(PR #4599) -
422 validation errors now log the full error details and raw request body for all transports (WhatsApp, WebRTC, telephony, etc.), making malformed payloads easier to debug. Previously this logging only applied to WhatsApp routes.
(PR #4622) -
Fixed
InworldTTSServicelogging a spurious "no websocket connected, will try to reconnect" warning and firing a redundant second reconnect when the initial connection attempt failed. The service now returns anErrorFrameimmediately if the websocket is unavailable after_connect(), matching the behaviour ofElevenLabsTTSService.
(PR #4635) -
Fixed
SarvamTTSService(WebSocket) emittingBotStoppedSpeakingFramelate. The service never produced aTTSStoppedFrameon synthesis completion, so end-of-turn was detected only by thestop_frame_timeout_sidle timer, causingBotStoppedSpeakingFrameto lag the actual end of audio by up to that timeout (especially for short utterances or a raisedstop_frame_timeout_s). The service now requests Sarvam's completion event (send_completion_event) and emitsTTSStoppedFrameas soon as thefinalevent arrives, so the bot-stopped-speaking event tracks the end of audio. The idle timeout remains as a fallback.
(PR #4639) -
Fixed a spurious
RuntimeWarning: coroutine '...' was never awaitedemitted byTaskManager.create_task()when a task is cancelled before its coroutine starts running. The wrapper now closes the un-started coroutine on cancellation, so the warning no longer fires. This surfaced, for example, when combiningTurnAnalyzerUserTurnStopStrategywith another stop strategy that force-completes the turn (cancelling the analyzer's timeout task before it ran), and when a function call is cancelled by a user-turn-start interruption race (theLLMService._run_function_callwarning, #4339). A localawait asyncio.sleep(0)workaround in_run_function_callthat existed only to dodge this warning has been removed now that it is handled centrally. The turn/cancellation behavior was already correct; only the noisy warning is removed.
(PR #4644) -
Fixed
LiveKitTransportleaking audio/video stream readers when a track is unsubscribed: the ownedrtc.AudioStream/rtc.VideoStreamand its producer task are now closed and cancelled on unsubscribe (and on a re-subscribe for the same participant), so a client republishing its mic (e.g. mute/unmute or text↔voice toggles) no longer accumulates concurrent producers that interleave audio into the shared queue and silence downstream STT.
(PR #4650) -
Fixed
TTSServiceemitting a secondLLMFullResponseEndFrame(with a new id) at the end of an audio context whenpush_text_framesisFalse, which causedRTVIObserverto send a duplicatebot-llm-stoppedmessage per LLM response. The original end frame received inprocess_frameis now held percontext_idand re-pushed, preserving its id.
(PR #4653) -
Fixed a frame-ordering race in bridged workers: frames received from the
WorkerBuswere pushed directly into the pipeline from the bus edge, so they could interleave with (or overtake) frames the worker had queued itself viaqueue_frame()/queue_frames(). A bus inbound frame (e.g. anLLMContextFramefrom a concurrent user input) could reach the LLM in the middle of a multi-frame update such as a flow'sset_node(LLMMessagesUpdateFrame+LLMSetToolsFrame), generating against the previous node's context. Bus inbound frames are now serialized through the worker's frame queue, so both paths share one FIFO.
(PR #4656) -
Fixed interruption handling for standalone
TTSSpeakFrame(append_to_context=True)utterances (those not part of an LLM response). Previously, when the user interrupted such an utterance:on_assistant_turn_stoppeddidn't fire- partially-spoken text wasn't recorded to the context (for TTS services that support word timestamps)
The problem was that these utterances have no
LLMFullResponseStartFrameto open the assistant turn, so there was no open turn for the interruption to stop. The assistant aggregator now uses a newTTSStartedFrame.append_to_contextto open the turn when the utterance begins.As a result of this fix,
on_assistant_turn_startedtiming is improved for standaloneTTSSpeakFrameutterances: the event now fires at the start rather than at the end.
(PR #4665) -
Fixed
OpenAIResponsesHttpLLMServiceraising'NoneType' object has no attribute 'cached_tokens'on every turn when used with a custombase_urlpointing at a third-party Responses API server that omits the OpenAI-specificinput_tokens_details/output_tokens_detailssub-objects. Token usage parsing now tolerates any field the server omits — the SDK's lenient streaming decoder leaves these asNonewhether it's a top-level count (input_tokens/output_tokens/total_tokens), a missing detail sub-object, or a missing field inside one — and falls back to0in each case, matching the WebSocketOpenAIResponsesLLMServicevariant.
(PR #4667) -
Fixed importing
pipecat.services.whisper.sttfailing on non-macOS platforms when themlx-whisperextra happened to be installed:mlx_whisperis now only imported on macOS (it's Apple-Silicon only, and elsewhere the package can be installed but unloadable, e.g. a missinglibmlx.so).WhisperSTTServiceMLXstill imports it lazily when actually used.
(PR #4684) -
Fixed
SambaNovaLLMServicefailing every completion with its default model: SambaNova Cloud removedLlama-4-Maverick-17B-128E-Instruct, so the default is nowMeta-Llama-3.3-70B-Instruct.
(PR #4687) -
Fixed
NebiusLLMServicefunction calls never executing with its default model: Nebius streamsopenai/gpt-oss-120btool calls with a broken final fragment (index=1on a single call), so the default is nowQwen/Qwen3-30B-A3B-Instruct-2507, which streams correctly.
(PR #4688) -
Fixed a worker-handoff race where
activate_worker(deactivate_self=True)left both workers briefly active: the caller'sactiveflag only flipped when its own deactivate message came back over the bus, so the handoff target could activate first and both workers re-broadcast each other's frames (duplicate tool round-trips in the LLM context). The caller now deactivates synchronously before publishing the activate message.
(PR #4691) -
Fixed
AzureTTSServiceproducing no audio when running in a pipeline without an output transport (e.g. headless/offline setups). Audio chunks arrive from the Speech SDK on native threads, and the cross-thread queue hand-off didn't wake an otherwise-idle event loop; the service now marshals those puts onto the loop, so audio is delivered regardless of loop activity.
(PR #4703) -
Fixed a regression from #4654 where unregistering a tool's handler on its own — via
unregister_function/unregister_direct_function, without changing the advertised tool set — was silently undone, because the nextLLMContextFramere-registered the handler from the still-advertised tool (a "zombie"). An explicitly unregistered handler now stays unregistered while its tool remains advertised (so calls hit the missing-handler recovery and the model learns to stop), and is restored only by registering it again, or by re-advertising the tool (removing it from the advertised set, then adding it back).
(PR #4709) -
Fixed
LLMSwitcher.register_direct_functionoverriding a direct function's@tool_optionscall options. Itscancel_on_interruptiondefaulted toTrueand was forwarded to each member LLM as an explicit value, so a@tool_options(cancel_on_interruption=False)handler was ignored. It now defaults toNoneand follows the same explicit-arg >@tool_options> default fallback asLLMService.register_direct_function(the fallback was added there in #4654 but not mirrored on the switcher).
(PR #4709) -
Fixed an audible 200-300 ms gap in audio mixer output (e.g.
SoundfileMixerbackground sound) on every interruption. The output transport now keeps the audio task running and drains the queue instead of cancelling and recreating it when a mixer is active.
(PR #4714) -
Fixed
pipecat initsilently ignoring the "Enable evals?" option for the Daily PSTN Dial-out and Twilio + Daily SIP scenarios. Generated bots for these scenarios can now be driven withpipecat eval(-t eval): they fall back tocreate_transport()when the request body carries noroom_url, while the production flow (room and call settings arriving fromserver.py) is unchanged. In local runs the dial-out and Twilio call-forwarding machinery is skipped and the bot stays silent until spoken to.
(PR #4715) -
Fixed
FastAPIWebsocketTransportstalling pipeline shutdown for ~10s when the client's WebSocket is half-closed (e.g. a telephony call already torn down on the provider's side, leaving the media-streams socket open at the TCP layer but unresponsive).FastAPIWebsocketClient.disconnect()awaitedwebsocket.close()unbounded, so it blocked on the ASGI server's close-handshake timeout, delayingEndFramepropagation and the whole pipeline teardown. The close handshake is now bounded by a newFastAPIWebsocketParams.ws_close_timeout(default 0.5s): the close is still initiated, butdisconnect()waits at most that long for the peer's acknowledgment before letting shutdown proceed. Increasews_close_timeoutfor high-latency peers that need longer to complete a graceful close.
(PR #4723) -
Fixed
NvidiaLLMServicereasoning streams so interrupted or early-cancelled responses clean up correctly and do not leak buffered thought content or leave the wrapped stream open.
(PR #4743) -
Fixed an issue in
AggregatedFrameSequencerwhere delayed word-timestamps from an interrupted (cleared) TTS context could be emitted as passthroughTTSTextFrames withappend_to_context=True, interleaving stale words into the next turn's transcript (observed with Inworld TTS inASYNCmode). Words for an unknown or cleared context are now dropped instead of corrupting the active turn.
(PR #4751) -
Fixed
RimeTTSService.SPELL()andRimeTTSService.PAUSE_TAG()helpers, which are now static methods. Previously they were defined as instance methods without aselfparameter, so calling them on a service instance bound the instance to the first argument and produced incorrect output.
(PR #4755) -
Fixed
SingleClientWebsocketServerTransport(formerlyWebsocketServerTransport) so that a new client connection no longer disconnects the client that is already connected. While a client is connected, new connection attempts are now rejected with a warning. The active connection's reference is cleared when the client disconnects or the connection fails, so a new client can connect afterwards.
(PR #4774) -
Fixed
DeepgramSageMakerSTTServiceraisingTypeErroron construction. Its default settings still passedvad_events, which was removed fromDeepgramSTTService.Settings, so the service (and thevoice-deepgram-sagemakerexample) crashed on instantiation.
(PR #4786)
Security
-
Added optional HMAC token authentication for WebSocket connections in the development runner. Set
PIPECAT_WEBSOCKET_AUTH=token(or pass--ws-auth token) to require clients to callPOST /startand obtain a short-lived signed session token before connecting. Tokens are one-time use and expire after 5 minutes.- Clients can supply the token via
Authorization: Bearer <token>header,?token=<token>query parameter, or URL path segment (/ws/<token>,/ws-client/<token>) — the path form is recommended for telephony providers like Twilio. - Both the telephony WebSocket (
/ws) and plain WebSocket (/ws-client) endpoints are protected. Connections with invalid, expired, or replayed tokens are rejected with WebSocket close code 4003.
(PR #4660)
- Clients can supply the token via
-
Added origin restriction support to
WebsocketServerTransport,FastAPIWebsocketTransport, and the development runner to mitigate Cross-Site WebSocket Hijacking (CSWSH). Whenallowed_originsis configured, connections with a missing or disallowedOriginheader are rejected before the WebSocket handshake completes.WebsocketServerParamsandFastAPIWebsocketParamsgain anallowed_origins: list[str]field.FastAPIWebsocketTransportraisesValueErrorat construction time if the origin is not allowed.- The runner gains
--allowed-originsCLI flag andPIPECAT_ALLOWED_ORIGINSenvironment variable (comma-separated). Both also control the transport params default, so a single env var covers all WebSocket transports uniformly. - Default is empty (allow all) — no behaviour change for existing deployments.
(PR #4704)
Other
- Pinned
viteto^8.0.16(from^8) in thepipecat createclient templates and the UI-worker examples.
(PR #4767)