-
Pipecat pipelines are multi-agent compatible by default. The new multi-agent framework (
pipecat.workers) turns everyPipelineWorker(previouslyPipelineTask) into a peer on a shared bus that passes typed messages, dispatches@jobwork, and coordinates with siblings, while existing single-pipeline code keeps running untouched.examples/multi-worker/ships ready-to-run patterns: LLM handoff, parallel debate, sidecar code assistants and hardware controllers, distributed deployments over Redis or PGMQ, point-to-point WebSocket proxies, and UI workers driving a web client over RTVI.
(PR #4493) -
Added
UIWorker(pipecat.workers.ui): an LLM worker that observes and drives a client web UI over the RTVI UI channel — for voice agents that act on what the user is looking at. It reads the page's accessibility snapshots, routes client UI events to@ui_eventhandlers, drives the page with UI commands (scroll_to,highlight,select_text,click,set_input_value), and answers screen-grounded questions.PipelineWorkerconnects it to the client automatically when RTVI is enabled — no extra wiring.- A voice agent delegates a turn via the built-in
respondjob; the worker returns an answer for the voice LLM to speak, or speaks it verbatim through the agent's TTS withrespond_to_job(answer, tts_speak=True). ReplyToolMixinprovides a ready-madereplytool (a spoken answer plus the standard UI actions).ui_job_group(...)fans work out to peer workers, surfaced to the client as cancellable progress cards.UI_STATE_PROMPT_GUIDEis drop-in system-prompt text that teaches the LLM the<ui_state>wire format.
(PR #4540)
- A voice agent delegates a turn via the built-in
-
Added
VonageVideoConnectorTransport, a new transport integration for real-time Vonage WebRTC sessions using the Vonage Video Connector library.
(PR #4052) -
Added
InceptionLLMServicefor Inception's Mercury 2 diffusion reasoning model, with support forreasoning_effortandrealtimesettings.
(PR #4423) -
Added plain WebSocket transport support to the development runner. Bots can now accept connections from non-telephony WebSocket clients (e.g., browser apps using protobuf framing) via the
/ws-clientendpoint alongside other transports.
(PR #4442) -
Added
GET /statusendpoint to the development runner that reports which transports the running instance accepts (all by default, or the single transport passed via-t).
(PR #4442) -
Added support for the Rime
codaTTS model toRimeTTSServiceandRimeHttpTTSService. Thetemperature,top_p, andrepetition_penaltysettings are not used bycoda. Also added atimeScaleFactorsetting (for thearcanaandcodamodels) to both services — values above 1.0 slow down audio playback; values below 1.0 speed it up.
(PR #4511) -
Added
max_endpoint_delay_mstoSonioxSTTService.Settings, controlling the maximum delay (500-3000 ms) before endpoint detection finalizes a turn.
(PR #4521) -
Added
LLMService.append_system_instruction(...): append durable text to a service's system instruction so it's included on every inference and survives context resets.
(PR #4540) -
Added
CartesiaTurnsSTTServicefor streaming speech-to-text against the Cartesia Streaming ASR v2 (Ink-2) turn-based WebSocket endpoint (/stt/turns/websocket). The server drives turn boundaries viaturn.start/turn.update/turn.endmessages, which the service translates intoUserStartedSpeakingFrame, finalizedTranscriptionFrame, andUserStoppedSpeakingFrame. Eager end-of-turn predictions and turn resumes (turn.eager_endandturn.resume) are surfaced via theon_turn_eager_endandon_turn_resumeevent handlers.
(PR #4552) -
Added the
STTService.supports_ttfsproperty, which subclasses can override to returnFalsewhen TTFS doesn't apply to their architecture (e.g. turn-based STTs where the server defines turn boundaries). WhenFalse,STTMetadataFrameis broadcast withttfs_p99_latency=0.0and the "ttfs_p99_latency not set" warning is suppressed.
(PR #4585)
Changed
-
⚠️ The development runner now supports all transports (WebRTC, Daily, telephony, plain WebSocket) simultaneously from a single server. The
/startendpoint accepts a"transport"field to select the transport per-request; omitting-tat startup enables all transports instead of defaulting to WebRTC. The Daily browser-redirect route moved fromGET /toGET /daily.
(PR #4442) -
Changed the default model for
RimeTTSServiceandRimeHttpTTSServicefromarcanatocoda. Code that relied on the implicit default should setmodel="arcana"explicitly to preserve previous behavior.
(PR #4511) -
OpenRouter LLM service now defaults to
openai/gpt-4.1.
(PR #4513) -
OpenRouter LLM requests now convert
developermessages tousermessages by default for broader model compatibility. Override this by subclassingOpenRouterLLMServiceor settingllm.supports_developer_role = Truefor models that support thedeveloperrole.
(PR #4513) -
SonioxSTTServicenow applies settings updates (e.g. viaSTTUpdateSettingsFrame) using a graceful reconnect instead of a hard disconnect/reconnect, preserving the service's reconnect retry behavior.
(PR #4521) -
Updated the default p99 TTFS latency values for Smallest AI, Mistral, and XAI STT so turn stop timing uses measured values instead of the conservative fallback.
(PR #4522) -
Updated the development runner startup banner to show the prebuilt client URL once and list enabled or disabled transports with install hints.
(PR #4524) -
Services and transports with missing optional dependencies now raise
ImportErrorinstead of a bareExceptionwhen their module is imported without the required extra installed. The originalModuleNotFoundErroris preserved as__cause__, so code that wraps these imports can now useexcept ImportError:cleanly instead ofexcept Exception:.
(PR #4525) -
Bumped
pipecat-ai-prebuiltto 1.0.1 in therunnerextra, updating the prebuilt client UI served by the development runner.
(PR #4531) -
Replaced the
transformers.WhisperFeatureExtractordependency inLocalSmartTurnAnalyzerV3with a vendored numpy-only implementation, reducing peak RSS at import from ~566 MB to ~60 MB and cold-start time from ~5.0 s to ~0.3 s. Behavior is numerically equivalent (matches the reference numpy code path within 1e-5 absolute tolerance; ONNX model output is bit-identical on representative inputs).- Smart Turn v3 no longer imports
transformersat module load. - Prepares the ground for making
transformersan optional dependency in a future release. - The vendored STFT is vectorized via
numpy.lib.stride_tricks.sliding_window_view+ batchednp.fft.rfft, cutting_power_spectrogramruntime by ~55% (~4.0 ms → ~1.8 ms per call on a typical 8-second segment at 16 kHz) while preserving the same parity tolerances against the reference implementation.
(PR #4536)
- Smart Turn v3 no longer imports
-
⚠️ Renamed the RTVI UI Worker Protocol's vocabulary from the
pipecat-subagentstask/agentterms to Pipecat's nativejob/worker. This spans the wire messages (ui-task→ui-job-group,ui-cancel-task→ui-cancel-job-group), their envelopekinds and fields (task_id→job_id,agents/agent_name→workers/worker_name), the paired Python models/frames (UITask*→UIJobGroup*,RTVIUITask*Frame→RTVIUIJobGroup*Frame), and the@pipecat-ai/client-js/client-reactAPIs (RTVIEvent.UITask→UIJobGroup,cancelUITask→cancelUIJobGroup,useUITasks→useUIJobGroups,UITasksProvider→UIJobGroupsProvider). These primitives shipped in 1.2.0 but were never documented, so no real consumers are affected.
(PR #4540) -
transformersis no longer a base dependency, sopip install pipecat-aino longer pulls it in. This follows Smart Turn v3 dropping itstransformersimport; the only remaining users (the deprecatedLocalSmartTurnAnalyzerV2/CoreML analyzers and the Moondream service) already require thelocal-smart-turnandmoondreamextras, which continue to installtransformers.
(PR #4546) -
Widened the
deepgramextra todeepgram-sdk>=6.1.1,<8so installations can resolve to either deepgram-sdk 6.x or 7.x.DeepgramSTTServicenow handles theagent_restkeyword argument that deepgram-sdk 7.2.0 added toDeepgramClientEnvironment, so custombase_urlconfiguration keeps working on both 6.x and 7.x.
(PR #4565) -
Dropped the upper bound on the
websockets-baseextra (websockets>=13.1) so downstream deployments can resolve to websockets 16.x and beyond. Pipecat'swebsocketsusage relies only on the modernwebsockets.asyncioAPI plus a handful of public symbols, all of which are retained in 16.x.
(PR #4565) -
Changed the default voice for
GradiumTTSServiceto_6Aslh2DxfmnRLmP.
(PR #4569) -
InworldRealtimeLLMServicenow defaults the STT model toinworld/inworld-stt-1.
(PR #4573)
Deprecated
-
FrameProcessor.pipeline_taskis deprecated; readFrameProcessor.pipeline_workerinstead. The old name still works but emits aDeprecationWarningand will be removed in a future release.
(PR #4493) -
Passing a worker to
WorkerRunner.run()is deprecated. Register the worker withWorkerRunner.add_workers()before callingrun()instead. Theworkerargument still works but emits aDeprecationWarningand will be removed in a future release.
(PR #4493) -
PipelineTask,PipelineTaskParams, and thepipecat.pipeline.taskmodule have been renamed toPipelineWorker,WorkerParams, andpipecat.pipeline.worker. The old names still resolve (the module re-exports the new symbols) but constructingPipelineTask/PipelineTaskParamsemits aDeprecationWarning; they will be removed in a future release.
(PR #4493) -
PipelineRunnerhas been renamed toWorkerRunnerand moved topipecat.workers.runner, since the runner now runs workers (of whichPipelineWorkeris one kind), not just pipelines. ImportWorkerRunnerfrompipecat.workers.runner. The oldpipecat.pipeline.runnermodule still re-exports both names, andPipelineRunnerstill works as a subclass alias, but it emits aDeprecationWarningand will be removed in a future release.
(PR #4589)
Removed
- Removed the unsupported Georgian (
Language.KA) language mapping fromSonioxSTTService.
(PR #4521)
Fixed
-
Fixed Azure TTS last word being missed by observers and RTVI UI. The completion signal was racing with word timestamp processing, causing the final word's
TTSTextFrameto arrive afterTTSStoppedFrame. Completion is now routed through the word boundary queue to ensure all words are processed before signaling stream end.
(PR #4306) -
Fixed skipped TTS frames (e.g. code blocks filtered via
skip_aggregator_types) being emitted to the assistant context immediately instead of waiting for preceding spoken frames to finish. They now hold their position in the frame sequence and are flushed only after all earlier spoken sentences are complete, keeping context ordering correct.
(PR #4380) -
Fixed Cartesia word timestamps leaking SSML tag text (e.g.
<spell>,<emotion>,<break>) into word entries. Tags are now stripped before processing, so word-to-text attribution remains accurate when SSML markup is present in the TTS input.
(PR #4380) -
Fixed
BaseOutputTransportreordering frames that share the same presentation timestamp. Frames with equal PTS values are now emitted in insertion order, preventing subtle audio/text sequencing bugs when multiple frames arrive at the same time.
(PR #4380) -
Fixed
TTSTextFrameentries losing their original text structure when word timestamps are enabled. EachTTSTextFramenow carries araw_textfield containing the corresponding span of the original LLM-produced text (including pattern delimiters such as<card>4111 1111 1111 1111</card>), so the assistant context receives properly-tagged content rather than the cleaned words returned by the TTS provider. Also handles words that straddle two sentence boundaries by splitting them and attributing each part to its correct source frame.
(PR #4380) -
Fixed
PipelineTask.cancel()hanging when cancellation is requested before the initialStartFramereaches the pipeline sink.
(PR #4455) -
Fixed
SmallWebRTCClient.read_audio_frameandread_video_framebusy-looping onMediaStreamError. When a track raisesMediaStreamError, the generator now clears the track reference (_audio_input_track/_video_input_track/_screen_video_track) so the loop parks on the existingis Nonegate instead of re-enteringrecv()at ~100 Hz on a permanently-failed track. Renegotiation still resumes seamlessly: when_handle_client_connectedreassigns a fresh track, the loop picks up frames from the new track.
(PR #4491) -
Fixed
ElevenLabsSTTServicecrashing whenlanguagewas passed asNone. Whenlanguageis not set, the service now lets ElevenLabs auto-detect the audio language.
(PR #4507) -
Fixed
NvidiaSTTServiceso unexpected gRPC stream drops reconnect cleanly using the active audio iterator, while service shutdown and cancellation still close that iterator and stop the streaming worker without leaving it stuck waiting for more audio.
(PR #4512) -
Fixed websocket STT connection setup failures so services clear stale websocket state and emit non-fatal error frames, allowing
ServiceSwitcherfailover to keep agents running.
(PR #4514) -
Fixed
ElevenLabsTTSServiceandElevenLabsHttpTTSServiceinserting unwanted spaces between words when synthesizing Chinese or Japanese. Word timestamps for these languages already include their own spacing, so they are now forwarded withincludes_inter_frame_spaces=Trueto avoid double-spacing in transcripts and context.
(PR #4517) -
Fixed the development runner so missing optional transport dependencies disable only their related routes instead of failing startup in all-transport mode.
(PR #4524) -
Fixed a race in
ElevenLabsTTSServicewhere the periodic keepalive could be sent for a new turn's context before that context'svoice_settingsinitialization message, causing ElevenLabs to close the WebSocket with a 1008 policy violation (voice_settings field must be provided in the first message ...). The keepalive now only targets a context once its context-init has been sent.
(PR #4527) -
Switched
BaseSmartTurnfromtime.time()totime.monotonic()for its three internal interval-math sites (audio-buffer timestamps, speech-start tracking, and the pre-speech buffer-trim loop). Wall-clock time can step forward or backward when NTP adjusts the system clock, which would silently corrupt the buffer trim (prune everything / prune nothing) and the speech-window extraction. The corrected primitive is monotonic and matches the existingtime.perf_counter()usage already in place for inference-latency metrics.
(PR #4542) -
Fixed
SOXRAudioResamplerandSOXRStreamAudioResamplerignoring the configured quality setting. Both resamplers were hardcoded toVHQ, which meantRNNoiseFilter'sresampler_qualityargument (defaulting toQQfor low-latency real-time use) had no effect. The resamplers now honor the configured quality, withVHQretained as the default.
(PR #4551) -
Fixed
GeminiLiveLLMService(andGeminiVertexLiveLLMService) crashing with'ContextWindowCompressionParams' object has no attribute 'get'whencontext_window_compressionwas supplied through thesettingsAPI (e.g.settings=GeminiLiveLLMService.Settings(context_window_compression=ContextWindowCompressionParams(...))). The setting is now handled whether it arrives as aContextWindowCompressionParamsobject or as a dict.
(PR #4563) -
Fixed
AudioBufferProcessorconcatenating utterances separated by a silent gap. When no user audio arrives for more than 200 ms, silence proportional to the wall-clock gap is now inserted into the user buffer; the same fix is applied symmetrically to the bot buffer, so two bot utterances spoken seconds apart (e.g. progressive hold messages played while a slow function call runs) remain temporally separated in the recorded audio.
(PR #4567) -
InworldRealtimeLLMServiceno longer logsWARNINGs for unrecognized realtime server events (e.g.response.output_text.done); they are now logged atDEBUG.
(PR #4573) -
Fixed a spurious
ttfs_p99_latency not set, using default 1.0swarning emitted by turn-based STT services (CartesiaTurnsSTTService,DeepgramFluxSTTService) at pipeline start. These services have no meaningful "speech end → final transcript" interval to measure, because the server defines turn boundaries directly.
(PR #4585)
Performance
-
BaseSmartTurnnow stores rawint16PCM views in its audio buffer and defers thefloat32conversion to the once-per-turn segment extraction, eliminating ~50 per-frame numpy allocations per second per analyzer. Output is bit-identical to the previous per-frame conversion path becauseint16 → float32 / 32768.0distributes over concatenation; subclasses (LocalSmartTurnAnalyzerV3,LocalCoreMLSmartTurnAnalyzer,HttpSmartTurnAnalyzer) all receive the same float32audio_arraythey did before. Also removes a spuriousnp.frombuffer(audio_int16, dtype=np.int16)re-wrap that was a no-op view-of-a-view of already-int16 data.
(PR #4542) -
Reduced the
soxrresampling quality preset inLocalSmartTurnAnalyzerV3fromVHQ(~26-tap polyphase) toHQ(~16-tap), cutting resample CPU time by 30–50% on non-16 kHz audio sources (~3–10 ms saved per turn at 24/48 kHz). Pipelines already delivering 16 kHz audio are unaffected — the existingactual_rate == _MODEL_SAMPLE_RATEfast path skips resampling entirely. The two quality presets differ in filter length, not cutoff or interpolation semantics; on a Whisper-style log-mel feature representation the audible difference sits well below the mel filterbank's quantization noise floor, so model predictions are unchanged on representative inputs.
(PR #4542)