Added
-
Added
MistralSTTServicefor real-time speech-to-text using Mistral's Voxtral Realtime API (voxtral-mini-transcribe-realtime-2602). Supports streaming transcription with interim results, automatic language detection, and VAD-driven utterance lifecycle.
(PR #4253) -
Added
buttonsfield toOutputDTMFFrameandOutputDTMFUrgentFramefor sending multi-key DTMF sequences as alist[KeypadEntry]. UseOutputDTMFFrame.from_string("123#")(or the equivalent onOutputDTMFUrgentFrame) to build one from a dial string, andto_string()to convert back.
(PR #4313) -
Added
DailyTransport.send_dtmf()to expose the Daily call client's DTMF sending capability, enabling applications to send tones during a call (e.g. IVR navigation).
(PR #4313) -
Added
DailyOutputDTMFFrameandDailyOutputDTMFUrgentFrameframes. In addition to the inheritedbuttons, they acceptsession_id,digit_duration_msandmethod, which are forwarded to Daily'ssend_dtmfassessionId,digitDurationMsandmethod.
(PR #4313) -
Added incremental
pyrighttype checking. Apyrightconfig.jsonat the repo root usestypeCheckingMode: "basic"with an explicitincludelist of modules that pass cleanly (clocks,metrics,transcriptions,frames,observers,extensions,turns,pipeline,runner). Remaining modules will be added in subsequent PRs. CI enforces the checked set viauv run pyrightin the format workflow.
(PR #4324) -
Added multilingual support to
DeepgramFluxSTTServicevia a newlanguage_hints: list[Language]setting. Works with Deepgram's newflux-general-multimodel to bias transcription across English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. Omit the hints to use auto-detection, or pass a subset to bias toward expected languages. Hints can be updated mid-stream viaSTTUpdateSettingsFrame(sent as a DeepgramConfigurecontrol message, no reconnect) to support detect-then-lock flows.
(PR #4326) -
Added fine-grained server-side VAD tuning options to
SarvamSTTService.Settingsfor thesaaras:v3model, including speech thresholds, frame-count controls, pre-speech padding, interruption sensitivity, and initial-frame skipping.
(PR #4334) -
Added
XAISTTServicefor real-time speech-to-text using xAI's voice STT WebSocket API (wss://api.x.ai/v1/stt). Streams raw audio (PCM, µ-law, or A-law) and emits interim and final transcription frames driven by the server'sis_final/speech_finalflags. Settings exposeinterim_results,endpointing,language,multichannel,channels, anddiarize. Requires thexaioptional extra (pip install "pipecat-ai[xai]").
(PR #4340) -
Added
XAITTSServicefor streaming text-to-speech using xAI's WebSocket TTS endpoint (wss://api.x.ai/v1/tts). Streamstext.deltachunks up and base64audio.deltachunks down on the same connection so audio begins flowing before the full utterance finishes synthesizing; complements the batch-HTTPXAIHttpTTSService. Defaults to raw PCM output soTTSAudioRawFrameneeds no decoding. Thexaioptional extra now pulls inpipecat-ai[websockets-base].
(PR #4341) -
Added
SonioxTTSService, a real-time WebSocket TTS service that streams text in and audio out over a persistent connection. Install withpip install "pipecat-ai[soniox]".
(PR #4360) -
Added support for Daily's built-in
screenVideodestination inDailyTransport. When"screenVideo"is included invideo_out_destinationstransport parameter, a dedicated screen video track is created at join time and frames withtransport_destination="screenVideo"are routed to it.params = DailyParams( video_out_enabled=True, video_out_is_live=True, video_out_width=1280, video_out_height=720, video_out_destinations=["screenVideo"] ) ... frame = OutputImageRawFrame(...) frame.transport_destination = "screenVideo"
(PR #4370)
-
Added
camera_out_send_settingstoDailyParams. This dict is passed verbatim to the Daily client's camera publishing settings, allowing applications to fully control encoding, codec, bitrate, and framerate.params = DailyParams( camera_out_send_settings={ "maxQuality": "high", "encodings": { "high": {"maxBitrate": 2_000_000, "maxFramerate": 30} }, }, )
(PR #4370)
-
Added
tool_resourcestoPipelineTaskandFunctionCallParams. Pass an application-defined object (DB handles, clients, state, etc.) toPipelineTask(..., tool_resources=...)and access it from any tool handler viaparams.tool_resources. Passed by reference; the caller retains their handle and can read mutations after the task finishes. Resolves #4256.
(PR #4371)
Changed
-
Updated NVIDIA STT services to align with Nemotron Speech defaults and
configuration:api_keyis now optional for local deployments, additional
recognition settings are available (including alternatives, word offsets, and
diarization), and streaming/segmented docs now reflect Nemotron Speech APIs.- NVIDIA streaming STT now sets
TranscriptionFrame.finalized=Truewhen the provider marks a result as final, and preserveslanguageon bothTranscriptionFrameandInterimTranscriptionFrame.
(PR #4269)
- NVIDIA streaming STT now sets
-
Updated
NvidiaLLMServiceto emit model reasoning asLLMThought*Frames (from bothreasoning_contentand<think>...</think>output), avoid mixing reasoning text into normal assistant content, and allow keyless local NIM endpoints while warning when the cloud endpoint is used without an API key.
(PR #4270) -
STT services now reconnect safely when settings change: reconnection is deferred until the current user turn ends (i.e., until
UserStoppedSpeakingFrameis received) rather than interrupting an active speech session. Audio frames received while the reconnect is in progress are buffered and replayed once the new connection is ready.CartesiaSTTServiceandDeepgramSTTServiceboth use this new behavior.
(PR #4311) -
Reduced debug log noise for LLM services. The system instruction is now logged once when composed (e.g. when turn completion is enabled) instead of on every LLM call. Per-call logs now show only the conversation messages, consistent across Google, Anthropic, AWS, and OpenAI services.
(PR #4314) -
LiveKitRunnerArguments.tokenis now a requiredstr(previouslystr | Nonewith a default ofNone). LiveKit requires a token to join a room, so the type now reflects reality. This only affects custom runners that constructLiveKitRunnerArgumentsdirectly; code consuming the argument from the standard runner is unaffected.
(PR #4324) -
TranscriptionFrame.languageandInterimTranscriptionFrame.languageemitted byDeepgramFluxSTTServicenow reflect the language Deepgram detected for each turn (read from thelanguagesfield on Flux'sTurnInfoevent). Onflux-general-multithis gives per-turn accuracy for downstream consumers (e.g. TTS voice selection).flux-general-encontinues to emitLanguage.EN.
(PR #4326) -
Added
includes_inter_frame_spacesparameter to
TTSService.add_word_timestampsand_add_word_timestamps(defaultNone).
WhenTrue, downstream consumers will not inject additional spaces between
tokens;Noneleaves each frame's own default unchanged.InworldTTSServicenow passesincludes_inter_frame_spaces=Truewhen reporting word timestamps, since Inworld tokens already include inter-word spacing.
(PR #4330)
-
SarvamSTTServicenow usessaaras:v3as its default model instead ofsaarika:v2.5. Applications that relied on the previous default should setsettings=SarvamSTTService.Settings(model="saarika:v2.5")explicitly.
(PR #4334) -
SpeechTimeoutUserTurnStopStrategynow waits onlyuser_speech_timeoutwhen a transcript arrives without a VAD stop event, rather thanmax(ttfs_p99_latency, user_speech_timeout). If you hadttfs_p99_latency > user_speech_timeout, turn detection in that path is slightly faster than before.
(PR #4337) -
If you use an STT service that emits finalized transcripts (Speechmatics, Soniox, Deepgram Flux, AssemblyAI) with
SpeechTimeoutUserTurnStopStrategy, user turns now end as soon asuser_speech_timeoutelapses after VAD stop. Previously the strategy also waited for the STT P99 latency (ttfs_p99_latency) even when the transcript was already marked final.user_speech_timeoutis still honored as a floor — STT finalization never shortens it.
(PR #4337) -
⚠️
PlivoFrameSerializerandTelnyxFrameSerializernow raiseValueErrorat construction whenauto_hang_up=True(the default) but required credentials are missing, matchingTwilioFrameSerializer. Previously they constructed successfully and the hangup failed silently at call-end, leaving phantom billable sessions on the provider. If you relied on the old silent behavior, passauto_hang_up=Falseexplicitly or provide the credentials. The specific fields checked arecall_id/auth_id/auth_tokenfor Plivo andcall_control_id/api_keyfor Telnyx.
(PR #4349) -
ToolsSchema(standard_tools=...)now accepts anySequence[FunctionSchema | DirectFunction]rather than requiring an exactlistof the union. Callers can pass a narrowerlist[FunctionSchema](or any otherSequence) without the type checker complaining about list invariance.
(PR #4352) -
Updated
aic-sdkdependency to~=2.2.0. TheAIC_LICENSE_KEYenvironment variable replaces the previousAICOUSTICS_LICENSE_KEY.
(PR #4362) -
Loosened the
protobufdependency to>=5.29.6,<7, so projects pinned to protobuf 5.x can installpipecat-aiagain. The previous>=6.31.1,<7pin (introduced in 1.0.8 alongside thenvidia-riva-client 2.25.1upgrade) silently blocked any environment whose dependency graph already constrained protobuf to the 5.x line. The bundledframes_pb2.pyis now compiled with protoc 5.x so it imports cleanly on both 5.x and 6.x runtimes.Installing the
nvidiaextra still pulls protobuf 6.x:nvidia-riva-client 2.25.1ships gencode that requires a 6.x runtime, sopipecat-ai[nvidia]now declaresprotobuf>=6.31.1,<7explicitly to cover an upstream packaging gap (nvidia-riva/python-clients#172).
(PR #4372) -
Daily rooms created by the development runner (
pipecat.runner.run) now expire after 4 hours witheject_at_room_exp=True, mirroring Pipecat Cloud's max session limit. Previously, runner-created rooms inherited a 2-hour expiration on the default code paths and had no expiration at all when callers posted partialdailyRoomProperties(e.g.{"start_video_off": true}) to/start, causing rooms to accumulate indefinitely. Explicitexpandeject_at_room_expvalues indailyRoomPropertiesare still respected.
(PR #4374) -
Updated
daily-pythondependency to~=0.28.0.
(PR #4379)
Deprecated
- Deprecated
TransportParams.video_out_bitratefor the Daily transport. UseDailyParams.camera_out_send_settingsinstead to configure camera publishing encodings (bitrate, framerate, codec, etc.).
(PR #4370)
Fixed
-
Fixed missing tool handlers so unregistered tool calls fail with a normal final tool result instead of leaving tool-call state hanging.
(PR #4301) -
Fixed
pipecat-ai[tavus]not installing the requireddaily-pythondependency. Installing thetavusextra now correctly pulls inpipecat-ai[daily].
(PR #4304) -
Fixed audio loss and potential errors when STT settings were updated
mid-speech. Previously,CartesiaSTTServiceandDeepgramSTTServicewould
immediately disconnect and reconnect when settings changed, dropping any
in-flight audio. Reconnection is now deferred until the user stops speaking,
and audio arriving during the reconnect window is buffered and replayed.
(PR #4311) -
Fixed
SmallestTTSServiceWebSocket endpoint URL to match Smallest AI v4.0.0 API (wss://waves-api.smallest.ai→wss://api.smallest.ai) and restored keepalive using a silent space message instead of the unsupported flush command.
(PR #4320) -
Fixed whitespace handling in TTS token streaming mode. Inter-token whitespace (e.g., spaces between words) is now preserved for correct prosody, while leading whitespace before the first non-whitespace token is still stripped to avoid issues with TTS models that are sensitive to leading spaces.
(PR #4323) -
Fixed
SentryMetricssilently droppingMetricsFrames fromstop_ttfb_metricsandstop_processing_metrics.SentryMetricscalled the baseFrameProcessorMetricsimplementation but discarded its return value, soFrameProcessornever pushed theMetricsFramedownstream. This prevented observers (e.g.UserBotLatencyObserver,MetricsLogObserver) from seeing TTFB and processing metrics for any service usingmetrics=SentryMetrics(). The metrics were still calculated and Sentry transactions still completed — only the downstream frame push was affected.
(PR #4325) -
Fixed
ElevenLabsTTSServiceandElevenLabsHttpTTSServiceemitting word timestamps andTTSTextFramecontent that matched the input text instead of the spoken audio when a pronunciation dictionary (pronunciation_dictionary_locators) or text normalization rewrote the input. Both services now consume ElevenLabs' normalized alignment, so downstream consumers (captions, transcripts, context aggregation) reflect what the listener actually hears.
(PR #4344) -
Fixed a crash in
DeepgramSTTServicewhen anSTTUpdateSettingsFramearrived before the WebSocket handshake completed (for example, when pushing an update upstream onStartFrame). The settings-triggered reconnect cancelled the in-flight connection task before its keepalive task was created, causing anUnboundLocalError: cannot access local variable 'keepalive_task'in the handler'sfinallyblock.
(PR #4347) -
Fixed direct-function registration crashing for functions without a docstring.
DirectFunctionWrapperpassedinspect.getdoc()'s result todocstring_parser.parse(), which raises when the docstring isNone. Functions now register cleanly whether or not they have a docstring; an empty docstring produces empty description and parameter metadata as expected.
(PR #4352) -
Fixed
AssemblyAISTTService,CartesiaSTTService,GradiumSTTService, andSonioxSTTServicecrashing the pipeline on transient WebSocket send failures. Eachrun_sttsent audio directly without catching errors, so a single network hiccup mid-stream raised an uncaught exception throughprocess_frame. The guards now log a warning and let the connection-state check on the next call handle recovery, matching the pattern used by Deepgram, xAI, Azure, and other push-based STTs.
(PR #4352) -
Fixed Gemini Live losing conversation history in the (rare) case of a WebSocket reconnect before any session resumption handle is received. When the session reconnects (e.g. on system instruction change), conversation history is now re-seeded into the new session before it is marked ready for input.
(PR #4355) -
Fixed SmallWebRTC data channel silently stalling on networks with a 1280-byte MTU (IPv6, Tailscale overlays, many consumer VPNs). aiortc's default SCTP chunk size of 1200 bytes produces ~1305-byte UDP datagrams after headers, which the kernel rejects with EMSGSIZE; aiortc has no path-MTU discovery so it retransmits forever at the same oversized size. The chunk size is now clamped to 1100 bytes (~1205-byte datagrams, ~75 bytes of slack). Override with
PIPECAT_SCTP_MAX_CHUNK_SIZEif your path MTU requires a different value.
(PR #4358)