Added
-
Added two new input parameters to
RimeTTSService:pause_between_bracketsandphonemize_between_brackets. -
Added support for cross-platform local smart turn detection. You can use
LocalSmartTurnAnalyzerfor on-device inference using Torch. -
BaseOutputTransportnow allows multiple destinations if the transport implementation supports it (e.g. Daily's custom tracks). With multiple destinations it is possible to send different audio or video tracks with a single transport simultaneously. To do that, you need to set the newFrame.transport_destinationfield with your desired transport destination (e.g. custom track name), tell the transport you want a new destination withTransportParams.audio_out_destinationsorTransportParams.video_out_destinationsand the transport should take care of the rest. -
Similar to the new
Frame.transport_destination, there's a newFrame.transport_sourcefield which is set by theBaseInputTransportif the incoming data comes from a non-default source (e.g. custom tracks). -
TTSServicehas a newtransport_destinationconstructor parameter. This parameter will be used to update theFrame.transport_destinationfield for each generatedTTSAudioRawFrame. This allows sending multiple bots' audio to multiple destinations in the same pipeline. -
Added
DailyTransportParams.camera_out_enabledandDailyTransportParams.microphone_out_enabledwhich allows you to enable/disable the main output camera or microphone tracks. This is useful if you only want to use custom tracks and not send the main tracks. Note that you still needaudio_out_enabled=Trueorvideo_out_enabled. -
Added
DailyTransport.capture_participant_audio()which allows you to capture an audio source (e.g. "microphone", "screenAudio" or a custom track name) from a remote participant. -
Added
DailyTransport.update_publishing()which allows you to update the call video and audio publishing settings (e.g. audio and video quality). -
Added
RTVIObserverParamswhich allows you to configure what RTVI messages are sent to the clients. -
Added a
context_window_compressionInputParam toGeminiMultimodalLiveLLMServicewhich allows you to enable a sliding context window for the session as well as set the token limit of the sliding window. -
Updated
SmallWebRTCConnectionto supportice_serverswith credentials. -
Added
VADUserStartedSpeakingFrameandVADUserStoppedSpeakingFrame, indicating when the VAD detected the user to start and stop speaking. These events are helpful when using smart turn detection, as the user's stop time can differ from when their turn ends (signified by UserStoppedSpeakingFrame). -
Added
TranslationFrame, a new frame type that contains a translated transcription. -
Added
TransportParams.audio_in_passthrough. If set (the default), incoming audio will be pushed downstream. -
Added
MCPClient; a way to connect to MCP servers and use the MCP servers' tools. -
Added
Mem0 OSS, along with Mem0 cloud support now the OSS version is also available.
Changed
TransportParams.audio_mixernow supports a string and also a dictionary to provide a mixer per destination. For example:
audio_out_mixer={
"track-1": SoundfileMixer(...),
"track-2": SoundfileMixer(...),
"track-N": SoundfileMixer(...),
},-
The
STTMuteFilternow mutesInterimTranscriptionFrameandTranscriptionFramewhich allows theSTTMuteFilterto be used in conjunction with transports that generate transcripts, e.g.DailyTransport. -
Function calls now receive a single parameter
FunctionCallParamsinstead of(function_name, tool_call_id, args, llm, context, result_callback)which is now deprecated. -
Changed the user aggregator timeout for late transcriptions from 1.0s to 0.5s (
LLMUserAggregatorParams.aggregation_timeout). Sometimes, the STT services might give us more than one transcription which could come after the user stopped speaking. We still want to include these additional transcriptions with the first one because it's part of the user turn. This is what this timeout is helpful with. -
Short utterances not detected by VAD while the bot is speaking are now ignored. This reduces the amount of bot interruptions significantly providing a more natural conversation experience.
-
Updated
GladiaSTTServiceto output aTranslationFramewhen specifying atranslationandtranslation_config. -
STT services now passthrough audio frames by default. This allows you to add audio recording without worrying about what's wrong in your pipeline when it doesn't work the first time.
-
Input transports now always push audio downstream unless disabled with
TransportParams.audio_in_passthrough. After many Pipecat releases, we realized this is the common use case. There are use cases where the input transport already provides STT and you also don't want recordings, in which case there's no need to push audio to the rest of the pipeline, but this is not a very common case. -
Added
RivaSegmentedSTTService, which allows Riva offline/batch models, such as to be "canary-1b-asr" used in Pipecat.
Deprecated
-
Function calls with parameters
(function_name, tool_call_id, args, llm, context, result_callback)are deprectated, use a singleFunctionCallParamsparameter instead. -
TransportParams.camera_*parameters are now deprecated, useTransportParams.video_*instead. -
TransportParams.vad_enabledparameter is now deprecated, useTransportParams.audio_in_enabledandTransportParams.vad_analyzerinstead. -
TransportParams.vad_audio_passthroughparameter is now deprecated, useTransportParams.audio_in_passthroughinstead. -
ParakeetSTTServiceis now deprecated, useRivaSTTServiceinstead, which uses the model "parakeet-ctc-1.1b-asr" by default. -
FastPitchTTSServiceis now deprecated, useRivaTTSServiceinstead, which uses the model "magpie-tts-multilingual" by default.
Fixed
-
Fixed an issue with
SimliVideoServicewhere the bot was continuously outputting audio, which prevents theBotStoppedSpeakingFramefrom being emitted. -
Fixed an issue where
OpenAIRealtimeBetaLLMServicewould add two assistant messages to the context. -
Fixed an issue with
GeminiMultimodalLiveLLMServicewhere the context contained tokens instead of words. -
Fixed an issue with HTTP Smart Turn handling, where the service returns a 500 error. Previously, this would cause an unhandled exception. Now, a 500 error is treated as an incomplete response.
-
Fixed a TTS services issue that could cause assistant output not to be aggregated to the context when also using
TTSSpeakFrames. -
Fixed an issue where the
SmartTurnMetricsDatawas reporting 0ms for inference and processing time when using theFalSmartTurnAnalyzer.
Other
-
Added
examples/daily-custom-tracksto show how to send and receive Daily custom tracks. -
Added
examples/daily-multi-translationto showcase how to send multiple simulataneous translations with the same transport. -
Added 04 foundational examples for client/server transports. Also, renamed
29-livekit-audio-chat.pyto04b-transports-livekit.py. -
Added foundational example
13c-gladia-translation.pyshowing how to useTranscriptionFrameandTranslationFrame.