Added
-
Added media resolution control to
GeminiMultimodalLiveLLMServicewithGeminiMediaResolutionenum, allowing configuration of token usage for image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing with 256 tokens). -
Added Gemini's Voice Activity Detection (VAD) configuration to
GeminiMultimodalLiveLLMServicewithGeminiVADParams, allowing fine control over speech detection sensitivity and timing, including:- Start sensitivity (how quickly speech is detected)
- End sensitivity (how quickly turns end after pauses)
- Prefix padding (milliseconds of audio to keep before speech is detected)
- Silence duration (milliseconds of silence required to end a turn)
-
Added comprehensive language support to
GeminiMultimodalLiveLLMService, supporting over 30 languages via thelanguageparameter, with proper mapping between Pipecat'sLanguageenum and Gemini's language codes. -
Added support in
SmallWebRTCTransportto detect when remote tracks are muted. -
Added support for image capture from a video stream to the
SmallWebRTCTransport. -
Added a new iOS client option to the
SmallWebRTCTransportvideo-transform example. -
Added new processors
ProducerProcessorandConsumerProcessor. The producer processor processes frames from the pipeline and decides whether the consumers should consume it or not. If so, the same frame that is received by the producer is sent to the consumer. There can be multiple consumers per producer. These processors can be useful to push frames from one part of a pipeline to a different one (e.g. when usingParallelPipeline). -
Improvements for the
SmallWebRTCTransport:- Wait until the pipeline is ready before triggering the
connectedevent. - Queue messages if the data channel is not ready.
- Update the aiortc dependency to fix an issue where the 'video/rtx' MIME type was incorrectly handled as a codec retransmission.
- Avoid initial video delays.
- Wait until the pipeline is ready before triggering the
Changed
-
In
GeminiMultimodalLiveLLMService, removed thetranscribe_model_audioparameter in favor of Gemini Live's native output transcription support. Now text transcriptions are produced directly by the model. No configuration is required. -
Updated
GeminiMultimodalLiveLLMService’s defaultmodeltomodels/gemini-2.0-flash-live-001andbase_urlto thev1betawebsocket URL.
Fixed
-
Updated
daily-pythonto 0.17.0 to fix an issue that was preventing to run on older platforms. -
Fixed an issue where
CartesiaTTSService's spell feature would result in the spelled word in the context appearing as "F,O,O,B,A,R" instead of "FOOBAR". -
Fixed an issue in the Azure TTS services where the language was being set incorrectly.
-
Fixed
SmallWebRTCTransportto support dynamic values forTransportParams.audio_out_10ms_chunks. Previously, it only worked with 20ms chunks. -
Fixed an issue with
GeminiMultimodalLiveLLMServicewhere the assistant context messages had no space between words. -
Fixed an issue where
LLMAssistantContextAggregatorwould prevent aBotStoppedSpeakingFramefrom moving through the pipeline.