Added
-
Added media resolution control to
GeminiMultimodalLiveLLMService
with
GeminiMediaResolution
enum, allowing configuration of token usage for
image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing
with 256 tokens). -
Added Gemini's Voice Activity Detection (VAD) configuration to
GeminiMultimodalLiveLLMService
withGeminiVADParams
, allowing fine
control over speech detection sensitivity and timing, including:- Start sensitivity (how quickly speech is detected)
- End sensitivity (how quickly turns end after pauses)
- Prefix padding (milliseconds of audio to keep before speech is detected)
- Silence duration (milliseconds of silence required to end a turn)
-
Added comprehensive language support to
GeminiMultimodalLiveLLMService
,
supporting over 30 languages via thelanguage
parameter, with proper
mapping between Pipecat'sLanguage
enum and Gemini's language codes. -
Added support in
SmallWebRTCTransport
to detect when remote tracks are
muted. -
Added support for image capture from a video stream to the
SmallWebRTCTransport
. -
Added a new iOS client option to the
SmallWebRTCTransport
video-transform example. -
Added new processors
ProducerProcessor
andConsumerProcessor
. The
producer processor processes frames from the pipeline and decides whether the
consumers should consume it or not. If so, the same frame that is received by
the producer is sent to the consumer. There can be multiple consumers per
producer. These processors can be useful to push frames from one part of a
pipeline to a different one (e.g. when usingParallelPipeline
). -
Improvements for the
SmallWebRTCTransport
:- Wait until the pipeline is ready before triggering the
connected
event. - Queue messages if the data channel is not ready.
- Update the aiortc dependency to fix an issue where the 'video/rtx' MIME
type was incorrectly handled as a codec retransmission. - Avoid initial video delays.
- Wait until the pipeline is ready before triggering the
Changed
-
In
GeminiMultimodalLiveLLMService
, removed thetranscribe_model_audio
parameter in favor of Gemini Live's native output transcription support. Now
text transcriptions are produced directly by the model. No configuration is
required. -
Updated
GeminiMultimodalLiveLLMService
’s defaultmodel
to
models/gemini-2.0-flash-live-001
andbase_url
to thev1beta
websocket
URL.
Fixed
-
Updated
daily-python
to 0.17.0 to fix an issue that was preventing to run on
older platforms. -
Fixed an issue where
CartesiaTTSService
's spell feature would result in
the spelled word in the context appearing as "F,O,O,B,A,R" instead of
"FOOBAR". -
Fixed an issue in the Azure TTS services where the language was being set
incorrectly. -
Fixed
SmallWebRTCTransport
to support dynamic values for
TransportParams.audio_out_10ms_chunks
. Previously, it only worked with 20ms
chunks. -
Fixed an issue with
GeminiMultimodalLiveLLMService
where the assistant
context messages had no space between words. -
Fixed an issue where
LLMAssistantContextAggregator
would prevent a
BotStoppedSpeakingFrame
from moving through the pipeline.