pipecat-ai/pipecat v0.0.63 on GitHub

Added

Added media resolution control to GeminiMultimodalLiveLLMService with
GeminiMediaResolution enum, allowing configuration of token usage for
image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing
with 256 tokens).
Added Gemini's Voice Activity Detection (VAD) configuration to
GeminiMultimodalLiveLLMService with GeminiVADParams, allowing fine
control over speech detection sensitivity and timing, including:
- Start sensitivity (how quickly speech is detected)
- End sensitivity (how quickly turns end after pauses)
- Prefix padding (milliseconds of audio to keep before speech is detected)
- Silence duration (milliseconds of silence required to end a turn)
Added comprehensive language support to GeminiMultimodalLiveLLMService,
supporting over 30 languages via the language parameter, with proper
mapping between Pipecat's Language enum and Gemini's language codes.
Added support in SmallWebRTCTransport to detect when remote tracks are
muted.
Added support for image capture from a video stream to the
SmallWebRTCTransport.
Added a new iOS client option to the SmallWebRTCTransport
video-transform example.
Added new processors ProducerProcessor and ConsumerProcessor. The
producer processor processes frames from the pipeline and decides whether the
consumers should consume it or not. If so, the same frame that is received by
the producer is sent to the consumer. There can be multiple consumers per
producer. These processors can be useful to push frames from one part of a
pipeline to a different one (e.g. when using ParallelPipeline).
Improvements for the SmallWebRTCTransport:
- Wait until the pipeline is ready before triggering the connected event.
- Queue messages if the data channel is not ready.
- Update the aiortc dependency to fix an issue where the 'video/rtx' MIME
  type was incorrectly handled as a codec retransmission.
- Avoid initial video delays.

Changed

In GeminiMultimodalLiveLLMService, removed the transcribe_model_audio
parameter in favor of Gemini Live's native output transcription support. Now
text transcriptions are produced directly by the model. No configuration is
required.
Updated GeminiMultimodalLiveLLMService’s default model to
models/gemini-2.0-flash-live-001 and base_url to the v1beta websocket
URL.

Fixed

Updated daily-python to 0.17.0 to fix an issue that was preventing to run on
older platforms.
Fixed an issue where CartesiaTTSService's spell feature would result in
the spelled word in the context appearing as "F,O,O,B,A,R" instead of
"FOOBAR".
Fixed an issue in the Azure TTS services where the language was being set
incorrectly.
Fixed SmallWebRTCTransport to support dynamic values for
TransportParams.audio_out_10ms_chunks. Previously, it only worked with 20ms
chunks.
Fixed an issue with GeminiMultimodalLiveLLMService where the assistant
context messages had no space between words.
Fixed an issue where LLMAssistantContextAggregator would prevent a
BotStoppedSpeakingFrame from moving through the pipeline.