pypi pipecat-ai 0.0.63
v0.0.63

latest releases: 1.0.0, 0.0.108, 0.0.107...
12 months ago

Added

  • Added media resolution control to GeminiMultimodalLiveLLMService with GeminiMediaResolution enum, allowing configuration of token usage for image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing with 256 tokens).

  • Added Gemini's Voice Activity Detection (VAD) configuration to GeminiMultimodalLiveLLMService with GeminiVADParams, allowing fine control over speech detection sensitivity and timing, including:

    • Start sensitivity (how quickly speech is detected)
    • End sensitivity (how quickly turns end after pauses)
    • Prefix padding (milliseconds of audio to keep before speech is detected)
    • Silence duration (milliseconds of silence required to end a turn)
  • Added comprehensive language support to GeminiMultimodalLiveLLMService, supporting over 30 languages via the language parameter, with proper mapping between Pipecat's Language enum and Gemini's language codes.

  • Added support in SmallWebRTCTransport to detect when remote tracks are muted.

  • Added support for image capture from a video stream to the SmallWebRTCTransport.

  • Added a new iOS client option to the SmallWebRTCTransport video-transform example.

  • Added new processors ProducerProcessor and ConsumerProcessor. The producer processor processes frames from the pipeline and decides whether the consumers should consume it or not. If so, the same frame that is received by the producer is sent to the consumer. There can be multiple consumers per producer. These processors can be useful to push frames from one part of a pipeline to a different one (e.g. when using ParallelPipeline).

  • Improvements for the SmallWebRTCTransport:

    • Wait until the pipeline is ready before triggering the connected event.
    • Queue messages if the data channel is not ready.
    • Update the aiortc dependency to fix an issue where the 'video/rtx' MIME type was incorrectly handled as a codec retransmission.
    • Avoid initial video delays.

Changed

  • In GeminiMultimodalLiveLLMService, removed the transcribe_model_audio parameter in favor of Gemini Live's native output transcription support. Now text transcriptions are produced directly by the model. No configuration is required.

  • Updated GeminiMultimodalLiveLLMService’s default model to models/gemini-2.0-flash-live-001 and base_url to the v1beta websocket URL.

Fixed

  • Updated daily-python to 0.17.0 to fix an issue that was preventing to run on older platforms.

  • Fixed an issue where CartesiaTTSService's spell feature would result in the spelled word in the context appearing as "F,O,O,B,A,R" instead of "FOOBAR".

  • Fixed an issue in the Azure TTS services where the language was being set incorrectly.

  • Fixed SmallWebRTCTransport to support dynamic values for TransportParams.audio_out_10ms_chunks. Previously, it only worked with 20ms chunks.

  • Fixed an issue with GeminiMultimodalLiveLLMService where the assistant context messages had no space between words.

  • Fixed an issue where LLMAssistantContextAggregator would prevent a BotStoppedSpeakingFrame from moving through the pipeline.

Don't miss a new pipecat-ai release

NewReleases is sending notifications on new releases.