github pipecat-ai/pipecat v0.0.63

latest releases: v0.0.65, v0.0.64
19 days ago

Added

  • Added media resolution control to GeminiMultimodalLiveLLMService with
    GeminiMediaResolution enum, allowing configuration of token usage for
    image processing (LOW: 64 tokens, MEDIUM: 256 tokens, HIGH: zoomed reframing
    with 256 tokens).

  • Added Gemini's Voice Activity Detection (VAD) configuration to
    GeminiMultimodalLiveLLMService with GeminiVADParams, allowing fine
    control over speech detection sensitivity and timing, including:

    • Start sensitivity (how quickly speech is detected)
    • End sensitivity (how quickly turns end after pauses)
    • Prefix padding (milliseconds of audio to keep before speech is detected)
    • Silence duration (milliseconds of silence required to end a turn)
  • Added comprehensive language support to GeminiMultimodalLiveLLMService,
    supporting over 30 languages via the language parameter, with proper
    mapping between Pipecat's Language enum and Gemini's language codes.

  • Added support in SmallWebRTCTransport to detect when remote tracks are
    muted.

  • Added support for image capture from a video stream to the
    SmallWebRTCTransport.

  • Added a new iOS client option to the SmallWebRTCTransport
    video-transform example.

  • Added new processors ProducerProcessor and ConsumerProcessor. The
    producer processor processes frames from the pipeline and decides whether the
    consumers should consume it or not. If so, the same frame that is received by
    the producer is sent to the consumer. There can be multiple consumers per
    producer. These processors can be useful to push frames from one part of a
    pipeline to a different one (e.g. when using ParallelPipeline).

  • Improvements for the SmallWebRTCTransport:

    • Wait until the pipeline is ready before triggering the connected event.
    • Queue messages if the data channel is not ready.
    • Update the aiortc dependency to fix an issue where the 'video/rtx' MIME
      type was incorrectly handled as a codec retransmission.
    • Avoid initial video delays.

Changed

  • In GeminiMultimodalLiveLLMService, removed the transcribe_model_audio
    parameter in favor of Gemini Live's native output transcription support. Now
    text transcriptions are produced directly by the model. No configuration is
    required.

  • Updated GeminiMultimodalLiveLLMService’s default model to
    models/gemini-2.0-flash-live-001 and base_url to the v1beta websocket
    URL.

Fixed

  • Updated daily-python to 0.17.0 to fix an issue that was preventing to run on
    older platforms.

  • Fixed an issue where CartesiaTTSService's spell feature would result in
    the spelled word in the context appearing as "F,O,O,B,A,R" instead of
    "FOOBAR".

  • Fixed an issue in the Azure TTS services where the language was being set
    incorrectly.

  • Fixed SmallWebRTCTransport to support dynamic values for
    TransportParams.audio_out_10ms_chunks. Previously, it only worked with 20ms
    chunks.

  • Fixed an issue with GeminiMultimodalLiveLLMService where the assistant
    context messages had no space between words.

  • Fixed an issue where LLMAssistantContextAggregator would prevent a
    BotStoppedSpeakingFrame from moving through the pipeline.

Don't miss a new pipecat release

NewReleases is sending notifications on new releases.