github pipecat-ai/pipecat v0.0.105

6 hours ago

Added

  • Added concurrent audio context support: CartesiaTTSService can now synthesize the next sentence while the previous one is still playing, by setting pause_frame_processing=False and routing each sentence through its own audio context queue.
    (PR #3804)

  • Added custom video track support to Daily transport. Use video_out_destinations in DailyParams to publish multiple video tracks simultaneously, mirroring the existing audio_out_destinations feature.
    (PR #3831)

  • Added ServiceSwitcherStrategyFailover that automatically switches to the next service when the active service reports a non-fatal error. Recovery policies can be implemented via the on_service_switched event handler.
    (PR #3861)

  • Added optional timeout_secs parameter to register_function() and register_direct_function() for per-tool function call timeout control, overriding the global function_call_timeout_secs default.
    (PR #3915)

  • Added cloud-audio-only recording option to Daily transport's enable_recording property.
    (PR #3916)

  • Wired up system_instruction in BaseOpenAILLMService, AnthropicLLMService, and AWSBedrockLLMService so it works as a default system prompt, matching the behavior of the Google services. This enables sharing a single LLMContext across multiple LLM services, where each service provides its own system instruction independently.

    llm = OpenAILLMService(
        api_key=os.getenv("OPENAI_API_KEY"),
        system_instruction="You are a helpful assistant.",
    )
    
    context = LLMContext()
    
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        context.add_message({"role": "user", "content": "Please introduce yourself."})
        await task.queue_frames([LLMRunFrame()])

    (PR #3918)

  • Added vad_threshold parameter to AssemblyAIConnectionParams for configuring voice activity detection sensitivity in U3 Pro. Aligning this with external VAD thresholds (e.g., Silero VAD) prevents the "dead zone" where AssemblyAI transcribes speech that VAD hasn't detected yet.
    (PR #3927)

  • Added push_empty_transcripts parameter to BaseWhisperSTTService and OpenAISTTService to allow empty transcripts to be pushed downstream as TranscriptionFrame instead of discarding them (the default behavior). This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription.
    (PR #3930)

  • LLM services (BaseOpenAILLMService, AnthropicLLMService, AWSBedrockLLMService) now log a warning when both system_instruction and a system message in the context are set. The constructor's system_instruction takes precedence.
    (PR #3932)

  • Runtime settings updates (via STTUpdateSettingsFrame) now work for AWS Transcribe, Azure, Cartesia, Deepgram, ElevenLabs Realtime, Gradium, and Soniox STT services. Previously, changing settings at runtime only stored the new values without reconnecting.
    (PR #3946)

  • Exposed on_summary_applied event on LLMAssistantAggregator, allowing users to listen for context summarization events without accessing private members.
    (PR #3947)

  • Deepgram Flux STT settings (keyterm, eot_threshold, eager_eot_threshold, eot_timeout_ms) can now be updated mid-stream via STTUpdateSettingsFrame without triggering a reconnect. The new values are sent to Deepgram as a Configure WebSocket message on the existing connection.
    (PR #3953)

  • Added system_instruction parameter to run_inference across all LLM services, allowing callers to override the system prompt for one-shot inference calls. Used by _generate_summary to pass the summarization prompt cleanly.
    (PR #3968)

Changed

  • Audio context management (previously in AudioContextTTSService) is now built into TTSService. All WebSocket providers (cartesia, elevenlabs, asyncai, inworld, rime, gradium, resembleai) now inherit from WebsocketTTSService directly. Word-timestamp baseline is set automatically on the first audio chunk of each context instead of requiring each provider to call start_word_timestamps() in their receive loop.
    (PR #3804)

  • Daily transport now uses CustomVideoSource/CustomVideoTrack instead of VirtualCameraDevice for the default camera output, mirroring how audio already works with CustomAudioSource/CustomAudioTrack.
    (PR #3831)

  • ⚠️ Updated DeepgramSTTService to use deepgram-sdk v6. The LiveOptions class was removed from the SDK and is now provided by pipecat directly; import it from pipecat.services.deepgram.stt instead of deepgram.
    (PR #3848)

  • ServiceSwitcherStrategy base class now provides a handle_error() hook for subclasses to implement error-based switching. ServiceSwitcher defaults to ServiceSwitcherStrategyManual and strategy_type is now optional.
    (PR #3861)

  • Support for Voice Focus 2.0 models.

    • Updated aic-sdk to ~=2.1.0 to support Voice Focus 2.0 models.
    • Cleaned unused ParameterFixedError exception handling in AICFilter
      parameter setup.
      (PR #3889)
  • max_context_tokens and max_unsummarized_messages in LLMAutoContextSummarizationConfig (and deprecated LLMContextSummarizationConfig) can now be set to None independently to disable that summarization threshold. At least one must remain set.
    (PR #3914)

  • ⚠️ Removed formatted_finals and word_finalization_max_wait_time from AssemblyAIConnectionParams as these were v2 API parameters not supported in v3. Clarified that format_turns only applies to Universal-Streaming models; U3 Pro has automatic formatting built-in.
    (PR #3927)

  • Changed DeepgramTTSService to send a Clear message on interruption instead of disconnecting and reconnecting the WebSocket, allowing the connection to persist throughout the session.
    (PR #3958)

  • Re-added enhancement_level support to AICFilter with runtime FilterEnableFrame control, applying ProcessorParameter.Bypass and ProcessorParameter.EnhancementLevel together.
    (PR #3961)

  • Updated daily-python dependency from ~=0.23.0 to ~=0.24.0.
    (PR #3970)

  • Updated FishAudioTTSService default model from s1 to s2-pro, matching Fish Audio's latest recommended model for improved quality and speed.
    (PR #3973)

  • AzureSTTService region parameter is now optional when private_endpoint is provided. A ValueError is raised if neither is given, and a warning is logged if both are provided (private_endpoint takes priority).
    (PR #3974)

Deprecated

  • Deprecated AudioContextTTSService and AudioContextWordTTSService. Subclass WebsocketTTSService directly instead; audio context management is now part of the base TTSService.

    • Deprecated WordTTSService, WebsocketWordTTSService, and InterruptibleWordTTSService. Word timestamp logic is now always active in TTSService and no longer needs to be opted into via a subclass.
      (PR #3804)
  • Deprecated pipecat.services.google.llm_vertex, pipecat.services.google.llm_openai, and pipecat.services.google.gemini_live.llm_vertex modules. Use pipecat.services.google.vertex.llm, pipecat.services.google.openai.llm, and pipecat.services.google.gemini_live.vertex.llm instead. The old import paths still work but will emit a DeprecationWarning.
    (PR #3980)

Removed

  • ⚠️ Removed supports_word_timestamps parameter from TTSService.__init__(). Word timestamp logic is now always active. Remove this argument from any custom subclass super().__init__() calls.
    (PR #3804)

Fixed

  • Fixed DeepgramSTTService keepalive ping timeout disconnections. The deepgram-sdk v6 removed automatic keepalive; pipecat now sends explicit KeepAlive messages every 5 seconds, within the recommended 3–5 second interval before Deepgram's 10-second inactivity timeout.
    (PR #3848)

  • Fixed BufferError: Existing exports of data: object cannot be re-sized in AICFilter caused by holding a memoryview on the mutable audio buffer across async yield points.
    (PR #3889)

  • Fixed TTS context not being appended to the assistant message history when using TTSSpeakFrame with append_to_context=True with some TTS providers.
    (PR #3936)

  • Fixed context summarization leaving orphaned tool responses in the kept context when tool calls were moved to the summarized portion.
    (PR #3937)

  • Fixed turn completion state not resetting at end of LLM responses. LLMFullResponseEndFrame is pushed (not received) by the LLM service, so the mixin now handles it in push_frame instead of process_frame.
    (PR #3956)

  • Fixed turn completion instructions being injected as a context system message instead of using system_instruction. This caused warning spam when system_instruction was also set and didn't persist across full context updates.
    (PR #3957)

  • Fixed TTSService audio context queue getting blocked when append_to_audio_context() was called with a None context ID, which prevented subsequent audio from being delivered.
    (PR #3958)

  • Fixed on_call_state_updated event handler in LiveKit transport receiving incorrect number of arguments due to redundant self passed to _call_event_handler.
    (PR #3959)

  • Fixed OpenAI Realtime, OpenAI Realtime Beta, and Grok realtime services treating conversation_already_has_active_response as a fatal error. These services now log it as a non-fatal debug event when a response is already in progress.
    (PR #3960)

  • Fixed SmallWebRTCConnection silently discarding messages sent before the data channel is open by queuing them and flushing once the channel is ready. A bounded queue (MAX_MESSAGE_QUEUE_SIZE = 50) prevents unbounded memory growth, and a 10-second timeout after connection clears the queue and falls back to discard mode if the data channel never opens.
    (PR #3962)

  • Fixed AzureSTTService failing to initialize when private_endpoint is provided. The Azure Speech SDK's SpeechConfig does not accept both region and endpoint simultaneously, so they are now passed conditionally.
    (PR #3967)

  • Fixed GoogleLLMService ignoring the system_instruction set via constructor or GoogleLLMSettings when a system message was also present in the context. The settings value now correctly takes priority, and a warning is logged when both are set.
    (PR #3976)

Other

  • Updated foundational examples to use system_instruction on LLM services instead of adding system messages to LLMContext.
    (PR #3918)

  • Updated AssemblyAI turn detection example to use keyterms_prompt list format instead of prompt string for improved clarity.
    (PR #3929)

  • Updated foundational examples and eval scripts to use "user" role instead of "system" when adding messages to LLMContext, since system prompts should be set via system_instruction on the LLM service.
    (PR #3931)

Don't miss a new pipecat release

NewReleases is sending notifications on new releases.