github pipecat-ai/pipecat v0.0.98

one day ago

Added

  • Added RimeNonJsonTTSService which supports non-JSON streaming mode. This new class supports websocket streaming for the Arcana model.
    (PR #3085)

  • Added additional functionality related to "thinking", for Google and Anthropic LLMs.

    1. New typed parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries):
      • AnthropicLLMService.ThinkingConfig
      • GoogleLLMService.ThinkingConfig
    2. New frames for representing thoughts output by LLMs:
      • LLMThoughtStartFrame
      • LLMThoughtTextFrame
      • LLMThoughtEndFrame
    3. A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages. See:
      • LLMThoughtEndFrame.signature
      • LLMAssistantAggregator handling of the above field
      • AnthropicLLMAdapter handling of "thought" context messages
    4. Google-specific logic for inserting thought signatures into the context, to help maintain thinking continuity in a chain of LLM calls. See:
      • GoogleLLMService sending LLMMessagesAppendFrames to add LLM-specific "thought_signature" messages to context
      • GeminiLLMAdapter handling of "thought_signature" messages
    5. An expansion of TranscriptProcessor to process LLM thoughts in addition to user and assistant utterances. See:
      • TranscriptProcessor(process_thoughts=True) (defaults to False)
      • ThoughtTranscriptionMessage, which is now also emitted with the
        "on_transcript_update" event
        (PR #3175)
  • Data and control frames can now be marked as non-interruptible by using the UninterruptibleFrame mixin. Frames marked as UninterruptibleFrame will not be interrupted during processing, and any queued frames of this type will be retained in the internal queues. This is useful when you need ordered frames (data or control) that should not be discarded or cancelled due to interruptions.
    (PR #3189)

  • Added on_conversation_detected event to VoicemaiDetector.
    (PR #3207)

  • Added x-goog-api-client header with Pipecat's version to all Google services' requests.
    (PR #3208)

  • Added support for the HeyGen LiveAvatar API (see https://www.liveavatar.com/).
    (PR #3210)

  • Added to AWSNovaSonicLLMService functionality related to the new (and now default) Nova 2 Sonic model ("amazon.nova-2-sonic-v1:0"):

    • Added the endpointing_sensitivity parameter to control how quickly the model decides the user has stopped speaking.
    • Made the assistant-response-trigger hack a no-op. It's only needed for the older Nova Sonic model.
      (PR #3212)
  • Ultravox Realtime is now a supported speech-to-speech service.

    • Added UltravoxRealtimeLLMService for the integration.
    • Added 49-ultravox-realtime.py example (with tool calling).
      (PR #3227)
  • Added Daily PSTN dial-in support to the development runner with --dialin flag. This includes:

    • /daily-dialin-webhook endpoint that handles incoming Daily PSTN webhooks
    • Automatic Daily room creation with SIP configuration
    • DialinSettings and DailyDialinRequest types in pipecat.runner.types for type-safe dial-in data
    • The runner now mimics Pipecat Cloud's dial-in webhook handling for local development
      (PR #3235)
  • Add Gladia session id to logs for GladiaSTTService.
    (PR #3236)

  • Added InworldHttpTTSService which uses Inworld's HTTP based TTS service in either streaming or non-streaming mode. Note: This class was previously named InworldTTSService.
    (PR #3239)

  • Added language_hints_strict parameter to SonioxSTTService to strictly enforces language hints. This ensures that transcription occurs in the specified language.
    (PR #3245)

  • Added Pipecat library version info to the about field in the bot-ready RTVI message.
    (PR #3248)

  • Added VisionFullResponseStartFrame, VisionFullResponseEndFrame and VisionTextFrame. This are used by vision services similar to LLM services.
    (PR #3252)

Changed

  • FunctionCallInProgressFrame and FunctionCallResultFrame have changed from system frames to a control frame and a data frame, respectively, and are now both marked as UninterruptibleFrame.
    (PR #3189)

  • UserBotLatencyLogObserver now uses VADUserStartedSpeakingFrame and VADUserStoppedSpeakingFrame to determine latency from user stopped speaking to bot started speaking.
    (PR #3206)

  • Updated HeyGenVideoService and HeyGenTransport to support both HeyGen APIs (Interactive Avatar and Live Avatar).

    Using them is as simple as specifying the service_type when creating the HeyGenVideoService and the HeyGenTransport:

    heyGen = HeyGenVideoService(
        api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"),
        service_type=ServiceType.LIVE_AVATAR,
        session=session,
    )

    (PR #3210)

  • Made "amazon.nova-2-sonic-v1:0" the new default model for AWSNovaSonicLLMService.
    (PR #3212)

  • Updated the run_inference methods in the LLM service classes (AnthropicLLMService, AWSBedrockLLMService, GoogleLLMService, and OpenAILLMService and its base classes) to use the provided LLM configuration parameters.
    (PR #3214)

  • Updated default models for:

    • GeminiLiveLLMService to gemini-2.5-flash-native-audio-preview-12-2025.
    • GeminiLiveVertexLLMService to gemini-live-2.5-flash-native-audio.
      (PR #3228)
  • Changed the reason field in EndFrame, CancelFrame, EndTaskFrame, and CancelTaskFrame from str to Any to indicate that it can hold values other than strings.
    (PR #3231)

  • Updated websocket STT services to use the WebsocketSTTService base class. This base class manages the websocket connection and handles reconnects.

    Updated services:

    • AssemblyAISTTService
    • AWSTranscribeSTTService
    • GladiaSTTService
    • SonioxSTTService
      (PR #3236)
  • Changed Inworld's TTS service implementations:

    • Previously, the HTTP implementation was named InworldTTSService. That has been moved to InworldHttpTTSService. This service now supports word-timestamp alignment data in both streaming and non-streaming modes.
    • Updated the InworldTTSService class to use Inworld's Websocket API. This class now has support for word-timestamp alignment data and tracks contexts for each user turn.
      (PR #3239)
  • ⚠️ Breaking change: WordTTSService.start_word_timestamps() and WordTTSService.reset_word_timestamps() are now async.
    (PR #3240)

  • Updated the current RTVI version to 1.1.0 to reflect recent additions and deprecations.

    • New RTVI Messages: send-text and bot-output
    • Deprecated Messages: append-to-context and bot-transcription
      (PR #3248)
  • MoondreamService now pushes VisionFullResponseStartFrame, VisionFullResponseEndFrame and VisionTextFrame.
    (PR #3252)

Deprecated

  • FalSmartTurnAnalyzer and LocalSmartTurnAnalyzer are deprecated and will be removed in a future version. Use LocalSmartTurnAnalyzerV3 instead.
    (PR #3219)

Removed

  • Removed the deprecated VLLM-based open source Ultravox STT service.
    (PR #3227)

Fixed

  • Fixed a bug in AWSNovaSonicLLMService where we would mishandle cancelled tool calls in the context, resulting in errors.
    (PR #3212)

  • Better support conversation history with Gemini 2.5 Flash Image (model "gemini-2.5-flash-image"). Prior to this fix, the model had no memory of previous images it had generated, so it wouldn't be able to iterate on them.
    (PR #3224)

  • Support conversations with Gemini 3 Pro Image (model "gemini-3-pro-image-preview"). Prior to this fix, after the model generated an image the conversation would not be able to progress.
    (PR #3224)

  • Fixed an issue where ElevenLabsHttpTTSService was not updating voice settings when receiving a TTSUpdateSettingsFrame.
    (PR #3226)

  • Fixed the return type for SmallWebRTCRequestHandler.handle_web_request() function.
    (PR #3230)

  • Fix a bug in LLM context audio content handling
    (PR #3234)

  • In GladiaSTTService, reset the _bytes_sent counter on connecting the websocket. This avoids unnecessary audio buffer trimming.
    (PR #3236)

  • Fixed a TTS service word-timestamp issue that could cause generated TTSTextFrame instances to have an incorrect pts (pts = -1).
    (PR #3240)

  • Fixed an issue in SimpleTextAggreagtor where spaces were not being stripped before returning the aggregation. This resulted in an extra space for TTS services that don't support word-timestamp alignment data.
    (PR #3247)

Don't miss a new pipecat release

NewReleases is sending notifications on new releases.