Added
-
Added
RimeNonJsonTTSServicewhich supports non-JSON streaming mode. This new class supports websocket streaming for the Arcana model.
(PR #3085) -
Added additional functionality related to "thinking", for Google and Anthropic LLMs.
- New typed parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries):
AnthropicLLMService.ThinkingConfigGoogleLLMService.ThinkingConfig
- New frames for representing thoughts output by LLMs:
LLMThoughtStartFrameLLMThoughtTextFrameLLMThoughtEndFrame
- A generic mechanism for recording LLM thoughts to context, used specifically to support Anthropic, whose thought signatures are expected to appear alongside the text of the thoughts within assistant context messages. See:
LLMThoughtEndFrame.signatureLLMAssistantAggregatorhandling of the above fieldAnthropicLLMAdapterhandling of"thought"context messages
- Google-specific logic for inserting thought signatures into the context, to help maintain thinking continuity in a chain of LLM calls. See:
GoogleLLMServicesendingLLMMessagesAppendFrames to add LLM-specific"thought_signature"messages to contextGeminiLLMAdapterhandling of"thought_signature"messages
- An expansion of
TranscriptProcessorto process LLM thoughts in addition to user and assistant utterances. See:TranscriptProcessor(process_thoughts=True)(defaults toFalse)ThoughtTranscriptionMessage, which is now also emitted with the
"on_transcript_update"event
(PR #3175)
- New typed parameters for Google and Anthropic LLMs that control the models' thinking behavior (like how much thinking to do, and whether to output thoughts or thought summaries):
-
Data and control frames can now be marked as non-interruptible by using the
UninterruptibleFramemixin. Frames marked asUninterruptibleFramewill not be interrupted during processing, and any queued frames of this type will be retained in the internal queues. This is useful when you need ordered frames (data or control) that should not be discarded or cancelled due to interruptions.
(PR #3189) -
Added
on_conversation_detectedevent toVoicemaiDetector.
(PR #3207) -
Added
x-goog-api-clientheader with Pipecat's version to all Google services' requests.
(PR #3208) -
Added support for the HeyGen LiveAvatar API (see https://www.liveavatar.com/).
(PR #3210) -
Added to
AWSNovaSonicLLMServicefunctionality related to the new (and now default) Nova 2 Sonic model ("amazon.nova-2-sonic-v1:0"):- Added the
endpointing_sensitivityparameter to control how quickly the model decides the user has stopped speaking. - Made the assistant-response-trigger hack a no-op. It's only needed for the older Nova Sonic model.
(PR #3212)
- Added the
-
Ultravox Realtime is now a supported speech-to-speech service.
- Added
UltravoxRealtimeLLMServicefor the integration. - Added
49-ultravox-realtime.pyexample (with tool calling).
(PR #3227)
- Added
-
Added Daily PSTN dial-in support to the development runner with
--dialinflag. This includes:/daily-dialin-webhookendpoint that handles incoming Daily PSTN webhooks- Automatic Daily room creation with SIP configuration
DialinSettingsandDailyDialinRequesttypes inpipecat.runner.typesfor type-safe dial-in data- The runner now mimics Pipecat Cloud's dial-in webhook handling for local development
(PR #3235)
-
Add Gladia session id to logs for
GladiaSTTService.
(PR #3236) -
Added
InworldHttpTTSServicewhich uses Inworld's HTTP based TTS service in either streaming or non-streaming mode. Note: This class was previously namedInworldTTSService.
(PR #3239) -
Added
language_hints_strictparameter toSonioxSTTServiceto strictly enforces language hints. This ensures that transcription occurs in the specified language.
(PR #3245) -
Added Pipecat library version info to the
aboutfield in thebot-readyRTVI message.
(PR #3248) -
Added
VisionFullResponseStartFrame,VisionFullResponseEndFrameandVisionTextFrame. This are used by vision services similar to LLM services.
(PR #3252)
Changed
-
FunctionCallInProgressFrameandFunctionCallResultFramehave changed from system frames to a control frame and a data frame, respectively, and are now both marked asUninterruptibleFrame.
(PR #3189) -
UserBotLatencyLogObservernow usesVADUserStartedSpeakingFrameandVADUserStoppedSpeakingFrameto determine latency from user stopped speaking to bot started speaking.
(PR #3206) -
Updated
HeyGenVideoServiceandHeyGenTransportto support both HeyGen APIs (Interactive Avatar and Live Avatar).Using them is as simple as specifying the
service_typewhen creating theHeyGenVideoServiceand theHeyGenTransport:heyGen = HeyGenVideoService( api_key=os.getenv("HEYGEN_LIVE_AVATAR_API_KEY"), service_type=ServiceType.LIVE_AVATAR, session=session, )
(PR #3210)
-
Made
"amazon.nova-2-sonic-v1:0"the new default model forAWSNovaSonicLLMService.
(PR #3212) -
Updated the
run_inferencemethods in the LLM service classes (AnthropicLLMService,AWSBedrockLLMService,GoogleLLMService, andOpenAILLMServiceand its base classes) to use the provided LLM configuration parameters.
(PR #3214) -
Updated default models for:
GeminiLiveLLMServicetogemini-2.5-flash-native-audio-preview-12-2025.GeminiLiveVertexLLMServicetogemini-live-2.5-flash-native-audio.
(PR #3228)
-
Changed the
reasonfield inEndFrame,CancelFrame,EndTaskFrame, andCancelTaskFramefromstrtoAnyto indicate that it can hold values other than strings.
(PR #3231) -
Updated websocket STT services to use the
WebsocketSTTServicebase class. This base class manages the websocket connection and handles reconnects.Updated services:
AssemblyAISTTServiceAWSTranscribeSTTServiceGladiaSTTServiceSonioxSTTService
(PR #3236)
-
Changed Inworld's TTS service implementations:
- Previously, the HTTP implementation was named
InworldTTSService. That has been moved toInworldHttpTTSService. This service now supports word-timestamp alignment data in both streaming and non-streaming modes. - Updated the
InworldTTSServiceclass to use Inworld's Websocket API. This class now has support for word-timestamp alignment data and tracks contexts for each user turn.
(PR #3239)
- Previously, the HTTP implementation was named
-
⚠️ Breaking change:
WordTTSService.start_word_timestamps()andWordTTSService.reset_word_timestamps()are now async.
(PR #3240) -
Updated the current RTVI version to 1.1.0 to reflect recent additions and deprecations.
- New RTVI Messages:
send-textandbot-output - Deprecated Messages:
append-to-contextandbot-transcription
(PR #3248)
- New RTVI Messages:
-
MoondreamServicenow pushesVisionFullResponseStartFrame,VisionFullResponseEndFrameandVisionTextFrame.
(PR #3252)
Deprecated
FalSmartTurnAnalyzerandLocalSmartTurnAnalyzerare deprecated and will be removed in a future version. UseLocalSmartTurnAnalyzerV3instead.
(PR #3219)
Removed
- Removed the deprecated VLLM-based open source Ultravox STT service.
(PR #3227)
Fixed
-
Fixed a bug in
AWSNovaSonicLLMServicewhere we would mishandle cancelled tool calls in the context, resulting in errors.
(PR #3212) -
Better support conversation history with Gemini 2.5 Flash Image (model "gemini-2.5-flash-image"). Prior to this fix, the model had no memory of previous images it had generated, so it wouldn't be able to iterate on them.
(PR #3224) -
Support conversations with Gemini 3 Pro Image (model "gemini-3-pro-image-preview"). Prior to this fix, after the model generated an image the conversation would not be able to progress.
(PR #3224) -
Fixed an issue where
ElevenLabsHttpTTSServicewas not updating voice settings when receiving aTTSUpdateSettingsFrame.
(PR #3226) -
Fixed the return type for
SmallWebRTCRequestHandler.handle_web_request()function.
(PR #3230) -
Fix a bug in LLM context audio content handling
(PR #3234) -
In
GladiaSTTService, reset the_bytes_sentcounter on connecting the websocket. This avoids unnecessary audio buffer trimming.
(PR #3236) -
Fixed a TTS service word-timestamp issue that could cause generated
TTSTextFrameinstances to have an incorrect pts (pts = -1).
(PR #3240) -
Fixed an issue in
SimpleTextAggreagtorwhere spaces were not being stripped before returning the aggregation. This resulted in an extra space for TTS services that don't support word-timestamp alignment data.
(PR #3247)