Added
-
Added new Gradium services,
GradiumSTTServiceandGradiumTTSService, for speech-to-text and text-to-speech functionality using Gradium's API. -
Additions for
AsyncAITTSServiceandAsyncAIHttpTTSService:- Added new
languages:pt,nl,ar,ru,ro,ja,he,hy,tr,hi,zh. - Updated the default model to
asyncflow_multilingual_v1.0for improved accuracy and broader language coverage.
- Added new
-
Added optional tool and tool output filters for MCP services.
Changed
-
Updated Deepgram logging to include Deepgram request IDs for improved debugging.
-
Text Aggregation Improvements:
- Breaking Change:
BaseTextAggregator.aggregate()now returnsAsyncIterator[Aggregation]instead ofOptional[Aggregation]. This enables the aggregator to return multiple results based on the provided text. - Refactored text aggregators to use inheritance:
SkipTagsAggregatorandPatternPairAggregatornow inherit fromSimpleTextAggregator, reusing the base class's sentence detection logic.
- Breaking Change:
-
Improved interruption handling to prevent bots from repeating themselves. LLM services that return multiple sentences in a single response (e.g.,
GoogleLLMService) are now split into individual sentences before being sent to TTS. This ensures interruptions occur at sentence boundaries, preventing the bot from repeating content after being interrupted during long responses. -
Updated
AICFilterto use Quail STT as the default model (AICModelType.QUAIL_STT). Quail STT is optimized for human-to-machine interaction (e.g., voice agents, speech-to-text) and operates at a native sample rate of 16 kHz with fixed enhancement parameters. -
If an unexpected exception is caught, or if
FrameProcessor.push_error()is called with an exception, the file name and line number where the exception occured are now logged. -
Updated Smart Turn model weights to v3.1.
-
Smart Turn analyzer now uses the full context of the turn rather than just the audio since VAD last triggered.
-
Updated
CartesiaSTTServiceto return the full transcriptionresultin theTranscriptionFrameandInterimTranscriptionFrame. This provides access to word timestamp data. -
Added tracking headers (
X-Hume-Client-NameandX-Hume-Client-Version) to all requests made byHumeTTSServiceto the Hume API for better usage tracking and analytics.- Added
stop()andcancel()cleanup methods toHumeTTSServiceto properly close the HTTP client and prevent resource leaks.
- Added
Deprecated
-
NVIDIA Services name changes (all functionality is unchanged):
NimLLMServiceis now deprecated, useNvidiaLLMServiceinstead.RivaSTTServiceis now deprecated, useNvidiaSTTServiceinstead.RivaTTSServiceis now deprecated, useNvidiaTTSServiceinstead.- Use
uv pip install pipecat-ai[nvidia]instead ofuv pip install pipecat-ai[riva]
-
The
noise_gate_enableparameter inAICFilteris deprecated and no longer has any effect. Noise gating is now handled automatically by the AIC VAD system. UseAICFilter.create_vad_analyzer()for VAD functionality instead. -
Package
pipecat.syncis deprecated, usepipecat.utils.syncinstead.
Fixed
-
Fixed bug in
PatternPairAggregatorwhere pattern handlers could be called multiple times forKEEPorAGGREGATEpatterns. -
Fixed sentence aggregation to correctly handle ambiguous punctuation in streaming text, such as currency ("$29.95") and abbreviations ("Mr. Smith").
-
Fixed an issue in
AWSTranscribeSTTServicewhere theregionarg was always set tous-east-1when providing an AWS_REGION env var. -
Fixed an issue in
SarvamTTSServicewhere the last sentence was not being spoken. Now, audio is flushed when the TTS services receives theLLMFullResponseEndFrameorEndFrame. -
Fixed an issue in
DeepgramTTSServicewhere aTTSStoppedFramewas incorrectly pushed after a functional call. This caused an issue with the voice-ui-kit's conversational panel rending of the LLM output after a function call. -
Fixed an issue where
LLMTextFrame.skip_ttswas being overwritten by LLM services. -
Fixed an issue that caused
WebsocketServiceinstances to attempt reconnection during shutdown. -
Fixed an issue in
ElevenLabsTTSServicewhere character usage metrics were only reported on the first TTS generation per turn.