pipecat-ai/pipecat v0.0.100 on GitHub

Added

Added Hathora service to support Hathora-hosted TTS and STT models (only non-streaming)
(PR #3169)
Added CambTTSService, using Camb.ai's TTS integration with MARS models (mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech synthesis.
(PR #3349)
Added the additional_headers param to WebsocketClientParams, allowing WebsocketClientTransport to send custom headers on connect, for cases such as authentication.
(PR #3461)
Added UserIdleController for detecting user idle state, integrated into LLMUserAggregator and UserTurnProcessor via optional user_idle_timeout parameter. Emits on_user_turn_idle event for application-level handling. Deprecated UserIdleProcessor in favor of the new compositional approach.
(PR #3482)
Added on_user_mute_started and on_user_mute_stopped event handlers to LLMUserAggregator for tracking user mute state changes.
(PR #3490)

Changed

Enhanced interruption handling in AsyncAITTSService by supporting multi-context WebSocket sessions for more robust context management.
(PR #3287)
Throttle UserSpeakingFrame to broadcast at most every 200ms instead of on every audio chunk, reducing frame processing overhead during user speech.
(PR #3483)

Deprecated

For consistency with other package names, we just deprecated pipecat.turns.mute (introduced in Pipecat 0.0.99) in favor of pipecat.turns.user_mute.
(PR #3479)

Fixed

Corrected TTFB metric calculation in AsyncAIHttpTTSService.
(PR #3287)
Fixed an issue where the "bot-llm-text" RTVI event would not fire for realtime (speech-to-speech) services:
- AWSNovaSonicLLMService
- GeminiLiveLLMService
- OpenAIRealtimeLLMService
- GrokRealtimeLLMService
The issue was that these services weren't pushing LLMTextFrames. Now they do.
(PR #3446)
Fixed an issue where on_user_turn_stop_timeout could fire while a user is talking when using ExternalUserTurnStrategies.
(PR #3454)
Fixed an issue where user turn start strategies were not being reset after a user turn started, causing incorrect strategy behavior.
(PR #3455)
Fixed MinWordsUserTurnStartStrategy to not aggregate transcriptions, preventing incorrect turn starts when words are spoken with pauses between them.
(PR #3462)
Fixed an issue where Grok Realtime would error out when running with SmallWebRTC transport.
(PR #3480)
Fixed a Mem0MemoryService issue where passing async_mode: true was causing an error. See https://docs.mem0.ai/platform/features/async-mode-default-change.
(PR #3484)
Fixed AWSNovaSonicLLMService.reset_conversation(), which would previously error out. Now it successfully reconnects and "rehydrates" from the context object.
(PR #3486)
Fixed AzureTTSService transcript formatting issues:
- Punctuation now appears without extra spaces (e.g., "Hello!" instead of "Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces between characters
  (PR #3489)
Fixed an issue where UninterruptibleFrame frames would not be preserved in some cases.
(PR #3494)
Fixed memory leak in LiveKitTransport when video_in_enabled is False.
(PR #3499)
Fixed an issue in AIService where unhandled exceptions in start(), stop(), or cancel() implementations would prevent process_frame() to continue and therefore StartFrame, EndFrame, or CancelFrame from being pushed downstream, causing the pipeline to not start or stop properly.
(PR #3503)
Moved NVIDIATTSService and NVIDIASTTService client initialization from constructor to start() for better error handling.
(PR #3504)
Optimized NVIDIATTSService to process incoming audio frames immediately.
(PR #3509)
Optimized NVIDIASTTService by removing unnecessary queue and task.
(PR #3509)
Fixed a CambTTSService issue where client was being initialized in the constructor which wouldn't allow for proper Pipeline error handling.
(PR #3511)