Added
-
Added Hathora service to support Hathora-hosted TTS and STT models (only non-streaming)
(PR #3169) -
Added
CambTTSService, using Camb.ai's TTS integration with MARS models (mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech synthesis.
(PR #3349) -
Added the
additional_headersparam toWebsocketClientParams, allowingWebsocketClientTransportto send custom headers on connect, for cases such as authentication.
(PR #3461) -
Added
UserIdleControllerfor detecting user idle state, integrated intoLLMUserAggregatorandUserTurnProcessorvia optionaluser_idle_timeoutparameter. Emitson_user_turn_idleevent for application-level handling. DeprecatedUserIdleProcessorin favor of the new compositional approach.
(PR #3482) -
Added
on_user_mute_startedandon_user_mute_stoppedevent handlers toLLMUserAggregatorfor tracking user mute state changes.
(PR #3490)
Changed
-
Enhanced interruption handling in
AsyncAITTSServiceby supporting multi-context WebSocket sessions for more robust context management.
(PR #3287) -
Throttle
UserSpeakingFrameto broadcast at most every 200ms instead of on every audio chunk, reducing frame processing overhead during user speech.
(PR #3483)
Deprecated
- For consistency with other package names, we just deprecated
pipecat.turns.mute(introduced in Pipecat 0.0.99) in favor ofpipecat.turns.user_mute.
(PR #3479)
Fixed
-
Corrected TTFB metric calculation in
AsyncAIHttpTTSService.
(PR #3287) -
Fixed an issue where the "bot-llm-text" RTVI event would not fire for realtime (speech-to-speech) services:
AWSNovaSonicLLMServiceGeminiLiveLLMServiceOpenAIRealtimeLLMServiceGrokRealtimeLLMService
The issue was that these services weren't pushing
LLMTextFrames. Now they do.
(PR #3446) -
Fixed an issue where
on_user_turn_stop_timeoutcould fire while a user is talking when usingExternalUserTurnStrategies.
(PR #3454) -
Fixed an issue where user turn start strategies were not being reset after a user turn started, causing incorrect strategy behavior.
(PR #3455) -
Fixed
MinWordsUserTurnStartStrategyto not aggregate transcriptions, preventing incorrect turn starts when words are spoken with pauses between them.
(PR #3462) -
Fixed an issue where Grok Realtime would error out when running with SmallWebRTC transport.
(PR #3480) -
Fixed a
Mem0MemoryServiceissue where passingasync_mode: truewas causing an error. See https://docs.mem0.ai/platform/features/async-mode-default-change.
(PR #3484) -
Fixed
AWSNovaSonicLLMService.reset_conversation(), which would previously error out. Now it successfully reconnects and "rehydrates" from the context object.
(PR #3486) -
Fixed
AzureTTSServicetranscript formatting issues:- Punctuation now appears without extra spaces (e.g., "Hello!" instead of "Hello !")
- CJK languages (Chinese, Japanese, Korean) no longer have unwanted spaces between characters
(PR #3489)
-
Fixed an issue where
UninterruptibleFrameframes would not be preserved in some cases.
(PR #3494) -
Fixed memory leak in
LiveKitTransportwhenvideo_in_enabledisFalse.
(PR #3499) -
Fixed an issue in
AIServicewhere unhandled exceptions instart(),stop(), orcancel()implementations would preventprocess_frame()to continue and thereforeStartFrame,EndFrame, orCancelFramefrom being pushed downstream, causing the pipeline to not start or stop properly.
(PR #3503) -
Moved
NVIDIATTSServiceandNVIDIASTTServiceclient initialization from constructor tostart()for better error handling.
(PR #3504) -
Optimized
NVIDIATTSServiceto process incoming audio frames immediately.
(PR #3509) -
Optimized
NVIDIASTTServiceby removing unnecessary queue and task.
(PR #3509) -
Fixed a
CambTTSServiceissue where client was being initialized in the constructor which wouldn't allow for proper Pipeline error handling.
(PR #3511)