pipecat-ai/pipecat v0.0.53 on GitHub

Added

Added ElevenLabsHttpTTSService and the 07d-interruptible-elevenlabs-http.py foundational example.
Introduced pipeline frame observers. Observers can view all the frames that go through the pipeline without the need to inject processors in the pipeline. This can be useful, for example, to implement frame loggers or debuggers among other things. The example examples/foundational/30-observer.py shows how to add an observer to a pipeline for debugging.
Introduced heartbeat frames. The pipeline task can now push periodic heartbeats down the pipeline when enable_heartbeats=True. Heartbeats are system frames that are supposed to make it all the way to the end of the pipeline. When a heartbeat frame is received the traversing time (i.e. the time it took to go through the whole pipeline) will be displayed (with TRACE logging) otherwise a warning will be shown. The example examples/foundational/31-heartbeats.py shows how to enable heartbeats and forces warnings to be displayed.
Added LLMTextFrame and TTSTextFrame which should be pushed by LLM and TTS services respectively instead of TextFrames.
Added OpenRouter for OpenRouter integration with an OpenAI-compatible interface. Added foundational example 14m-function-calling-openrouter.py.
Added a new WebsocketService based class for TTS services, containing base functions and retry logic.
Added DeepSeekLLMService for DeepSeek integration with an OpenAI-compatible interface. Added foundational example 14l-function-calling-deepseek.py.
Added FunctionCallResultProperties dataclass to provide a structured way to control function call behavior, including:
- run_llm: Controls whether to trigger LLM completion
- on_context_updated: Optional callback triggered after context update
Added a new foundational example 07e-interruptible-playht-http.py for easy testing of PlayHTHttpTTSService.
Added support for Google TTS Journey voices in GoogleTTSService.
Added 29-livekit-audio-chat.py, as a new foundational examples for LiveKitTransportLayer.
Added enable_prejoin_ui, max_participants and start_video_off params to DailyRoomProperties.
Added session_timeout to FastAPIWebsocketTransport and WebsocketServerTransport for configuring session timeouts (in seconds). Triggers on_session_timeout for custom timeout handling.
See examples/websocket-server/bot.py.
Added the new modalities option and helper function to set Gemini output modalities.
Added examples/foundational/26d-gemini-multimodal-live-text.py which is using Gemini as TEXT modality and using another TTS provider for TTS process.

Changed

Modified UserIdleProcessor to start monitoring only after first conversation activity (UserStartedSpeakingFrame or BotStartedSpeakingFrame) instead of immediately.
Modified OpenAIAssistantContextAggregator to support controlled completions and to emit context update callbacks via FunctionCallResultProperties.
Added aws_session_token to the PollyTTSService.
Changed the default model for PlayHTHttpTTSService to Play3.0-mini-http.
api_key, aws_access_key_id and region are no longer required parameters for the PollyTTSService (AWSTTSService)
Added session_timeout example in examples/websocket-server/bot.py to handle session timeout event.
Changed InputParams in src/pipecat/services/gemini_multimodal_live/gemini.py to support different modalities.
Changed DeepgramSTTService to send finalize event whenever VAD detects UserStoppedSpeakingFrame. This helps in faster transcriptions and clearing the Deepgram audio buffer.

Fixed

Fixed an issue where DeepgramSTTService was not generating metrics using pipeline's VAD.
Fixed UserIdleProcessor not properly propagating EndFrames through the pipeline.
Fixed an issue where websocket based TTS services could incorrectly terminate their connection due to a retry counter not resetting.
Fixed a PipelineTask issue that would cause a dangling task after stopping the pipeline with an EndFrame.
Fixed an import issue for PlayHTHttpTTSService.
Fixed an issue where languages couldn't be used with the PlayHTHttpTTSService.
Fixed an issue where OpenAIRealtimeBetaLLMService audio chunks were hitting an error when truncating audio content.
Fixed an issue where setting the voice and model for RimeHttpTTSService wasn't working.
Fixed an issue where IdleFrameProcessor and UserIdleProcessor were getting initialized before the start of the pipeline.