pipecat-ai/pipecat v0.0.95 on GitHub

Added

Added ai-coustics integrated VAD (AICVADAnalyzer) with AICFilter factory and example wiring; leverages the enhancement model for robust detection with no ONNX dependency or added processing complexity.
Added a watchdog to DeepgramFluxSTTService to prevent dangling tasks in case the user was speaking and we stop receiving audio.
Introduced a minimum confidence parameter in DeepgramFluxSTTService to avoid generating transcriptions below a defined threshold.
Added ElevenLabsRealtimeSTTService which implements the Realtime STT service from ElevenLabs.
Added word-level timestamps support to Hume TTS service

Changed

⚠️ Breaking change: LLMContext.create_image_message(), LLMContext.create_audio_message(), LLMContext.add_image_frame_message() and LLMContext.add_audio_frames_message() are now async methods. This fixes an issue where the asyncio event loop would be blocked while encoding audio or images.
ConsumerProcessor now queues frames from the producer internally instead of pushing them directly. This allows us to subclass consumer processors and manipulate frames before they are pushed.
BaseTextFilter only require subclasses to implement the filter() method.
Extracted the logic for retrying connections, and create a new send_with_retry method inside WebSocketService.
Refactored DeepgramFluxSTTService to automatically reconnect if sending a message fails.
Updated all STT and TTS services to use consistent error handling pattern with push_error() method for better pipeline error event integration.
Added support for maybe_capture_participant_camera() and maybe_capture_participant_screen() for SmallWebRTCTransport in the runner utils.
Added Hindi support for Rime TTS services.
Updated GeminiTTSService to use Google Cloud Text-to-Speech streaming API instead of the deprecated Gemini API. Now uses credentials / credentials_path for authentication. The api_key parameter is deprecated. Also, added support for prompt parameter for style instructions and expressive markup tags. Significantly improved latency with streaming synthesis.
Updated language mappings for the Google and Gemini TTS services to match official documentation.

Deprecated

The api_key parameter in GeminiTTSService is deprecated. Use credentials or credentials_path instead for Google Cloud authentication.

Fixed

Fixed a SimliVideoService connection issue.
Fixed an issue in the Runner where, when using SmallWebRTCTransport, the request_data was not being passed to the SmallWebRTCRunnerArguments body.
Fixed subtle issue of assistant context messages ending up with double spaces between words or sentences.
Fixed an issue where NeuphonicTTSService wasn't pushing TTSTextFrames, meaning assistant messages weren't being written to context.
Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying LLM completions and tools when using the universal LLMContext.
Fixed issue where DeepgramFluxSTTService failed to connect if passing a keyterm or tag containing a space.
Prevented HeyGenVideoService from automatically disconnecting after 5 minutes.