github pipecat-ai/pipecat v0.0.95

13 hours ago

Added

  • Added ai-coustics integrated VAD (AICVADAnalyzer) with AICFilter factory and example wiring; leverages the enhancement model for robust detection with no ONNX dependency or added processing complexity.

  • Added a watchdog to DeepgramFluxSTTService to prevent dangling tasks in case the user was speaking and we stop receiving audio.

  • Introduced a minimum confidence parameter in DeepgramFluxSTTService to avoid generating transcriptions below a defined threshold.

  • Added ElevenLabsRealtimeSTTService which implements the Realtime STT service from ElevenLabs.

  • Added word-level timestamps support to Hume TTS service

Changed

  • ⚠️ Breaking change: LLMContext.create_image_message(), LLMContext.create_audio_message(), LLMContext.add_image_frame_message() and LLMContext.add_audio_frames_message() are now async methods. This fixes an issue where the asyncio event loop would be blocked while encoding audio or images.

  • ConsumerProcessor now queues frames from the producer internally instead of pushing them directly. This allows us to subclass consumer processors and manipulate frames before they are pushed.

  • BaseTextFilter only require subclasses to implement the filter() method.

  • Extracted the logic for retrying connections, and create a new send_with_retry method inside WebSocketService.

  • Refactored DeepgramFluxSTTService to automatically reconnect if sending a message fails.

  • Updated all STT and TTS services to use consistent error handling pattern with push_error() method for better pipeline error event integration.

  • Added support for maybe_capture_participant_camera() and maybe_capture_participant_screen() for SmallWebRTCTransport in the runner utils.

  • Added Hindi support for Rime TTS services.

  • Updated GeminiTTSService to use Google Cloud Text-to-Speech streaming API instead of the deprecated Gemini API. Now uses credentials / credentials_path for authentication. The api_key parameter is deprecated. Also, added support for prompt parameter for style instructions and expressive markup tags. Significantly improved latency with streaming synthesis.

  • Updated language mappings for the Google and Gemini TTS services to match official documentation.

Deprecated

  • The api_key parameter in GeminiTTSService is deprecated. Use credentials or credentials_path instead for Google Cloud authentication.

Fixed

  • Fixed a SimliVideoService connection issue.

  • Fixed an issue in the Runner where, when using SmallWebRTCTransport, the request_data was not being passed to the SmallWebRTCRunnerArguments body.

  • Fixed subtle issue of assistant context messages ending up with double spaces between words or sentences.

  • Fixed an issue where NeuphonicTTSService wasn't pushing TTSTextFrames, meaning assistant messages weren't being written to context.

  • Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying LLM completions and tools when using the universal LLMContext.

  • Fixed issue where DeepgramFluxSTTService failed to connect if passing a keyterm or tag containing a space.

  • Prevented HeyGenVideoService from automatically disconnecting after 5 minutes.

Don't miss a new pipecat release

NewReleases is sending notifications on new releases.