Added
-
Added ai-coustics integrated VAD (
AICVADAnalyzer) withAICFilterfactory and example wiring; leverages the enhancement model for robust detection with no ONNX dependency or added processing complexity. -
Added a watchdog to
DeepgramFluxSTTServiceto prevent dangling tasks in case the user was speaking and we stop receiving audio. -
Introduced a minimum confidence parameter in
DeepgramFluxSTTServiceto avoid generating transcriptions below a defined threshold. -
Added
ElevenLabsRealtimeSTTServicewhich implements the Realtime STT service from ElevenLabs. -
Added word-level timestamps support to Hume TTS service
Changed
-
⚠️ Breaking change:
LLMContext.create_image_message(),LLMContext.create_audio_message(),LLMContext.add_image_frame_message()andLLMContext.add_audio_frames_message()are now async methods. This fixes an issue where the asyncio event loop would be blocked while encoding audio or images. -
ConsumerProcessornow queues frames from the producer internally instead of pushing them directly. This allows us to subclass consumer processors and manipulate frames before they are pushed. -
BaseTextFilteronly require subclasses to implement thefilter()method. -
Extracted the logic for retrying connections, and create a new
send_with_retrymethod insideWebSocketService. -
Refactored
DeepgramFluxSTTServiceto automatically reconnect if sending a message fails. -
Updated all STT and TTS services to use consistent error handling pattern with
push_error()method for better pipeline error event integration. -
Added support for
maybe_capture_participant_camera()andmaybe_capture_participant_screen()forSmallWebRTCTransportin the runner utils. -
Added Hindi support for Rime TTS services.
-
Updated
GeminiTTSServiceto use Google Cloud Text-to-Speech streaming API instead of the deprecated Gemini API. Now usescredentials/credentials_pathfor authentication. Theapi_keyparameter is deprecated. Also, added support forpromptparameter for style instructions and expressive markup tags. Significantly improved latency with streaming synthesis. -
Updated language mappings for the Google and Gemini TTS services to match official documentation.
Deprecated
- The
api_keyparameter inGeminiTTSServiceis deprecated. Usecredentialsorcredentials_pathinstead for Google Cloud authentication.
Fixed
-
Fixed a
SimliVideoServiceconnection issue. -
Fixed an issue in the
Runnerwhere, when usingSmallWebRTCTransport, therequest_datawas not being passed to theSmallWebRTCRunnerArgumentsbody. -
Fixed subtle issue of assistant context messages ending up with double spaces between words or sentences.
-
Fixed an issue where
NeuphonicTTSServicewasn't pushingTTSTextFrames, meaning assistant messages weren't being written to context. -
Fixed an issue with OpenTelemetry where tracing wasn't correctly displaying LLM completions and tools when using the universal
LLMContext. -
Fixed issue where
DeepgramFluxSTTServicefailed to connect if passing akeytermortagcontaining a space. -
Prevented
HeyGenVideoServicefrom automatically disconnecting after 5 minutes.