Added
-
Added new
AudioContextWordTTSService. This is a TTS base class for TTS services that handling multiple separate audio requests. -
Added new frames
EmulateUserStartedSpeakingFrameandEmulateUserStoppedSpeakingFramewhich can be used to emulated VAD behavior without VAD being present or not being triggered. -
Added a new
audio_in_stream_on_startfield toTransportParams. -
Added a new method
start_audio_in_streamingin theBaseInputTransport.- This method should be used to start receiving the input audio in case the field
audio_in_stream_on_startis set tofalse.
- This method should be used to start receiving the input audio in case the field
-
Added support for the
RTVIProcessorto handle buffered audio inbase64format, converting it into InputAudioRawFrame for transport. -
Added support for the
RTVIProcessorto triggerstart_audio_in_streamingonly after theclient-readymessage. -
Added new
MUTE_UNTIL_FIRST_BOT_COMPLETEstrategy toSTTMuteStrategy. This strategy starts muted and remains muted until the first bot speech completes, ensuring the bot's first response cannot be interrupted. This complements the existingFIRST_SPEECHstrategy which only mutes during the first detected bot speech. -
Added support for Google Cloud Speech-to-Text V2 through
GoogleSTTService. -
Added
RimeTTSService, a newWordTTSService. Updated the foundational example07q-interruptible-rime.pyto useRimeTTSService. -
Added support for Groq's Whisper API through the new
GroqSTTServiceand OpenAI's Whisper API through the newOpenAISTTService. Introduced a new base classBaseWhisperSTTServiceto handle common Whisper API functionality. -
Added
PerplexityLLMServicefor Perplexity NIM API integration, with an OpenAI-compatible interface. Also, added foundational example14n-function-calling-perplexity.py. -
Added
DailyTransport.update_remote_participants(). This allows you to update remote participant's settings, like their permissions or which of their devices are enabled. Requires that the local participant have participant admin permission.
Changed
-
We don't consider a colon
:and end of sentence any more. -
Updated
DailyTransportto respect theaudio_in_stream_on_startfield, ensuring it only starts receiving the audio input if it is enabled. -
Updated
FastAPIWebsocketOutputTransportto sendTransportMessageFrameandTransportMessageUrgentFrameto the serializer. -
Updated
WebsocketServerOutputTransportto sendTransportMessageFrameandTransportMessageUrgentFrameto the serializer. -
Enhanced
STTMuteConfigto validate strategy combinations, preventingMUTE_UNTIL_FIRST_BOT_COMPLETEandFIRST_SPEECHfrom being used together as they handle first bot speech differently. -
Updated foundational example
07n-interruptible-google.pyto use all Google services. -
RimeHttpTTSServicenow uses themistv2model by default. -
Improved error handling in
AzureTTSServiceto properly detect and log synthesis cancellation errors. -
Enhanced
WhisperSTTServicewith full language support and improved model documentation. -
Updated foundation example
14f-function-calling-groq.pyto useGroqSTTServicefor transcription. -
Updated
GroqLLMServiceto usellama-3.3-70b-versatileas the default model. -
RTVIObserverdoesn't handleLLMSearchResponseFrameframes anymore. For now, to handle those frames you need to create aGoogleRTVIObserverinstead.
Deprecated
-
STTMuteFilterconstructor'sstt_serviceparameter is now deprecated and will be removed in a future version. The filter now manages mute state internally instead of querying the STT service. -
RTVI.observer()is now deprecated, instantiate anRTVIObserverdirectly instead. -
All RTVI frame processors (e.g.
RTVISpeakingProcessor,RTVIBotLLMProcessor) are now deprecated, instantiate anRTVIObserverinstead.
Fixed
-
Fixed a
FalImageGenServiceissue that was causing the event loop to be blocked while loading the downloadded image. -
Fixed a
CartesiaTTSServiceservice issue that would cause audio overlapping in some cases. -
Fixed a websocket-based service issue (e.g.
CartesiaTTSService) that was preventing a reconnection after the server disconnected cleanly, which was causing an inifite loop instead. -
Fixed a
BaseOutputTransportissue that was causing upstream frames to no be pushed upstream. -
Fixed multiple issue where user transcriptions where not being handled properly. It was possible for short utterances to not trigger VAD which would cause user transcriptions to be ignored. It was also possible for one or more transcriptions to be generated after VAD in which case they would also be ignored.
-
Fixed an issue that was causing
BotStoppedSpeakingFrameto be generated too late. This could then cause issues unblockingSTTMuteFilterlater than desired. -
Fixed an issue that was causing
AudioBufferProcessorto not record synchronized audio. -
Fixed an
RTVIissue that was causingbot-tts-textmessages to be sent before being processed by the output transport. -
Fixed an issue[#1192] in 11labs where we are trying to reconnect/disconnect the websocket connection even when the connection is already closed.
-
Fixed an issue where
has_regular_messagescondition was always true inGoogleLLMContextdue toParthavingfunction_call&function_responsewithNonevalues.
Other
-
Added new
instant-voiceexample. This example showcases how to enable instant voice communication as soon as a user connects. -
Added new
local-input-select-sttexample. This examples allows you to play with local audio inputs by slecting them through a nice text interface.