🦃 Happy Thanksgiving! 🦃
Added
-
Enhanced error handling across the framework:
-
Added
on_errorcallback toFrameProcessorfor centralized error handling. -
Renamed
push_error(error: ErrorFrame)topush_error_frame(error: ErrorFrame)for clarity. -
Added new
push_errormethod for simplified error reporting:async def push_error(error_msg: str, exception: Optional[Exception] = None, fatal: bool = False)
-
Standardized error logging by replacing
logger.exceptioncalls withlogger.errorthroughout the codebase.
-
-
Added
cache_read_input_tokens,cache_creation_input_tokensandreasoning_tokensto OTel spans for LLM call -
Added
LiveKitRESTHelperutility class for managing LiveKit rooms via REST API. -
Added
DeepgramSageMakerSTTServicewhich connects to a SageMaker hosted Deepgram STT model. Added07c-interruptible-deepgram-sagemaker.pyfoundational example. -
Added
SageMakerBidiClientto connect to SageMaker hosted BiDi compatible services. -
Added support for
include_timestampsandenable_logginginElevenLabsRealtimeSTTService. Wheninclude_timestampsis enabled, timestamp data is included in theTranscriptionFrame'sresultparameter. -
Added optional speaking rate control to
InworldTTSService. -
Introduced a new
AggregatedTextFrametype to support passing text along with anaggregated_byfield to describe the type of text included.TTSTextFrames now inherit fromAggregatedTextFrame. With this inheritance, an observer can watch forAggregatedTextFrames to accumlate the perceived output and determine whether or not the text was spoken based on if that frame is also aTTSTextFrame.With this frame, the llm token stream can be transformed into custom composable chunks, allowing for aggregation outside the TTS service. This makes it possible to listen for or handle those aggregations and sets the stage for doing things like composing a best effort of the perceived llm output in a more digestable form and to do so whether or not it is processed by a TTS or if even a TTS exists.
-
Introduced
LLMTextProcessor: A new processor meant to allow customization for how LLMTextFrames should be aggregated and considered. It's purpose is to turnLLMTextFrames intoAggregatedTextFrames. By default, a TTSService will still aggregateLLMTextFrames by sentence for the service to consume. However, if you wish to override how the llm text is aggregated, you should no longer override the TTS's internal text_aggregator, but instead, insert this processor between your LLM and TTS in the pipeline. -
New
bot-outputRTVI message to represent what the bot actually "says".-
The
RTVIObservernow emitsbot-outputmessages based off the newAggregatedTextFrames (bot-tts-textandbot-llm-textare still supported and generated, butbot-transcriptis now deprecated in lieu of this new, more thorough, message). -
The new
RTVIBotOutputMessageincludes the fields:-
spoken: A boolean indicating whether the text was spoken by TTS -
aggregated_by: A string representing how the text was aggregated ("sentence", "word", "my custom aggregation")
-
-
Introduced new fields to
RTVIObserverto support the newbot-output
messaging:-
bot_output_enabled: Defaults to True. Set to false to disable bot-output messages. -
skip_aggregator_types: Defaults toNone. Set to a list of strings that match aggregation types that should not be included in bot-output messages. (Ex.credit_card)
-
-
Introduced new methods,
add_text_transformer()andremove_text_transformer(), toRTVIObserverto support providing (and subsequently removing) callbacks for various types of aggregations (or all aggregations with*) that can modify the text before being sent as abot-outputortts-textmessage. (Think obscuring the credit card or inserting extra detail the client might want that the context doesn't need.)
-
-
In
MiniMaxHttpTTSService:-
Added support for speech-2.6-hd and speech-2.6-turbo models
-
Added languages: Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino, Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian, Swedish, and Tamil
-
Added new emotions: calm and fluent
-
-
Added
enable_loggingtoSimliVideoServiceinput parameters. It's disabled by default.
Changed
-
Updated
FishAudioTTSServicedefault model tos1. -
Updated
DeepgramTTSServiceto use Deepgram's TTS websocket API. ⚠️ This is a potential breaking change, which only affects you if you're self-hostingDeepgramTTSService. The new service uses Websockets and improves TTFB latency. -
Updated
daily-pythonto 0.22.0. -
BaseTextAggregatorchanges:Modified the BaseTextAggregator type so that when text gets aggregated, metadata can be associated with it. Currently, that just means a
type, so that the aggregation can be classified or described. Changes made to support this:-
⚠️ IMPORTANT: Aggregators are now expected to strip leading/trailing white space characters before returning their aggregation from
aggregation()or.text. This way all aggregators have a consistent contract allowing downstream use to know how to stitch aggregations back together. -
Introduced a new
Aggregationdataclass to represent both the aggregatedtextand a string identifying thetypeof aggregation (ex. "sentence", "word", "my custom aggregation") -
⚠️ Breaking change:
BaseTextAggregator.textnow returns anAggregation(instead ofstr).Before:
aggregated_text = myAggregator.text
Now:
aggregated_text = myAggregator.text.text
-
⚠️ Breaking change:
BaseTextAggregator.aggregate()now returnsOptional[Aggregation](instead ofOptional[str]).Before:
aggregation = myAggregator.aggregate(text) print(f"successfully aggregated text: {aggregation}")
Now:
aggregation = myAggregator.aggregate(text) if aggregation: print(f"successfully aggregated text: {aggregation.text}")
-
SimpleTextAggregator,SkipTagsAggregator,PatternPairAggregatorupdated to produce/consumeAggregationobjects. -
All uses of the above Aggregators have been updated accordingly.
-
-
Augmented the
PatternPairAggregatorso that matched patterns can be treated as their own aggregation, taking advantage of the new. To that end:-
Introduced a new, preferred version of
add_patternto support a new option for treating a match as a separate aggregation returned fromaggregate(). This replaces the now deprecatedadd_pattern_pairmethod and you provide aMatchActionin lieu of theremove_matchfield.-
MatchActionenum:REMOVE,KEEP,AGGREGATE, allowing customization for how a match should be handled.-
REMOVE: The text along with its delimiters will be removed from the streaming text. Sentence aggregation will continue on as if this text did not exist. -
KEEP: The delimiters will be removed, but the content between them will be kept. Sentence aggregation will continue on with the internal text included. -
AGGREGATE: The delimiters will be removed and the content between will be treated as a separate aggregation. Any text before the start of the pattern will be returned early, whether or not a complete sentence was found. Then the pattern will be returned. Then the aggregation will continue on sentence matching after the closing delimiter is found. The content between the delimiters is not aggregated by sentence. It is aggregated as one single block of text.
-
-
PatternMatchnow extendsAggregationand provides richer info to handlers.
-
-
⚠️ Breaking change: The
PatternMatchtype returned to handlers registered viaon_pattern_matchhas been updated to subclass from the newAggregationtype, which means thatcontenthas been replaced withtextandpattern_idhas been replaced withtype:async dev on_match_tag(match: PatternMatch): pattern = match.type # instead of match.pattern_id text = match.text # instead of match.content
-
-
TextFramenow includes the fieldappend_to_contextto support setting whether or not the encompassing text should be added to the LLM context (by the LLM assistant aggregator). It defaults toTrue. -
TTSServicebase class updates:-
TTSServices now accept a newskip_aggregator_typesto avoid speaking certain aggregation types (now determined/returned by the aggregator) -
Introduced the ability to do a just-in-time transform of text before it gets sent to the TTS service via callbacks you can set up via a new init field,
text_transformsor a new methodadd_text_transformer(). This makes it possible to do things like introduce TTS-specific tags for spelling or emotion or change the pronunciation of something on the fly.remove_text_transformerhas also been added to support removing a registered transform callback. -
TTS services push
AggregatedTextFramein addition toTTSTextFrames when either an aggregation occurs that should not be spoken or when the TTS service supports word-by-word timestamping. In the latter case, theTTSServicepreliminarily generates anAggregatedTextFrame, aggregated by sentence to generate the full sentence content as early as possible.
-
-
Updated
CartesiaTTSService:-
Modified use of custom default text_aggregator to avoid deprecation warnings and push users towards use of transformers or the
LLMTextProcessor -
Added convenience methods for taking advantage of Cartesia's SSML tags: spell, emotion, pauses, volume, and speed.
-
-
Updated
RimeTTSService:-
Modified use of custom default text_aggregator to avoid deprecation warnings and push users towards use of transformers or the
LLMTextProcessor -
Added convenience methods for taking advantage of Rime's customization options: spell, pauses, pronunciations, and inline speed control.
-
Deprecated
-
The TTS constructor field,
text_aggregatoris deprecated in favor of the newLLMTextProcessor. TTSServices still have an internal aggregator for support of default behavior, but if you want to override the aggregation behavior, you should use the new processor. -
The RTVI
bot-transcriptionevent is deprecated in favor of the newbot-outputmessage which is the canonical representation of bot output (spoken or not). The code still emits a transcription message for backwards compatibility while transition occurs. -
Deprecated
add_pattern_pairin thePatternPairAggregatorwhich takes apattern_idandremove_matchfield in favor of the newadd_patternmethod which takes atypeand anaction -
english_normalizationinput parameter forMiniMaxHttpTTSServiceis deprecated, usetest_normalizationinstead.
Fixed
-
Fixed an issue in
AWSBedrockLLMServicewhere theaws_regionarg was always set tous-east-1. -
Fixed an issue with
DeepgramFluxSTTServicewhere it sometimes failed to reconnect. -
Fixed an issue in
ElevenLabsRealtimeSTTServicewhere dynamic language updates were not working. -
Fixed an issue in
ElevenLabsRealtimeSTTServicewhere setting the sample rate would result in transcripts failing. -
Fixed
InworldTTSServiceaudio config payload to use camelCase keys expected by the Inworld API.