Added
-
Added a new
LLMRunFrame
to trigger an LLM response:await task.queue_frames([LLMRunFrame()])
This replaces
OpenAILLMContextFrame
, which you’d previously typically use like this:await task.queue_frames([context_aggregator.user().get_context_frame()])
Use this way of kicking off your conversation when you’ve already initialized your context and are simply instructing the bot when to go:
context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context) # ... @transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. await task.queue_frames([LLMRunFrame()])
Note that if you want to add new messages when kicking off the conversation, you could use
LLMMessagesAppendFrame
withrun_llm=True
instead:@transport.event_handler("on_client_connected") async def on_client_connected(transport, client): # Kick off the conversation. await task.queue_frames([LLMMessagesAppendFrame(new_messages, run_llm=True)])
In the rare case you don’t have a context aggregator in your pipeline, then you may continue using a context frame.
-
Added support for switching between audio+text to text-only modes within the same pipeline. This is done by pushing
LLMConfigureOutputFrame(skip_tts=True)
to enter text-only mode, and disabling it to return to audio+text. The LLM will still generate tokens and add them to the context, but they will not be sent to TTS. -
Added
skip_tts
field toTextFrame
. This lets a text frame bypass TTS while still being included in the LLM context. Useful for cases like structured text that isn’t meant to be spoken but should still contribute to context. -
Added a
cancel_timeout_secs
argument toPipelineTask
which defines how long the pipeline has to complete cancellation. WhenPipelineTask.cancel()
is called, aCancelFrame
is pushed through the pipeline and must reach the end. If it does not reach the end within the specified time, a warning is shown and the wait is aborted. -
Added a new "universal" (LLM-agnostic)
LLMContext
and accompanyingLLMContextAggregatorPair
, which will eventually replaceOpenAILLMContext
(and the other under-the-hood contexts) and the other context aggregators. The new universalLLMContext
machinery allows a single context to be shared between different LLMs, enabling runtime LLM switching and scenarios like failover.From the developer's point of view, switching to using the new universal context machinery will usually be a matter of going from this:
context = OpenAILLMContext(messages, tools) context_aggregator = llm.create_context_aggregator(context)
To this:
context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
To start, the universal
LLMContext
is supported with the following LLM services:OpenAILLMService
GoogleLLMService
-
Added a new
LLMSwitcher
class to enable runtime LLM switching, built atop a new genericServiceSwitcher
.Switchers take a switching strategy. The first available strategy is
ServiceSwitcherStrategyManual
.To switch LLMs at runtime, the LLMs must be sharing one instance of the new universal
LLMContext
(see above bullet).# Instantiate your LLM services llm_openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY")) llm_google = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY")) # Instantiate a switcher # (ServiceSwitcherStrategyManual defaults to OpenAI, as it's first in the list) llm_switcher = LLMSwitcher( llms=[llm_openai, llm_google], strategy_type=ServiceSwitcherStrategyManual ) # Create your pipeline pipeline = Pipeline( [ transport.input(), stt, context_aggregator.user(), llm_switcher, tts, transport.output(), context_aggregator.assistant(), ] ) task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True)) # ... # Whenever is appropriate, switch LLMs! await task.queue_frames([ManuallySwitchServiceFrame(service=llm_google)])
-
Added an
LLMService.run_inference()
method to LLM services to enable direct, out-of-band (i.e. out-of-pipeline) inference.
Changed
-
Updated
daily-python
to 0.19.8. -
PipelineTask
now waits forStartFrame
to reach the end of the pipeline before pushing any other frames. -
Updated
CartesiaTTSService
andCartesiaHttpTTSService
to align with Cartesia's changes for thespeed
parameter. It now takes only an enum ofslow
,normal
, orfast
. -
Added support to
AWSBedrockLLMService
for setting authentication credentials through environment variables. -
Updated
SarvamTTSService
to use WebSocket streaming for real-time audio generation with multiple Indian languages, with HTTP support still available viaSarvamHttpTTSService
.
Fixed
-
Fixed an RTVI issue that was causing frames to be pushed before pipeline was properly initialized.
-
Fixed some
get_messages_for_logging()
that were returning a JSON string instead of a list. -
Fixed a
DailyTransport
issue that prevented DTMF tones from being sent. -
Fixed a missing import in
SentryMetrics
. -
Fixed
AWSPollyTTSService
to support AWS credential provider chain (IAM roles, IRSA, instance profiles) instead of requiring explicit environment variables. -
Fixed a
CartesiaTTSService
issue that was causing the application to hang after Cartesia's 5 minutes timed out. -
Fixed an issue preventing
SpeechmaticsSTTService
from transcribing audio.