github pipecat-ai/pipecat v0.0.82

3 days ago

Added

  • Added a new LLMRunFrame to trigger an LLM response:

    await task.queue_frames([LLMRunFrame()])

    This replaces OpenAILLMContextFrame, which you’d previously typically use like this:

    await task.queue_frames([context_aggregator.user().get_context_frame()])

    Use this way of kicking off your conversation when you’ve already initialized your context and are simply instructing the bot when to go:

    context = OpenAILLMContext(messages, tools)
    context_aggregator = llm.create_context_aggregator(context)
    
    # ...
    
    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        # Kick off the conversation.
        await task.queue_frames([LLMRunFrame()])

    Note that if you want to add new messages when kicking off the conversation, you could use LLMMessagesAppendFrame with run_llm=True instead:

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        # Kick off the conversation.
        await task.queue_frames([LLMMessagesAppendFrame(new_messages, run_llm=True)])

    In the rare case you don’t have a context aggregator in your pipeline, then you may continue using a context frame.

  • Added support for switching between audio+text to text-only modes within the same pipeline. This is done by pushing LLMConfigureOutputFrame(skip_tts=True) to enter text-only mode, and disabling it to return to audio+text. The LLM will still generate tokens and add them to the context, but they will not be sent to TTS.

  • Added skip_tts field to TextFrame. This lets a text frame bypass TTS while still being included in the LLM context. Useful for cases like structured text that isn’t meant to be spoken but should still contribute to context.

  • Added a cancel_timeout_secs argument to PipelineTask which defines how long the pipeline has to complete cancellation. When PipelineTask.cancel() is called, a CancelFrame is pushed through the pipeline and must reach the end. If it does not reach the end within the specified time, a warning is shown and the wait is aborted.

  • Added a new "universal" (LLM-agnostic) LLMContext and accompanying LLMContextAggregatorPair, which will eventually replace OpenAILLMContext (and the other under-the-hood contexts) and the other context aggregators. The new universal LLMContext machinery allows a single context to be shared between different LLMs, enabling runtime LLM switching and scenarios like failover.

    From the developer's point of view, switching to using the new universal context machinery will usually be a matter of going from this:

    context = OpenAILLMContext(messages, tools)
    context_aggregator = llm.create_context_aggregator(context)

    To this:

    context = LLMContext(messages, tools)
    context_aggregator = LLMContextAggregatorPair(context)

    To start, the universal LLMContext is supported with the following LLM services:

    • OpenAILLMService
    • GoogleLLMService
  • Added a new LLMSwitcher class to enable runtime LLM switching, built atop a new generic ServiceSwitcher.

    Switchers take a switching strategy. The first available strategy is ServiceSwitcherStrategyManual.

    To switch LLMs at runtime, the LLMs must be sharing one instance of the new universal LLMContext (see above bullet).

    # Instantiate your LLM services
    llm_openai = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
    llm_google = GoogleLLMService(api_key=os.getenv("GOOGLE_API_KEY"))
    
    # Instantiate a switcher
    # (ServiceSwitcherStrategyManual defaults to OpenAI, as it's first in the list)
    llm_switcher = LLMSwitcher(
        llms=[llm_openai, llm_google], strategy_type=ServiceSwitcherStrategyManual
    )
    
    # Create your pipeline
    pipeline = Pipeline(
      [
          transport.input(),
          stt,
          context_aggregator.user(),
          llm_switcher,
          tts,
          transport.output(),
          context_aggregator.assistant(),
      ]
    )
    task = PipelineTask(pipeline, params=PipelineParams(allow_interruptions=True))
    
    # ...
    # Whenever is appropriate, switch LLMs!
    await task.queue_frames([ManuallySwitchServiceFrame(service=llm_google)])
  • Added an LLMService.run_inference() method to LLM services to enable direct, out-of-band (i.e. out-of-pipeline) inference.

Changed

  • Updated daily-python to 0.19.8.

  • PipelineTask now waits for StartFrame to reach the end of the pipeline before pushing any other frames.

  • Updated CartesiaTTSService and CartesiaHttpTTSService to align with Cartesia's changes for the speed parameter. It now takes only an enum of slow, normal, or fast.

  • Added support to AWSBedrockLLMService for setting authentication credentials through environment variables.

  • Updated SarvamTTSService to use WebSocket streaming for real-time audio generation with multiple Indian languages, with HTTP support still available via SarvamHttpTTSService.

Fixed

  • Fixed an RTVI issue that was causing frames to be pushed before pipeline was properly initialized.

  • Fixed some get_messages_for_logging() that were returning a JSON string instead of a list.

  • Fixed a DailyTransport issue that prevented DTMF tones from being sent.

  • Fixed a missing import in SentryMetrics.

  • Fixed AWSPollyTTSService to support AWS credential provider chain (IAM roles, IRSA, instance profiles) instead of requiring explicit environment variables.

  • Fixed a CartesiaTTSService issue that was causing the application to hang after Cartesia's 5 minutes timed out.

  • Fixed an issue preventing SpeechmaticsSTTService from transcribing audio.

Don't miss a new pipecat release

NewReleases is sending notifications on new releases.