pipecat-ai/pipecat v0.0.80 on GitHub

Added GeminiTTSService which uses Google Gemini to generate TTS output. The Gemini model can be prompted to insert styled speech to control the TTS output.
Added Exotel support to Pipecat's development runner. You can now connect using the runner with uv run bot.py -t exotel and an ngrok connection to HTTP port 7860.
Added enable_direct_mode argument to FrameProcessor. The direct mode is for processors which require very little I/O or compute resources, that is processors that can perform their task almost immediately. These type of processors don't need any of the internal tasks and queues usually created by frame processors which means overall application performance might be slightly increased. Use with care.
Added TTFB metrics for HeyGenVideoService and TavusVideoService.
Added endpoint_id parameter to AzureSTTService. (Custom EndpointId)

WatchdogPriorityQueue now requires the items to be inserted to always be tuples and the size of the tuple needs to be specified in the constructor when creating the queue with the tuple_size argument.
Updated Moondream to revision 2025-01-09.
Updated PlayHTHttpTTSService to no longer use the pyht client to remove compatibility issues with other packages. Now you can use the PlayHT HTTP service with other services, like GoogleLLMService.
Updated pyproject.toml to once again pin numba to >=0.61.2 in order to resolve package versioning issues.
Updated the STTMuteFilter to include VADUserStartedSpeakingFrame and VADUserStoppedSpeakingFrame in the list of frames to filter when the filtering is on.

Improving the latency of the HeyGenVideoService.
Improved some frame processors performance by using the new frame processor direct mode. In direct mode a frame processor will process frames right away avoiding the need for internal queues and tasks. This is useful for some simple processors. For example, in processors that wrap other processors (e.g. Pipeline, ParallelPipeline), we add one processor before and one after the wrapped processors (internally, you will see them as sources and sinks). These sources and sinks don't do any special processing and they basically forward frames. So, for these simple processors we now enable the new direct mode which avoids creating any internal tasks (and queues) and therefore improves performance.

Fixed an issue with the BaseWhisperSTTService where the language was specified as an enum and not a string.
Fixed an issue where SmallWebRTCTransport ended before TTS finished.
Fixed an issue in OpenAIRealtimeBetaLLMService where specifying a text modalities didn't result in text being outputted from the model.
Added SSML reserved character escaping to AzureBaseTTSService to properly handle special characters in text sent to Azure TTS. This fixes an issue where characters like &, <, >, ", and ' in LLM-generated text would cause TTS failures.
Fixed a WatchdogPriorityQueue issue that could cause an exception when compating watchdog cancel sentinel items with other items in the queue.
Fixed an issue that would cause system frames to not be processed with higher priority than other frames. This could cause slower interruption times.
Fixed an issue where retrying a websocket connection error would result in an error.

Add foundation example 19b-openai-realtime-beta-text.py, showing how to use OpenAIRealtimeBetaLLMService to output text to a TTS service.
Add vision support to release evals so we can run the foundational examples 12 series.
Added foundational example 15a-switch-languages.py to release evals. It is able to detect if we switched the language properly.
Updated foundational examples to show how to enclose complex logic (e.g. ParallelPipeline) into a single processor so the main pipeline becomes simpler.
Added 07n-interruptible-gemini.py, demonstrating how to use GeminiTTSService.