Added
-
Constructor arguments for GoogleLLMService to directly set tools and tool_config.
-
Smart turn detection example (
22d-natural-conversation-gemini-audio.py) that leverages Gemini 2.0 capabilities ().
(see https://x.com/kwindla/status/1870974144831275410) -
Added
DailyTransport.send_dtmf()to send dial-out DTMF tones. -
Added
DailyTransport.sip_call_transfer()to forward SIP and PSTN calls to another address or number. For example, transfer a SIP call to a different SIP address or transfer a PSTN phone number to a different PSTN phone number. -
Added
DailyTransport.sip_refer()to transfer incoming SIP/PSTN calls from outside Daily to another SIP/PSTN address. -
Added an
auto_modeinput parameter toElevenLabsTTSService.auto_modeis set toTrueby default. Enabling this setting disables the chunk schedule and all buffers, which reduces latency. -
Added
KoalaFilterwhich implement on device noise reduction using Koala Noise Suppression.
(see https://picovoice.ai/platform/koala/) -
Added
CerebrasLLMServicefor Cerebras integration with an OpenAI-compatible interface. Added foundational example14k-function-calling-cerebras.py. -
Pipecat now supports Python 3.13. We had a dependency on the
audiooppackage which was deprecated and now removed on Python 3.13. We are now usingaudioop-lts(https://github.com/AbstractUmbra/audioop) to provide the same functionality. -
Added timestamped conversation transcript support:
- New
TranscriptProcessorfactory provides access to user and assistant transcript processors. UserTranscriptProcessorprocesses user speech with timestamps from transcription.AssistantTranscriptProcessorprocesses assistant responses with LLM context timestamps.- Messages emitted with ISO 8601 timestamps indicating when they were spoken.
- Supports all LLM formats (OpenAI, Anthropic, Google) via standard message format.
- New examples:
28a-transcription-processor-openai.py,28b-transcription-processor-anthropic.py, and28c-transcription-processor-gemini.py.
- New
-
Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino, Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian, Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa).
Changed
-
PlayHTTTSServiceuses the new v4 websocket API, which also fixes an issue where text inputted to the TTS didn't return audio. -
The default model for
ElevenLabsTTSServiceis noweleven_flash_v2_5. -
OpenAIRealtimeBetaLLMServicenow takes amodelparameter in the constructor. -
Updated the default model for the
OpenAIRealtimeBetaLLMService. -
Room expiration (
exp) inDailyRoomPropertiesis now optional (None) by default instead of automatically setting a 5-minute expiration time. You must explicitly set expiration time if desired.
Deprecated
AWSTTSServiceis now deprecated, usePollyTTSServiceinstead.
Fixed
-
Fixed token counting in
GoogleLLMService. Tokens were summed incorrectly (double-counted in many cases). -
Fixed an issue that could cause the bot to stop talking if there was a user interruption before getting any audio from the TTS service.
-
Fixed an issue that would cause
ParallelPipelineto handleEndFrameincorrectly causing the main pipeline to not terminate or terminate too early. -
Fixed an audio stuttering issue in
FastPitchTTSService. -
Fixed a
BaseOutputTransportissue that was causing non-audio frames being processed before the previous audio frames were played. This will allow, for example, sending a frameAafter aTTSSpeakFrameand the frameAwill only be pushed downstream after the audio generated fromTTSSpeakFramehas been spoken. -
Fixed a
DeepgramSTTServiceissue that was causing language to be passed as an object instead of a string resulting in the connection to fail.