Added
-
Added
InputTextRawFrameframe type to handle user text input with Gemini Multimodal Live. -
Added
HeyGenVideoService. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/) -
Added the ability to switch voices to
RimeTTSService. -
Added unified development runner for building voice AI bots across multiple transports
pipecat.runner.run– FastAPI-based development server with automatic bot discoverypipecat.runner.types– Runner session argument types (DailyRunnerArguments,SmallWebRTCRunnerArguments,WebSocketRunnerArguments)pipecat.runner.utils.create_transport()– Factory function for creating transports from session argumentspipecat.runner.dailyandpipecat.runner.livekit– Configuration utilities for Daily and LiveKit setups- Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo
- Automatic telephony provider detection and serializer configuration
- ESP32 WebRTC compatibility with SDP munging
- Environment detection (
ENV=local) for conditional features
-
Added Async.ai TTS integration (https://async.ai/)
AsyncAITTSService– WebSocket-based streaming TTS with interruption supportAsyncAIHttpTTSService– HTTP-based streaming TTS service- Example scripts:
examples/foundational/07ac-interruptible-asyncai.py(WebSocket demo)examples/foundational/07ac-interruptible-asyncai-http.py(HTTP demo)
-
Added
transcription_bucketparams support to theDailyRESTHelper. -
Added a new TTS service,
InworldTTSService. This service provides low-latency, high-quality speech generation using Inworld's streaming API. -
Added a new field
handle_sigtermtoPipelineRunner. It defaults toFalse. This field handles SIGTERM signals. Thehandle_sigintfield still defaults toTrue, but now it handles only SIGINT signals. -
Added foundational example
14u-function-calling-ollama.pyfor Ollama function calling. -
Added
LocalSmartTurnAnalyzerV2, which supports local on-device inference with the newsmart-turn-v2turn detection model. -
Added
set_log_leveltoDailyTransport, allowing setting the logging level for Daily's internal logging system. -
Added
on_transcription_stoppedandon_transcription_errorto Daily callbacks.
Changed
-
Changed the default
urlforNeuphonicTTSServicetowss://api.neuphonic.comas it provides better global performance. You can set the URL to other URLs, such as the previous default:wss://eu-west-1.api.neuphonic.com. -
Update
daily-pythonto 0.19.5. -
STTMuteFilternow pushes theSTTMuteFrameupstream and downstream, to allow for more flexibleSTTMuteFilterplacement. -
Play delayed messages from
ElevenLabsTTSServiceif they still belong to the current context. -
Dependency compatibility improvements: Relaxed version constraints for core dependencies to support broader version ranges while maintaining stability:
aiohttp,Markdown,nltk,numpy,Pillow,pydantic,openai,numba: Now support up to the next major version (e.g.numpy>=1.26.4,<3)pyht: Relaxed to>=0.1.6to resolvegrpcioconflicts withnvidia-riva-clientfastapi: Updated to support versions>=0.115.6,<0.117.0torch/torchaudio: Changed from exact pinning (==2.5.0) to compatible range (~=2.5.0)aws_sdk_bedrock_runtime: Added Python 3.12+ constraint via environment markernumba: Reduced minimum version to0.60.0for better compatibility
-
Changed
NeuphonicHttpTTSServiceto use a POST based request instead of thepyneuphonicpackage. This removes a package requirement, allowing Neuphonic to work with more services. -
Updated
ElevenLabsTTSServiceto handle the case whereallow_interruptions=False. Now, when interruptions are disabled, the same context ID will be used throughout the conversation. -
Updated the
deepgramoptional dependency to 4.7.0, which downgrades thetasks cancelled errorto a debug log. This removes the log from appearing in Pipecat logs upon leaving. -
Upgraded the
websocketsimplementation to the new asyncio implementation. Along with this change, we're updating support for versions >=13.1.0 and <15.0.0. All services have been update to use the asyncio implementation. -
Updated
MiniMaxHttpTTSServicewith abase_urlarg where you can specify the Global endpoint (default) or Mainland China. -
Replaced regex-based sentence detection in
match_endofsentencewith NLTK's punkt_tab tokenizer for more reliable sentence boundary detection. -
Changed the
livekitoptional dependency fortenacitytotenacity>=8.2.3,<10.0.0in order to support thegoogle-genaipackage. -
For
LmntTTSService, changed the defaultmodeltoblizzard, LMNT's recommended model. -
Updated
SpeechmaticsSTTService:- Added support for additional diarization options.
- Added foundational example
07a-interruptible-speechmatics-vad.py, which
uses VAD detection provided bySpeechmaticsSTTService.
Fixed
-
Fixed a
LLMUserResponseAggregatorissue where interruptions were not being handled properly. -
Fixed
PiperTTSServiceto work with newer Piper GPL. -
Fixed a race condition in
FastAPIWebsocketClientthat occurred when attempting to send a message while the client was disconnecting. -
Fixed an issue in
GoogleLLMServicewhere interruptions did not work when an interruption strategy was used. -
Fixed an issue in the
TranscriptProcessorwhere newline characters could cause the transcript output to be corrupted (e.g. missing all spaces). -
Fixed an issue in
AudioBufferProcessorwhen usingSmallWebRTCTransportwhere, if the microphone was muted, track timing was not respected. -
Fixed an error that occurs when pushing an
LLMMessagesFrame. Only some LLM services, like Grok, are impacted by this issue. The fix is to remove the optionalnameproperty that was being added to the message. -
Fixed an issue in
AudioBufferProcessorthat caused garbled audio whenenable_turn_audiowas enabled and audio resampling was required. -
Fixed a dependency issue for uv users where an
llvmliteversion required python 3.9. -
Fixed an issue in
MiniMaxHttpTTSServicewhere thepitchparam was the incorrect type. -
Fixed an issue with OpenTelemetry tracing where the
enable_tracingflag did not disable the internal tracing decorator functions. -
Fixed an issue in
OLLamaLLMServicewhere kwargs were not passed correctly to the parent class. -
Fixed an issue in
ElevenLabsTTSServicewhere the word/timestamp pairs were calculating word boundaries incorrectly. -
Fixed an issue where, in some edge cases, the
EmulateUserStartedSpeakingFramecould be created even if we didn't have a transcription. -
Fixed an issue in
GoogleLLMContextwhere it would inject thesystem_messageas a "user" message into cases where it was not meant to; it was only meant to do that when there were no "regular" (non-function-call) messages in the context, to ensure that inference would run properly. -
Fixed an issue in
LiveKitTransportwhere theon_audio_track_subscribedwas never emitted.
Other
-
Added new quickstart demos:
- examples/quickstart: voice AI bot quickstart
- examples/client-server-web: client/server starter example
- examples/phone-bot-twilio: twilio starter example
-
Removed most of the examples from the pipecat repo. Examples can now be found in: https://github.com/pipecat-ai/pipecat-examples.