Added
-
Added
InputTextRawFrame
frame type to handle user text input with Gemini Multimodal Live. -
Added
HeyGenVideoService
. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/) -
Added the ability to switch voices to
RimeTTSService
. -
Added unified development runner for building voice AI bots across multiple transports
pipecat.runner.run
– FastAPI-based development server with automatic bot discoverypipecat.runner.types
– Runner session argument types (DailyRunnerArguments
,SmallWebRTCRunnerArguments
,WebSocketRunnerArguments
)pipecat.runner.utils.create_transport()
– Factory function for creating transports from session argumentspipecat.runner.daily
andpipecat.runner.livekit
– Configuration utilities for Daily and LiveKit setups- Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo
- Automatic telephony provider detection and serializer configuration
- ESP32 WebRTC compatibility with SDP munging
- Environment detection (
ENV=local
) for conditional features
-
Added Async.ai TTS integration (https://async.ai/)
AsyncAITTSService
– WebSocket-based streaming TTS with interruption supportAsyncAIHttpTTSService
– HTTP-based streaming TTS service- Example scripts:
examples/foundational/07ac-interruptible-asyncai.py
(WebSocket demo)examples/foundational/07ac-interruptible-asyncai-http.py
(HTTP demo)
-
Added
transcription_bucket
params support to theDailyRESTHelper
. -
Added a new TTS service,
InworldTTSService
. This service provides low-latency, high-quality speech generation using Inworld's streaming API. -
Added a new field
handle_sigterm
toPipelineRunner
. It defaults toFalse
. This field handles SIGTERM signals. Thehandle_sigint
field still defaults toTrue
, but now it handles only SIGINT signals. -
Added foundational example
14u-function-calling-ollama.py
for Ollama function calling. -
Added
LocalSmartTurnAnalyzerV2
, which supports local on-device inference with the newsmart-turn-v2
turn detection model. -
Added
set_log_level
toDailyTransport
, allowing setting the logging level for Daily's internal logging system. -
Added
on_transcription_stopped
andon_transcription_error
to Daily callbacks.
Changed
-
Changed the default
url
forNeuphonicTTSService
towss://api.neuphonic.com
as it provides better global performance. You can set the URL to other URLs, such as the previous default:wss://eu-west-1.api.neuphonic.com
. -
Update
daily-python
to 0.19.5. -
STTMuteFilter
now pushes theSTTMuteFrame
upstream and downstream, to allow for more flexibleSTTMuteFilter
placement. -
Play delayed messages from
ElevenLabsTTSService
if they still belong to the current context. -
Dependency compatibility improvements: Relaxed version constraints for core dependencies to support broader version ranges while maintaining stability:
aiohttp
,Markdown
,nltk
,numpy
,Pillow
,pydantic
,openai
,numba
: Now support up to the next major version (e.g.numpy>=1.26.4,<3
)pyht
: Relaxed to>=0.1.6
to resolvegrpcio
conflicts withnvidia-riva-client
fastapi
: Updated to support versions>=0.115.6,<0.117.0
torch
/torchaudio
: Changed from exact pinning (==2.5.0
) to compatible range (~=2.5.0
)aws_sdk_bedrock_runtime
: Added Python 3.12+ constraint via environment markernumba
: Reduced minimum version to0.60.0
for better compatibility
-
Changed
NeuphonicHttpTTSService
to use a POST based request instead of thepyneuphonic
package. This removes a package requirement, allowing Neuphonic to work with more services. -
Updated
ElevenLabsTTSService
to handle the case whereallow_interruptions=False
. Now, when interruptions are disabled, the same context ID will be used throughout the conversation. -
Updated the
deepgram
optional dependency to 4.7.0, which downgrades thetasks cancelled error
to a debug log. This removes the log from appearing in Pipecat logs upon leaving. -
Upgraded the
websockets
implementation to the new asyncio implementation. Along with this change, we're updating support for versions >=13.1.0 and <15.0.0. All services have been update to use the asyncio implementation. -
Updated
MiniMaxHttpTTSService
with abase_url
arg where you can specify the Global endpoint (default) or Mainland China. -
Replaced regex-based sentence detection in
match_endofsentence
with NLTK's punkt_tab tokenizer for more reliable sentence boundary detection. -
Changed the
livekit
optional dependency fortenacity
totenacity>=8.2.3,<10.0.0
in order to support thegoogle-genai
package. -
For
LmntTTSService
, changed the defaultmodel
toblizzard
, LMNT's recommended model. -
Updated
SpeechmaticsSTTService
:- Added support for additional diarization options.
- Added foundational example
07a-interruptible-speechmatics-vad.py
, which
uses VAD detection provided bySpeechmaticsSTTService
.
Fixed
-
Fixed a
LLMUserResponseAggregator
issue where interruptions were not being handled properly. -
Fixed
PiperTTSService
to work with newer Piper GPL. -
Fixed a race condition in
FastAPIWebsocketClient
that occurred when attempting to send a message while the client was disconnecting. -
Fixed an issue in
GoogleLLMService
where interruptions did not work when an interruption strategy was used. -
Fixed an issue in the
TranscriptProcessor
where newline characters could cause the transcript output to be corrupted (e.g. missing all spaces). -
Fixed an issue in
AudioBufferProcessor
when usingSmallWebRTCTransport
where, if the microphone was muted, track timing was not respected. -
Fixed an error that occurs when pushing an
LLMMessagesFrame
. Only some LLM services, like Grok, are impacted by this issue. The fix is to remove the optionalname
property that was being added to the message. -
Fixed an issue in
AudioBufferProcessor
that caused garbled audio whenenable_turn_audio
was enabled and audio resampling was required. -
Fixed a dependency issue for uv users where an
llvmlite
version required python 3.9. -
Fixed an issue in
MiniMaxHttpTTSService
where thepitch
param was the incorrect type. -
Fixed an issue with OpenTelemetry tracing where the
enable_tracing
flag did not disable the internal tracing decorator functions. -
Fixed an issue in
OLLamaLLMService
where kwargs were not passed correctly to the parent class. -
Fixed an issue in
ElevenLabsTTSService
where the word/timestamp pairs were calculating word boundaries incorrectly. -
Fixed an issue where, in some edge cases, the
EmulateUserStartedSpeakingFrame
could be created even if we didn't have a transcription. -
Fixed an issue in
GoogleLLMContext
where it would inject thesystem_message
as a "user" message into cases where it was not meant to; it was only meant to do that when there were no "regular" (non-function-call) messages in the context, to ensure that inference would run properly. -
Fixed an issue in
LiveKitTransport
where theon_audio_track_subscribed
was never emitted.
Other
-
Added new quickstart demos:
- examples/quickstart: voice AI bot quickstart
- examples/client-server-web: client/server starter example
- examples/phone-bot-twilio: twilio starter example
-
Removed most of the examples from the pipecat repo. Examples can now be found in: https://github.com/pipecat-ai/pipecat-examples.