Added
-
When registering a function call it is now possible to indicate if you want the function call to be cancelled if there's a user interruption via
cancel_on_interruption
(defaults to False). This is now possible because function calls are executed concurrently. -
Added support for detecting idle pipelines. By default, if no activity has been detected during 5 minutes, the
PipelineTask
will be automatically cancelled. It is possible to override this behavior by passingcancel_on_idle_timeout=False
. It is also possible to change the default timeout withidle_timeout_secs
or the frames that prevent the pipeline from being idle withidle_timeout_frames
. Finally, anon_idle_timeout
event handler will be triggered if the idle timeout is reached (whether the pipeline task is cancelled or not). -
Added
FalSTTService
, which provides STT for Fal's Wizper API. -
Added a
reconnect_on_error
parameter to websocket-based TTS services as well as aon_connection_error
event handler. Thereconnect_on_error
indicates whether the TTS service should reconnect on error. Theon_connection_error
will always get called if there's any error no matter the value ofreconnect_on_error
. This allows, for example, to fallback to a different TTS provider if something goes wrong with the current one. -
Added new
SkipTagsAggregator
that extendsBaseTextAggregator
to aggregate text and skips end of sentence matching if aggregated text is between start/end tags. -
Added new
PatternPairAggregator
that extendsBaseTextAggregator
to identify content between matching pattern pairs in streamed text. This allows for detection and processing of structured content like XML-style tags that may span across multiple text chunks or sentence boundaries. -
Added new
BaseTextAggregator
. Text aggregators are used by the TTS service to aggregate LLM tokens and decide when the aggregated text should be pushed to the TTS service. They also allow for the text to be manipulated while it's being aggregated. A text aggregator can be passed viatext_aggregator
to the TTS service. -
Added new
sample_rate
constructor parameter toTavusVideoService
to allow changing the output sample rate. -
Added new
NeuphonicTTSService
.
(see https://neuphonic.com) -
Added new
UltravoxSTTService
.
(see https://github.com/fixie-ai/ultravox) -
Added
on_frame_reached_upstream
andon_frame_reached_downstream
event handlers toPipelineTask
. Those events will be called when a frame reaches the beginning or end of the pipeline respectively. Note that by default, the event handlers will not be called unless a filter is set withPipelineTask.set_reached_upstream_filter()
orPipelineTask.set_reached_downstream_filter()
. -
Added support for Chirp voices in
GoogleTTSService
. -
Added a
flush_audio()
method toFishTTSService
andLmntTTSService
. -
Added a
set_language
convenience method forGoogleSTTService
, allowing you to set a single language. This is in addition to theset_languages
method which allows you to set a list of languages. -
Added
on_user_turn_audio_data
andon_bot_turn_audio_data
toAudioBufferProcessor
. This gives the ability to grab the audio of only that turn for both the user and the bot. -
Added new base class
BaseObject
which is now the base class ofFrameProcessor
,PipelineRunner
,PipelineTask
andBaseTransport
. The newBaseObject
adds supports for event handlers. -
Added support for a unified format for specifying function calling across all LLM services.
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
-
Added
speech_threshold
parameter toGladiaSTTService
. -
Allow passing user (
user_kwargs
) and assistant (assistant_kwargs
) context aggregator parameters when usingcreate_context_aggregator()
. The values are passed as a mapping that will then be converted to arguments. -
Added
speed
as anInputParam
for bothElevenLabsTTSService
andElevenLabsHttpTTSService
. -
Added new
LLMFullResponseAggregator
to aggregate full LLM completions. At every completion theon_completion
event handler is triggered. -
Added a new frame,
RTVIServerMessageFrame
, and RTVI messageRTVIServerMessage
which provides a generic mechanism for sending custom messages from server to client. TheRTVIServerMessageFrame
is processed by theRTVIObserver
and will be delivered to the client'sonServerMessage
callback orServerMessage
event. -
Added
GoogleLLMOpenAIBetaService
for Google LLM integration with an OpenAI-compatible interface. Added foundational example14o-function-calling-gemini-openai-format.py
. -
Added
AzureRealtimeBetaLLMService
to support Azure's OpeanAI Realtime API. Added foundational example19a-azure-realtime-beta.py
. -
Introduced
GoogleVertexLLMService
, a new class for integrating with Vertex AI Gemini models. Added foundational example14p-function-calling-gemini-vertex-ai.py
. -
Added support in
OpenAIRealtimeBetaLLMService
for a slate of new features:-
The
'gpt-4o-transcribe'
input audio transcription model, along with newlanguage
andprompt
options specific to that model. -
The
input_audio_noise_reduction
session property.session_properties = SessionProperties( # ... input_audio_noise_reduction=InputAudioNoiseReduction( type="near_field" # also supported: "far_field" ) # ... )
-
The
'semantic_vad'
turn_detection
session property value, a more sophisticated model for detecting when the user has stopped speaking. -
on_conversation_item_created
andon_conversation_item_updated
events toOpenAIRealtimeBetaLLMService
.@llm.event_handler("on_conversation_item_created") async def on_conversation_item_created(llm, item_id, item): # ... @llm.event_handler("on_conversation_item_updated") async def on_conversation_item_updated(llm, item_id, item): # `item` may not always be available here # ...
-
The
retrieve_conversation_item(item_id)
method for introspecting a conversation item on the server.item = await llm.retrieve_conversation_item(item_id)
-
Changed
-
Updated
OpenAISTTService
to usegpt-4o-transcribe
as the default transcription model. -
Updated
OpenAITTSService
to usegpt-4o-mini-tts
as the default TTS model. -
Function calls are now executed in tasks. This means that the pipeline will not be blocked while the function call is being executed.
-
⚠️
PipelineTask
will now be automatically cancelled if no bot activity is happening in the pipeline. There are a few settings to configure this behavior, seePipelineTask
documentation for more details. -
All event handlers are now executed in separate tasks in order to prevent blocking the pipeline. It is possible that event handlers take some time to execute in which case the pipeline would be blocked waiting for the event handler to complete.
-
Updated
TranscriptProcessor
to support text output fromOpenAIRealtimeBetaLLMService
. -
OpenAIRealtimeBetaLLMService
andGeminiMultimodalLiveLLMService
now push aTTSTextFrame
. -
Updated the default mode for
CartesiaTTSService
andCartesiaHttpTTSService
tosonic-2
.
Deprecated
-
Passing a
start_callback
toLLMService.register_function()
is now deprecated, simply move the code from the start callback to the function call. -
TTSService
parametertext_filter
is now deprecated, usetext_filters
instead which is now a list. This allows passing multiple filters that will be executed in order.
Removed
-
Removed deprecated
audio.resample_audio()
, usecreate_default_resampler()
instead. -
Removed deprecated
stt_service
parameter fromSTTMuteFilter
. -
Removed deprecated RTVI processors, use an
RTVIObserver
instead. -
Removed deprecated
AWSTTSService
, usePollyTTSService
instead. -
Removed deprecated field
tier
fromDailyTranscriptionSettings
, usemodel
instead. -
Removed deprecated
pipecat.vad
package, usepipecat.audio.vad
instead.
Fixed
-
Fixed an assistant aggregator issue that could cause assistant text to be split into multiple chunks during function calls.
-
Fixed an assistant aggregator issue that was causing assistant text to not be added to the context during function calls. This could lead to duplications.
-
Fixed a
SegmentedSTTService
issue that was causing audio to be sent prematurely to the STT service. Instead of analyzing the volume in this service we rely on VAD events which use both VAD and volume. -
Fixed a
GeminiMultimodalLiveLLMService
issue that was causing messages to be duplicated in the context when pushingLLMMessagesAppendFrame
frames. -
Fixed an issue with
SegmentedSTTService
based services (e.g.GroqSTTService
) that was not allow audio to pass-through downstream. -
Fixed a
CartesiaTTSService
andRimeTTSService
issue that would consider text between spelling out tags end of sentence. -
Fixed a
match_endofsentence
issue that would result in floating point numbers to be considered an end of sentence. -
Fixed a
match_endofsentence
issue that would result in emails to be considered an end of sentence. -
Fixed an issue where the RTVI message
disconnect-bot
was pushing anEndFrame
, resulting in the pipeline not shutting down. It now pushes anEndTaskFrame
upstream to shutdown the pipeline. -
Fixed an issue with the
GoogleSTTService
where stream timeouts during periods of inactivity were causing connection failures. The service now properly detects timeout errors and handles reconnection gracefully, ensuring continuous operation even after periods of silence or when using anSTTMuteFilter
. -
Fixed an issue in
RimeTTSService
where the last line of text sent didn't result in an audio output being generated. -
Fixed
OpenAIRealtimeBetaLLMService
by adding proper handling for:- The
conversation.item.input_audio_transcription.delta
server message, which was added server-side at some point and not handled client-side. - Errors reported by the
response.done
server message.
- The
Other
-
Add foundational example
07w-interruptible-fal.py
, showingFalSTTService
. -
Added a new Ultravox example
examples/foundational/07u-interruptible-ultravox.py
. -
Added new Neuphonic examples
examples/foundational/07v-interruptible-neuphonic.py
andexamples/foundational/07v-interruptible-neuphonic-http.py
. -
Added a new example
examples/foundational/36-user-email-gathering.py
to show how to gather user emails. The example uses's Cartesia's<spell></spell>
tags and Rimespell()
function to spell out the emails for confirmation. -
Update the
34-audio-recording.py
example to include an STT processor. -
Added foundational example
35-voice-switching.py
showing how to use the newPatternPairAggregator
. This example shows how to encode information for the LLM to instruct TTS voice changes, but this can be used to encode any information into the LLM response, which you want to parse and use in other parts of your application. -
Added a Pipecat Cloud deployment example to the
examples
directory. -
Removed foundational examples 28b and 28c as the TranscriptProcessor no longer has an LLM depedency. Renamed foundational example 28a to
28-transcript-processor.py
.