Added
-
When registering a function call it is now possible to indicate if you want the function call to be cancelled if there's a user interruption via
cancel_on_interruption(defaults to False). This is now possible because function calls are executed concurrently. -
Added support for detecting idle pipelines. By default, if no activity has been detected during 5 minutes, the
PipelineTaskwill be automatically cancelled. It is possible to override this behavior by passingcancel_on_idle_timeout=False. It is also possible to change the default timeout withidle_timeout_secsor the frames that prevent the pipeline from being idle withidle_timeout_frames. Finally, anon_idle_timeoutevent handler will be triggered if the idle timeout is reached (whether the pipeline task is cancelled or not). -
Added
FalSTTService, which provides STT for Fal's Wizper API. -
Added a
reconnect_on_errorparameter to websocket-based TTS services as well as aon_connection_errorevent handler. Thereconnect_on_errorindicates whether the TTS service should reconnect on error. Theon_connection_errorwill always get called if there's any error no matter the value ofreconnect_on_error. This allows, for example, to fallback to a different TTS provider if something goes wrong with the current one. -
Added new
SkipTagsAggregatorthat extendsBaseTextAggregatorto aggregate text and skips end of sentence matching if aggregated text is between start/end tags. -
Added new
PatternPairAggregatorthat extendsBaseTextAggregatorto identify content between matching pattern pairs in streamed text. This allows for detection and processing of structured content like XML-style tags that may span across multiple text chunks or sentence boundaries. -
Added new
BaseTextAggregator. Text aggregators are used by the TTS service to aggregate LLM tokens and decide when the aggregated text should be pushed to the TTS service. They also allow for the text to be manipulated while it's being aggregated. A text aggregator can be passed viatext_aggregatorto the TTS service. -
Added new
sample_rateconstructor parameter toTavusVideoServiceto allow changing the output sample rate. -
Added new
NeuphonicTTSService.
(see https://neuphonic.com) -
Added new
UltravoxSTTService.
(see https://github.com/fixie-ai/ultravox) -
Added
on_frame_reached_upstreamandon_frame_reached_downstreamevent handlers toPipelineTask. Those events will be called when a frame reaches the beginning or end of the pipeline respectively. Note that by default, the event handlers will not be called unless a filter is set withPipelineTask.set_reached_upstream_filter()orPipelineTask.set_reached_downstream_filter(). -
Added support for Chirp voices in
GoogleTTSService. -
Added a
flush_audio()method toFishTTSServiceandLmntTTSService. -
Added a
set_languageconvenience method forGoogleSTTService, allowing you to set a single language. This is in addition to theset_languagesmethod which allows you to set a list of languages. -
Added
on_user_turn_audio_dataandon_bot_turn_audio_datatoAudioBufferProcessor. This gives the ability to grab the audio of only that turn for both the user and the bot. -
Added new base class
BaseObjectwhich is now the base class ofFrameProcessor,PipelineRunner,PipelineTaskandBaseTransport. The newBaseObjectadds supports for event handlers. -
Added support for a unified format for specifying function calling across all LLM services.
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])-
Added
speech_thresholdparameter toGladiaSTTService. -
Allow passing user (
user_kwargs) and assistant (assistant_kwargs) context aggregator parameters when usingcreate_context_aggregator(). The values are passed as a mapping that will then be converted to arguments. -
Added
speedas anInputParamfor bothElevenLabsTTSServiceandElevenLabsHttpTTSService. -
Added new
LLMFullResponseAggregatorto aggregate full LLM completions. At every completion theon_completionevent handler is triggered. -
Added a new frame,
RTVIServerMessageFrame, and RTVI messageRTVIServerMessagewhich provides a generic mechanism for sending custom messages from server to client. TheRTVIServerMessageFrameis processed by theRTVIObserverand will be delivered to the client'sonServerMessagecallback orServerMessageevent. -
Added
GoogleLLMOpenAIBetaServicefor Google LLM integration with an OpenAI-compatible interface. Added foundational example14o-function-calling-gemini-openai-format.py. -
Added
AzureRealtimeBetaLLMServiceto support Azure's OpeanAI Realtime API. Added foundational example19a-azure-realtime-beta.py. -
Introduced
GoogleVertexLLMService, a new class for integrating with Vertex AI Gemini models. Added foundational example14p-function-calling-gemini-vertex-ai.py. -
Added support in
OpenAIRealtimeBetaLLMServicefor a slate of new features:-
The
'gpt-4o-transcribe'input audio transcription model, along with newlanguageandpromptoptions specific to that model. -
The
input_audio_noise_reductionsession property.session_properties = SessionProperties( # ... input_audio_noise_reduction=InputAudioNoiseReduction( type="near_field" # also supported: "far_field" ) # ... )
-
The
'semantic_vad'turn_detectionsession property value, a more sophisticated model for detecting when the user has stopped speaking. -
on_conversation_item_createdandon_conversation_item_updatedevents toOpenAIRealtimeBetaLLMService.@llm.event_handler("on_conversation_item_created") async def on_conversation_item_created(llm, item_id, item): # ... @llm.event_handler("on_conversation_item_updated") async def on_conversation_item_updated(llm, item_id, item): # `item` may not always be available here # ...
-
The
retrieve_conversation_item(item_id)method for introspecting a conversation item on the server.item = await llm.retrieve_conversation_item(item_id)
-
Changed
-
Updated
OpenAISTTServiceto usegpt-4o-transcribeas the default transcription model. -
Updated
OpenAITTSServiceto usegpt-4o-mini-ttsas the default TTS model. -
Function calls are now executed in tasks. This means that the pipeline will not be blocked while the function call is being executed.
-
⚠️
PipelineTaskwill now be automatically cancelled if no bot activity is happening in the pipeline. There are a few settings to configure this behavior, seePipelineTaskdocumentation for more details. -
All event handlers are now executed in separate tasks in order to prevent blocking the pipeline. It is possible that event handlers take some time to execute in which case the pipeline would be blocked waiting for the event handler to complete.
-
Updated
TranscriptProcessorto support text output fromOpenAIRealtimeBetaLLMService. -
OpenAIRealtimeBetaLLMServiceandGeminiMultimodalLiveLLMServicenow push aTTSTextFrame. -
Updated the default mode for
CartesiaTTSServiceandCartesiaHttpTTSServicetosonic-2.
Deprecated
-
Passing a
start_callbacktoLLMService.register_function()is now deprecated, simply move the code from the start callback to the function call. -
TTSServiceparametertext_filteris now deprecated, usetext_filtersinstead which is now a list. This allows passing multiple filters that will be executed in order.
Removed
-
Removed deprecated
audio.resample_audio(), usecreate_default_resampler()instead. -
Removed deprecated
stt_serviceparameter fromSTTMuteFilter. -
Removed deprecated RTVI processors, use an
RTVIObserverinstead. -
Removed deprecated
AWSTTSService, usePollyTTSServiceinstead. -
Removed deprecated field
tierfromDailyTranscriptionSettings, usemodelinstead. -
Removed deprecated
pipecat.vadpackage, usepipecat.audio.vadinstead.
Fixed
-
Fixed an assistant aggregator issue that could cause assistant text to be split into multiple chunks during function calls.
-
Fixed an assistant aggregator issue that was causing assistant text to not be added to the context during function calls. This could lead to duplications.
-
Fixed a
SegmentedSTTServiceissue that was causing audio to be sent prematurely to the STT service. Instead of analyzing the volume in this service we rely on VAD events which use both VAD and volume. -
Fixed a
GeminiMultimodalLiveLLMServiceissue that was causing messages to be duplicated in the context when pushingLLMMessagesAppendFrameframes. -
Fixed an issue with
SegmentedSTTServicebased services (e.g.GroqSTTService) that was not allow audio to pass-through downstream. -
Fixed a
CartesiaTTSServiceandRimeTTSServiceissue that would consider text between spelling out tags end of sentence. -
Fixed a
match_endofsentenceissue that would result in floating point numbers to be considered an end of sentence. -
Fixed a
match_endofsentenceissue that would result in emails to be considered an end of sentence. -
Fixed an issue where the RTVI message
disconnect-botwas pushing anEndFrame, resulting in the pipeline not shutting down. It now pushes anEndTaskFrameupstream to shutdown the pipeline. -
Fixed an issue with the
GoogleSTTServicewhere stream timeouts during periods of inactivity were causing connection failures. The service now properly detects timeout errors and handles reconnection gracefully, ensuring continuous operation even after periods of silence or when using anSTTMuteFilter. -
Fixed an issue in
RimeTTSServicewhere the last line of text sent didn't result in an audio output being generated. -
Fixed
OpenAIRealtimeBetaLLMServiceby adding proper handling for:- The
conversation.item.input_audio_transcription.deltaserver message, which was added server-side at some point and not handled client-side. - Errors reported by the
response.doneserver message.
- The
Other
-
Add foundational example
07w-interruptible-fal.py, showingFalSTTService. -
Added a new Ultravox example
examples/foundational/07u-interruptible-ultravox.py. -
Added new Neuphonic examples
examples/foundational/07v-interruptible-neuphonic.pyandexamples/foundational/07v-interruptible-neuphonic-http.py. -
Added a new example
examples/foundational/36-user-email-gathering.pyto show how to gather user emails. The example uses's Cartesia's<spell></spell>tags and Rimespell()function to spell out the emails for confirmation. -
Update the
34-audio-recording.pyexample to include an STT processor. -
Added foundational example
35-voice-switching.pyshowing how to use the newPatternPairAggregator. This example shows how to encode information for the LLM to instruct TTS voice changes, but this can be used to encode any information into the LLM response, which you want to parse and use in other parts of your application. -
Added a Pipecat Cloud deployment example to the
examplesdirectory. -
Removed foundational examples 28b and 28c as the TranscriptProcessor no longer has an LLM depedency. Renamed foundational example 28a to
28-transcript-processor.py.