Added
-
Additions for
AICFilterandAICVADAnalyzer:- Added model downloading support to
AICFilterwithmodel_idandmodel_download_dirparameters. - Added
model_pathparameter toAICFilterfor loading local.aicmodelfiles. - Added unit tests for
AICFilterandAICVADAnalyzer.
(PR #3408)
- Added model downloading support to
-
Added handling for
server_content.interruptedsignal in the Gemini Live service for faster interruption response in the case where there isn't already turn tracking in the pipeline, e.g. local VAD + context aggregators. When there is already turn tracking in the pipeline, the additional interruption does no harm.
(PR #3429) -
Added new
GenesysFrameSerializerfor the Genesys AudioHook WebSocket protocol, enabling bidirectional audio streaming between Pipecat pipelines and Genesys Cloud contact center.
(PR #3500) -
Added
reached_upstream_typesandreached_downstream_typesread-only properties toPipelineTaskfor inspecting current frame filters.
(PR #3510) -
Added
add_reached_upstream_filter()andadd_reached_downstream_filter()methods toPipelineTaskfor appending frame types.
(PR #3510) -
Added
UserTurnCompletionLLMServiceMixinfor LLM services to detect and filter incomplete user turns. When enabled viafilter_incomplete_user_turnsinLLMUserAggregatorParams, the LLM outputs a turn completion marker at the start of each response: ✓ (complete), ○ (incomplete short), or ◐ (incomplete long). Incomplete turns are suppressed, and configurable timeouts automatically re-prompt the user.
(PR #3518) -
Added
FrameProcessor.broadcast_frame_instance(frame)method to broadcast a frame instance by extracting its fields and creating new instances for each direction.
(PR #3519) -
PipelineTasknow automatically addsRTVIProcessorand registersRTVIObserverwhenenable_rtvi=True(default), simplifying pipeline setup.
(PR #3519) -
Added
RTVIProcessor.create_rtvi_observer()factory method for creating RTVI observers.
(PR #3519) -
Added
video_out_codecparameter toTransportParamsallowing configuration of the preferred video codec (e.g.,"VP8","H264","H265") for video output inDailyTransport.
(PR #3520) -
Added
locationparameter to Google TTS services (GoogleHttpTTSService,GoogleTTSService,GeminiTTSService) for regional endpoint support.
(PR #3523) -
Added new
PIPECAT_SMART_TURN_LOG_DATAenvironment variable, which causes Smart Turn input data to be saved to disk
(PR #3525) -
Added
result_callbackparameter toUserImageRequestFrameto support deferred function call results.
(PR #3571) -
Added
function_call_timeout_secsparameter toLLMServiceto configure timeout for deferred function calls (defaults to 10.0 seconds).
(PR #3571) -
Added
vad_analyzerparameter toLLMUserAggregatorParams. VAD analysis is now handled inside theLLMUserAggregatorrather than in the transport, keeping voice activity detection closer to where it is consumed. Thevad_analyzeronBaseInputTransportis now deprecated.context_aggregator = LLMContextAggregatorPair( context, user_params=LLMUserAggregatorParams( vad_analyzer=SileroVADAnalyzer(), ), )
(PR #3583)
-
Added
VADProcessorfor detecting speech in audio streams within a pipeline. PushesVADUserStartedSpeakingFrame,VADUserStoppedSpeakingFrame, andUserSpeakingFramedownstream based on VAD state changes.
(PR #3583) -
Added
VADControllerfor managing voice activity detection state and emitting speech events independently of transport or pipeline processors.
(PR #3583) -
Added local
PiperTTSServicefor offline text-to-speech using Piper voice models. The existing HTTP-based service has been renamed toPiperHttpTTSService.
(PR #3585) -
main()inpipecat.runner.runnow accepts an optionalargparse.ArgumentParser, allowing bots to define custom CLI arguments accessible viarunner_args.cli_args.
(PR #3590) -
Added
KokoroTTSServicefor local text-to-speech synthesis using the Kokoro-82M model.
(PR #3595)
Changed
-
Updated
AICFilterandAICVADAnalyzerto use aic-sdk ~= 2.0.1.
(PR #3408) -
Improved the STT TTFB (Time To First Byte) measurement, reporting the delay between when the user stops speaking and when the final transcription is received. Note: Unlike traditional TTFB which measures from a discrete request, STT services receive continuous audio input—so we measure from speech end to final transcript, which captures the latency that matters for voice AI applications. In support of this change, added
finalizedfield toTranscriptionFrameto indicate when a transcript is the final result for an utterance.
(PR #3495) -
SarvamSTTServicenow defaultsvad_signalsandhigh_vad_sensitivitytoNone(omitted from connection parameters), improving latency by ~300ms compared to the previous defaults.
(PR #3495) -
Changed frame filter storage from tuples to sets in
PipelineTask.
(PR #3510) -
Changed default Inworld TTS model from
inworld-tts-1toinworld-tts-1.5-max.
(PR #3531) -
FrameSerializernow subclasses fromBaseObjectto enable event support.
(PR #3560) -
Added support for TTFS in
SpeechmaticsSTTServiceand set the default mode toEXTERNALto support Pipecat-controlled VAD.- Changed dependency to
speechmatics-voice[smart]>=0.2.8
(PR #3562)
- Changed dependency to
-
⚠️ Changed function call handling to use timeout-based completion instead of immediate callback execution.
- Function calls that defer their results (e.g.,
UserImageRequestFrame) now use a timeout mechanism - The
result_callbackis invoked automatically when the deferred operation completes or after timeout - This change affects examples using
UserImageRequestFrame- theresult_callbackshould now be passed to the frame instead of being called immediately
(PR #3571)
- Function calls that defer their results (e.g.,
-
Pipecat runner now uses
DAILY_ROOM_URLinstead ofDAILY_SAMPLE_ROOM_URL.
(PR #3582) -
Updates to
GradiumSTTService:- Now flushes pending transcriptions when VAD detects the user stopped speaking, improving response latency.
GradiumSTTServicenow supportsInputParamsfor configuringlanguageanddelay_in_framessettings.
(PR #3587)
Deprecated
- ⚠️ Deprecated
vad_analyzerparameter onBaseInputTransport. Passvad_analyzertoLLMUserAggregatorParamsinstead or useVADProcessorin the pipeline.
(PR #3583)
Removed
- Removed deprecated
AICFilterparameters:enhancement_level,voice_gain,noise_gate_enable.
(PR #3408)
Fixed
-
Fixed an issue where if you were using
OpenRouterLLMServicewith a Gemini model, it wouldn't handle multiple"system"messages as expected (and as we do inGoogleLLMService), which is to convert subsequent ones into"user"messages. Instead, the latest"system"message would overwrite the previous ones.
(PR #3406) -
Transports now properly broadcast
InputTransportMessageFrameframes both upstream and downstream instead of only pushing downstream.
(PR #3519) -
Fixed
FrameProcessor.broadcast_frame()to deep copy kwargs, preventing shared mutable references between the downstream and upstream frame instances.
(PR #3519) -
Fixed OpenAI LLM services to emit
ErrorFrameon completion timeout, enabling proper error handling and LLMSwitcher failover.
(PR #3529) -
Fixed a logging issue where non-ASCII characters (e.g., Japanese, Chinese, etc.) were being unnecessarily escaped to Unicode sequences when function call occurred.
(PR #3536) -
Fixed how audio tracks are synchronized inside the
AudioBufferProcessorto fix timing issues where silence and audio were misaligned between user and bot buffers.
(PR #3541) -
Fixed race condition in
OpenAIRealtimeBetaLLMServicethat could cause an error when truncating the conversation.
(PR #3567) -
Fixed an infinite loop in
WebsocketServicethat blocked the event loop when a remote server closed the connection gracefully.
(PR #3574) -
Fixed
LLMUserAggregatorandLLMAssistantAggregatornot emitting pending transcripts viaon_user_turn_stoppedandon_assistant_turn_stoppedevents when the conversation ends (EndFrame) or is cancelled (CancelFrame).
(PR #3575) -
Added missing
LiveKitRunnerArgumentsandLiveKitTransportsupport in runner utilities to enable LiveKit transport configuration.
(PR #3580) -
Fixed race condition in
OpenAIRealtimeLLMServicethat could cause an error when truncating the conversation.
(PR #3581) -
Fixed
PiperHttpTTSService(olfPiperTTSService) to resample audio output based on the model's sample rate parsed from the WAV header.
(PR #3585) -
Fixed
UserTurnControllerto reset user turn timeout when interim transcriptions are received.
(PR #3594) -
Fixed an issue in the
IVRNavigatorwhere theTextFrames pushed had incorrect spacing. Now, the internalIVRProcessorpushesAggregatedTextFrames when in conversation mode. This allows for controlling spacing of the outputted, aggregated text.
(PR #3604) -
Fixed
GeminiLiveLLMServicetranscription timeout handler not being scheduled by yielding to the event loop after task creation.
(PR #3605)