🎃 The Haunted Edition 👻
Added
-
Added a new
DeepgramHttpTTSService, which delivers a meaningful reduction in latency when compared to theDeepgramTTSService. -
Add support for
speaking_rateinput parameter inGoogleHttpTTSService. -
Added
enable_speaker_diarizationandenable_language_identificationtoSonioxSTTService. -
Added
SpeechmaticsTTSService, which uses Speechmatic's TTS API. Updated examples 07a* to use the new TTS service. -
Added support for including images or audio to LLM context messages using
LLMContext.create_image_message()orLLMContext.create_image_url_message()(not all LLMs support URLs) andLLMContext.create_audio_message(). For example, when creatingLLMMessagesAppendFrame:message = LLMContext.create_image_message(image=..., size= ...) await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True))
-
New event handlers for the
DeepgramFluxSTTService:on_start_of_turn,on_turn_resumed,on_end_of_turn,on_eager_end_of_turn,on_update. -
Added
generation_configparameter support toCartesiaTTSServiceandCartesiaHttpTTSServicefor Cartesia Sonic-3 models. Includes a newGenerationConfigclass withvolume(0.5-2.0),speed(0.6-1.5), andemotion(60+ options) parameters for fine-grained speech generation control. -
Expanded support for univeral
LLMContexttoOpenAIRealtimeLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
(Note that even though
OpenAIRealtimeLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Note:
TranscriptionFrames andInterimTranscriptionFrames now go upstream fromOpenAIRealtimeLLMService, so if you're usingTranscriptProcessor, say, you'll want to adjust accordingly:pipeline = Pipeline( [ transport.input(), context_aggregator.user(), # BEFORE llm, transcript.user(), # AFTER transcript.user(), llm, transport.output(), transcript.assistant(), context_aggregator.assistant(), ] )
Also worth noting: whether or not you use the new context-setup pattern with
OpenAIRealtimeLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: OpenAIContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: OpenAIRealtimeLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext
Also note that
RealtimeMessagesUpdateFrameandRealtimeFunctionCallResultFramehave been deprecated, since they're no longer used byOpenAIRealtimeLLMService. OpenAI Realtime now works more like other LLM services in Pipecat, relying on updates to its context, pushed by context aggregators, to update its internal state. Listen forLLMContextFrames for context updates.Finally,
LLMTextFrames are no longer pushed fromOpenAIRealtimeLLMServicewhen it's configured withoutput_modalities=['audio']. If you need to process its output, listen forTTSTextFrames instead. -
Expanded support for universal
LLMContexttoGeminiLiveLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
(Note that even though
GeminiLiveLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Worth noting: whether or not you use the new context-setup pattern with
GeminiLiveLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: GeminiLiveContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: GeminiLiveLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext
Also note that
LLMTextFrames are no longer pushed fromGeminiLiveLLMServicewhen it's configured withmodalities=GeminiModalities.AUDIO. If you need to process its output, listen forTTSTextFrames instead.
Changed
-
The development runner's
/startendpoint now supports passingdailyRoomPropertiesanddailyMeetingTokenPropertiesin the request body whencreateDailyRoomis true. Properties are validated against theDailyRoomPropertiesandDailyMeetingTokenPropertiestypes respectively and passed to Daily's room and token creation APIs. -
UserImageRawFramenew fieldsappend_to_contextandtext. Theappend_to_contextfield indicates if this image and text should be added to the LLM context (by the LLM assistant aggregator). Thetextfield, if set, might also guide the LLM or the vision service on how to analyze the image. -
UserImageRequestFramenew fielsappend_to_contextandtext. Both fields will be used to set the same fields on the capturedUserImageRawFrame. -
UserImageRequestFramedon't require function call name and ID anymore. -
Updated
MoondreamServiceto processUserImageRawFrame. -
VisionServiceexpectsUserImageRawFramein order to analyze images. -
DailyTransporttriggerson_errorevent if transcription can't be started or stopped. -
DailyTransportupdates:start_dialout()now returns two values:session_idanderror.start_recording()now returns two values:stream_idanderror. -
Updated
daily-pythonto 0.21.0. -
SimliVideoServicenow acceptsapi_keyandface_idparameters directly, with optionalparamsformax_session_lengthandmax_idle_timeconfiguration, aligning with other Pipecat service patterns. -
Updated the default model to
sonic-3forCartesiaTTSServiceandCartesiaHttpTTSService. -
FunctionFilternow has afilter_system_framesarg, which controls whether or not SystemFrames are filtered. -
Upgraded
aws_sdk_bedrock_runtimeto v0.1.1 to resolve potential CPU issues when runningAWSNovaSonicLLMService.
Deprecated
-
The
expect_stripped_wordsparameter ofLLMAssistantAggregatorParamsis ignored when used with the newerLLMAssistantAggregator, which now handles word spacing automatically. -
LLMService.request_image_frame()is deprecated, push aUserImageRequestFrameinstead. -
UserResponseAggregatoris deprecated and will be removed in a future version. -
The
send_transcription_framesargument toOpenAIRealtimeLLMServiceis deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. -
Types in
pipecat.services.openai.realtime.contextandpipecat.services.openai.realtime.framesare deprecated, as they're no longer used byOpenAIRealtimeLLMService. See "Added" section for details. -
SimliVideoServicesimli_configparameter is deprecated. Useapi_keyandface_idparameters instead.
Removed
-
Removed
enable_non_final_tokensandmax_non_final_tokens_duration_msfromSonioxSTTService. -
Removed the
aiohttp_sessionarg fromSarvamTTSServiceas it's no longer used.
Fixed
-
Fixed a
PipelineTaskissue that was causing an idle timeout for frames that were being generated but not reaching the end of the pipeline. Since the exact point when frames are discarded is unknown, we now monitor pipeline frames using an observer. If the observer detects frames are being generated, it will prevent the pipeline from being considered idle. -
Fixed an issue in
HumeTTSServicethat was only using Octave 2, which does not support thedescriptionfield. Now, if a description is provided, it switches to Octave 1. -
Fixed an issue where
DailyTransportwould timeout prematurely on join and on leave. -
Fixed an issue in the runner where starting a DailyTransport room via
/startdidn't support using theDAILY_SAMPLE_ROOM_URLenv var. -
Fixed an issue in
ServiceSwitcherwhere theSTTServices would result in all STT services producingTranscriptionFrames.
Other
-
Updated all vision 12-series foundational examples to load images from a file.
-
Added 14-series video examples for different services. These new examples request an image from the user camera through a function call.