Added
-
SentryMetricshas been added to report frame processor metrics to Sentry. This is now possible becauseFrameProcessorMetricscan now be passed toFrameProcessor. -
Added Google TTS service and corresponding foundational example
07n-interruptible-google.py -
Added AWS Polly TTS support and
07m-interruptible-aws.pyas an example. -
Added InputParams to Azure TTS service.
-
Added
LivekitTransport(audio-only for now). -
RTVI 0.2.0 is now supported.
-
All
FrameProcessorscan now register event handlers.
tts = SomeTTSService(...)
@tts.event_handler("on_connected"):
async def on_connected(processor):
...
-
Added
AsyncGeneratorProcessor. This processor can be used together with aFrameSerializeras an async generator. It provides agenerator()function that returns anAsyncGeneratorand that yields serialized frames. -
Added
EndTaskFrameandCancelTaskFrame. These are new frames that are meant to be pushed upstream to tell the pipeline task to stop nicely or immediately respectively. -
Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed) for OpenAI, Anthropic, and Together AI services along with corresponding setter functions.
-
Added
sample_rateas a constructor parameter for TTS services. -
Pipecat has a pipeline-based architecture. The pipeline consists of frame processors linked to each other. The elements traveling across the pipeline are called frames. To have a deterministic behavior the frames traveling through the pipeline should always be ordered, except system frames which are out-of-band frames. To achieve that, each frame processor should only output frames from a single task. In this version all the frame processors have their own task to push frames. That is, when
push_frame()is called the given frame will be put into an internal queue (with the exception of system frames) and a frame processor task will push it out. -
Added pipeline clocks. A pipeline clock is used by the output transport to know when a frame needs to be presented. For that, all frames now have an optional
ptsfield (prensentation timestamp). There's currently just one clock implementationSystemClockand theptsfield is currently only used forTextFrames (audio and image frames will be next). -
A clock can now be specified to
PipelineTask(defaults toSystemClock). This clock will be passed to each frame processor via theStartFrame. -
Added
CartesiaHttpTTSService. -
DailyTransportnow supports setting the audio bitrate to improve audio quality through theDailyParams.audio_out_bitrateparameter. The new default is 96kbps. -
DailyTransportnow uses the number of audio output channels (1 or 2) to set mono or stereo audio when needed. -
Interruptions support has been added to
TwilioFrameSerializerwhen usingFastAPIWebsocketTransport. -
Added new
LmntTTSServicetext-to-speech service. (see https://www.lmnt.com/) -
Added
TTSModelUpdateFrame,TTSLanguageUpdateFrame,STTModelUpdateFrame, andSTTLanguageUpdateFrameframes to allow you to switch models, language and voices in TTS and STT services. -
Added new
transcriptions.Languageenum.
Changed
-
Context frames are now pushed downstream from assistant context aggregators.
-
Removed Silero VAD torch dependency.
-
Updated individual update settings frame classes into a single
ServiceUpdateSettingsFrameclass. -
We now distinguish between input and output audio and image frames. We introduce
InputAudioRawFrame,OutputAudioRawFrame,InputImageRawFrameandOutputImageRawFrame(and other subclasses of those). The input frames usually come from an input transport and are meant to be processed inside the pipeline to generate new frames. However, the input frames will not be sent through an output transport. The output frames can also be processed by any frame processor in the pipeline and they are allowed to be sent by the output transport. -
ParallelTaskhas been renamed toSyncParallelPipeline. ASyncParallelPipelineis a frame processor that contains a list of different pipelines to be executed concurrently. The difference between aSyncParallelPipelineand aParallelPipelineis that, given an input frame, theSyncParallelPipelinewill wait for all the internal pipelines to complete. This is achieved by making sure the last processor in each of the pipelines is synchronous (e.g. an HTTP-based service that waits for the response). -
StartFrameis back a system frame to make sure it's processed immediately by all processors.EndFramestays a control frame since it needs to be ordered allowing the frames in the pipeline to be processed. -
Updated
MoondreamServicerevision to2024-08-26. -
CartesiaTTSServiceandElevenLabsTTSServicenow add presentation timestamps to their text output. This allows the output transport to push the text frames downstream at almost the same time the words are spoken. We say "almost" because currently the audio frames don't have presentation timestamp but they should be played at roughly the same time. -
DailyTransport.on_joinedevent now returns the full session data instead of just the participant. -
CartesiaTTSServiceis now a subclass ofTTSService. -
DeepgramSTTServiceis now a subclass ofSTTService. -
WhisperSTTServiceis now a subclass ofSegmentedSTTService. ASegmentedSTTServiceis aSTTServicewhere the provided audio is given in a big chunk (i.e. from when the user starts speaking until the user stops speaking) instead of a continous stream.
Fixed
-
Fixed OpenAI multiple function calls.
-
Fixed a Cartesia TTS issue that would cause audio to be truncated in some cases.
-
Fixed a
BaseOutputTransportissue that would stop audio and video rendering tasks (after receiving andEndFrame) before the internal queue was emptied, causing the pipeline to finish prematurely. -
StartFrameshould be the first frame every processor receives to avoid situations where things are not initialized (because initialization happens onStartFrame) and other frames come in resulting in undesired behavior.
Performance
obj_id()andobj_count()now useitertools.countavoiding the need ofthreading.Lock.
Other
- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).