Added
-
Added
TransportParams.audio_out_10ms_chunks
parameter to allow controlling the amount of audio being sent by the output transport. It defaults to 2, so 20ms audio chunks are sent. -
Added
QwenLLMService
for Qwen integration with an OpenAI-compatible interface. Added foundational example14q-function-calling-qwen.py
. -
Added
Mem0MemoryService
. Mem0 is a self-improving memory layer for LLM applications. Learn more at: https://mem0.ai/. -
Added
WhisperSTTServiceMLX
for Whisper transcription on Apple Silicon. See example inexamples/foundational/13e-whisper-mlx.py
. Latency of completed transcription using Whisper large-v3-turbo on an M4 macbook is ~500ms. -
GladiaSTTService
now have comprehensive support for the latest API config options, including model, language detection, preprocessing, custom vocabulary, custom spelling, translation, and message filtering options. -
Added
SmallWebRTCTransport
, a new P2P WebRTC transport.- Created two examples in
p2p-webrtc
:- video-transform: Demonstrates sending and receiving audio/video with
SmallWebRTCTransport
usingTypeScript
. Includes video frame processing with OpenCV. - voice-agent: A minimal example of creating a voice agent with
SmallWebRTCTransport
.
- video-transform: Demonstrates sending and receiving audio/video with
- Created two examples in
-
Added support to
ProtobufFrameSerializer
to send the messages fromTransportMessageFrame
andTransportMessageUrgentFrame
. -
Added support for a new TTS service,
PiperTTSService
.
(see https://github.com/rhasspy/piper/) -
It is now possible to tell whether
UserStartedSpeakingFrame
orUserStoppedSpeakingFrame
have been generated because of emulation frames.
Changed
-
FunctionCallResultFrame
a are now system frames. This is to prevent function call results to be discarded during interruptions. -
Pipecat services have been reorganized into packages. Each package can have one or more of the following modules (in the future new module names might be needed) depending on the services implemented:
- image: for image generation services
- llm: for LLM services
- memory: for memory services
- stt: for Speech-To-Text services
- tts: for Text-To-Speech services
- video: for video generation services
- vision: for video recognition services
-
Base classes for AI services have been reorganized into modules. They can now be found in
pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]
. -
GladiaSTTService
now uses thesolaria-1
model by default. Other params use Gladia's default values. Added support for more language codes.
Deprecated
-
All Pipecat services imports have been deprecated and a warning will be shown when using the old import. The new import should be
pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]
. For example,from pipecat.services.openai.llm import OpenAILLMService
. -
Import for AI services base classes from
pipecat.services.ai_services
is now deprecated, use one ofpipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]
. -
Deprecated the
language
parameter inGladiaSTTService.InputParams
in favor oflanguage_config
, which better aligns with Gladia's API. -
Deprecated using
GladiaSTTService.InputParams
directly. Use the newGladiaInputParams
class instead.
Fixed
-
Fixed a
FastAPIWebsocketTransport
andWebsocketClientTransport
issue that would cause the transport to be closed prematurely, preventing the internally queued audio to be sent. The same issue could also cause an infinite loop while using an output mixer and when sending anEndFrame
, preventing the bot to finish. -
Fixed an issue that could cause the
TranscriptionUpdateFrame
being pushed because of an interruption to be discarded. -
Fixed an issue that would cause
SegmentedSTTService
based services (e.g.OpenAISTTService
) to try to transcribe non-spoken audio, causing invalid transcriptions. -
Fixed an issue where
GoogleTTSService
was emitting twoTTSStoppedFrames
.
Performance
-
Output transports now send 40ms audio chunks instead of 20ms. This should improve performance.
-
BotSpeakingFrame
s are now sent every 200ms. If the output transport audio chunks are higher than 200ms then they will be sent at every audio chunk.
Other
-
Added foundational example
37-mem0.py
demonstrating how to use theMem0MemoryService
. -
Added foundational example
13e-whisper-mlx.py
demonstrating how to use theWhisperSTTServiceMLX
.