Added
-
Added
TransportParams.audio_out_10ms_chunksparameter to allow controlling the amount of audio being sent by the output transport. It defaults to 2, so 20ms audio chunks are sent. -
Added
QwenLLMServicefor Qwen integration with an OpenAI-compatible interface. Added foundational example14q-function-calling-qwen.py. -
Added
Mem0MemoryService. Mem0 is a self-improving memory layer for LLM applications. Learn more at: https://mem0.ai/. -
Added
WhisperSTTServiceMLXfor Whisper transcription on Apple Silicon. See example inexamples/foundational/13e-whisper-mlx.py. Latency of completed transcription using Whisper large-v3-turbo on an M4 macbook is ~500ms. -
GladiaSTTServicenow have comprehensive support for the latest API config options, including model, language detection, preprocessing, custom vocabulary, custom spelling, translation, and message filtering options. -
Added
SmallWebRTCTransport, a new P2P WebRTC transport.- Created two examples in
p2p-webrtc:- video-transform: Demonstrates sending and receiving audio/video with
SmallWebRTCTransportusingTypeScript. Includes video frame processing with OpenCV. - voice-agent: A minimal example of creating a voice agent with
SmallWebRTCTransport.
- video-transform: Demonstrates sending and receiving audio/video with
- Created two examples in
-
Added support to
ProtobufFrameSerializerto send the messages fromTransportMessageFrameandTransportMessageUrgentFrame. -
Added support for a new TTS service,
PiperTTSService.
(see https://github.com/rhasspy/piper/) -
It is now possible to tell whether
UserStartedSpeakingFrameorUserStoppedSpeakingFramehave been generated because of emulation frames.
Changed
-
FunctionCallResultFramea are now system frames. This is to prevent function call results to be discarded during interruptions. -
Pipecat services have been reorganized into packages. Each package can have one or more of the following modules (in the future new module names might be needed) depending on the services implemented:
- image: for image generation services
- llm: for LLM services
- memory: for memory services
- stt: for Speech-To-Text services
- tts: for Text-To-Speech services
- video: for video generation services
- vision: for video recognition services
-
Base classes for AI services have been reorganized into modules. They can now be found in
pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]. -
GladiaSTTServicenow uses thesolaria-1model by default. Other params use Gladia's default values. Added support for more language codes.
Deprecated
-
All Pipecat services imports have been deprecated and a warning will be shown when using the old import. The new import should be
pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]. For example,from pipecat.services.openai.llm import OpenAILLMService. -
Import for AI services base classes from
pipecat.services.ai_servicesis now deprecated, use one ofpipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service]. -
Deprecated the
languageparameter inGladiaSTTService.InputParamsin favor oflanguage_config, which better aligns with Gladia's API. -
Deprecated using
GladiaSTTService.InputParamsdirectly. Use the newGladiaInputParamsclass instead.
Fixed
-
Fixed a
FastAPIWebsocketTransportandWebsocketClientTransportissue that would cause the transport to be closed prematurely, preventing the internally queued audio to be sent. The same issue could also cause an infinite loop while using an output mixer and when sending anEndFrame, preventing the bot to finish. -
Fixed an issue that could cause the
TranscriptionUpdateFramebeing pushed because of an interruption to be discarded. -
Fixed an issue that would cause
SegmentedSTTServicebased services (e.g.OpenAISTTService) to try to transcribe non-spoken audio, causing invalid transcriptions. -
Fixed an issue where
GoogleTTSServicewas emitting twoTTSStoppedFrames.
Performance
-
Output transports now send 40ms audio chunks instead of 20ms. This should improve performance.
-
BotSpeakingFrames are now sent every 200ms. If the output transport audio chunks are higher than 200ms then they will be sent at every audio chunk.
Other
-
Added foundational example
37-mem0.pydemonstrating how to use theMem0MemoryService. -
Added foundational example
13e-whisper-mlx.pydemonstrating how to use theWhisperSTTServiceMLX.