github pipecat-ai/pipecat v0.0.62

one day ago

Added

  • Added TransportParams.audio_out_10ms_chunks parameter to allow controlling the amount of audio being sent by the output transport. It defaults to 2, so 20ms audio chunks are sent.

  • Added QwenLLMService for Qwen integration with an OpenAI-compatible interface. Added foundational example 14q-function-calling-qwen.py.

  • Added Mem0MemoryService. Mem0 is a self-improving memory layer for LLM applications. Learn more at: https://mem0.ai/.

  • Added WhisperSTTServiceMLX for Whisper transcription on Apple Silicon. See example in examples/foundational/13e-whisper-mlx.py. Latency of completed transcription using Whisper large-v3-turbo on an M4 macbook is ~500ms.

  • GladiaSTTService now have comprehensive support for the latest API config options, including model, language detection, preprocessing, custom vocabulary, custom spelling, translation, and message filtering options.

  • Added SmallWebRTCTransport, a new P2P WebRTC transport.

    • Created two examples in p2p-webrtc:
      • video-transform: Demonstrates sending and receiving audio/video with SmallWebRTCTransport using TypeScript. Includes video frame processing with OpenCV.
      • voice-agent: A minimal example of creating a voice agent with SmallWebRTCTransport.
  • Added support to ProtobufFrameSerializer to send the messages from TransportMessageFrame and TransportMessageUrgentFrame.

  • Added support for a new TTS service, PiperTTSService.
    (see https://github.com/rhasspy/piper/)

  • It is now possible to tell whether UserStartedSpeakingFrame or UserStoppedSpeakingFrame have been generated because of emulation frames.

Changed

  • FunctionCallResultFramea are now system frames. This is to prevent function call results to be discarded during interruptions.

  • Pipecat services have been reorganized into packages. Each package can have one or more of the following modules (in the future new module names might be needed) depending on the services implemented:

    • image: for image generation services
    • llm: for LLM services
    • memory: for memory services
    • stt: for Speech-To-Text services
    • tts: for Text-To-Speech services
    • video: for video generation services
    • vision: for video recognition services
  • Base classes for AI services have been reorganized into modules. They can now be found in pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service].

  • GladiaSTTService now uses the solaria-1 model by default. Other params use Gladia's default values. Added support for more language codes.

Deprecated

  • All Pipecat services imports have been deprecated and a warning will be shown when using the old import. The new import should be pipecat.services.[service].[image,llm,memory,stt,tts,video,vision]. For example, from pipecat.services.openai.llm import OpenAILLMService.

  • Import for AI services base classes from pipecat.services.ai_services is now deprecated, use one of pipecat.services.[ai_service,image_service,llm_service,stt_service,vision_service].

  • Deprecated the language parameter in GladiaSTTService.InputParams in favor of language_config, which better aligns with Gladia's API.

  • Deprecated using GladiaSTTService.InputParams directly. Use the new GladiaInputParams class instead.

Fixed

  • Fixed a FastAPIWebsocketTransport and WebsocketClientTransport issue that would cause the transport to be closed prematurely, preventing the internally queued audio to be sent. The same issue could also cause an infinite loop while using an output mixer and when sending an EndFrame, preventing the bot to finish.

  • Fixed an issue that could cause the TranscriptionUpdateFrame being pushed because of an interruption to be discarded.

  • Fixed an issue that would cause SegmentedSTTService based services (e.g. OpenAISTTService) to try to transcribe non-spoken audio, causing invalid transcriptions.

  • Fixed an issue where GoogleTTSService was emitting two TTSStoppedFrames.

Performance

  • Output transports now send 40ms audio chunks instead of 20ms. This should improve performance.

  • BotSpeakingFrames are now sent every 200ms. If the output transport audio chunks are higher than 200ms then they will be sent at every audio chunk.

Other

  • Added foundational example 37-mem0.py demonstrating how to use the Mem0MemoryService.

  • Added foundational example 13e-whisper-mlx.py demonstrating how to use the WhisperSTTServiceMLX.

Don't miss a new pipecat release

NewReleases is sending notifications on new releases.