github pipecat-ai/pipecat v0.0.77

latest releases: v0.0.84, v0.0.83, v0.0.82...
one month ago

Added

  • Added InputTextRawFrame frame type to handle user text input with Gemini Multimodal Live.

  • Added HeyGenVideoService. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/)

  • Added the ability to switch voices to RimeTTSService.

  • Added unified development runner for building voice AI bots across multiple transports

    • pipecat.runner.run – FastAPI-based development server with automatic bot discovery
    • pipecat.runner.types – Runner session argument types (DailyRunnerArguments, SmallWebRTCRunnerArguments, WebSocketRunnerArguments)
    • pipecat.runner.utils.create_transport() – Factory function for creating transports from session arguments
    • pipecat.runner.daily and pipecat.runner.livekit – Configuration utilities for Daily and LiveKit setups
    • Support for all transport types: Daily, WebRTC, Twilio, Telnyx, Plivo
    • Automatic telephony provider detection and serializer configuration
    • ESP32 WebRTC compatibility with SDP munging
    • Environment detection (ENV=local) for conditional features
  • Added Async.ai TTS integration (https://async.ai/)

    • AsyncAITTSService – WebSocket-based streaming TTS with interruption support
    • AsyncAIHttpTTSService – HTTP-based streaming TTS service
    • Example scripts:
      • examples/foundational/07ac-interruptible-asyncai.py (WebSocket demo)
      • examples/foundational/07ac-interruptible-asyncai-http.py (HTTP demo)
  • Added transcription_bucket params support to the DailyRESTHelper.

  • Added a new TTS service, InworldTTSService. This service provides low-latency, high-quality speech generation using Inworld's streaming API.

  • Added a new field handle_sigterm to PipelineRunner. It defaults to False. This field handles SIGTERM signals. The handle_sigint field still defaults to True, but now it handles only SIGINT signals.

  • Added foundational example 14u-function-calling-ollama.py for Ollama function calling.

  • Added LocalSmartTurnAnalyzerV2, which supports local on-device inference with the new smart-turn-v2 turn detection model.

  • Added set_log_level to DailyTransport, allowing setting the logging level for Daily's internal logging system.

  • Added on_transcription_stopped and on_transcription_error to Daily callbacks.

Changed

  • Changed the default url for NeuphonicTTSService to wss://api.neuphonic.com as it provides better global performance. You can set the URL to other URLs, such as the previous default: wss://eu-west-1.api.neuphonic.com.

  • Update daily-python to 0.19.5.

  • STTMuteFilter now pushes the STTMuteFrame upstream and downstream, to allow for more flexible STTMuteFilter placement.

  • Play delayed messages from ElevenLabsTTSService if they still belong to the current context.

  • Dependency compatibility improvements: Relaxed version constraints for core dependencies to support broader version ranges while maintaining stability:

    • aiohttp, Markdown, nltk, numpy, Pillow, pydantic, openai, numba: Now support up to the next major version (e.g. numpy>=1.26.4,<3)
    • pyht: Relaxed to >=0.1.6 to resolve grpcio conflicts with nvidia-riva-client
    • fastapi: Updated to support versions >=0.115.6,<0.117.0
    • torch/torchaudio: Changed from exact pinning (==2.5.0) to compatible range (~=2.5.0)
    • aws_sdk_bedrock_runtime: Added Python 3.12+ constraint via environment marker
    • numba: Reduced minimum version to 0.60.0 for better compatibility
  • Changed NeuphonicHttpTTSService to use a POST based request instead of the pyneuphonic package. This removes a package requirement, allowing Neuphonic to work with more services.

  • Updated ElevenLabsTTSService to handle the case where allow_interruptions=False. Now, when interruptions are disabled, the same context ID will be used throughout the conversation.

  • Updated the deepgram optional dependency to 4.7.0, which downgrades the tasks cancelled error to a debug log. This removes the log from appearing in Pipecat logs upon leaving.

  • Upgraded the websockets implementation to the new asyncio implementation. Along with this change, we're updating support for versions >=13.1.0 and <15.0.0. All services have been update to use the asyncio implementation.

  • Updated MiniMaxHttpTTSService with a base_url arg where you can specify the Global endpoint (default) or Mainland China.

  • Replaced regex-based sentence detection in match_endofsentence with NLTK's punkt_tab tokenizer for more reliable sentence boundary detection.

  • Changed the livekit optional dependency for tenacity to tenacity>=8.2.3,<10.0.0 in order to support the google-genai package.

  • For LmntTTSService, changed the default model to blizzard, LMNT's recommended model.

  • Updated SpeechmaticsSTTService:

    • Added support for additional diarization options.
    • Added foundational example 07a-interruptible-speechmatics-vad.py, which
      uses VAD detection provided by SpeechmaticsSTTService.

Fixed

  • Fixed a LLMUserResponseAggregator issue where interruptions were not being handled properly.

  • Fixed PiperTTSService to work with newer Piper GPL.

  • Fixed a race condition in FastAPIWebsocketClient that occurred when attempting to send a message while the client was disconnecting.

  • Fixed an issue in GoogleLLMService where interruptions did not work when an interruption strategy was used.

  • Fixed an issue in the TranscriptProcessor where newline characters could cause the transcript output to be corrupted (e.g. missing all spaces).

  • Fixed an issue in AudioBufferProcessor when using SmallWebRTCTransport where, if the microphone was muted, track timing was not respected.

  • Fixed an error that occurs when pushing an LLMMessagesFrame. Only some LLM services, like Grok, are impacted by this issue. The fix is to remove the optional name property that was being added to the message.

  • Fixed an issue in AudioBufferProcessor that caused garbled audio when enable_turn_audio was enabled and audio resampling was required.

  • Fixed a dependency issue for uv users where an llvmlite version required python 3.9.

  • Fixed an issue in MiniMaxHttpTTSService where the pitch param was the incorrect type.

  • Fixed an issue with OpenTelemetry tracing where the enable_tracing flag did not disable the internal tracing decorator functions.

  • Fixed an issue in OLLamaLLMService where kwargs were not passed correctly to the parent class.

  • Fixed an issue in ElevenLabsTTSService where the word/timestamp pairs were calculating word boundaries incorrectly.

  • Fixed an issue where, in some edge cases, the EmulateUserStartedSpeakingFrame could be created even if we didn't have a transcription.

  • Fixed an issue in GoogleLLMContext where it would inject the system_message as a "user" message into cases where it was not meant to; it was only meant to do that when there were no "regular" (non-function-call) messages in the context, to ensure that inference would run properly.

  • Fixed an issue in LiveKitTransport where the on_audio_track_subscribed was never emitted.

Other

  • Added new quickstart demos:

    • examples/quickstart: voice AI bot quickstart
    • examples/client-server-web: client/server starter example
    • examples/phone-bot-twilio: twilio starter example
  • Removed most of the examples from the pipecat repo. Examples can now be found in: https://github.com/pipecat-ai/pipecat-examples.

Don't miss a new pipecat release

NewReleases is sending notifications on new releases.