github jamiepine/voicebox v0.1.10

latest releases: v0.3.1, v0.3.0, v0.2.3...
one month ago

Faster generation on Apple Silicon

Massive speed gains, from around 20s per generation to 2-3s!

Added native MLX backend support for Apple Silicon, providing significantly faster TTS and STT generation on M-series macOS machines.

Note: this update broke transcriptions on Apple Silicon only, the patch is in the oven as we speak, 0.1.11 will follow.

Features

  • MLX Backend: New backend implementation optimized for Apple Silicon using MLX framework
  • Dynamic Backend Selection: Automatically detects platform and selects between MLX (macOS) and PyTorch (other platforms)
  • Improved Performance: Leverages Apple's unified memory architecture for faster model inference

Backend Changes

  • Refactored TTS and STT logic into modular backend implementations (mlx_backend.py, pytorch_backend.py)
  • Added platform detection system to handle backend selection at runtime
  • Updated model loading and caching to support both backend types
  • Enhanced health check endpoints to report active backend type

Build & Release

  • Updated build process to include MLX-specific dependencies for macOS builds
  • Modified release workflow to handle platform-specific backend bundling
  • Added requirements-mlx.txt for MLX dependencies

Documentation

  • Updated setup and building guides with MLX-specific instructions
  • Added troubleshooting guidance for MLX-related issues
  • Enhanced architecture documentation to explain backend selection

Don't miss a new voicebox release

NewReleases is sending notifications on new releases.