Faster generation on Apple Silicon
Massive speed gains, from around 20s per generation to 2-3s!
Added native MLX backend support for Apple Silicon, providing significantly faster TTS and STT generation on M-series macOS machines.
Note: this update broke transcriptions on Apple Silicon only, the patch is in the oven as we speak, 0.1.11 will follow.
Features
- MLX Backend: New backend implementation optimized for Apple Silicon using MLX framework
- Dynamic Backend Selection: Automatically detects platform and selects between MLX (macOS) and PyTorch (other platforms)
- Improved Performance: Leverages Apple's unified memory architecture for faster model inference
Backend Changes
- Refactored TTS and STT logic into modular backend implementations (
mlx_backend.py,pytorch_backend.py) - Added platform detection system to handle backend selection at runtime
- Updated model loading and caching to support both backend types
- Enhanced health check endpoints to report active backend type
Build & Release
- Updated build process to include MLX-specific dependencies for macOS builds
- Modified release workflow to handle platform-specific backend bundling
- Added
requirements-mlx.txtfor MLX dependencies
Documentation
- Updated setup and building guides with MLX-specific instructions
- Added troubleshooting guidance for MLX-related issues
- Enhanced architecture documentation to explain backend selection