Voxtral backend & benchmarks
New: Voxtral backend
- voxtral-mlx: Native MLX backend for Apple Silicon. Runs at 0.18-0.32x real-time, handles 100+ languages with automatic language detection. No extra dependencies needed on macOS.
- voxtral (HF): HuggingFace transformers backend for Linux/GPU. Requires
pip install transformers torch.
Benchmarks
New offline benchmark harness (test_backend_offline.py --benchmark) that runs all installed backends and computes WER, RTF, and timestamp accuracy against ground truth transcripts. Results exportable as JSON.
Full benchmark report in BENCHMARK.md with tables, charts, and recommendations for every backend/policy/model combination.
Bug fixes
- Fixed silence double-counting in the audio processor
- Fixed median calculation for even-length lists in timestamp accuracy
- Fixed RTF inflation in metrics collector (was using wall-clock time instead of ASR processing time)