Dependency and Compatibility Changes
- Removed Triton <3 requirement
- Tested compatibility with Python 3.14 and 3.15
Performance Improvements
- Simulstreaming backend now defaults to MLX-Whisper (if available) or Faster-Whisper (if available) encoders, paired with Whisper cross-attention and decoder using an AlignAtt policy, for increased speed. Can be disabled using
--disable-fast-encoder - Encoders are loaded once and shared in Simulstreaming, reducing vRAM usage
- Only the decoder of Whisper is loaded when using a different encoder, reducing vRAM usage
Frontend Enhancements
- Added a microphone picker
- Loads the UI as a single inline HTML file (instead of separate CSS, JS, SVGs and HTML files) for simplified deployment
Bug Fixes and Improvements
- Resolved warmup error when no connection is provided or when the language is set to auto
- Added pip timeout and retries in Dockerfile when installing Torch/TorchVision/TorchAudio
- Fixed issue where an exception is raised when language is set to 'auto' and task is set to 'translation'
- Enabled auto-detection of language for warmup if not specified