What's New in v0.15.21
Two New State-of-the-Art ASR Engines
This release adds two powerful new WebGPU-based speech recognition engines — the most accurate and fastest local ASR options available in Sokuji.
Cohere Transcribe (2B) — Currently #1 on the Open ASR Leaderboard with 5.42% average WER. Supports 14 languages including English, Chinese, Japanese, Korean, and major European languages. Real-time streaming output with token-level partial results. (~1.5 GB download)
Voxtral Mini 4B Realtime — Mistral's streaming speech recognition model with hybrid endpoint detection: VAD for speech boundaries + punctuation-based sentence splitting for lower translation latency. Supports 13 languages. Auto-selects optimal quantization based on GPU capabilities. (~2.5 GB download)
Both engines require WebGPU support (Chrome/Edge 113+).
Other Changes
- Fix: Log resolved model IDs instead of 'auto' in analytics
- Refactor: Improved type safety in session config analytics