A new local transcription option: Voxtral Mini 3B
Sokuji now offers Voxtral Mini 3B 2507 as a local ASR model — a smaller, faster sibling to the existing Voxtral Mini 4B Realtime that runs entirely on your GPU.
Why pick this one over 4B Realtime?
- More accurate when your source language is known. This is Voxtral 3B's headline feature: it accepts a language hint (e.g. "this is German") and uses it to lock onto the right language during transcription. The 4B Realtime model, by contrast, has to auto-detect on every sentence — which sometimes wanders, especially for typologically close languages.
- Smaller and faster to load. Around 2.7 GB on GPUs with shader-f16 support, around 3.0 GB elsewhere. Lower VRAM use, quicker startup.
- 8 supported languages: English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian. For other source languages, the existing 4B Realtime model stays the right pick — it covers a wider language set.
The model appears in Local Inference → Model Management when your source language is one of the supported eight, and on a WebGPU-capable browser (Chrome or Edge). Download it once, then pick it in the ASR selector alongside 4B Realtime.
Finer VAD silence-duration control
The Min Silence Duration slider in Local Inference settings now allows finer adjustments — minimum 0.05 s with 0.05 s steps, down from 0.1 s previously. Useful if you want the model to commit to a transcription faster (or wait a little longer) when there's a brief pause.
Full Changelog: v0.22.0...v0.23.0