github kizuna-ai-lab/sokuji v0.23.0

14 hours ago

A new local transcription option: Voxtral Mini 3B

Sokuji now offers Voxtral Mini 3B 2507 as a local ASR model — a smaller, faster sibling to the existing Voxtral Mini 4B Realtime that runs entirely on your GPU.

Why pick this one over 4B Realtime?

  • More accurate when your source language is known. This is Voxtral 3B's headline feature: it accepts a language hint (e.g. "this is German") and uses it to lock onto the right language during transcription. The 4B Realtime model, by contrast, has to auto-detect on every sentence — which sometimes wanders, especially for typologically close languages.
  • Smaller and faster to load. Around 2.7 GB on GPUs with shader-f16 support, around 3.0 GB elsewhere. Lower VRAM use, quicker startup.
  • 8 supported languages: English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian. For other source languages, the existing 4B Realtime model stays the right pick — it covers a wider language set.

The model appears in Local Inference → Model Management when your source language is one of the supported eight, and on a WebGPU-capable browser (Chrome or Edge). Download it once, then pick it in the ASR selector alongside 4B Realtime.

Finer VAD silence-duration control

The Min Silence Duration slider in Local Inference settings now allows finer adjustments — minimum 0.05 s with 0.05 s steps, down from 0.1 s previously. Useful if you want the model to commit to a transcription faster (or wait a little longer) when there's a brief pause.

Full Changelog: v0.22.0...v0.23.0

Don't miss a new sokuji release

NewReleases is sending notifications on new releases.