kizuna-ai-lab/sokuji v0.15.27 on GitHub

What's New

Cohere Transcribe VAD Fix

Cohere Transcribe was incorrectly classified as a streaming model, which meant VAD (Voice Activity Detection) settings had no effect when using it. Speech threshold, minimum silence duration, and minimum speech duration sliders were hidden and their values ignored.

Cohere Transcribe now correctly runs through the offline ASR pipeline with full VAD configuration support
Token-level partial results (text appearing word-by-word during inference) continue to work as before

VAD Settings for Voxtral

Voxtral Mini 4B also uses VAD for speech detection, but its VAD settings were previously hardcoded and the settings sliders were hidden. VAD settings are now visible and configurable when Voxtral is selected.

Unified VAD Configuration

All browser-based ASR models that use Voice Activity Detection (Whisper, Cohere Transcribe, Granite Speech, Voxtral) now share a consistent configuration interface. Previously, some models read user settings while others used hardcoded defaults — now all respect the same VAD sliders in settings.

Install: Chrome Web Store · Website