Highlights
đïž Supertonic 3 â recommended local TTS, 31 languages
A new browser-native text-to-speech engine from Supertone (HYBE) â one ~398 MB download covers English, Korean, Japanese, Arabic, German, French, Spanish, Portuguese, Russian, Ukrainian, Vietnamese, Polish, Czech, Dutch, Italian, Turkish, Hindi, and 14 more languages. Runs entirely on your machine via WebGPU (Chrome / Edge), with automatic fallback to WebAssembly on browsers without WebGPU (Firefox, Safari).
Set as the recommended TTS for the languages it covers. Ships with 10 preset voices â five male, five female â selectable from the new voice picker in Settings â Local Inference.
Note: Supertonic 3 does not cover Chinese (
zh) or Thai (th). The existing Matcha-zh-en and other per-language models remain available for those.
đŁïž Bring Your Own Voice â Voice Library
You can now import custom voice profiles created with Supertone's Voice Builder (their paid hosted service for cloning voices from a short recording). Drop the resulting voice_style.json into sokuji's voice library and it appears alongside the presets in the dropdown.
- Drag-and-drop or file picker import
- Rename and delete imported voices
- Voices persist in browser storage across sessions
- Validation rejects malformed files at import time so the engine never sees broken inputs
Voice changes apply on the next session start â there's no in-session hot-swap by design.
đ§ Granite Speech 4.1 2B â recommended WebGPU ASR/AST
IBM's new compact speech model joins the local ASR lineup as the recommended WebGPU engine for automatic speech translation (AST): listen in one language, transcribe directly in another, without a separate translation pass. Faster end-to-end latency than the Whisper â translation pipeline for languages it supports.
What's improved
- Lighter download size for users on Edge/Chrome: the new Supertonic worker uses raw onnxruntime-web (~399 KB worker chunk) instead of the heavier Transformers.js stack â about 1 MB lighter than the comparable Whisper / Voxtral workers.
- Voice picker is locked during an active session to prevent confusion about mid-session voice changes that wouldn't actually take effect.
- Cleaner error handling: a deleted imported voice no longer surfaces as a red session-error banner; the engine falls back to the default voice and logs a one-line warning.
Installation
| Platform | Asset |
|---|---|
| macOS (Apple Silicon) | Sokuji-0.28.0-arm64.pkg
|
| macOS (Intel) | Sokuji-0.28.0-x64.pkg
|
| Windows | Sokuji-0.28.0.Setup.exe
|
| Linux (.deb, x64) | sokuji_0.28.0_amd64.deb
|
| Linux (.deb, arm64) | sokuji_0.28.0_arm64.deb
|
| Linux (AppImage, x64) | Sokuji-0.28.0-x86_64.AppImage
|
| Linux (AppImage, arm64) | Sokuji-0.28.0-arm64.AppImage
|
| Browser extension (Chrome/Edge) | sokuji-extension-0.28.0.zip
|
Existing installations on macOS / Windows auto-update on next launch.
Acknowledgments
Supertonic 3 model © 2026 Supertone Inc., licensed under OpenRAIL-M. Use is subject to that license's responsible-use restrictions â see the LICENSE for the full list of prohibited uses (impersonation without consent, harassment, deceptive use, etc.).
Granite Speech 4.1 2B © 2026 IBM Research, licensed under Apache 2.0.
Full change log: v0.27.2âŠv0.28.0