kizuna-ai-lab/sokuji v0.28.0 on GitHub

Highlights

🎙️ Supertonic 3 — recommended local TTS, 31 languages

A new browser-native text-to-speech engine from Supertone (HYBE) — one ~398 MB download covers English, Korean, Japanese, Arabic, German, French, Spanish, Portuguese, Russian, Ukrainian, Vietnamese, Polish, Czech, Dutch, Italian, Turkish, Hindi, and 14 more languages. Runs entirely on your machine via WebGPU (Chrome / Edge), with automatic fallback to WebAssembly on browsers without WebGPU (Firefox, Safari).

Set as the recommended TTS for the languages it covers. Ships with 10 preset voices — five male, five female — selectable from the new voice picker in Settings → Local Inference.

Note: Supertonic 3 does not cover Chinese (zh) or Thai (th). The existing Matcha-zh-en and other per-language models remain available for those.

🗣️ Bring Your Own Voice — Voice Library

You can now import custom voice profiles created with Supertone's Voice Builder (their paid hosted service for cloning voices from a short recording). Drop the resulting voice_style.json into sokuji's voice library and it appears alongside the presets in the dropdown.

Drag-and-drop or file picker import
Rename and delete imported voices
Voices persist in browser storage across sessions
Validation rejects malformed files at import time so the engine never sees broken inputs

Voice changes apply on the next session start — there's no in-session hot-swap by design.

🎧 Granite Speech 4.1 2B — recommended WebGPU ASR/AST

IBM's new compact speech model joins the local ASR lineup as the recommended WebGPU engine for automatic speech translation (AST): listen in one language, transcribe directly in another, without a separate translation pass. Faster end-to-end latency than the Whisper → translation pipeline for languages it supports.

What's improved

Lighter download size for users on Edge/Chrome: the new Supertonic worker uses raw onnxruntime-web (~399 KB worker chunk) instead of the heavier Transformers.js stack — about 1 MB lighter than the comparable Whisper / Voxtral workers.
Voice picker is locked during an active session to prevent confusion about mid-session voice changes that wouldn't actually take effect.
Cleaner error handling: a deleted imported voice no longer surfaces as a red session-error banner; the engine falls back to the default voice and logs a one-line warning.

Installation

Platform	Asset
macOS (Apple Silicon)	`Sokuji-0.28.0-arm64.pkg`
macOS (Intel)	`Sokuji-0.28.0-x64.pkg`
Windows	`Sokuji-0.28.0.Setup.exe`
Linux (.deb, x64)	`sokuji_0.28.0_amd64.deb`
Linux (.deb, arm64)	`sokuji_0.28.0_arm64.deb`
Linux (AppImage, x64)	`Sokuji-0.28.0-x86_64.AppImage`
Linux (AppImage, arm64)	`Sokuji-0.28.0-arm64.AppImage`
Browser extension (Chrome/Edge)	`sokuji-extension-0.28.0.zip`

Existing installations on macOS / Windows auto-update on next launch.

Acknowledgments

Supertonic 3 model © 2026 Supertone Inc., licensed under OpenRAIL-M. Use is subject to that license's responsible-use restrictions — see the LICENSE for the full list of prohibited uses (impersonation without consent, harassment, deceptive use, etc.).

Full change log: v0.27.2…v0.28.0