github kizuna-ai-lab/sokuji v0.28.0

3 hours ago

Highlights

đŸŽ™ïž Supertonic 3 — recommended local TTS, 31 languages

A new browser-native text-to-speech engine from Supertone (HYBE) — one ~398 MB download covers English, Korean, Japanese, Arabic, German, French, Spanish, Portuguese, Russian, Ukrainian, Vietnamese, Polish, Czech, Dutch, Italian, Turkish, Hindi, and 14 more languages. Runs entirely on your machine via WebGPU (Chrome / Edge), with automatic fallback to WebAssembly on browsers without WebGPU (Firefox, Safari).

Set as the recommended TTS for the languages it covers. Ships with 10 preset voices — five male, five female — selectable from the new voice picker in Settings → Local Inference.

Note: Supertonic 3 does not cover Chinese (zh) or Thai (th). The existing Matcha-zh-en and other per-language models remain available for those.

đŸ—Łïž Bring Your Own Voice — Voice Library

You can now import custom voice profiles created with Supertone's Voice Builder (their paid hosted service for cloning voices from a short recording). Drop the resulting voice_style.json into sokuji's voice library and it appears alongside the presets in the dropdown.

  • Drag-and-drop or file picker import
  • Rename and delete imported voices
  • Voices persist in browser storage across sessions
  • Validation rejects malformed files at import time so the engine never sees broken inputs

Voice changes apply on the next session start — there's no in-session hot-swap by design.

🎧 Granite Speech 4.1 2B — recommended WebGPU ASR/AST

IBM's new compact speech model joins the local ASR lineup as the recommended WebGPU engine for automatic speech translation (AST): listen in one language, transcribe directly in another, without a separate translation pass. Faster end-to-end latency than the Whisper → translation pipeline for languages it supports.


What's improved

  • Lighter download size for users on Edge/Chrome: the new Supertonic worker uses raw onnxruntime-web (~399 KB worker chunk) instead of the heavier Transformers.js stack — about 1 MB lighter than the comparable Whisper / Voxtral workers.
  • Voice picker is locked during an active session to prevent confusion about mid-session voice changes that wouldn't actually take effect.
  • Cleaner error handling: a deleted imported voice no longer surfaces as a red session-error banner; the engine falls back to the default voice and logs a one-line warning.

Installation

Platform Asset
macOS (Apple Silicon) Sokuji-0.28.0-arm64.pkg
macOS (Intel) Sokuji-0.28.0-x64.pkg
Windows Sokuji-0.28.0.Setup.exe
Linux (.deb, x64) sokuji_0.28.0_amd64.deb
Linux (.deb, arm64) sokuji_0.28.0_arm64.deb
Linux (AppImage, x64) Sokuji-0.28.0-x86_64.AppImage
Linux (AppImage, arm64) Sokuji-0.28.0-arm64.AppImage
Browser extension (Chrome/Edge) sokuji-extension-0.28.0.zip

Existing installations on macOS / Windows auto-update on next launch.

Acknowledgments

Supertonic 3 model © 2026 Supertone Inc., licensed under OpenRAIL-M. Use is subject to that license's responsible-use restrictions — see the LICENSE for the full list of prohibited uses (impersonation without consent, harassment, deceptive use, etc.).

Granite Speech 4.1 2B © 2026 IBM Research, licensed under Apache 2.0.


Full change log: v0.27.2
v0.28.0

Don't miss a new sokuji release

NewReleases is sending notifications on new releases.