kizuna-ai-lab/sokuji v0.17.1 on GitHub

What's New

New Feature — Microsoft Edge TTS (Free, high-quality online TTS)

Edge TTS now available in the renamed Free provider (formerly Local/Offline). 400+ neural voices across 100+ languages, streamed live from Bing's TTS service — completely free, no API key required. (#195)
Voice picker in settings lets you choose from all available voices for your target language (Ava, Keita, Nanami, Denise, and many more). The voice auto-selects the first available option for the current language whenever you switch target languages.
Platform support:
- Desktop (Electron): audio is proxied through the main process with the required headers, so Edge TTS works out of the box.
- Browser extension: uses Chrome's declarativeNetRequest to inject the required header on the WebSocket handshake — works on Chrome/Edge 116+.
Streaming MP3 decoding via mpg123-decoder WASM in a Web Worker keeps playback latency low — audio starts within a few hundred milliseconds of generation.

Bug Fixes

Tab audio capture on Meet/Teams/Zoom now works again (v0.17.1 hotfix) — v0.17.0 shipped with a regression where starting a participant session on supported video-conferencing sites failed with "Extension has not been invoked for the current page". The side panel was being opened via an implicit action-click path that bypassed Chrome's activeTab permission grant. The extension now opens the side panel explicitly from a chrome.action.onClicked handler, preserving the user gesture so tab audio capture can proceed. (#196)
Extension auto-opens the side panel on supported video-conferencing sites (#193) — click the Sokuji icon on Google Meet / Teams / Zoom / Slack and the side panel opens immediately, ready to start a session. On other sites the popup is shown as before. Switching away from a supported tab closes the side panel automatically.
Multiple Edge TTS stability fixes rolled into the v0.17.0/v0.17.1 window:
- Voice auto-selection now runs on any screen when the target language changes, so the voice displayed in the UI always matches the voice actually used for synthesis.
- Audio corruption on certain MP3 chunks (caused by buffer-view aliasing across IPC) has been fixed — audio is consistently clear.
- The TTS pipeline no longer hangs if the decoder fails to reset between sentences, and stale IPC events from a prior generation can no longer resolve the current one.
- MP3 decoder reset is properly awaited before streaming the next sentence, preventing stuttering/duplication.
- Edge TTS voice and speed are now logged on every local.tts.start event for easier debugging.

Under the hood

Provider display name: Local (Offline) → Free across all 31 locales.
Streaming karaoke metadata is now published mid-stream for Edge TTS, matching the non-streaming path's timing behavior.
Bundle adds mpg123-decoder for WASM-based MP3 streaming decode.

Compatibility

Chrome / Edge 116+ (extension)
Desktop app: Windows / macOS / Linux