github kizuna-ai-lab/sokuji v0.17.1

19 hours ago

What's New

New Feature — Microsoft Edge TTS (Free, high-quality online TTS)

  • Edge TTS now available in the renamed Free provider (formerly Local/Offline). 400+ neural voices across 100+ languages, streamed live from Bing's TTS service — completely free, no API key required. (#195)

  • Voice picker in settings lets you choose from all available voices for your target language (Ava, Keita, Nanami, Denise, and many more). The voice auto-selects the first available option for the current language whenever you switch target languages.

  • Platform support:

    • Desktop (Electron): audio is proxied through the main process with the required headers, so Edge TTS works out of the box.
    • Browser extension: uses Chrome's declarativeNetRequest to inject the required header on the WebSocket handshake — works on Chrome/Edge 116+.
  • Streaming MP3 decoding via mpg123-decoder WASM in a Web Worker keeps playback latency low — audio starts within a few hundred milliseconds of generation.

Bug Fixes

  • Tab audio capture on Meet/Teams/Zoom now works again (v0.17.1 hotfix) — v0.17.0 shipped with a regression where starting a participant session on supported video-conferencing sites failed with "Extension has not been invoked for the current page". The side panel was being opened via an implicit action-click path that bypassed Chrome's activeTab permission grant. The extension now opens the side panel explicitly from a chrome.action.onClicked handler, preserving the user gesture so tab audio capture can proceed. (#196)

  • Extension auto-opens the side panel on supported video-conferencing sites (#193) — click the Sokuji icon on Google Meet / Teams / Zoom / Slack and the side panel opens immediately, ready to start a session. On other sites the popup is shown as before. Switching away from a supported tab closes the side panel automatically.

  • Multiple Edge TTS stability fixes rolled into the v0.17.0/v0.17.1 window:

    • Voice auto-selection now runs on any screen when the target language changes, so the voice displayed in the UI always matches the voice actually used for synthesis.
    • Audio corruption on certain MP3 chunks (caused by buffer-view aliasing across IPC) has been fixed — audio is consistently clear.
    • The TTS pipeline no longer hangs if the decoder fails to reset between sentences, and stale IPC events from a prior generation can no longer resolve the current one.
    • MP3 decoder reset is properly awaited before streaming the next sentence, preventing stuttering/duplication.
    • Edge TTS voice and speed are now logged on every local.tts.start event for easier debugging.

Under the hood

  • Provider display name: Local (Offline)Free across all 31 locales.
  • Streaming karaoke metadata is now published mid-stream for Edge TTS, matching the non-streaming path's timing behavior.
  • Bundle adds mpg123-decoder for WASM-based MP3 streaming decode.

Compatibility

  • Chrome / Edge 116+ (extension)
  • Desktop app: Windows / macOS / Linux

Don't miss a new sokuji release

NewReleases is sending notifications on new releases.