github debpalash/OmniVoice-Studio v0.3.10
v0.3.10 — OmniVoice Studio

7 hours ago

The listening release — nine fixes in twenty-four hours, almost all driven by your v0.3.9 field reports (several with same-day turnaround). The dubbing pipeline stops lying: Cinematic and Autofit can no longer invent dialogue, the speaker count you set is honored on every path (and auto-cloning stops fabricating voices from guessed labels), and the timeline stops flashing invisible on Windows. Audiobook chapters with pauses render again. And one fix everyone should want: updating can no longer leave you secretly running the old version — a leftover backend from a previous install holding the port is now detected and replaced at launch. Plus: the Dub tab's LLM engine finally runs on the provider you configured in Settings, history timestamps stop reading "20617d ago", and the Engines page can't crash under concurrent load.

Fixed

  • Audiobook/Stories chapters with a [pause] no longer fail to render. Pause spans were built as 1-D silence while every TTS engine returns 2-D audio, so the chapter concatenation crashed with Tensors must have same number of dimensions — any chapter containing a pause failed on every attempt (reported with a precise trace in #897). Silence now matches the rendered audio's shape at the source, and the chunk concatenator defensively normalizes mixed ranks (including honest mono→stereo broadcast) so no engine can re-trigger the class. (#953)
  • Cinematic and Autofit dubbing can no longer invent dialogue. The refine and slot-fit passes accepted any non-empty LLM reply for Latin-script languages — hallucinated lines, refusals, or the critique itself could ship as the dub. Every reply is now checked against the original line (length window, target script, critique echo — tunable via OMNIVOICE_REFINE_RATIO_MIN/MAX), rejected output falls back to the literal translation with an adapt-diverged/fit-diverged marker, lines too short to honestly fill their slot skip LLM expansion entirely, and both passes pin temperature=0.2 like the Fast path. (#950)
  • Dub timeline boxes can no longer flash invisible during playback. On some Windows GPU/WebView2 driver combos the segment boxes under the video vanished and reappeared while playing (first reported in #373; the earlier fix was incomplete) — the timeline lane still animated a CSS transform every playback tick, keeping the translucent boxes on a composited layer that the driver mis-painted. Boxes are now positioned in pure layout with fully opaque theme-aware fills (pixel-identical colors), removing the glitch class on every platform. (#951)
  • The dub "Speakers" count now actually does something — on every path. The hint only reached pyannote; the common fallbacks silently ignored it (the no-diarization heuristic was hardcoded to alternate two speakers, and the FunASR shortcut never consulted it). The heuristic now cycles the requested count, an explicit count routes through pyannote when available, every path that can only approximate (or must ignore) the setting says so in a visible warning, and the legacy endpoint + a new CLI --speakers flag accept it too. Auto voice-cloning also stops fabricating voices from guessed labels: reference slices under 1.5s are rejected, slices bordering another speaker's turn are avoided, and cloning is skipped with an honest warning when speaker labels came from the gap heuristic instead of real diarization. (#952)
  • Settings → Engines can no longer 500 under concurrent loads. The lazy TTS/ASR engine registries held a live dictionary iterator open across each engine's is_available() probe while list_backends() ran in a FastAPI threadpool — so a second concurrent /engines request materializing a lazy engine entry (self[key] = cls) mutated the dict mid-iteration and crashed the request with RuntimeError: dictionary changed size during iteration. Both registries now snapshot their keys before iterating (atomic under the GIL), immune to a concurrent insert; regression-tested for TTS and ASR. (#940)
  • The Dub tab's LLM translation engine now runs on your configured LLM provider. Picking "LLM (OpenAI-compatible)" silently required three hand-set environment variables even when a provider was already configured and tested in Settings → LLM Providers; it now resolves through a new "Dub translation" LLM skill (route it to any provider — remote or local — in Settings → LLM Skills, independently of Cinematic refinement), keeps the TRANSLATE_* env vars as a power-user override, bounds every call with the LLM timeout instead of the SDK's 600-second default, tells the Engine dropdown whether the engine is actually ready (and via which provider), and — when nothing is configured — returns a clear pointer to Settings → LLM Providers instead of a raw 401 per segment. (#944)
  • Timestamps no longer show "20617d ago" in OmniDrive/Projects. The backend stores record times in Unix seconds while some views assumed milliseconds, so generation-history cards rendered as ~1970 ("20617d ago") and sorted last; every relative-time label (OmniDrive, sidebar history, dub projects, batch queue, transcriptions) now goes through one unit-tolerant formatter, and records missing a timestamp show "—" instead of an epoch age.
  • Updating can no longer leave you secretly running the old version. If a backend from a previous version was still holding the port (an orphan that survived an update), the new app "attached" to it because it answered health checks — so every fix in the update appeared to change nothing (the reported "bound port blocked the newer version"). The launcher now compares the running backend's version against the app before attaching: same version attaches as before, a stale one is killed and the bundled backend is started in its place — on macOS, Windows, and Linux. (#947)

Linux x64 artifacts

775d8efc66728dfab8841d46d8b7f1808d36b162bef46b9107c87778ef40d4b0  OmniVoice Studio_0.3.10_amd64.AppImage
eca4815812389587f6a4b13f79e15f74aaf5b746183b526a38e8ad9dd3bc49c3  OmniVoice Studio_0.3.10_amd64.AppImage.sig

macOS Intel artifacts

0b4504b1ccfee742a85a12a77eff2efb146d12f617b53360777bdaff88e227dd  OmniVoice Studio_0.3.10_x64.dmg
847d9c65aedd2432b7e755d11311580a854bd333719a5d270d6d58a4e6bd3b4c  OmniVoice Studio.app.tar.gz
eb641274d41b18e94bcfb7373e1b0dbcf52eb65f45189f949cd9407ec00682a2  OmniVoice Studio.app.tar.gz.sig

Windows x64 artifacts

2b5a92c44b4c76334634d57ae3682263fd1aeaeadcc76af934e42b6864c1d791 *OmniVoice Studio_0.3.10_x64_en-US.msi
1ca5097928aff568155deeeef80821da83a105fec40d7845024c7055815d8099 *OmniVoice Studio_0.3.10_x64_en-US.msi.sig

Don't miss a new OmniVoice-Studio release

NewReleases is sending notifications on new releases.