debpalash/OmniVoice-Studio v0.3.7 on GitHub

A stabilization release that clears the wave of issues reported on the 0.3.6
line — across voice design, dubbing, transcription, install, and the Linux/web
UI — and lands two more opt-in cloning engines. The throughline is non-English
correctness and cross-platform playback: cloned and designed voices now hold
their language end-to-end, and audio plays inline in Linux/Android browsers,
not just macOS. It also carries the v0.3.6 startup-crash fixes, so anyone still
hitting "Can't reach the local backend" on v0.3.5/v0.3.6 only needs to update.

Added

Two opt-in heavyweight TTS engines: MOSS-TTS-v1.5 (8B) and dots.tts (2B).
Both are zero-shot voice-cloning engines, each running in its own isolated
subprocess venv (they pin a transformers version that conflicts with the
parent's >=5.3 — MOSS ==5.0, dots.tts ==4.57) via the same dedicated-venv
pattern as IndexTTS-2, so they can't disturb the default install or its
lockfile. Point OMNIVOICE_MOSS_TTS_V15_DIR / OMNIVOICE_DOTS_TTS_DIR at a
local clone to enable. CUDA/CPU only — neither claims Apple-Silicon MPS, and
dots.tts is gated off on Windows (upstream is Linux/macOS only). See
docs/engines/moss-tts-v15.md and
docs/engines/dots-tts.md. (#498)

Fixed

Non-English voices drifted to English / the wrong language. Three
independent root causes, all in the language path: (1) a voice profile's
stored language was never read back into generation, so a German archetype
that previewed in German generated in English (the preview passed the
language; the user's Generate call didn't); (2) the audiobook/longform synth
hardcoded language=None, letting the engine re-autodetect per chunk so a
non-English clone could flip language mid-render on short/ambiguous lines; and
(3) the duration estimator weighted Unicode combining marks at zero, so
decomposed (NFD) diacritic text — common for Vietnamese — under-allocated
frames and came out rushed. The profile/request language is now threaded
through both the single-shot and longform paths (request wins, profile fills
the gap), and text is NFC-normalized before duration estimation. Each fix has
a fail-before/pass-after regression test. (#533, #505, #502)
Audio playback on Linux Firefox/Chrome and Android Chrome. Two separate
root causes both masquerade as "the play button doesn't work" on non-macOS
browsers — and both are invisible when developing on macOS, which is why they
shipped. (1) The backend served .wav / .flac with Python's default
audio/x-wav / audio/x-flac (vendor-experimental, never IANA-registered);
macOS CoreAudio MIME-sniffs leniently and plays anyway, but Linux FFmpeg and
Android ExoPlayer strictly honor the declared type and prompt to download.
Fixed by registering the canonical audio/wav / audio/flac types before
any StaticFiles mount. (2) WaveSurfer's AudioContext is constructed at
component-mount time — i.e. before any user gesture — so on Linux FF/Chrome
and Android Chrome it stays suspended, decodeAudioData hangs, the
ready event never fires, and the play button never enables. macOS
Safari/Chrome auto-resume on first interaction. Fixed by patching
window.AudioContext to track every instance and resuming them on the first
pointerdown / keydown / touchstart, plus resuming inline on the play
click itself. The MIME fix has a backend regression test; the unlock path
has a Vitest unit test covering idempotency, post-unlock contexts, and
error isolation. (#510)
Voice Studio "Save design as profile" poisoned the profile with
"[object Object]" and then 400'd every generation ("Unsupported instruct
items found in [object Object]"). The save passed the instruct builder
object to the form instead of its string. Fixed at the source + defended with
a coercion helper; the engine now tolerates the sentinel, and a migration
heals already-saved profiles. (#550, #545, #542, #537, #530, #525)
Profile / persona / consent endpoints 500'd with no such column: consent_audio_path (and the same class for kind/vd_states/…) after an
in-place upgrade. The alembic migration existed but couldn't always apply
(stamped at a removed revision, or alembic not importable) and the failure was
swallowed. The runtime schema now self-heals — it ADDs any missing additive
column from the canonical schema on startup. (#552, #547)
Stories: the global reading-speed slider was ignored by preview and stem
export. The #415 global speed only flowed through the full longform export;
per-segment preview and stem export still resolved a hardcoded track.speed || 1.0, so audio played at 1.0× even with the global set to e.g. 0.70×. A shared
effectiveSpeed(track, global) helper (per-line override → global → engine
default) now drives all three generation paths. (#508)
Generate / Settings / Clone buttons were missing / unpressable on Linux.
The UI-scale fix round-trips correctly on Chromium, but older WebKitGTK treats
zoom as a layout no-op, leaving a ~23% black band that pushed the bottom CTAs
off-screen. The shell now probes the engine and fills the window when zoom
doesn't lay out. (#523, #524)
Settings tabs with little content rendered as a stunted box in a black
void (reported on Appearance). The page is now a flex column with a
min-height floor — short tabs fill the panel, tall tabs grow and scroll
exactly as before. The Appearance panel's previously hardcoded English
strings ("UI scale", "Color theme", "Font") were also routed through i18n,
per the localization rule. (#507)
The engine "Install" button 500'd with "No virtual environment found."
uv pip install now targets the running interpreter (--python sys.executable) instead of relying on a venv it couldn't auto-discover.
(#529, #527)
Transcription failed with "no segments" on GPUs without efficient float16.
Both CTranslate2 ASR backends now fall back float16 → int8 instead of crashing
at model load; a transcribe stream can no longer close without a terminal
error event; and an incomplete transformers install reports an actionable
message instead of "Could not import module 'AutoFeatureExtractor'".
(#551, #549, #516)
Audiobook import 500'd with 'AudiobookPlan' object has no attribute 'chapter_count' for every format (.txt/.md/.epub/.pdf). (#543)
Windows: generated audio auto-played in a separate, un-closeable black
window. Renders now play in-app through the shared playback manager. (#532)
Cryptic video-download errors now carry actionable hints: an unsupported
link shape ("paste a direct video page, not a share/feed link") vs a transient
network drop ("just retry — the partial download was cleaned up"). (#554, #536)
A relocated, copied, or restored backend venv ("No module named
'encodings'") now self-heals (rebuilds once) instead of failing on every
launch.
The donate goal bar showed fabricated progress ($137.50 / $200, 23
sponsors). It now reflects the real figures ($10 / $200, 1 sponsor) in both the
runtime JSON and the TypeScript fallback. (#513)
The "Can't reach the local backend" startup-crash wave (pkg_resources
#248, scalar_fastapi #307, exit-106 broken venv) was fixed in v0.3.6 — this
release carries those fixes, so updating from v0.3.5/older resolves them.

Changed

Version is now single-sourced from frontend/package.json. Five
hand-maintained literals drifting is exactly what shipped a 0.3.6 build that
called itself 0.3.5. package.json is canonical (vite already injects it as
__APP_VERSION__), tauri.conf.json reads its bundle version from it
("version": "../package.json"), and the remaining toolchain-required mirrors
(Cargo.toml, pyproject.toml, the frozen-backend fallback) are CI-guarded to
stay in lockstep. (#503)
Updater: the Preview channel actually tracks main again. It was stuck at
0.3.5-41 because its only build trigger was a manual dispatch; a nightly
rebuild now enforces "preview = main" (no-opping on days main didn't move).
Two latent hazards are closed: the preview release is re-asserted as a
prerelease every run (a non-prerelease preview could hijack the Stable
channel's "Latest"), and its manifest can no longer silently drop the
Intel-Mac (darwin-x86_64) target. (#500)

Internal

The frozen desktop backend reported 0.3.5 regardless of its real version.
In a synced env, core.version.APP_VERSION resolves from package metadata
(correct, so CI stayed green), but the PyInstaller-frozen build has no
.dist-info, hit PackageNotFoundError, and fell back to a hardcoded literal.
The spec now bundles omnivoice metadata so the primary path works frozen too,
and the resolution chain is metadata → pyproject → named fallback. This also
fixes About → Version rendering blank in the web/Pinokio build (no Tauri,
backend idle), which now falls back to the build-time version. (#501)

Linux x64 artifacts

487c19c1915e467c35514293a9a8808b17bf1bc35c035cfd7055d72303d6bd8e  OmniVoice Studio_0.3.7_amd64.AppImage
57e04d9f27581eb6ed1f49c739074087641ca4e61b9feb070141fe53e4f0415c  OmniVoice Studio_0.3.7_amd64.AppImage.sig

Windows x64 artifacts

a40154f754719ab780c8b880be444f755d736bfa7a739454c861731817d80eaa *OmniVoice Studio_0.3.7_x64_en-US.msi
8151f634524501616b29aa99a1043d575fd52227d3330a0e3a434975cd3bc451 *OmniVoice Studio_0.3.7_x64_en-US.msi.sig

macOS Intel artifacts

fceb715e0333b51abfd8f6c4e1a1125ec149a09d8ac9c2d840ba941d4fa6d12c  OmniVoice Studio_0.3.7_x64.dmg
6e79c2eae746d2903f37f01e0fc3e6758a77472697b50bddfb316036b9cae08d  OmniVoice Studio.app.tar.gz
92a20548d5fb2ace4347f7d073bcf01b8d620416efb566133a7136d351f2e355  OmniVoice Studio.app.tar.gz.sig

debpalash/OmniVoice-Studio v0.3.7 OmniVoice Studio v0.3.7 on GitHub