github debpalash/OmniVoice-Studio v0.3.7
OmniVoice Studio v0.3.7

4 hours ago

A stabilization release that clears the wave of issues reported on the 0.3.6
line — across voice design, dubbing, transcription, install, and the Linux/web
UI — and lands two more opt-in cloning engines. The throughline is non-English
correctness and cross-platform playback
: cloned and designed voices now hold
their language end-to-end, and audio plays inline in Linux/Android browsers,
not just macOS. It also carries the v0.3.6 startup-crash fixes, so anyone still
hitting "Can't reach the local backend" on v0.3.5/v0.3.6 only needs to update.

Added

  • Two opt-in heavyweight TTS engines: MOSS-TTS-v1.5 (8B) and dots.tts (2B).
    Both are zero-shot voice-cloning engines, each running in its own isolated
    subprocess venv (they pin a transformers version that conflicts with the
    parent's >=5.3 — MOSS ==5.0, dots.tts ==4.57) via the same dedicated-venv
    pattern as IndexTTS-2, so they can't disturb the default install or its
    lockfile. Point OMNIVOICE_MOSS_TTS_V15_DIR / OMNIVOICE_DOTS_TTS_DIR at a
    local clone to enable. CUDA/CPU only — neither claims Apple-Silicon MPS, and
    dots.tts is gated off on Windows (upstream is Linux/macOS only). See
    docs/engines/moss-tts-v15.md and
    docs/engines/dots-tts.md. (#498)

Fixed

  • Non-English voices drifted to English / the wrong language. Three
    independent root causes, all in the language path: (1) a voice profile's
    stored language was never read back into generation, so a German archetype
    that previewed in German generated in English (the preview passed the
    language; the user's Generate call didn't); (2) the audiobook/longform synth
    hardcoded language=None, letting the engine re-autodetect per chunk so a
    non-English clone could flip language mid-render on short/ambiguous lines; and
    (3) the duration estimator weighted Unicode combining marks at zero, so
    decomposed (NFD) diacritic text — common for Vietnamese — under-allocated
    frames and came out rushed. The profile/request language is now threaded
    through both the single-shot and longform paths (request wins, profile fills
    the gap), and text is NFC-normalized before duration estimation. Each fix has
    a fail-before/pass-after regression test. (#533, #505, #502)
  • Audio playback on Linux Firefox/Chrome and Android Chrome. Two separate
    root causes both masquerade as "the play button doesn't work" on non-macOS
    browsers — and both are invisible when developing on macOS, which is why they
    shipped. (1) The backend served .wav / .flac with Python's default
    audio/x-wav / audio/x-flac (vendor-experimental, never IANA-registered);
    macOS CoreAudio MIME-sniffs leniently and plays anyway, but Linux FFmpeg and
    Android ExoPlayer strictly honor the declared type and prompt to download.
    Fixed by registering the canonical audio/wav / audio/flac types before
    any StaticFiles mount. (2) WaveSurfer's AudioContext is constructed at
    component-mount time — i.e. before any user gesture — so on Linux FF/Chrome
    and Android Chrome it stays suspended, decodeAudioData hangs, the
    ready event never fires, and the play button never enables. macOS
    Safari/Chrome auto-resume on first interaction. Fixed by patching
    window.AudioContext to track every instance and resuming them on the first
    pointerdown / keydown / touchstart, plus resuming inline on the play
    click itself. The MIME fix has a backend regression test; the unlock path
    has a Vitest unit test covering idempotency, post-unlock contexts, and
    error isolation. (#510)
  • Voice Studio "Save design as profile" poisoned the profile with
    "[object Object]" and then 400'd every generation
    ("Unsupported instruct
    items found in [object Object]"). The save passed the instruct builder
    object
    to the form instead of its string. Fixed at the source + defended with
    a coercion helper; the engine now tolerates the sentinel, and a migration
    heals already-saved profiles. (#550, #545, #542, #537, #530, #525)
  • Profile / persona / consent endpoints 500'd with no such column: consent_audio_path (and the same class for kind/vd_states/…) after an
    in-place upgrade. The alembic migration existed but couldn't always apply
    (stamped at a removed revision, or alembic not importable) and the failure was
    swallowed. The runtime schema now self-heals — it ADDs any missing additive
    column from the canonical schema on startup. (#552, #547)
  • Stories: the global reading-speed slider was ignored by preview and stem
    export.
    The #415 global speed only flowed through the full longform export;
    per-segment preview and stem export still resolved a hardcoded track.speed || 1.0, so audio played at 1.0× even with the global set to e.g. 0.70×. A shared
    effectiveSpeed(track, global) helper (per-line override → global → engine
    default) now drives all three generation paths. (#508)
  • Generate / Settings / Clone buttons were missing / unpressable on Linux.
    The UI-scale fix round-trips correctly on Chromium, but older WebKitGTK treats
    zoom as a layout no-op, leaving a ~23% black band that pushed the bottom CTAs
    off-screen. The shell now probes the engine and fills the window when zoom
    doesn't lay out. (#523, #524)
  • Settings tabs with little content rendered as a stunted box in a black
    void
    (reported on Appearance). The page is now a flex column with a
    min-height floor — short tabs fill the panel, tall tabs grow and scroll
    exactly as before. The Appearance panel's previously hardcoded English
    strings ("UI scale", "Color theme", "Font") were also routed through i18n,
    per the localization rule. (#507)
  • The engine "Install" button 500'd with "No virtual environment found."
    uv pip install now targets the running interpreter (--python sys.executable) instead of relying on a venv it couldn't auto-discover.
    (#529, #527)
  • Transcription failed with "no segments" on GPUs without efficient float16.
    Both CTranslate2 ASR backends now fall back float16 → int8 instead of crashing
    at model load; a transcribe stream can no longer close without a terminal
    error event; and an incomplete transformers install reports an actionable
    message instead of "Could not import module 'AutoFeatureExtractor'".
    (#551, #549, #516)
  • Audiobook import 500'd with 'AudiobookPlan' object has no attribute 'chapter_count' for every format (.txt/.md/.epub/.pdf). (#543)
  • Windows: generated audio auto-played in a separate, un-closeable black
    window.
    Renders now play in-app through the shared playback manager. (#532)
  • Cryptic video-download errors now carry actionable hints: an unsupported
    link shape ("paste a direct video page, not a share/feed link") vs a transient
    network drop ("just retry — the partial download was cleaned up"). (#554, #536)
  • A relocated, copied, or restored backend venv ("No module named
    'encodings'") now self-heals
    (rebuilds once) instead of failing on every
    launch.
  • The donate goal bar showed fabricated progress ($137.50 / $200, 23
    sponsors). It now reflects the real figures ($10 / $200, 1 sponsor) in both the
    runtime JSON and the TypeScript fallback. (#513)
  • The "Can't reach the local backend" startup-crash wave (pkg_resources
    #248, scalar_fastapi #307, exit-106 broken venv) was fixed in v0.3.6 — this
    release carries those fixes, so updating from v0.3.5/older resolves them.

Changed

  • Version is now single-sourced from frontend/package.json. Five
    hand-maintained literals drifting is exactly what shipped a 0.3.6 build that
    called itself 0.3.5. package.json is canonical (vite already injects it as
    __APP_VERSION__), tauri.conf.json reads its bundle version from it
    ("version": "../package.json"), and the remaining toolchain-required mirrors
    (Cargo.toml, pyproject.toml, the frozen-backend fallback) are CI-guarded to
    stay in lockstep. (#503)
  • Updater: the Preview channel actually tracks main again. It was stuck at
    0.3.5-41 because its only build trigger was a manual dispatch; a nightly
    rebuild now enforces "preview = main" (no-opping on days main didn't move).
    Two latent hazards are closed: the preview release is re-asserted as a
    prerelease every run (a non-prerelease preview could hijack the Stable
    channel's "Latest"), and its manifest can no longer silently drop the
    Intel-Mac (darwin-x86_64) target. (#500)

Internal

  • The frozen desktop backend reported 0.3.5 regardless of its real version.
    In a synced env, core.version.APP_VERSION resolves from package metadata
    (correct, so CI stayed green), but the PyInstaller-frozen build has no
    .dist-info, hit PackageNotFoundError, and fell back to a hardcoded literal.
    The spec now bundles omnivoice metadata so the primary path works frozen too,
    and the resolution chain is metadata → pyproject → named fallback. This also
    fixes About → Version rendering blank in the web/Pinokio build (no Tauri,
    backend idle), which now falls back to the build-time version. (#501)

Linux x64 artifacts

487c19c1915e467c35514293a9a8808b17bf1bc35c035cfd7055d72303d6bd8e  OmniVoice Studio_0.3.7_amd64.AppImage
57e04d9f27581eb6ed1f49c739074087641ca4e61b9feb070141fe53e4f0415c  OmniVoice Studio_0.3.7_amd64.AppImage.sig

Windows x64 artifacts

a40154f754719ab780c8b880be444f755d736bfa7a739454c861731817d80eaa *OmniVoice Studio_0.3.7_x64_en-US.msi
8151f634524501616b29aa99a1043d575fd52227d3330a0e3a434975cd3bc451 *OmniVoice Studio_0.3.7_x64_en-US.msi.sig

macOS Intel artifacts

fceb715e0333b51abfd8f6c4e1a1125ec149a09d8ac9c2d840ba941d4fa6d12c  OmniVoice Studio_0.3.7_x64.dmg
6e79c2eae746d2903f37f01e0fc3e6758a77472697b50bddfb316036b9cae08d  OmniVoice Studio.app.tar.gz
92a20548d5fb2ace4347f7d073bcf01b8d620416efb566133a7136d351f2e355  OmniVoice Studio.app.tar.gz.sig

Don't miss a new OmniVoice-Studio release

NewReleases is sending notifications on new releases.