github jamiepine/voicebox v0.3.0

latest release: v0.3.1
7 hours ago

This release rewrites the backend into a modular architecture, overhauls the settings UI into routed sub-pages, fixes audio player freezing, migrates documentation to Fumadocs, and ships a batch of bug fixes targeting the most-reported issues from the tracker.

The backend's 3,000-line monolith main.py has been decomposed into domain routers, a services layer, and a proper database package. A style guide and ruff configuration now enforce consistency. On the frontend, settings have been split into dedicated routed pages with server logs, a changelog viewer, and an about page. The audio player no longer freezes mid-playback, and model loading status is now visible in the UI. Seven user-reported bugs have been fixed, including server crashes during sample uploads, generation list staleness, cryptic error messages, and CUDA support for RTX 50-series GPUs.

Settings Overhaul (#294)

  • Split settings into routed sub-tabs: General, Generation, GPU, Logs, Changelog, About
  • Added live server log viewer with auto-scroll
  • Added in-app changelog page that parses CHANGELOG.md at build time
  • Added About page with version info, license, and generation folder quick-open
  • Extracted reusable SettingRow component for consistent setting layouts

Audio Player Fix (#293)

  • Fixed audio player freezing during playback
  • Improved playback UX with better state management and listener cleanup
  • Fixed restart race condition during regeneration
  • Added stable keys for audio element re-rendering
  • Improved accessibility across player controls

Backend Refactor (#285)

  • Extracted all routes from main.py into 13 domain routers under backend/routes/main.py dropped from ~3,100 lines to ~10
  • Moved CRUD and service modules into backend/services/, platform detection into backend/utils/
  • Split monolithic database.py into a database/ package with separate models, session, migrations, and seed modules
  • Added backend/STYLE_GUIDE.md and pyproject.toml with ruff linting config
  • Removed dead code: unused _get_cuda_dll_excludes, stale studio.py, example_usage.py, old Makefile
  • Deduplicated shared logic across TTS backends into backends/base.py
  • Improved startup logging with version, platform, data directory, and database stats
  • Fixed startup database session leak — sessions now rollback and close in finally block
  • Isolated shutdown unload calls so one backend failure doesn't block the others
  • Handled null duration in story_items migration
  • Reject model migration when target is a subdirectory of source cache

Documentation Rewrite (#288)

  • Migrated docs site from Mintlify to Fumadocs (Next.js-based)
  • Rewrote introduction and root page with content from README
  • Added "Edit on GitHub" links and last-updated timestamps on all pages
  • Generated OpenAPI spec and auto-generated API reference pages
  • Removed stale planning docs (CUDA_BACKEND_SWAP, EXTERNAL_PROVIDERS, MLX_AUDIO, TTS_PROVIDER_ARCHITECTURE, etc.)
  • Sidebar groups now expand by default; root redirects to /docs
  • Added OG image metadata and /og preview page

UI & Frontend

  • Added model loading status indicator and effects preset dropdown (3187344)
  • Fixed take-label race condition during regeneration
  • Added accessible focus styling to select component
  • Softened select focus indicator opacity
  • Addressed 4 critical and 12 major issues from CodeRabbit review

Bug Fixes (#295)

  • Fixed sample uploads crashing the server — audio decoding now runs in a thread pool instead of blocking the async event loop (#278)
  • Fixed generation list not updating when a generation completes — switched to refetchQueries for reliable cache busting, added SSE error fallback, and page reset on completion (#231)
  • Fixed error toasts showing [object Object] instead of the actual error message (#290)
  • Added Whisper model selection (base, small, medium, large, turbo) and expanded language support to the /transcribe endpoint (#233)
  • Upgraded CUDA backend build from cu121 to cu126 for RTX 50-series (Blackwell) GPU support (#289)
  • Handled client disconnects in SSE and streaming endpoints to suppress [Errno 32] Broken Pipe errors (#248)
  • Fixed Docker build failure from pip hash mismatch on Qwen3-TTS dependencies (#286)
  • Added 50 MB upload size limit with chunked reads to prevent unbounded memory allocation on sample uploads
  • Eliminated redundant double audio decode in sample processing pipeline

Platform Fixes

  • Replaced netstat with TcpStream + PowerShell for Windows port detection (#277)
  • Fixed Docker frontend build and cleaned up Docker docs
  • Fixed macOS download links to use .dmg instead of .app.tar.gz
  • Added dynamic download redirect routes to landing site

Release Tooling

  • Added draft-release-notes and release-bump agent skills
  • Wired CI release workflow to extract notes from CHANGELOG.md for GitHub Releases
  • Backfilled changelog with all historical releases

Don't miss a new voicebox release

NewReleases is sending notifications on new releases.