github jamiepine/voicebox v0.1.0

latest releases: v0.3.1, v0.3.0, v0.2.3...
one month ago

Voicebox v0.1.0

The first public release of Voicebox — an open-source voice synthesis studio powered by Qwen3-TTS.


Download

Platform Status
macOS (Apple Silicon) Available
macOS (Intel) Available
Windows (x64) Available
Linux Coming soon*

*Linux builds are delayed due to GitHub Actions CI issues. We're working on it and will release Linux support in v0.1.1.


What's in this release

Voice Cloning with Qwen3-TTS

Clone any voice from just a few seconds of audio using Alibaba's Qwen3-TTS model.

  • Automatic model download — Models download from HuggingFace on first use
  • Multiple model sizes — Support for 1.7B and 0.6B parameter models
  • Voice prompt caching — Regenerate instantly without reprocessing audio
  • Multi-language — English and Chinese support

Voice Profile Management

  • Create profiles from audio files or record directly in the app
  • Multiple samples per profile — Combine samples for higher quality cloning
  • Import/Export — Share profiles or back them up
  • Automatic transcription — Whisper extracts reference text from samples

Speech Generation

  • Simple text-to-speech — Select a profile, type text, generate
  • Seed control — Reproducible generations with optional seed input
  • Long-form support — Generate up to 5,000 characters at once

Generation History

  • Full history — Every generation is saved with metadata
  • Search — Find past generations by text content
  • Inline playback — Listen without leaving the app
  • Download — Export audio files to your system

Flexible Deployment

  • Local mode — Backend runs alongside the desktop app
  • Remote mode — Connect to a GPU server on your network
  • One-click server — Turn any machine into a Voicebox server

Desktop Experience

  • Native performance — Built with Tauri (Rust), not Electron
  • Cross-platform — Same experience on macOS and Windows
  • Bundled backend — No Python installation required

Tech Stack

  • Desktop: Tauri v2 (Rust)
  • Frontend: React, TypeScript, Tailwind CSS
  • Backend: FastAPI (Python)
  • Voice Model: Qwen3-TTS
  • Transcription: Whisper
  • Database: SQLite

Known Issues

  • First launch is slow — Model downloads (2-7GB) on first use
  • Apple Silicon performance — Generation takes ~10s per paragraph on M1/M2 chips; CUDA is significantly faster
  • Linux not available — CI pipeline issues; coming in v0.1.1

What's Next

We're already working on the next release. Here's a preview:

  • Linux support — Top priority
  • Real-time synthesis — Stream audio as it generates
  • Voice effects — Pitch shift, reverb, and more
  • Timeline editor — Word-level precision audio editing
  • Conversation mode — Multi-speaker dialogue generation
  • More models — XTTS, Bark, and other open-source voice models

Feedback

Found a bug? Have a feature request? Open an issue on GitHub or reach out at voicebox.sh.


Thank you for trying Voicebox!

P.S: This was originally released yesterday, note to self, don't let Claude manage GitHub tags with bypass permissions turned on.

Don't miss a new voicebox release

NewReleases is sending notifications on new releases.