jamiepine/voicebox v0.1.0
on GitHub

latest releases: v0.3.1, v0.3.0, v0.2.3...

one month ago

Voicebox v0.1.0

The first public release of Voicebox — an open-source voice synthesis studio powered by Qwen3-TTS.

Download

Platform	Status
macOS (Apple Silicon)	Available
macOS (Intel)	Available
Windows (x64)	Available
Linux	Coming soon*

*Linux builds are delayed due to GitHub Actions CI issues. We're working on it and will release Linux support in v0.1.1.

What's in this release

Voice Cloning with Qwen3-TTS

Clone any voice from just a few seconds of audio using Alibaba's Qwen3-TTS model.

Automatic model download — Models download from HuggingFace on first use
Multiple model sizes — Support for 1.7B and 0.6B parameter models
Voice prompt caching — Regenerate instantly without reprocessing audio
Multi-language — English and Chinese support

Voice Profile Management

Create profiles from audio files or record directly in the app
Multiple samples per profile — Combine samples for higher quality cloning
Import/Export — Share profiles or back them up
Automatic transcription — Whisper extracts reference text from samples

Speech Generation

Simple text-to-speech — Select a profile, type text, generate
Seed control — Reproducible generations with optional seed input
Long-form support — Generate up to 5,000 characters at once

Generation History

Full history — Every generation is saved with metadata
Search — Find past generations by text content
Inline playback — Listen without leaving the app
Download — Export audio files to your system

Flexible Deployment

Local mode — Backend runs alongside the desktop app
Remote mode — Connect to a GPU server on your network
One-click server — Turn any machine into a Voicebox server

Desktop Experience

Native performance — Built with Tauri (Rust), not Electron
Cross-platform — Same experience on macOS and Windows
Bundled backend — No Python installation required

Tech Stack

Desktop: Tauri v2 (Rust)
Frontend: React, TypeScript, Tailwind CSS
Backend: FastAPI (Python)
Voice Model: Qwen3-TTS
Transcription: Whisper
Database: SQLite

Known Issues

First launch is slow — Model downloads (2-7GB) on first use
Apple Silicon performance — Generation takes ~10s per paragraph on M1/M2 chips; CUDA is significantly faster
Linux not available — CI pipeline issues; coming in v0.1.1

What's Next

We're already working on the next release. Here's a preview:

Linux support — Top priority
Real-time synthesis — Stream audio as it generates
Voice effects — Pitch shift, reverb, and more
Timeline editor — Word-level precision audio editing
Conversation mode — Multi-speaker dialogue generation
More models — XTTS, Bark, and other open-source voice models

Feedback

Found a bug? Have a feature request? Open an issue on GitHub or reach out at voicebox.sh.

Thank you for trying Voicebox!

P.S: This was originally released yesterday, note to self, don't let Claude manage GitHub tags with bypass permissions turned on.

Check out latest releases or
releases around jamiepine/voicebox v0.1.0

Don't miss a new voicebox release

NewReleases is sending notifications on new releases.

Get notifications