Chatterbox TTS Server v2.0.0
The complete Chatterbox family on every major GPU stack, behind one OpenAI-compatible API and a modern Web UI.
v1.0.0 shipped an English-only, CUDA 12.1 / 12.8 server. v2.0.0 closes the gap to a full release: all three Chatterbox models (Original, Multilingual, Turbo) hot-swappable from the UI, every consumer GPU family supported (NVIDIA cu121 / cu128 / cu130, AMD ROCm, AMD Strix Halo, Apple MPS, CPU), Portable Mode on Windows so the entire app folder can be copied to a USB stick, plus a long list of install, security, and quality fixes from community PRs and reports.
Headline changes
Complete Chatterbox lineup, hot-swappable
- Original Chatterbox (0.5B, English, emotion exaggeration, LLaMA backbone).
- Chatterbox Multilingual (0.5B, 23 languages, voice cloning, emotion control).
- Chatterbox Turbo (350M, 1-step diffusion decoder, paralinguistic tags like
[laugh],[cough],[chuckle]). - Engine selector dropdown at the top of the Web UI, the backend swaps without restart or config change. Same
/ttsand/v1/audio/speechsurface for all three.
Portable Mode (Windows)
- The launcher now offers a fully self-contained installation: Python 3.10 runtime, venv, models, and dependencies all live inside the project folder.
- Copy the folder to a USB drive, zip it, move it anywhere, no Python install on the target machine. Double-click
start.bat. - Selected by default during first-time setup, or pass
--portable/--no-portableto skip the prompt.
Every GPU stack covered
- NVIDIA CUDA 12.1 (default, RTX 30/40 series).
- NVIDIA CUDA 12.8 (
docker-compose-cu128.yml, RTX 5060 Ti / 5070 / 5070 Ti / 5080 / 5090, sm_120). - NVIDIA CUDA 13.0 (
docker-compose-cu130.yml, sm_121, DGX Spark / GB10), via @osos in #135. - AMD ROCm 6.1 with PyTorch 2.5.1, two-step install so pip cannot replace ROCm wheels with CPU-only versions.
- AMD Strix Halo (
docker-compose-strixhalo.yml, ROCm 7.2, HSA_OVERRIDE), via @0xrushi in #125. - Apple Silicon (MPS) with Turbo float64 fix.
- CPU on a small
python:3.10-slimDocker image (no more 4 GB CUDA base layer for CPU users).
New features
- Streaming
/ttsendpoint. Opt-instream: trueparameter returns aStreamingResponsethat flushes WAV bytes per chunk with 20 ms crossfades. Default behavior unchanged. Useful for long-form / audiobook workloads where time-to-first-byte matters. Via @D34DC3N73R in #140. - Voice conditioning cache. Repeated requests against the same reference voice skip re-encoding. Real latency win for batch and OpenAI-endpoint workflows. Cleared automatically on
reload_model(). Via @0xrushi in #125. - Opt-in BF16 inference.
TTS_BF16=on(or=auto) converts T3 to bfloat16 and runsgenerate()under autocast. Default isoffto preserve existing behavior on upgrade. Roughly 40% throughput improvement on bf16-capable cards according to the contributor. - HTTPS / SSL support. Optional
ssl_certfileandssl_keyfileinconfig.yaml. /api/unloadendpoint. Releases GPU memory without restarting the server. Via @CameronSima in #103./v1/audio/voicesendpoint. OpenAI-compatible voice listing with optional model parameter. Via @sjoelund in #127.- Dynamic language selector. UI populates the language dropdown from
SUPPORTED_LANGUAGESexposed by the multilingual engine. Via @htrex in #117. - Paralinguistic tag presets. New
ui/presets.yamlentries demonstrating[laugh],[cough],[chuckle]for Turbo.
Security
- CWE-22 path traversal fixed in
/ttsand/v1/audio/speech. Voice file parameters now go throughutils.safe_resolve_within()and reject../ absolute paths with HTTP 400. Default-deny posture. Via @sebastiondev in #147.
Install fixes (the long list)
- Chatterbox installed with
--no-depsacross every install path. Eliminates ONNX source builds, torch downgrade conflicts, and CMake errors that affected most users on first install. - All chatterbox dependencies (conformer, diffusers, transformers, s3tokenizer, etc.) now listed and pinned explicitly in each requirements file.
onnx==1.16.0pinned everywhere so pip uses pre-built wheels.- Apple Silicon Turbo crash fixed (
Cannot convert a MPS Tensor to float64). Forces float32 ins3tokenizerandvoice_encoder. Patch baked into the chatterbox-v2 fork and also applied as a post-install patch instart.pyfor users on other chatterbox versions. Reported by @jonas3245 in #93. - ROCm requirements switched to PyTorch's official ROCm 6.1 wheel index. Two-step install (
requirements-rocm-init.txtfirst) prevents pip from replacing ROCm torch with CPU-only wheels. - Python 3.10 is now hard-required, not just recommended. It is the only version with pre-built wheels for the full dependency tree. Portable Mode handles this automatically on Windows.
config.yamldefault device changed fromcudatoautoso the engine picks CUDA / MPS / CPU correctly.- Lightweight
Dockerfile.cpubased onpython:3.10-sliminstead of the 4 GB+ NVIDIA CUDA base image. - Deprecated
version:keys removed from all docker-compose files. - protobuf force-upgraded after
requirementsinstall to keep onnx and descript-audiotools compatible. - FastAPI pinned
<0.116to avoid the starlette 1.0TemplateResponsebreaking change. - numpy constraint relaxed to
<2.0for dependency compatibility.
Bug fixes
- Chunking error on text with stray dashes (#144). Bullet-point detection regex was matching
\s+after-, including newlines. A line like-Within the Tunnels of Alexander-followed by a paragraph break got read as a single bullet item that swallowed the rest of the paragraph, producing one huge unsplit chunk that crashed the engine. The pattern now requires a regular space or tab after the bullet char so genuine- itemand1. itemstill split, and stray dashes do not. Reported by @Torlek. - Colab Web UI proxy URL. Fixed using
kernel.proxyPort. Via @bakamono12 in #141. - Frontend error serialization. Fixed via @htrex in #123.
- Multiple chunking, config backup, filename, and save-toggle fixes.
- Colab notebook reworked. Perth watermarker patch, protobuf force-upgrade, two-step chatterbox install (
--no-depsonly for onnx/s3tokenizer), kernel proxy URL fix.
Upgrade
Standard install:
git pull
python start.py --reinstall
Docker (pick your stack):
docker compose -f docker-compose.yml up -d # cu121, RTX 30/40
docker compose -f docker-compose-cu128.yml up -d # cu128, RTX 50xx (Blackwell)
docker compose -f docker-compose-cu130.yml up -d # cu130, sm_121 / DGX Spark
docker compose -f docker-compose-rocm.yml up -d # AMD ROCm
docker compose -f docker-compose-strixhalo.yml up -d # AMD Strix Halo
docker compose -f docker-compose-cpu.yml up -d # CPU only
Compatibility notes
- Default behavior is preserved across the board.
stream,TTS_BF16,ssl_certfile/keyfile, and the LLM-preprocessor proposal (deferred) all default off. - The voice conditioning cache and the BULLET_POINT_PATTERN fix apply unconditionally and have no API surface change.
Thanks
Contributors merged into 2.0: @osos (#135), @sebastiondev (#147), @bakamono12 (#141), @D34DC3N73R (#140), @0xrushi (#125), @htrex (#117, #123), @CameronSima (#103), @sjoelund (#127), @keturn (#97), @ther3zz (#87). Reporters whose issues drove fixes: @Torlek, @jonas3245, @Qriist, @sloptimize, @doctorcolossus, @skafiend, @gnusupport, @warlocc, and many others linked above.