devnen/Chatterbox-TTS-Server v2.0.0 on GitHub

Chatterbox TTS Server v2.0.0

The complete Chatterbox family on every major GPU stack, behind one OpenAI-compatible API and a modern Web UI.

v1.0.0 shipped an English-only, CUDA 12.1 / 12.8 server. v2.0.0 closes the gap to a full release: all three Chatterbox models (Original, Multilingual, Turbo) hot-swappable from the UI, every consumer GPU family supported (NVIDIA cu121 / cu128 / cu130, AMD ROCm, AMD Strix Halo, Apple MPS, CPU), Portable Mode on Windows so the entire app folder can be copied to a USB stick, plus a long list of install, security, and quality fixes from community PRs and reports.

Headline changes

Complete Chatterbox lineup, hot-swappable

Original Chatterbox (0.5B, English, emotion exaggeration, LLaMA backbone).
Chatterbox Multilingual (0.5B, 23 languages, voice cloning, emotion control).
Chatterbox Turbo (350M, 1-step diffusion decoder, paralinguistic tags like [laugh], [cough], [chuckle]).
Engine selector dropdown at the top of the Web UI, the backend swaps without restart or config change. Same /tts and /v1/audio/speech surface for all three.

Portable Mode (Windows)

The launcher now offers a fully self-contained installation: Python 3.10 runtime, venv, models, and dependencies all live inside the project folder.
Copy the folder to a USB drive, zip it, move it anywhere, no Python install on the target machine. Double-click start.bat.
Selected by default during first-time setup, or pass --portable / --no-portable to skip the prompt.

Every GPU stack covered

NVIDIA CUDA 12.1 (default, RTX 30/40 series).
NVIDIA CUDA 12.8 (docker-compose-cu128.yml, RTX 5060 Ti / 5070 / 5070 Ti / 5080 / 5090, sm_120).
NVIDIA CUDA 13.0 (docker-compose-cu130.yml, sm_121, DGX Spark / GB10), via @osos in #135.
AMD ROCm 6.1 with PyTorch 2.5.1, two-step install so pip cannot replace ROCm wheels with CPU-only versions.
AMD Strix Halo (docker-compose-strixhalo.yml, ROCm 7.2, HSA_OVERRIDE), via @0xrushi in #125.
Apple Silicon (MPS) with Turbo float64 fix.
CPU on a small python:3.10-slim Docker image (no more 4 GB CUDA base layer for CPU users).

New features

Streaming /tts endpoint. Opt-in stream: true parameter returns a StreamingResponse that flushes WAV bytes per chunk with 20 ms crossfades. Default behavior unchanged. Useful for long-form / audiobook workloads where time-to-first-byte matters. Via @D34DC3N73R in #140.
Voice conditioning cache. Repeated requests against the same reference voice skip re-encoding. Real latency win for batch and OpenAI-endpoint workflows. Cleared automatically on reload_model(). Via @0xrushi in #125.
Opt-in BF16 inference. TTS_BF16=on (or =auto) converts T3 to bfloat16 and runs generate() under autocast. Default is off to preserve existing behavior on upgrade. Roughly 40% throughput improvement on bf16-capable cards according to the contributor.
HTTPS / SSL support. Optional ssl_certfile and ssl_keyfile in config.yaml.
/api/unload endpoint. Releases GPU memory without restarting the server. Via @CameronSima in #103.
/v1/audio/voices endpoint. OpenAI-compatible voice listing with optional model parameter. Via @sjoelund in #127.
Dynamic language selector. UI populates the language dropdown from SUPPORTED_LANGUAGES exposed by the multilingual engine. Via @htrex in #117.
Paralinguistic tag presets. New ui/presets.yaml entries demonstrating [laugh], [cough], [chuckle] for Turbo.

Security

CWE-22 path traversal fixed in /tts and /v1/audio/speech. Voice file parameters now go through utils.safe_resolve_within() and reject .. / absolute paths with HTTP 400. Default-deny posture. Via @sebastiondev in #147.

Install fixes (the long list)

Chatterbox installed with --no-deps across every install path. Eliminates ONNX source builds, torch downgrade conflicts, and CMake errors that affected most users on first install.
All chatterbox dependencies (conformer, diffusers, transformers, s3tokenizer, etc.) now listed and pinned explicitly in each requirements file.
onnx==1.16.0 pinned everywhere so pip uses pre-built wheels.
Apple Silicon Turbo crash fixed (Cannot convert a MPS Tensor to float64). Forces float32 in s3tokenizer and voice_encoder. Patch baked into the chatterbox-v2 fork and also applied as a post-install patch in start.py for users on other chatterbox versions. Reported by @jonas3245 in #93.
ROCm requirements switched to PyTorch's official ROCm 6.1 wheel index. Two-step install (requirements-rocm-init.txt first) prevents pip from replacing ROCm torch with CPU-only wheels.
Python 3.10 is now hard-required, not just recommended. It is the only version with pre-built wheels for the full dependency tree. Portable Mode handles this automatically on Windows.
config.yaml default device changed from cuda to auto so the engine picks CUDA / MPS / CPU correctly.
Lightweight Dockerfile.cpu based on python:3.10-slim instead of the 4 GB+ NVIDIA CUDA base image.
Deprecated version: keys removed from all docker-compose files.
protobuf force-upgraded after requirements install to keep onnx and descript-audiotools compatible.
FastAPI pinned <0.116 to avoid the starlette 1.0 TemplateResponse breaking change.
numpy constraint relaxed to <2.0 for dependency compatibility.

Bug fixes

Chunking error on text with stray dashes (#144). Bullet-point detection regex was matching \s+ after -, including newlines. A line like -Within the Tunnels of Alexander- followed by a paragraph break got read as a single bullet item that swallowed the rest of the paragraph, producing one huge unsplit chunk that crashed the engine. The pattern now requires a regular space or tab after the bullet char so genuine - item and 1. item still split, and stray dashes do not. Reported by @Torlek.
Colab Web UI proxy URL. Fixed using kernel.proxyPort. Via @bakamono12 in #141.
Frontend error serialization. Fixed via @htrex in #123.
Multiple chunking, config backup, filename, and save-toggle fixes.
Colab notebook reworked. Perth watermarker patch, protobuf force-upgrade, two-step chatterbox install (--no-deps only for onnx/s3tokenizer), kernel proxy URL fix.

Upgrade

Standard install:

git pull
python start.py --reinstall

Docker (pick your stack):

docker compose -f docker-compose.yml          up -d   # cu121, RTX 30/40
docker compose -f docker-compose-cu128.yml    up -d   # cu128, RTX 50xx (Blackwell)
docker compose -f docker-compose-cu130.yml    up -d   # cu130, sm_121 / DGX Spark
docker compose -f docker-compose-rocm.yml     up -d   # AMD ROCm
docker compose -f docker-compose-strixhalo.yml up -d  # AMD Strix Halo
docker compose -f docker-compose-cpu.yml      up -d   # CPU only

Compatibility notes

Default behavior is preserved across the board. stream, TTS_BF16, ssl_certfile/keyfile, and the LLM-preprocessor proposal (deferred) all default off.
The voice conditioning cache and the BULLET_POINT_PATTERN fix apply unconditionally and have no API surface change.

Thanks

Contributors merged into 2.0: @osos (#135), @sebastiondev (#147), @bakamono12 (#141), @D34DC3N73R (#140), @0xrushi (#125), @htrex (#117, #123), @CameronSima (#103), @sjoelund (#127), @keturn (#97), @ther3zz (#87). Reporters whose issues drove fixes: @Torlek, @jonas3245, @Qriist, @sloptimize, @doctorcolossus, @skafiend, @gnusupport, @warlocc, and many others linked above.

devnen/Chatterbox-TTS-Server v2.0.0 v2.0.0 - Complete Chatterbox Family: Multilingual + Turbo + Portable Mode + Multi-GPU on GitHub