github devnen/Chatterbox-TTS-Server v2.0.0
v2.0.0 - Complete Chatterbox Family: Multilingual + Turbo + Portable Mode + Multi-GPU

10 days ago

Chatterbox TTS Server v2.0.0

The complete Chatterbox family on every major GPU stack, behind one OpenAI-compatible API and a modern Web UI.

v1.0.0 shipped an English-only, CUDA 12.1 / 12.8 server. v2.0.0 closes the gap to a full release: all three Chatterbox models (Original, Multilingual, Turbo) hot-swappable from the UI, every consumer GPU family supported (NVIDIA cu121 / cu128 / cu130, AMD ROCm, AMD Strix Halo, Apple MPS, CPU), Portable Mode on Windows so the entire app folder can be copied to a USB stick, plus a long list of install, security, and quality fixes from community PRs and reports.

Headline changes

Complete Chatterbox lineup, hot-swappable

  • Original Chatterbox (0.5B, English, emotion exaggeration, LLaMA backbone).
  • Chatterbox Multilingual (0.5B, 23 languages, voice cloning, emotion control).
  • Chatterbox Turbo (350M, 1-step diffusion decoder, paralinguistic tags like [laugh], [cough], [chuckle]).
  • Engine selector dropdown at the top of the Web UI, the backend swaps without restart or config change. Same /tts and /v1/audio/speech surface for all three.

Portable Mode (Windows)

  • The launcher now offers a fully self-contained installation: Python 3.10 runtime, venv, models, and dependencies all live inside the project folder.
  • Copy the folder to a USB drive, zip it, move it anywhere, no Python install on the target machine. Double-click start.bat.
  • Selected by default during first-time setup, or pass --portable / --no-portable to skip the prompt.

Every GPU stack covered

  • NVIDIA CUDA 12.1 (default, RTX 30/40 series).
  • NVIDIA CUDA 12.8 (docker-compose-cu128.yml, RTX 5060 Ti / 5070 / 5070 Ti / 5080 / 5090, sm_120).
  • NVIDIA CUDA 13.0 (docker-compose-cu130.yml, sm_121, DGX Spark / GB10), via @osos in #135.
  • AMD ROCm 6.1 with PyTorch 2.5.1, two-step install so pip cannot replace ROCm wheels with CPU-only versions.
  • AMD Strix Halo (docker-compose-strixhalo.yml, ROCm 7.2, HSA_OVERRIDE), via @0xrushi in #125.
  • Apple Silicon (MPS) with Turbo float64 fix.
  • CPU on a small python:3.10-slim Docker image (no more 4 GB CUDA base layer for CPU users).

New features

  • Streaming /tts endpoint. Opt-in stream: true parameter returns a StreamingResponse that flushes WAV bytes per chunk with 20 ms crossfades. Default behavior unchanged. Useful for long-form / audiobook workloads where time-to-first-byte matters. Via @D34DC3N73R in #140.
  • Voice conditioning cache. Repeated requests against the same reference voice skip re-encoding. Real latency win for batch and OpenAI-endpoint workflows. Cleared automatically on reload_model(). Via @0xrushi in #125.
  • Opt-in BF16 inference. TTS_BF16=on (or =auto) converts T3 to bfloat16 and runs generate() under autocast. Default is off to preserve existing behavior on upgrade. Roughly 40% throughput improvement on bf16-capable cards according to the contributor.
  • HTTPS / SSL support. Optional ssl_certfile and ssl_keyfile in config.yaml.
  • /api/unload endpoint. Releases GPU memory without restarting the server. Via @CameronSima in #103.
  • /v1/audio/voices endpoint. OpenAI-compatible voice listing with optional model parameter. Via @sjoelund in #127.
  • Dynamic language selector. UI populates the language dropdown from SUPPORTED_LANGUAGES exposed by the multilingual engine. Via @htrex in #117.
  • Paralinguistic tag presets. New ui/presets.yaml entries demonstrating [laugh], [cough], [chuckle] for Turbo.

Security

  • CWE-22 path traversal fixed in /tts and /v1/audio/speech. Voice file parameters now go through utils.safe_resolve_within() and reject .. / absolute paths with HTTP 400. Default-deny posture. Via @sebastiondev in #147.

Install fixes (the long list)

  • Chatterbox installed with --no-deps across every install path. Eliminates ONNX source builds, torch downgrade conflicts, and CMake errors that affected most users on first install.
  • All chatterbox dependencies (conformer, diffusers, transformers, s3tokenizer, etc.) now listed and pinned explicitly in each requirements file.
  • onnx==1.16.0 pinned everywhere so pip uses pre-built wheels.
  • Apple Silicon Turbo crash fixed (Cannot convert a MPS Tensor to float64). Forces float32 in s3tokenizer and voice_encoder. Patch baked into the chatterbox-v2 fork and also applied as a post-install patch in start.py for users on other chatterbox versions. Reported by @jonas3245 in #93.
  • ROCm requirements switched to PyTorch's official ROCm 6.1 wheel index. Two-step install (requirements-rocm-init.txt first) prevents pip from replacing ROCm torch with CPU-only wheels.
  • Python 3.10 is now hard-required, not just recommended. It is the only version with pre-built wheels for the full dependency tree. Portable Mode handles this automatically on Windows.
  • config.yaml default device changed from cuda to auto so the engine picks CUDA / MPS / CPU correctly.
  • Lightweight Dockerfile.cpu based on python:3.10-slim instead of the 4 GB+ NVIDIA CUDA base image.
  • Deprecated version: keys removed from all docker-compose files.
  • protobuf force-upgraded after requirements install to keep onnx and descript-audiotools compatible.
  • FastAPI pinned <0.116 to avoid the starlette 1.0 TemplateResponse breaking change.
  • numpy constraint relaxed to <2.0 for dependency compatibility.

Bug fixes

  • Chunking error on text with stray dashes (#144). Bullet-point detection regex was matching \s+ after -, including newlines. A line like -Within the Tunnels of Alexander- followed by a paragraph break got read as a single bullet item that swallowed the rest of the paragraph, producing one huge unsplit chunk that crashed the engine. The pattern now requires a regular space or tab after the bullet char so genuine - item and 1. item still split, and stray dashes do not. Reported by @Torlek.
  • Colab Web UI proxy URL. Fixed using kernel.proxyPort. Via @bakamono12 in #141.
  • Frontend error serialization. Fixed via @htrex in #123.
  • Multiple chunking, config backup, filename, and save-toggle fixes.
  • Colab notebook reworked. Perth watermarker patch, protobuf force-upgrade, two-step chatterbox install (--no-deps only for onnx/s3tokenizer), kernel proxy URL fix.

Upgrade

Standard install:

git pull
python start.py --reinstall

Docker (pick your stack):

docker compose -f docker-compose.yml          up -d   # cu121, RTX 30/40
docker compose -f docker-compose-cu128.yml    up -d   # cu128, RTX 50xx (Blackwell)
docker compose -f docker-compose-cu130.yml    up -d   # cu130, sm_121 / DGX Spark
docker compose -f docker-compose-rocm.yml     up -d   # AMD ROCm
docker compose -f docker-compose-strixhalo.yml up -d  # AMD Strix Halo
docker compose -f docker-compose-cpu.yml      up -d   # CPU only

Compatibility notes

  • Default behavior is preserved across the board. stream, TTS_BF16, ssl_certfile/keyfile, and the LLM-preprocessor proposal (deferred) all default off.
  • The voice conditioning cache and the BULLET_POINT_PATTERN fix apply unconditionally and have no API surface change.

Thanks

Contributors merged into 2.0: @osos (#135), @sebastiondev (#147), @bakamono12 (#141), @D34DC3N73R (#140), @0xrushi (#125), @htrex (#117, #123), @CameronSima (#103), @sjoelund (#127), @keturn (#97), @ther3zz (#87). Reporters whose issues drove fixes: @Torlek, @jonas3245, @Qriist, @sloptimize, @doctorcolossus, @skafiend, @gnusupport, @warlocc, and many others linked above.

Don't miss a new Chatterbox-TTS-Server release

NewReleases is sending notifications on new releases.