🌍 ChatterBox Language Expansion
New Languages Added (7 new, 11 total models)
- 🇮🇹 Italian - Bilingual model with automatic
[it]prefix for Italian text - 🇫🇷 French - 1400+ hours training dataset with voice cloning
- 🇷🇺 Russian - Complete model with training artifacts
- 🇦🇲 Armenian - Full model with unique architecture
- 🇬🇪 Georgian - Full model with specialized features
- 🇯🇵 Japanese - With proper Japanese tokenizer support
- 🇰🇷 Korean - With Korean tokenizer support
- 🇩🇪 German variants:
- havok2 - Multi-speaker hybrid, best quality
- SebastianBodza - Emotion control with
<haha>,<wow>tags
Critical Fixes
- Fixed tokenizer discovery for non-Latin languages (Japanese/Korean were using English tokenizer)
- Fixed vocabulary size mismatches for extended vocabularies
- Fixed state dict key format issues for incomplete models
- Italian prefix system for proper bilingual support
Technical Improvements
- Unified model architecture support for Italian single-checkpoint model
- Smart tokenizer discovery prioritizing language-specific files
- Extended vocabulary support (1500 tokens for Italian)
All models auto-download from HuggingFace on first use.