🎙️ CosyVoice3 TTS Engine - Zero-Shot Voice Conversion!
Major new engine addition! CosyVoice3 brings powerful TTS capabilities AND zero-shot voice conversion. Previously, ChatterBox was the only zero-shot voice changer option (RVC requires training). Now you have another high-quality option with CosyVoice3 VC!
✨ New Features
CosyVoice3 Engine
- Zero-shot voice conversion (VC) - Convert any voice to match another voice without training! Another option alongside ChatterBox VC
- Iterative refinement cache - Improve VC quality through multiple passes
- Multilingual TTS - 4 core languages (Chinese, English, Japanese, Korean) plus 5 additional languages
- Three TTS modes: zero-shot, instruction, and cross-lingual voice cloning
- Model variants: standard and RL-enhanced (improved quality, set as default)
- Paralinguistic tag support for natural speech effects (laughter, breath, cough, etc.)
- Character voice switching with
[CharacterName]syntax - Language switching with
[lang:code]syntax - SRT subtitle timing support for synchronized audio generation
- Per-segment parameter control (
[seed:42],[speed:1.5]) - Live generation progress with token-by-token updates and ETA
🔧 Improvements
- Fix ChatterBox and RVC model discovery to work with custom model paths (extra_model_paths.yaml)
- Fix RVC audio chunking errors with short segments
📚 Documentation
- New CosyVoice3 paralinguistic tags guide
- Updated README with CosyVoice3 features and examples
🙏 Credits
Initial CosyVoice3 implementation by @tazztone