- Complete Model Overhaul:
- Upgraded to Kokoro v1.0 model architecture, deprecated V0.19 support
- Integration with hexgrad/kokoro and hexgrad/misaki packages
- Pre-installed all multi-language support from Misaki:
- English (en), Japanese (ja), Korean (ko), Chinese (zh), Vietnamese (vi)
- Note: This will likely controlled via env variable in upcoming versions
- All voice packs included for supported languages, along with the original versions
- Enhanced Audio Generation Features:
- Per-word timestamped caption generation
- Phoneme generation, Phoneme-Based Audio Generation (510 token cap)
- Web UI Improvements:
- Weighted voice mixing
- Text file upload support
- Improved text editor, user interface changes
What's Changed
- Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
- Bumping PyTorch version to 2.6.0, CUDA 12.4
- Adjustments to Docker workflows + Incorporating Docker Bake
Contributors
Full Changelog: v0.1.4...v0.2.0