remsky/Kokoro-FastAPI v0.2.0 on GitHub

Complete Model Overhaul:
- Upgraded to Kokoro v1.0 model architecture, deprecated V0.19 support
- Integration with hexgrad/kokoro and hexgrad/misaki packages
- Pre-installed all multi-language support from Misaki:
  - English (en), Japanese (ja), Korean (ko), Chinese (zh), Vietnamese (vi)
  - Note: This will likely controlled via env variable in upcoming versions
- All voice packs included for supported languages, along with the original versions
Enhanced Audio Generation Features:
- Per-word timestamped caption generation
- Phoneme generation, Phoneme-Based Audio Generation (510 token cap)
Web UI Improvements:
- Weighted voice mixing
- Text file upload support
- Improved text editor, user interface changes

What's Changed

Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
Bumping PyTorch version to 2.6.0, CUDA 12.4
Adjustments to Docker workflows + Incorporating Docker Bake

Contributors

Full Changelog: v0.1.4...v0.2.0