github remsky/Kokoro-FastAPI v0.2.0

latest releases: v0.2.4, v0.2.3, v0.2.2...
7 months ago
  • Complete Model Overhaul:
    • Upgraded to Kokoro v1.0 model architecture, deprecated V0.19 support
    • Integration with hexgrad/kokoro and hexgrad/misaki packages
    • Pre-installed all multi-language support from Misaki:
      • English (en), Japanese (ja), Korean (ko), Chinese (zh), Vietnamese (vi)
      • Note: This will likely controlled via env variable in upcoming versions
    • All voice packs included for supported languages, along with the original versions
  • Enhanced Audio Generation Features:
    • Per-word timestamped caption generation
    • Phoneme generation, Phoneme-Based Audio Generation (510 token cap)
  • Web UI Improvements:
    • Weighted voice mixing
    • Text file upload support
    • Improved text editor, user interface changes

What's Changed

  • Combine Voices endpoint now returns a .pt file, with generation combinations generated on the fly otherwise
  • Bumping PyTorch version to 2.6.0, CUDA 12.4
  • Adjustments to Docker workflows + Incorporating Docker Bake

Contributors

Full Changelog: v0.1.4...v0.2.0

Don't miss a new Kokoro-FastAPI release

NewReleases is sending notifications on new releases.