kyutai-labs/pocket-tts v2.0.0 on GitHub

New models! 🎉 🎉 🎉

This release notably includes new models. We now have a --language argument to select the pre-trained model from the cli and a language= argument in TTSModel.load_model(). Here is the list of all available models/languages:

english_2026-01: The only model that was available until now. 6 layers.
english_2026-04: The new and improved english model. Supports better short sentences and has better voice cloning. 6 layers.
english: This is just an alias for english_2026-04.
italian: Our new pocket-tts in italian! 6 layers.
italian_24l: The undistilled italian model. We would love reports if you find bugs that are present in the italian model but not in the italian_24l model. 24 layers.
german: Our new pocket-tts in german! 6 layers.
german_24l: The undistilled german model. We would love reports if you find bugs that are present in the german model but not in the german_24l model. 24 layers.
spanish: Our new pocket-tts in spanish! 6 layers.
spanish_24l: The undistilled spanish model. We would love reports if you find bugs that are present in the spanish model but not in the spanish_24l model. 24 layers.
portuguese: Our new pocket-tts in portuguese! 6 layers.
portuguese_24l: The undistilled portuguese model. We would love reports if you find bugs that are present in the portuguese model but not in the portuguese_24l model. 24 layers.
french_24l: The undistilled french model. The distillation of the french model has been more painful than anticipated due to the data quality. While we fix those issue, we want to unblock the french pocket-tts community, which is why we release the undistilled version here. 24 layers.

If the 24 layers are too slow to run in real-time on your CPU, you can try the new --quantize option! You can expect ~30% perf improvements in most cases.

The pre-defined voices are all english. For other languages, we recommend using the voice cloning and use a voice prompt that correspond to your language.

Note for maintainers of alternative implementations

This section should be especially helpful to @LaurentMazare @KevinAHM @babybirdprd @ekzhang @jishnuvenugopal @VolgaGerm @csukuangfj @TheAjaykrishnanR

The pocket-tts community has been amazing! We were blown away by the number of alternative implementation of pocket-tts in other languages and frameworks! We want to make it easy for them to adapt their code to the new models. I added comments to the commit that did architecture changes. If you report the changes done next to each comment, that should be enough to make your alternative implementation work!

Here is the list

Notable pull requests:

Changed the implementation to fuse the transformers by @darknight054 in #85
Raise minimum huggingface_hub to 0.13.0 for consistent offline behavior by @joshwhiton in #137
Add int8 dynamic quantization support by @nabil-tazi in #147
Split long sentences on commas to prevent skipped words by @costajohnt in #143
Add french, italian, portuguese, spanish, german by @gabrieldemarmiesse in #155

New Contributors

@ai-joe-git made their first contribution in #136
@joshwhiton made their first contribution in #137
@alkmei made their first contribution in #138
@dooart made their first contribution in #139
@tonelord made their first contribution in #140
@VolgaGerm made their first contribution in #142
@csukuangfj made their first contribution in #150
@nabil-tazi made their first contribution in #147
@costajohnt made their first contribution in #143
@markd89 made their first contribution in #157
@TheAjaykrishnanR made their first contribution in #159
@dodgyrabbit made their first contribution in #160

Many thanks to the community for being so awesome! ❤️

Full Changelog: v1.1.1...v2.0.0