github kyutai-labs/pocket-tts v2.0.0

12 hours ago

New models! 🎉 🎉 🎉

pocket-tts-multilingual-v2 (2)

Credit: @vvolhejn

This release notably includes new models. We now have a --language argument to select the pre-trained model from the cli and a language= argument in TTSModel.load_model(). Here is the list of all available models/languages:

  • english_2026-01: The only model that was available until now. 6 layers.
  • english_2026-04: The new and improved english model. Supports better short sentences and has better voice cloning. 6 layers.
  • english: This is just an alias for english_2026-04.
  • italian: Our new pocket-tts in italian! 6 layers.
  • italian_24l: The undistilled italian model. We would love reports if you find bugs that are present in the italian model but not in the italian_24l model. 24 layers.
  • german: Our new pocket-tts in german! 6 layers.
  • german_24l: The undistilled german model. We would love reports if you find bugs that are present in the german model but not in the german_24l model. 24 layers.
  • spanish: Our new pocket-tts in spanish! 6 layers.
  • spanish_24l: The undistilled spanish model. We would love reports if you find bugs that are present in the spanish model but not in the spanish_24l model. 24 layers.
  • portuguese: Our new pocket-tts in portuguese! 6 layers.
  • portuguese_24l: The undistilled portuguese model. We would love reports if you find bugs that are present in the portuguese model but not in the portuguese_24l model. 24 layers.
  • french_24l: The undistilled french model. The distillation of the french model has been more painful than anticipated due to the data quality. While we fix those issue, we want to unblock the french pocket-tts community, which is why we release the undistilled version here. 24 layers.

If the 24 layers are too slow to run in real-time on your CPU, you can try the new --quantize option! You can expect ~30% perf improvements in most cases.

The pre-defined voices are all english. For other languages, we recommend using the voice cloning and use a voice prompt that correspond to your language.

Note for maintainers of alternative implementations

This section should be especially helpful to @LaurentMazare @KevinAHM @babybirdprd @ekzhang @jishnuvenugopal @VolgaGerm @csukuangfj @TheAjaykrishnanR

The pocket-tts community has been amazing! We were blown away by the number of alternative implementation of pocket-tts in other languages and frameworks! We want to make it easy for them to adapt their code to the new models. I added comments to the commit that did architecture changes. If you report the changes done next to each comment, that should be enough to make your alternative implementation work!

Here is the list

Notable pull requests:

New Contributors

Many thanks to the community for being so awesome! ❤️

Full Changelog: v1.1.1...v2.0.0

Don't miss a new pocket-tts release

NewReleases is sending notifications on new releases.