github coqui-ai/TTS v0.1.0

latest releases: v0.22.0, v0.21.3, v0.21.2...
2 years ago

๐Ÿธ v0.1.0

In a nutshell, there are a ton of updates in this release. I don't know if we can cover them all here but let's try.

After this release, ๐Ÿธ TTS stands on the following architecture.

  • Trainer API for training.
  • Synthesizer API for inference.
  • ModelManager API for managing ๐ŸธTTS model zoo.
  • SpeakerManager API for managing speakers in a multi-speaker setting.
  • (TBI) Exporter API for exporting models to ONNX, TorchScript, etc.
  • (TBI) Data Processing API for making a dataset ready for training.
  • Model API for implementing models, compatible with all the other components above.

Updates

๐Ÿ’พ Code updates

  • Brand new Trainer API

    We unified all the training code in a lightweight but feature complete Trainer API. From now on all the ๐ŸธTTS
    models will use this new API for training.

    It provides mixed precision (with Nvidia's APEX of torch.amp) and multi-gpu training for all the models.

  • Brand new Model API

    Abstract BaseModel and its BaseTTS, BaseVocoder child classes are used as the basis of the ๐ŸธTTS models now.
    Any model that implements one of these classes, works seamlessly with the Trainer and Synthesizer.

  • Brand new ๐ŸธTTS recipes.

    We decided to merge the recipes to the main project. Now we host recipes for the LJspeech dataset, covering all the implemented models.
    So you can pick the model you want, change the parameters, and train your own model easily.

    Thanks to the new Trainer API and ๐Ÿ‘ฉโ€โœˆ๏ธCoqpit integration, we could implement these recipes with pure python.

  • Updates SpeakerManager API

    TTS.utilsSpeakerManager is now the core unit to manage speakers in a multi-speaker model and interface a SpeakerEncoder model with the tts and vocoder models.

  • Updated model training mechanics.

    You can now use pure Python to define your model and run the training. It is useful to train models on a Jupyter
    Notebook or the other python environments.

    We also keep the old mechanics by using TTS/bin/train_tts.py or ``TTS/bin/train_vocoder.py`. You just need to
    change the previous training script name with one of these two based on your model.

    python TTS/bin/train_tacotron.py --config_path config.json

    becomes

    python TTS/bin/train_tts.py --config_path config.json
  • Use ๐Ÿ‘ฉโ€โœˆ๏ธCoqpit for managing model class arguments.

    Now all the model arguments are defined in a coqpit class and imported by the model config.

  • gruut based character to phoneme conversion. (๐Ÿ‘‘ @synesthesiam)

    As a drop-in replacement for the previous solution that is compatible with the released models. So now all these
    models are functional again without version nitpicking.

  • Set test_sentences in the config rather than providing a txt file.

  • Set the maximum number of decoder steps of Tacotron1-2 models in the config.

๐Ÿƒโ€โ™€๏ธ Operational Updates

๐Ÿ… Model implementations

๐Ÿš€ Model releases

We solved the compat issues and re-release some of the models. You can see them in the released binaries section.

You don't need to change anything. If you use v0.1.0, by default, it uses these new models.

Don't miss a new TTS release

NewReleases is sending notifications on new releases.