๐ธ v0.1.0
In a nutshell, there are a ton of updates in this release. I don't know if we can cover them all here but let's try.
After this release, ๐ธ TTS stands on the following architecture.
Trainer API
for training.Synthesizer API
for inference.ModelManager API
for managing ๐ธTTS model zoo.SpeakerManager API
for managing speakers in a multi-speaker setting.- (TBI)
Exporter API
for exporting models to ONNX, TorchScript, etc. - (TBI)
Data Processing API
for making a dataset ready for training. Model API
for implementing models, compatible with all the other components above.
Updates
๐พ Code updates
-
Brand new
Trainer API
We unified all the training code in a lightweight but feature complete
Trainer API
. From now on all the ๐ธTTS
models will use this new API for training.It provides mixed precision (with Nvidia's APEX of
torch.amp
) and multi-gpu training for all the models. -
Brand new
Model API
Abstract
BaseModel
and itsBaseTTS
,BaseVocoder
child classes are used as the basis of the ๐ธTTS models now.
Any model that implements one of these classes, works seamlessly with theTrainer
andSynthesizer
. -
Brand new ๐ธTTS
recipes
.We decided to merge the recipes to the main project. Now we host recipes for the LJspeech dataset, covering all the implemented models.
So you can pick the model you want, change the parameters, and train your own model easily.Thanks to the new
Trainer API
and ๐ฉโโ๏ธCoqpit integration, we could implement these recipes with pure python. -
Updates
SpeakerManager API
TTS.utilsSpeakerManager
is now the core unit to manage speakers in a multi-speaker model and interface aSpeakerEncoder
model with thetts
andvocoder
models. -
Updated model training mechanics.
You can now use pure Python to define your model and run the training. It is useful to train models on a Jupyter
Notebook or the other python environments.We also keep the old mechanics by using
TTS/bin/train_tts.py
or ``TTS/bin/train_vocoder.py`. You just need to
change the previous training script name with one of these two based on your model.python TTS/bin/train_tacotron.py --config_path config.json
becomes
python TTS/bin/train_tts.py --config_path config.json
-
Use ๐ฉโโ๏ธCoqpit for managing model class arguments.
Now all the model arguments are defined in a
coqpit
class and imported by the model config. -
gruut
based character to phoneme conversion. (๐ @synesthesiam)As a drop-in replacement for the previous solution that is compatible with the released models. So now all these
models are functional again without version nitpicking. -
Set
test_sentences
in the config rather than providing a txt file. -
Set the maximum number of decoder steps of
Tacotron1-2
models in the config.
๐โโ๏ธ Operational Updates
- FINALLY DOCUMENTATION!! https://tts.readthedocs.io
- Enable support for Python 3.9
- Changes for PyTorch 1.9.0
๐ Model implementations
- Univnet GAN Vocoder: https://arxiv.org/pdf/2106.07889.pdf (๐ @rishikksh20)
๐ Model releases
We solved the compat issues and re-release some of the models. You can see them in the released binaries section.
You don't need to change anything. If you use v0.1.0, by default, it uses these new models.