Highlights

Example Pipelines

torchaudio is expanding its support for models and end-to-end applications. Please file an issue on github to provide feedback on them.

Speech Recognition: Building on the addition of the Wav2Letter model for speech recognition in the last release, we added a training example pipelines for speech recognition that uses the LibriSpeech dataset.
Text-to-Speech: With the goal of supporting text-to-speech applications, we added a vocoder based on the WaveRNN model. WaveRNN model is based on the implementation from this repository. The original implementation was introduced in "Efficient Neural Audio Synthesis". We provide an example training pipeline in the example folder that uses the LibriTTS dataset added to torchaudio in this release.
Source Separation: We also support source separation with the addition of the ConvTasNet model, based on the paper "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation." An example training pipeline is provided with the wsj0-mix dataset.

I/O Improvements

As you are likely already aware from the last release we’re currently in the process of making sox_io, which ships with new features such as TorchScript support and performance improvements, the new default. If you want to benefit from these features now, we encourage you to migrate. For more information see issue #903.

Backwards Incompatible Changes

Switched all %-based string formatting to str.format to adopt changes in PyTorch, leading to improved error messages for TorchScript (#850)
Split sox_utils.list_formats() for read and write (#811)
Made directory traversal order alphabetical and breadth-first, consistent across operating systems (#814)
Changed GTZAN so that it only traverses filenames belonging to the dataset (#791)

New Features

Added ConvTasNet model (#920, #933) with pipeline (#894)
Added canonical pipeline with wav2letter (#632)
The WaveRNN model (#705, #797, #801, #810, #836) is available with a canonical pipeline (#749, #802, #831, #863)
Added all 3 releases of tedlium dataset (#882, #934, #945, #895)
Added VCTK_092 dataset (#812)
Added LibriTTS (#790, #820)
Added SPHERE support to sox_io backend (#871)
Added torchscript sox effects (#760)
Added a flag to change the interface of soundfile backend to the one identical to sox_io backend. (#922)

Improvements

Added soundfile compatibility backend. (#922)
Improved the speed of torchaudio.compliance.kaldi.fbank (#947)
Improved the speed of phaser (#660)
Added warning when a Mel filter is all zero (#914)
Added pathlib.Path support to sox_io backend (#907)
Simplified C++ registration with TORCH_LIBRARY (#840)
Merged sox effect and sox_io C++ implementation (#779)

Internal

CI: Added test to validate torchscript backward compatibility (#838)
CI: Used mocked datasets to test CMUArctic (#829), CommonVoice (#827), Speech Commands (#824), LJSpeech (#826), LibriSpeech (#825), YESNO (#792, #832)
CI: Made *nix unit test fail if C++ extension is not available (#847, #849)
CI: Separated I/O in testing. (#813, #773, #783)
CI: Added smoke tests to sox_io and sox_effects (#806)
CI: Tested utilities have been refactored (#805, #808, #809, #817, #822, #831)
Doc: Added how to run tests (#843)
Doc: Added 0.6.0 to version matrix in README (#833)

Bug Fixes

Fixed device in interactive ASR example (#900)
Fixed incorrect extension parsing (#885)
Fixed dither with noise_shaping = True (#865)
Run unit test with non-editable installation (#845), and set zip_safe = False to disable egg installation (#842)
Sorted GTZAN dataset and use on-the-fly data in GTZAN test (#819)

Deprecations

Removed istft wrapper in favor of torch.istft. (#841)
Deprecated SoxEffect and SoxEffectsChain (#787)
I/O: Deprecated sox backend. (#904)
I/O: Deprecated the current interface of soundfile. (#922)
I/O: Deprecated load_wav functions. (#905)

torchaudio 0.7.0 v0.7.0 on Python PyPI