torchaudio 0.4 improves on current transformations, datasets, and backend support.
- We introduce an interactive speech recognition demo. (#266, #229, #248)
- SoX is now optional, and a new extensible backend dispatch mechanism exposes SoundFile as an alternative to SoX.
- The interface for datasets has been unified. This enables the addition of two large datasets: LibriSpeech and Common Voice.
- New filters such as biquad, data augmentation such as time and frequency masking, and transforms such as gain and dither, and new feature computation such as deltas, are now available.
- Transformations now support batches and are jitable.
We would like to thank again our contributors and the wider community for their significant contributions to this release. In particular we'd like to thank @keunwoochoi, @ksanjeevan, and all the other maintainers and contributors of torchaudio-contrib for their significant and valuable additions around augmentations (#285) and batching (#327).
Breaking Changes
- torchaudio now requires PyTorch 1.3.0 or newer, see https://pytorch.org/ for installation instructions. (#312)
- We make jit compilation optional for functions and use nn.Module where possible. (#314, #326, #342, #369)
- By unifying the interface for datasets, we changed the interface for VCTK and YESNO (#303, #316). In particular, the construction parameters
downsample
,transform
,target_transform
, andreturn_dict
are being deprecated. - SoxEffectsChain.EFFECTS_AVAILABLE replaced by SoxEffectsChain().EFFECTS_AVAILABLE (#355)
- This is the last version to support Python 2.
New Features
- SoX is now optional, and a new extensible backend dispatch mechanism exposes SoundFile as an alternative to SoX. This makes it possible to use torchaudio even when SoX or SoundFile are not installed or available. (#355)
- We now have a unified dataset interface that loads in memory only one item at a time enabling new large datasets: LibriSpeech and CommonVoice. (#303, #316, #330)
- We introduce a pitch detection algorithm:
torchaudio.functional.detect_pitch_frequency
. (#313, #322) - We offer data augmentations in
torchaudio.transforms
:TimeStretch
,FrequencyMasking
,TimeMasking
. (#285, #333, #348) - We introduce a complex norm transform:
torchaudio.transform.ComplexNorm
. (#285, #333) - We now have a new audio feature generation for computing deltas:
torchaudio.functional.compute_deltas
. (#268, #326) - We introduce
torchaudio.functional.gain
andtorchaudio.functional.dither
(#319, #360). We welcome work to continue the effort to implement features available in SoX, see #260. - We now include
equalizer_biquad
(#315, #340),lowpass_biquad
,highpass_biquad
(#275),lfilter
, andbiquad
(#275, #291, #326) intorchaudio.functional
. - MFCC is available as
torchaudio.functional.mfcc
. (#228)
Improvements
- We now support batching in transforms. (#327, #337, #404)
- Functions are now jitable, and nn.Module is used where possible. (#314, #326, #342, #362, #369, #395)
- Downloads of large files are now automatically resumed with new download function. (#320)
- New tests for ISTFT are added. (#279)
- We introduce nightly builds. (#301)
- We now have smoke tests for builds. (#346, #359)
Bug Fixes
- Fix mismatch between
MelScale
and librosa. (#294) - Fix
torchaudio.compliance.kaldi.resample_waveform
where internal variables where not moved to the GPU when used. (#277) - Fix a bug that occurred when importing torchaudio built outside of a git repository. (#276)
- Fix
istft
where thedtype
anddevice
of parameters were not created on the same device as the tensor provided by the user. (#264) - Fix size mismatch when saving and loading from state dictionary (
load_state_dict
). (#246) - Clarified internal naming convention within transforms and functionals. (#298)
- Fix build script to be more tolerant to download drops. (#280, #284, #305)
- Correct documentation for SoxEffectsChain. (#283)
- Fix resample error with cuda tensors. (#277)
- Fix error when importing version outside of git. (#276)
- Fix missing asound in linux build. (#254)
- Fix deprecated torch. (#254)
- Fix link in README. (#253)
- Fix window device in ISTFT. (#240)
- Documentation: Fix range in documentation for
torchaudio.load
to [-1, 1]. (#283)