New Model additions
WavLM
WavLM was proposed in WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
WavLM sets a new SOTA on the SUPERB benchmark.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=wavlm
- Add WavLM by @patrickvonplaten in #14354
Wav2Vec2Phoneme
Wav2Vec2Phoneme was proposed in Simple and Effective Zero-shot Cross-lingual Phoneme Recognition by Qiantong Xu, Alexei Baevski, Michael Auli.
Wav2Vec2Phoneme allows to do phoneme classification as part of automatic speech recognition
- [Wav2Vec2 Phoneme] Let phonemizer lang default to tokenizer's settings by @patrickvonplaten in #14829
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=phoneme-recognition
UniSpeech-SAT
Unispeech-SAT was proposed in UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
UniSpeech-SAT is especially good at speaker related tasks.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech-sat
UniSpeech
Unispeech was proposed in UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
Three new models are released as part of the ImageGPT integration: ImageGPTModel
, ImageGPTForCausalImageModeling
, ImageGPTForImageClassification
, in PyTorch.
Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech
New Tasks
Speaker Diarization and Verification
Wav2Vec2-like architecture now have a speaker diarization and speaker verification head added to their architectures.
You can try out the new task here: https://huggingface.co/spaces/microsoft/wavlm-speaker-verification
What's Changed
- Move import to avoid circular import by @sgugger in #14787
- PoC for conserving old links by @sgugger in #14754
- Removes images to put them in a dataset by @LysandreJik in #14781
- Post sphinx-clean up and contributing guide updates by @sgugger in #14790
- Fix the build documentation job by @sgugger in #14788
- Update CONTRIBUTING.md by @kamalkraj in #14799
- Update CONTRIBUTING.md by @kamalkraj in #14800
- Train step fix by @Rocketknight1 in #14796
- [Generate] Make generate multi-modal by @patrickvonplaten in #14784
- Remove
require_datasets
testing utility by @LysandreJik in #14795 - [WavLM] Correct position bias computation by @patrickvonplaten in #14805
- Fix Perceiver multi GPU test by @NielsRogge in #14810
- [WavLM] Layerdrop is not allowed for first layer by @patrickvonplaten in #14811
- [Generate] Correct input_ids detection by @patrickvonplaten in #14815
- Implement head_mask for Flax BERT and other models copied from BERT by @stancld in #14620
- Convert rst to mdx bert by @LysandreJik in #14806
- Wav2Vec2 meets phonemes by @patrickvonplaten in #14353
- [ImageGPT] Deprecate pixel_values input name to input_ids by @patrickvonplaten in #14801
- [Seq2SeqTrainer] Remove model input name hack by @patrickvonplaten in #14802
- [WavLM] Fix slow tests by @patrickvonplaten in #14845
- Add SD and SV heads for WavLM by @anton-l in #14847
- Add an argument to set bucket_cap_mb for PyTorch DDP by @changlan in #14756
- Update CONTRIBUTING.md by @kamalkraj in #14835
- Fix dead link to benchmarks.ipynb by @DerekChia in #14842
- [Perceiver] Skip multi-gpu tests for now by @patrickvonplaten in #14813
- Add 'with torch.no_grad()' to DeBERTa integration test forward pass by @henholm in #14821
- Add 'with torch.no_grad()' to BERT integration test forward pass by @henholm in #14820
- Add a main_input_name attribute to all models by @sgugger in #14803
- [doc] typo by @stas00 in #14849
- [logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS by @stas00 in #14669
- Make the onnx submodule init lazy by @sgugger in #14855
- Convert docstrings of modeling files by @sgugger in #14850
- [Bart] better error message by @patrickvonplaten in #14854
- Only create the model card on process 0 by @sgugger in #14857
- [ASR example] Improve example + add more examples by @patrickvonplaten in #14848
- Fix the value error typo of AdamW's betas' valid values checking by @dourgey in #14780
- Add custom
stopping_criteria
andlogits_processor
togenerate
by @lvwerra in #14779 - Replace commit sha by commit url for update jobs by @sgugger in #14852
- [examples/summarization] deal with None in data records by @stas00 in #14816
- [doc porting] several docs by @stas00 in #14858
- Mass conversion of documentation from rst to Markdown by @sgugger in #14866
- Fix FLAX_MULTIPLE_CHOICE_SAMPLE typo by @mishig25 in #14871
- Fixes in marian doc by @sgugger in #14872
- Fix
FlaxMarianMTModel
return block. by @sgugger in #14873 - Fix doc mistakes by @sgugger in #14874
- Convert model files from rst to mdx by @LysandreJik in #14865
- update the arguments
add_prefix_space
andtrim_offsets
inbackend_tokenizer.post_processor
ofRobertaTokenizerFast
by @SaulLu in #14752 - Feature/fix slow test in mluke by @Ryou0634 in #14749
- Updated deberta attention by @guillaume-be in #14625
- IterableDatasetShard should use per device batch size instead of real… by @SysuCharon in #14714
- Fix Perceiver code example by @NielsRogge in #14879
- Fix pytorch image classification example by @mariosasko in #14883
- Onnx enable tasks for supported models (part 2) by @michaelbenayoun in #14700
- Properly indent return block by @sgugger in #14887
New Contributors
- @changlan made their first contribution in #14756
- @DerekChia made their first contribution in #14842
- @henholm made their first contribution in #14821
- @dourgey made their first contribution in #14780
- @SysuCharon made their first contribution in #14714
Full Changelog: v4.14.0...v4.15.0