v4.4.0: S2T, M2M100, I-BERT, mBART-50, DeBERTa-v2, XLSR-Wav2Vec2
SpeechToText
Two new models are released as part of the S2T implementation: Speech2TextModel
and Speech2TextForConditionalGeneration
, in PyTorch.
Speech2Text is a speech model that accepts a float tensor of log-mel filter-bank features extracted from the speech signal. It’s a transformer-based seq2seq model, so the transcripts/translations are generated autoregressively.
The Speech2Text model was proposed in fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text
- Speech2TextTransformer #10175 (@patil-suraj)
M2M100
Two new models are released as part of the M2M100 implementation: M2M100Model
and M2M100ForConditionalGeneration
, in PyTorch.
M2M100 is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation tasks.
The M2M100 model was proposed in Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=m2m_100
- Add m2m100 #10236 (@patil-suraj)
I-BERT
Six new models are released as part of the I-BERT implementation: IBertModel
, IBertForMaskedLM
, IBertForSequenceClassification
, IBertForMultipleChoice
, IBertForTokenClassification
and IBertForQuestionAnswering
, in PyTorch.
I-BERT is a quantized version of RoBERTa running inference up to four times faster.
The I-BERT framework in PyTorch allows to identify the best parameters for quantization. Once the model is exported in a framework that supports int8 execution (such as TensorRT), a speedup of up to 4x is visible, with no loss in performance thanks to the parameter search.
The I-BERT model was proposed in I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney and Kurt Keutzer.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=ibert
- I-BERT model support #10153 (@kssteven418)
- [IBert] Correct link to paper #10445 (@patrickvonplaten)
- Add I-BERT to README #10462 (@LysandreJik)
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=speech_to_text
mBART-50
MBart-50 is created using the original mbart-large-cc25 checkpoint by extending its embedding layers with randomly initialized vectors for an extra set of 25 language tokens and then pretrained on 50 languages.
The MBart model was presented in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=mbart-50
- Add mBART-50 #10154 (@patil-suraj)
DeBERTa-v2
Fixe new models are released as part of the DeBERTa-v2 implementation: DebertaV2Model
, DebertaV2ForMaskedLM
, DebertaV2ForSequenceClassification
, DeberaV2ForTokenClassification
and DebertaV2ForQuestionAnswering
, in PyTorch.
The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa.
Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=deberta-v2
- Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… #10018 (@BigBird01)
- DeBERTa-v2 fixes #10328 (@LysandreJik)
Wav2Vec2
XLSR-Wav2Vec2
The XLSR-Wav2Vec2 model was proposed in Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
The checkpoint corresponding to that model is added to the model hub: facebook/
wav2vec2-large-xlsr-53
- [XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models #10648 (@patrickvonplaten)
Training script
A fine-tuning script showcasing how the Wav2Vec2 model can be trained has been added.
- Add Fine-Tuning for Wav2Vec2 #10145 (@patrickvonplaten)
Further improvements
The Wav2Vec2 architecture becomes more stable as several changes are done to its architecture. This introduces feature extractors and feature processors as the pre-processing aspect of multi-modal speech models.
- Deprecate Wav2Vec2ForMaskedLM and add Wav2Vec2ForCTC #10089 (@patrickvonplaten)
- Fix example in Wav2Vec2 documentation #10096 (@abhishekkrthakur)
- [Wav2Vec2] Remove unused config #10457 (@patrickvonplaten)
- [Wav2Vec2FeatureExtractor] smal fixes #10455 (@patil-suraj)
- [Wav2Vec2] Improve Tokenizer & Model for batched inference #10117 (@patrickvonplaten)
- [PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer #10324 (@patrickvonplaten)
- [Wav2Vec2 Example Script] Typo #10547 (@patrickvonplaten)
- [Wav2Vec2] Make wav2vec2 test deterministic #10714 (@patrickvonplaten)
- [Wav2Vec2] Fix documentation inaccuracy #10694 (@MikeG112)
AMP & XLA Support for TensorFlow models
Most of the TensorFlow models are now compatible with automatic mixed precision and have XLA support.
- Add AMP for TF Albert #10141 (@jplu)
- Unlock XLA test for TF ConvBert #10207 (@jplu)
- Making TF BART-like models XLA and AMP compliant #10191 (@jplu)
- Making TF XLM-like models XLA and AMP compliant #10211 (@jplu)
- Make TF CTRL compliant with XLA and AMP #10209 (@jplu)
- Making TF GPT2 compliant with XLA and AMP #10230 (@jplu)
- Making TF Funnel compliant with AMP #10216 (@jplu)
- Making TF Lxmert model compliant with AMP #10257 (@jplu)
- Making TF MobileBert model compliant with AMP #10259 (@jplu)
- Making TF MPNet model compliant with XLA #10260 (@jplu)
- Making TF T5 model compliant with AMP and XLA #10262 (@jplu)
- Making TF TransfoXL model compliant with AMP #10264 (@jplu)
- Making TF OpenAI GPT model compliant with AMP and XLA #10261 (@jplu)
- Rework the AMP for TF XLNet #10274 (@jplu)
- Making TF Longformer-like models compliant with AMP #10233 (@jplu)
SageMaker Trainer for model parallelism
We are rolling out experimental support for model parallelism on SageMaker with a new SageMakerTrainer
that can be used in place of the regular Trainer
. This is a temporary class that will be removed in a future version, the end goal is to have Trainer
support this feature out of the box.
- Add SageMakerTrainer for model paralellism #10122 (@sgugger)
- Extend trainer logging for sm #10633 (@philschmid)
- Sagemaker Model Parallel tensoboard writing fix #10403 (@mansimane)
- Multiple fixes in SageMakerTrainer #10687 (@sgugger)
- Add DistributedSamplerWithLoop #10746 (@sgugger)
General improvements and bugfixes
-
Removing run_pl_glue.py from text classification docs, include run_xnli.py & run_tf_text_classification.py #10066 (@cbjuan)
-
remove token_type_ids from TokenizerBertGeneration output #10070 (@sadakmed)
-
[deepspeed tests] transition to new tests dir #10080 (@stas00)
-
Added integration tests for Pytorch implementation of the ELECTRA model #10073 (@spatil6)
-
[examples/s2s] add test set predictions #10085 (@patil-suraj)
-
Logging propagation #10092 (@LysandreJik)
-
Fix some edge cases in report_to and add deprecation warnings #10100 (@sgugger)
-
Add head_mask and decoder_head_mask to TF LED #9988 (@stancld)
-
Fix Faiss Import #10103 (@patrickvonplaten)
-
[RAG] fix generate #10094 (@patil-suraj)
-
Fix TFConvBertModelIntegrationTest::test_inference_masked_lm Test #10104 (@abhishekkrthakur)
-
doc: update W&B related doc #10086 (@borisdayma)
-
Remove speed metrics from default compute objective [WIP] #10107 (@shiva-z)
-
[scheduled github CI] add deepspeed fairscale deps #10116 (@stas00)
-
Line endings should be LF across repo and not CRLF #10119 (@LysandreJik)
-
remove adjust_logits_during_generation method #10087 (@patil-suraj)
-
[examples/run_s2s] remove task_specific_params and update rouge computation #10133 (@patil-suraj)
-
[hf_api] delete deprecated methods and tests #10159 (@julien-c)
-
Revert propagation #10171 (@LysandreJik)
-
Conversion from slow to fast for BPE spm vocabs contained an error. #10120 (@Narsil)
-
[Doc] Fix version control in internal pages #10124 (@sgugger)
-
Fix v2 model loading issue #10129 (@BigBird01)
-
Add new model to labels that should not stale #10187 (@LysandreJik)
-
[RAG] fix tokenizer #10167 (@patil-suraj)
-
fix run_seq2seq.py; porting trainer tests to it #10162 (@stas00)
-
Specify dataset dtype #10195 (@LysandreJik)
-
[CI] make the examples sub-group of tests run always #10196 (@stas00)
-
[WIP][examples/seq2seq] move old s2s scripts to legacy #10136 (@patil-suraj)
-
set tgt_lang of MBart Tokenizer for summarization #10205 (@HeroadZ)
-
Fix add_token_positions in custom datasets tutorial #10217 (@joeddav)
-
Factor out methods #10215 (@LysandreJik)
-
[trainer] refactor place_model_on_device logic, add deepspeed #10243 (@stas00)
-
Introduce warmup_ratio training argument #10229 (@tanmay17061)
-
Script for distilling zero-shot classifier to more efficient student #10244 (@joeddav)
-
[trainer] implement support for full fp16 in evaluation/predict #10268 (@stas00)
-
[ISSUES.md] propose using google colab to reproduce problems #10270 (@stas00)
-
Introduce logging_strategy training argument #10267 (@tanmay17061)
-
Patch zero shot distillation script cuda issue #10284 (@joeddav)
-
Add note to resize token embeddings matrix when adding new tokens to voc #10331 (@LysandreJik)
-
[examples/seq2seq] defensive programming + expand/correct README #10295 (@stas00)
-
[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration #10310 (@stas00)
-
Loading from last checkpoint functionality in Trainer.train #10334 (@tanmay17061)
-
[trainer] add Trainer methods for metrics logging and saving #10266 (@stas00)
-
Fix evaluation with label smoothing in Trainer #10338 (@sgugger)
-
Fix broken examples/seq2seq/README.md markdown #10344 (@Wikidepia)
-
[bert-base-german-cased] use model repo, not external bucket #10353 (@julien-c)
-
[Trainer/Deepspeed] handle get_last_lr() before first step() #10362 (@stas00)
-
ConvBERT fix torch <> tf weights conversion #10314 (@abhishekkrthakur)
-
fix deprecated reference
tokenizer.max_len
in glue.py #10220 (@poedator) -
[trainer] move secondary methods into a separate file #10363 (@stas00)
-
Run GA on every push even on forks #10383 (@LysandreJik)
-
GA: only run model templates once #10388 (@LysandreJik)
-
Bugfix: Removal of padding_idx in BartLearnedPositionalEmbedding #10200 (@mingruimingrui)
-
Remove unused variable in example for Q&A #10392 (@abhishekkrthakur)
-
Ignore unexpected weights from PT conversion #10397 (@LysandreJik)
-
Add support for ZeRO-2/3 and ZeRO-offload in fairscale #10354 (@sgugger)
-
Fix None in add_token_positions - issue #10210 #10374 (@andreabac3)
-
Fix run_glue evaluation when model has a label correspondence #10401 (@sgugger)
-
[ci, flax] non-existing models are unlikely to pass tests #10409 (@julien-c)
-
[LED] Correct Docs #10419 (@patrickvonplaten)
-
Add Ray Tune hyperparameter search integration test #10414 (@krfricke)
-
Fix conda-build #10431 (@LysandreJik)
-
[run_seq2seq.py] restore functionality: saving to test_generations.txt #10428 (@stas00)
-
updated logging and saving metrics #10436 (@bhadreshpsavani)
-
Introduce save_strategy training argument #10286 (@tanmay17061)
-
Adds terms to Glossary #10443 (@darigovresearch)
-
Fixes compatibility bug when using grouped beam search and constrained decoding together #10475 (@mnschmit)
-
Generate can return cross-attention weights too #10493 (@Mehrad0711)
-
Fix typos #10489 (@WybeKoper)
-
[T5] Fix speed degradation bug t5 #10496 (@patrickvonplaten)
-
feat(docs): navigate with left/right arrow keys #10481 (@ydcjeff)
-
Refactor checkpoint name in BERT and MobileBERT #10424 (@sgugger)
-
remap MODEL_FOR_QUESTION_ANSWERING_MAPPING classes to names auto-generated file #10487 (@stas00)
-
Fix the bug in constructing the all_hidden_states of DeBERTa v2 #10466 (@felixgwu)
-
Remove unsupported methods from ModelOutput doc #10505 (@sgugger)
-
Not always consider a local model a checkpoint in run_glue #10517 (@sgugger)
-
Removes overwrites for output_dir #10521 (@philschmid)
-
[ProphetNet] Bart-like Refactor #10501 (@patrickvonplaten)
-
Fix example of custom Trainer to reflect signature of compute_loss #10537 (@lewtun)
-
Fix torch 1.8.0 segmentation fault #10546 (@LysandreJik)
-
Typo correction. #10531 (@cliang1453)
-
Stale Bot #10509 (@LysandreJik)
-
Refactoring checkpoint names for multiple models #10527 (@danielpatrickhug)
-
fix tf doc bug #10570 (@Sniper970119)
-
Fix typo in docstring for pipeline #10591 (@silvershine157)
-
wrong model used for BART Summarization example #10582 (@orena1)
-
[M2M100] fix positional embeddings #10590 (@patil-suraj)
-
Enable torch 1.8.0 on GPU CI #10593 (@LysandreJik)
-
tokenization_marian.py: use current_spm for decoding #10357 (@Mehrad0711)
-
Added max_sample_ arguments #10551 (@bhadreshpsavani)
-
[examples tests on multigpu] resolving require_torch_non_multi_gpu_but_fix_me #10561 (@stas00)
-
Check layer types for Optimizer construction #10598 (@sgugger)
-
Speedup tf tests #10601 (@LysandreJik)
-
[docs] How to solve "Title level inconsistent" sphinx error #10600 (@stas00)
-
[FeatureExtractorSavingUtils] Refactor PretrainedFeatureExtractor #10594 (@patrickvonplaten)
-
fix flaky m2m100 test #10604 (@patil-suraj)
-
[examples template] added max_sample args and metrics changes #10602 (@bhadreshpsavani)
-
Fixes an issue in
text-classification
where MNLI eval/test datasets are not being preprocessed. #10621 (@allenwang28) -
[M2M100] remove final_logits_bias #10606 (@patil-suraj)
-
Copy tokenizer files in each of their repo #10624 (@sgugger)
-
Document Trainer limitation on custom models #10635 (@sgugger)
-
Fix Longformer tokenizer filename #10653 (@LysandreJik)
-
Update README.md #10647 (@Arvid-pku)
-
Ensure metric results are JSON-serializable #10632 (@sgugger)
-
S2S + M2M100 should be available in tokenization_auto #10657 (@LysandreJik)
-
Remove special treatment for custom vocab files #10637 (@sgugger)
-
[S2T] fix example in docs #10667 (@patil-suraj)
-
W2v2 test require torch #10665 (@LysandreJik)
-
Fix Marian/TFMarian tokenization tests #10661 (@LysandreJik)
-
Fixes Pegasus tokenization tests #10671 (@LysandreJik)
-
Onnx fix test #10663 (@mfuntowicz)
-
Specify minimum version for sacrebleu #10662 (@LysandreJik)
-
Add DeBERTa to MODEL_FOR_PRETRAINING_MAPPING #10668 (@jeswan)
-
Fix broken link #10656 (@WybeKoper)
-
fix typing error for HfArgumentParser for Optional[bool] #10672 (@bfineran)
-
MT5 integration test: adjust loss difference #10669 (@LysandreJik)
-
TensorFlow tests: having from_pt set to True requires torch to be installed. #10664 (@LysandreJik)
-
Add auto_wrap option in fairscale integration #10673 (@sgugger)
-
fix: #10628 expanduser path in TrainingArguments #10660 (@PaulLerner)
-
Pass encoder outputs into GenerationMixin #10599 (@ymfa)
-
[wip] [deepspeed] AdamW is now supported by default #9624 (@stas00)
-
[Tests] RAG #10679 (@patrickvonplaten)
-
enable loading Mbart50Tokenizer with AutoTokenizer #10690 (@patil-suraj)
-
GPT2DoubleHeadsModel made parallelizable #10658 (@ishalyminov)
-
split seq2seq script into summarization & translation #10611 (@theo-m)
-
Adding required flags to non-default arguments in hf_argparser #10688 (@Craigacp)
-
Fix backward compatibility with EvaluationStrategy #10718 (@sgugger)
-
Tests run on Docker #10681 (@LysandreJik)
-
Rename zero-shot pipeline multi_class argument #10727 (@joeddav)
-
independent training / eval with local files #10710 (@riklopfer)
-
Flax testing should not run the full torch test suite #10725 (@patrickvonplaten)