v4.7.0: DETR, RoFormer, ByT5, Hubert, support for torch 1.9.0

DETR (@NielsRogge)

Three new models are released as part of the DETR implementation: DetrModel, DetrForObjectDetection and DetrForSegmentation, in PyTorch.

DETR consists of a convolutional backbone followed by an encoder-decoder Transformer which can be trained end-to-end for object detection. It greatly simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use things like region proposals, non-maximum suppression procedure, and anchor generation. Moreover, DETR can also be naturally extended to perform panoptic segmentation, by simply adding a mask head on top of the decoder outputs.

DETR can support any timm backbone.

The DETR model was proposed in End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko.

Add DETR #11653 (@NielsRogge)
Improve DETR #12147 (@NielsRogge)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=detr

ByT5 (@patrickvonplaten)

A new tokenizer is released as part of the ByT5 implementation: ByT5Tokenizer. It can be used with the T5 family of models.

The ByT5 model was presented in ByT5: Towards a token-free future with pre-trained byte-to-byte models by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.

ByT5 model #11971 (@patrickvonplaten)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?search=byt5

RoFormer (@JunnYu)

14 new models are released as part of the RoFormer implementation: RoFormerModel, RoFormerForCausalLM, RoFormerForMaskedLM, RoFormerForSequenceClassification, RoFormerForTokenClassification, RoFormerForQuestionAnswering and RoFormerForMultipleChoice, TFRoFormerModel, TFRoFormerForCausalLM, TFRoFormerForMaskedLM, TFRoFormerForSequenceClassification, TFRoFormerForTokenClassification, TFRoFormerForQuestionAnswering and TFRoFormerForMultipleChoice, in PyTorch and TensorFlow.

RoFormer is a BERT-like autoencoding model with rotary position embeddings. Rotary position embeddings have shown improved performance on classification tasks with long texts. The RoFormer model was proposed in RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.

Add new model RoFormer (use rotary position embedding ) #11684 (@JunnYu)

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=roformer

HuBERT (@patrickvonplaten)

HuBERT is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.

HuBERT was proposed in HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.

Two new models are released as part of the HuBERT implementation: HubertModel and HubertForCTC, in PyTorch.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=hubert

Hubert #11889 (@patrickvonplaten)

Hugging Face Course - Part 1

On Monday, June 14th, 2021, we released the first part of the Hugging Face Course. The course is focused on the Hugging Face ecosystem, including transformers. Most of the material in the course is now linked from the transformers documentation which now includes videos to explain singular concepts.

Add video links to the documentation #12162 (@sgugger)
Add link to the course #12229 (@sgugger)

TensorFlow additions

The Wav2Vec2 model can now be used in TensorFlow:

Adding TFWav2Vec2Model #11617 (@will-rice)

PyTorch 1.9 support

Add support for torch 1.9.0 #12224 (@LysandreJik )
fix pt-1.9.0 add_ deprecation #12217 (@stas00)

Notebooks

@NielsRogge has contributed five tutorials on the usage of BERT in his repository: Transformers-Tutorials
[Community Notebooks] Add Emotion Speech Noteboook #11900 (@patrickvonplaten)

General improvements and bugfixes

Vit deit fixes #11309 (@NielsRogge)
Enable option for subword regularization in more tokenizers. #11417 (@PhilipMay)
Fix gpt-2 warnings #11709 (@LysandreJik)
[Flax] Fix BERT initialization & token_type_ids default #11695 (@patrickvonplaten)
BigBird on TPU #11651 (@vasudevgupta7)
[T5] Add 3D attention mask to T5 model (2) (#9643) #11197 (@lexhuismans)
Fix loading the best model on the last stage of training #11718 (@vbyno)
Fix T5 beam search when using parallelize #11717 (@OyvindTafjord)
[Flax] Correct example script #11726 (@patrickvonplaten)
Add Cloud details to README #11706 (@marcvanzee)
Experimental symbolic tracing feature with torch.fx for BERT, ELECTRA and T5 #11475 (@michaelbenayoun)
Improvements to Flax finetuning script #11727 (@marcvanzee)
Remove tapas model card #11739 (@julien-c)
Add visual + link to Premium Support webpage #11740 (@julien-c)
Issue with symbolic tracing for T5 #11742 (@michaelbenayoun)
[BigBird Pegasus] Make tests faster #11744 (@patrickvonplaten)
Use new evaluation loop in TrainerQA #11746 (@sgugger)
Flax BERT fix token type init #11750 (@patrickvonplaten)
[TokenClassification] Label realignment for subword aggregation #11680 (@Narsil)
Fix checkpoint deletion #11748 (@sgugger)
Fix incorrect newline in #11650 #11757 (@oToToT)
Add more subsections to main doc #11758 (@patrickvonplaten)
Fixed: Better names for nlp variables in pipelines' tests and docs. #11752 (@01-vyom)
add dataset_name to data_args and added accuracy metric #11760 (@philschmid)
Add Flax Examples and Cloud TPU README #11753 (@avital)
Fix a bug in summarization example which did not load model from config properly #11762 (@tomy0000000)
FlaxGPT2 #11556 (@patil-suraj)
Fix usage of head masks by PT encoder-decoder models' generate() function #11621 (@stancld)
[T5 failing CI] Fix generate test #11770 (@patrickvonplaten)
[Flax MLM] Refactor run mlm with optax #11745 (@patrickvonplaten)
Add DOI badge to README #11771 (@albertvillanova)
Deprecate commands from the transformers-cli that are in the hf-cli #11779 (@LysandreJik)
Fix release utilpattern in conf.py #11784 (@sgugger)
Fix regression in regression #11785 (@sgugger)
A cleaner and more scalable implementation of symbolic tracing #11763 (@michaelbenayoun)
Fix failing test on Windows Platform #11589 (@Lynx1820)
[Flax] Align GLUE training script with mlm training script #11778 (@patrickvonplaten)
Patch recursive import #11812 (@LysandreJik)
fix roformer config doc #11813 (@JunnYu)
[Flax] Small fixes in run_flax_glue.py #11820 (@patrickvonplaten)
[Deepspeed] support zero.Init in from_config #11805 (@stas00)
Add flax text class colab #11824 (@patrickvonplaten)
Faster list concat for trainer_pt_utils.get_length_grouped_indices() #11825 (@ctheodoris)
Replace double occurrences as the last step #11367 (@LysandreJik)
[Flax] Fix PyTorch import error #11839 (@patrickvonplaten)
Fix reference to XLNet #11846 (@sgugger)
Switch mem metrics flag #11851 (@sgugger)
Fix flos single node #11844 (@TevenLeScao)
Fix two typos in docs #11852 (@nickls)
[Trainer] Report both steps and num samples per second #11818 (@sgugger)
Add some tests to the slow suite #11860 (@LysandreJik)
Enable memory metrics in tests that need it #11859 (@LysandreJik)
fixed a small typo in the CONTRIBUTING doc #11856 (@stsuchi)
typo #11858 (@WrRan)
Add option to log only once in multinode training #11819 (@sgugger)
[Wav2Vec2] SpecAugment Fast #11764 (@patrickvonplaten)
[lm examples] fix overflow in perplexity calc #11855 (@stas00)
[Examples] create model with custom config on the fly #11798 (@stas00)
[Wav2Vec2ForCTC] example typo fixed #11878 (@madprogramer)
[AutomaticSpeechRecognitionPipeline] Ensure input tensors are on device #11874 (@francescorubbo)
Fix usage of head masks by TF encoder-decoder models' generate() function #11775 (@stancld)
Correcting comments in T5Stack to reflect correct tuple order #11330 (@talkhaldi)
[Flax] Allow dataclasses to be jitted #11886 (@patrickvonplaten)
changing find_batch_size to work with tokenizer outputs #11890 (@joerenner)
Link official Cloud TPU JAX docs #11892 (@avital)
Flax Generate #11777 (@patrickvonplaten)
Update deepspeed config to reflect hyperparameter search parameters #11896 (@Mindful)
Adding new argument max_new_tokens for generate. #11476 (@Narsil)
Added Sequence Classification class in GPTNeo #11906 (@bhadreshpsavani)
[Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 #11918 (@jayendra13)
Test optuna and ray #11924 (@LysandreJik)
Use self.assertEqual instead of assert in deberta v2 test. #11935 (@PhilipMay)
Remove redundant nn.log_softmax in run_flax_glue.py #11920 (@n2cholas)
Add MT5ForConditionalGeneration as supported arch. to summarization README #11961 (@PhilipMay)
Add FlaxCLIP #11883 (@patil-suraj)
RAG-2nd2end-revamp #11893 (@shamanez)
modify qa-trainer #11872 (@zhangfanTJU)
get_ordinal(local=True) replaced with get_local_ordinal() in training_args.py #11922 (@BassaniRiccardo)
reinitialize wandb config for each hyperparameter search run #11945 (@Mindful)
Add regression tests for slow sentencepiece tokenizers. #11737 (@PhilipMay)
Authorize args when instantiating an AutoModel #11956 (@LysandreJik)
Neptune.ai integration #11937 (@vbyno)
[deepspeed] docs #11940 (@stas00)
typo correction #11973 (@JminJ)
Typo in usage example, changed to device instead of torch_device #11979 (@albertovilla)
[DeepSpeed] decouple DeepSpeedConfigHF from Trainer #11966 (@stas00)
[Trainer] add train loss and flops metrics reports #11980 (@stas00)
Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert #11983 (@dependabot[bot])
[RAG] Fix rag from pretrained question encoder generator behavior #11962 (@patrickvonplaten)
Fix examples in VisualBERT docs #11990 (@gchhablani)
[docs] fix xref to PreTrainedModel.generate #11049 (@stas00)
Update return introduction of forward method #11976 (@kouyk)
[deepspeed] Move code and doc into standalone files #11984 (@stas00)
[deepspeed] add nvme test skip rule #11997 (@stas00)
Fix weight decay masking in run_flax_glue.py #11964 (@n2cholas)
[Flax] Refactor MLM #12013 (@patrickvonplaten)
[Deepspeed] Assert on mismatches between ds and hf args #12021 (@stas00)
[TrainerArguments] format and sort repr, add str #12018 (@stas00)
Fixed Typo in modeling_bart.py #12035 (@ceevaaa)
Fix deberta 2 Tokenizer Integration Test #12017 (@PhilipMay)
fix past_key_values docs #12049 (@patil-suraj)
[JAX] Bump jax lib #12053 (@patrickvonplaten)
Fixes bug that appears when using QA bert and distilation. #12026 (@madlag)
Extend pipelines for automodel tupels #12025 (@Narsil)
Add optional grouped parsers description to HfArgumentParser #12042 (@peteriz)
adds metric prefix. #12057 (@riklopfer)
[CI] skip failing test #12059 (@stas00)
Fix LUKE integration tests #12066 (@NielsRogge)
Fix tapas issue #12063 (@NielsRogge)
updated the original RAG implementation to be compatible with latest Pytorch-Lightning #11806 (@shamanez)
Replace legacy tensor.Tensor with torch.tensor/torch.empty #12027 (@mariosasko)
Add torch to requirements.txt in language-modeling #12040 (@cdleong)
Properly indent block_size #12070 (@sgugger)
[Deepspeed] various fixes #12058 (@stas00)
[Deepspeed Wav2vec2] integration #11638 (@stas00)
Update run_ner.py with id2label config #12001 (@KoichiYasuoka)
[wav2vec2 / Deepspeed] sync LayerDrop for Wav2Vec2Encoder + tests #12076 (@stas00)
[test] support more than 2 gpus #12074 (@stas00)
Wav2Vec2 Pretraining #11306 (@anton-l)
[examples/flax] pass decay_mask fn to optimizer #12087 (@patil-suraj)
[versions] rm require_version_examples #12088 (@stas00)
[Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests #12089 (@patrickvonplaten)
Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args #12083 (@kumapo)
CLIPFeatureExtractor should resize images with kept aspect ratio #11994 (@TobiasNorlund)
New TF GLUE example #12028 (@Rocketknight1)
Appending label2id and id2label to models for inference #12102 (@Rocketknight1)
Fix a condition in test_generate_with_head_masking #11911 (@stancld)
[Flax] Adding Visual-Transformer #11951 (@jayendra13)
add relevant description to tqdm in examples #11927 (@bhavitvyamalik)
Fix head masking generate tests #12110 (@patrickvonplaten)
Flax CLM script #12023 (@patil-suraj)
Add from_pretrained to dummy timm objects #12097 (@LysandreJik)
Fix t5 error message #12136 (@cccntu)
Fix megatron_gpt2 attention block's causal mask #12007 (@novatig)
Add mlm pretraining xla torch readme #12011 (@patrickvonplaten)
add readme for flax clm #12111 (@patil-suraj)
[Flax] Add FlaxBart models #11537 (@stancld)
Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer #11810 (@SaulLu)
[Flax] Add links to google colabs #12146 (@patrickvonplaten)
Don't log anything before logging is setup in examples #12121 (@sgugger)
Use text_column_name variable instead of "text" #12132 (@nbroad1881)
[lm examples] Replicate --config_overrides addition to other LM examples #12135 (@kumar-abhishek)
[Flax] fix error message #12148 (@patil-suraj)
[optim] implement AdafactorSchedule #12123 (@stas00)
[style] consistent nn. and nn.functional #12124 (@stas00)
[Flax] Fix flax pt equivalence tests #12154 (@patrickvonplaten)
[style] consistent nn. and nn.functional: part2: templates #12153 (@stas00)
Flax Big Bird #11967 (@vasudevgupta7)
[style] consistent nn. and nn.functional: part 3 tests #12155 (@stas00)
[style] consistent nn. and nn.functional: part 4 examples #12156 (@stas00)
consistent nn. and nn.functional: part 5 docs #12161 (@stas00)
[Flax generate] Add params to generate #12171 (@patrickvonplaten)
Use a released version of optax rather than installing from Git. #12173 (@avital)
Have dummy processors have a from_pretrained method #12145 (@LysandreJik)
Add course banner #12157 (@sgugger)
Enable add_prefix_space on run_ner if necessary #12116 (@kumapo)
Update AutoModel classes in summarization example #12178 (@ionicsolutions)
Ray Tune Integration Updates #12134 (@amogkam)
[testing] ensure concurrent pytest workers use a unique port for torch.dist #12166 (@stas00)
Model card defaults #12122 (@sgugger)
Temporarily deactivate torch-scatter while we wait for new release #12181 (@LysandreJik)
Temporarily deactivate torchhub test #12184 (@sgugger)
[Flax] Add Beam Search #12131 (@patrickvonplaten)
updated DLC images and sample notebooks #12191 (@philschmid)
Enabling AutoTokenizer for HubertConfig. #12198 (@Narsil)
Use yaml to create metadata #12185 (@sgugger)
[Docs] fixed broken link #12205 (@bhadreshpsavani)
Pipeline update & tests #12207 (@LysandreJik)

huggingface/transformers v4.7.0 v4.7.0: DETR, RoFormer, ByT5, HuBERT, support for torch 1.9.0 on GitHub