huggingface/transformers v4.16.0 on GitHub

New models

Nyströmformer

The Nyströmformer model was proposed in Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh.

The Nyströmformer model overcomes the quadratic complexity of self-attention on the input sequence length by adapting the Nyström method to approximate standard self-attention, enabling longer sequences with thousands of tokens as input.

Add Nystromformer by @novice03 in #14659

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=nystromformer

REALM

The REALM model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.

It’s a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks.

Add REALM by @qqaatw in #13292
Add FastTokenizer to REALM by @qqaatw in #15211

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=realm

ViTMAE

The ViTMAE model was proposed in Masked Autoencoders Are Scalable Vision Learners by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.

The paper shows that, by pre-training a Vision Transformer (ViT) to reconstruct pixel values for masked patches, one can get results after fine-tuning that outperform supervised pre-training.

Add MAE by @NielsRogge in #15120

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vit_mae

ViLT

The ViLT model was proposed in ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision by Wonjae Kim, Bokyung Son, Ildoo Kim.

ViLT incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for Vision-and-Language Pre-training (VLP).

Add ViLT by @NielsRogge in #14895

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vilt

Swin Transformer

The Swin Transformer was proposed in Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

The Swin Transformer serves as a general-purpose backbone for computer vision. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size.

Add Swin Transformer by @novice03 in #15085

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=swin

YOSO

The YOSO model was proposed in You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling
by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.

YOSO approximates standard softmax self-attention via a Bernoulli sampling scheme based on Locality Sensitive Hashing (LSH). In principle, all the Bernoulli random variables can be sampled with a single hash.

Add YOSO by @novice03 in #15091

Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=yoso

Add model like

To help contributors add new models more easily to Transformers, there is a new command that will clone an existing model and set the various hooks in the library, so that you only have to write the tweaks needed to the modeling file. Just run transformers-cli add-new-model-like and fill the questionnaire!

Add model like by @sgugger in #14992

Training scripts

New training scripts were introduced, for speech seq2seq models and an image pre-training script leveraging the ViTMAE models.
Finally, an image captioning example in Flax gets added to the library.

Add Speech Seq2Seq Training script by @patrickvonplaten in #14792
[ViTMAE] Add image pretraining script by @NielsRogge in #15242
Add Flax image captioning example by @ydshieh in #14864

Pipelines

Adding support for long files on automatic-speech-recognition (ASR) as well as supporting audio models with LM which increases the WER on many tasks See the blogpost.
Also continuously increasing homogeneity in arguments, framework support on all pipelines.

Large audio chunking for the existing ASR pipeline by @anton-l in #14896
Enabling TF on image-classification pipeline. by @Narsil in #15030
Pipeline ASR with LM. by @Narsil in #15071
ChunkPipeline: batch_size enabled on zero-cls and qa pipelines. by @Narsil in #14225

PyTorch improvements

The ELECTRA model can now be used as a decoder, enabling an ELECTRA encoder-decoder model.

Add ElectraForCausalLM -> Enable Electra encoder-decoder model by @stancld in #14729

TensorFlow improvements

Keras metric callback by @Rocketknight1 and @merveenoyan in #14867

The vision encoder decoder model can now be used in TensorFlow.

Add TFVisionEncoderDecoderModel by @ydshieh in #14148

CLIP gets ported to TensorFlow.

Add TFCLIPModel by @ydshieh in #13967

Flax improvements

RoFormer gets ported to Flax.

Add Flax RoFormer by @stancld in #15005

Deprecations

Deprecates AdamW and adds --optim by @manuelciosici in #14744

Documentation

The documentation has been fully migrated to MarkDown, if you are making contribution, make sure to read the upgraded guide on how to write good docstrings.

Convert rst files by @sgugger in #14888
Doc styler v2 by @sgugger in #14950
Convert last rst file by @sgugger in #14952
Doc styler examples by @sgugger in #14953
[doc] consistent True/False/None default format by @stas00 in #14951
[doc] :obj: hunt by @stas00 in #14954
[doc] :class: hunt by @stas00 in #14955

Bugfixes and improvements

Fix installation instructions for BART ONNX example by @lewtun in #14885
Fix doc examples: ... takes no keyword arguments by @ydshieh in #14701
Fix AttributeError from PreTrainedTokenizerFast.decoder by @aphedges in #14691
Add 'with torch.no_grad()' to ALBERT integration test forward pass by @henholm in #14808
Add ONNX support for MarianMT models by @lewtun in #14586
add custom stopping criteria to human eval script by @lvwerra in #14897
Set run_name in MLflowCallback by @YangDong2002 in #14894
[AutoTokenizer] Fix incorrect from pretrained by @patrickvonplaten in #14900
[Tests] Update speech diarization and WavLM tolerances by @anton-l in #14902
[doc] post-porting by @stas00 in #14890
[Generate] Remove attention_mask and integrate model_main_input_name by @patrickvonplaten in #14856
Fix failing GPU trainer tests by @sgugger in #14903
Better logic for getting tokenizer config in AutoTokenizer by @sgugger in #14906
[doc] install - add link to jax installation by @stas00 in #14912
[WavLM] fix wavlm docs by @patrickvonplaten in #14910
Fix Perceiver docs by @Sanster in #14917
fix to issue #14833 in data_collator - consider no labels by @kleinay in #14930
Fix duplicate call to save_checkpoint when using deepspeed by @MihaiBalint in #14946
[WavLM] give model more precision tolerance in tests by @patrickvonplaten in #14958
[Speech Recognition Examples] Update README.md by @patrickvonplaten in #14965
[Tests] Speed up tokenizer tests by @patrickvonplaten in #14964
[Wav2Vec2] Rename model's feature extractor to feature encoder by @patrickvonplaten in #14959
Replace assertion with exception by @jaketae in #14970
remove absl workaround as it's no longer needed by @stas00 in #14909
Fixing a pathological case for slow tokenizers by @Narsil in #14981
[AutoProcessor] Correct AutoProcessor and automatically add processor… by @patrickvonplaten in #14881
[Generate] correct encoder_outputs are passed without attention_mask by @patrickvonplaten in #14980
Adding num_return_sequences support for text2text generation. by @Narsil in #14988
Enabling tokenizers upgrade. by @Narsil in #14941
Allow training to resume even if RNG states are not properly loaded by @sgugger in #14994
Map model_type and doc pages names by @sgugger in #14944
Fixing t2t pipelines lists outputs. by @Narsil in #15008
Improve truncation_side by @Narsil in #14947
Fix doc examples: name 'torch' is not defined by @ydshieh in #15016
[Tests] Correct Wav2Vec2 & WavLM tests by @patrickvonplaten in #15015
[doc] Update parallelism.mdx by @hyunwoongko in #15013
Fix Code block speech pretraining example by @flozi00 in #14983
Fix a little typo by @milyiyo in #15002
Hotfix chunk_length_s instead of _ms. by @Narsil in #15029
[doc] Update parallelism.mdx by @hyunwoongko in #15018
[megatron convert] PYTHONPATH requirements by @stas00 in #14956
Fix doc example: mask_time_indices (numpy) has no attribute 'to' by @ydshieh in #15033
Adding QoL for batch_size arg (like others enabled everywhere). by @Narsil in #15027
[CLIP] Fix PT test by @patrickvonplaten in #15041
[SpeechEncoderDecoder] Fix from pretrained by @patrickvonplaten in #15043
[CLIP] Fix TF test by @patil-suraj in #15042
Wrap Roberta integration test forward passes with torch.no_grad() by @mattchurgin in #15037
Add Detectron2 to Github actions by @NielsRogge in #15053
Remove old asserts. by @Narsil in #15012
Add 'with torch.no_grad()' to BertGeneration integration test forward passes by @itsTurner in #14963
Update run_speech_recognition_seq2seq.py (max_eval_samples instead of train_samples) by @flozi00 in #14967
[VisionTextDualEncoder] Fix doc example by @ydshieh in #15057
Resubmit changes after rebase to master by @kct22aws in #14982
[Fix doc examples] missing from_pretrained by @ydshieh in #15044
[VisionTextDualEncoder] Add token_type_ids param by @ydshieh in #15073
Fix convert for newer megatron-lm bert model by @yoquankara in #14082
[Wav2Vec2 Speech Event] Add speech event v2 by @patrickvonplaten in #15083
fix model table cell text alignment by @ydshieh in #14999
Update check_repo.py by @kamalkraj in #15014
Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x by @cody-moveworks in #15019
Change assignee for tokenizers by @LysandreJik in #15088
support the trocr small models by @liminghao1630 in #14893
[Fix doc example] RagModel by @ydshieh in #15076
Model summary doc page horizontal banners by @mishig25 in #15058
Use tqdm.auto in Pipeline docs by @bryant1410 in #14920
[doc] normalize HF Transformers string by @stas00 in #15023
Happy New Year! by @sgugger in #15094
[DOC] fix doc examples for bart-like models by @patil-suraj in #15093
[performance doc] Power and Cooling by @stas00 in #14935
Add test to check reported training loss by @sgugger in #15096
Take gradient accumulation into account when defining samplers by @sgugger in #15095
[Fix doc example] Speech2TextForConditionalGeneration by @ydshieh in #15092
Fix cookiecutter by @NielsRogge in #15100
[Wav2Vec2ProcessorWithLM] improve decoder download by @patrickvonplaten in #15040
Adds IBERT to models exportable with ONNX by @MaximovaIrina in #14868
change metric_key_prefix in seq2seq_trainer.py by @JejuWayfarer in #15099
Print out durations of all scheduled tests by @LysandreJik in #15102
Fix failing W2V2 test by @LysandreJik in #15104
Doc styler tip by @sgugger in #15105
Update ONNX docs by @lewtun in #14904
Fix saving FlaubertTokenizer configs by @vmaryasin in #14991
Update TF test_step to match train_step by @Rocketknight1 in #15111
use block_size instead of max_seq_length in tf run_clm example by @riklopfer in #15036
fix: switch from slow to generic tokenizer class by @lvwerra in #15122
Fix TFEncoderDecoder labels handling #14357 by @ydshieh in #15001
Add ONNX configuration classes to docs by @lewtun in #15121
Add with torch.no_grad() to DistilBERT integration test forward pass by @jaketae in #14979
mBART support for run_summarization.py by @banda-larga in #15125
doc-builder -> doc-build by @LysandreJik in #15134
[Fix doc example] - ProphetNetDecoder by @ydshieh in #15124
[examples/flax/language-modeling] set loglevel by @stas00 in #15129
Update model_sharing.mdx by @carlos-aguayo in #15142
Enable AMP for xla:gpu device in trainer class by @ymwangg in #15022
[deepspeed tests] fix summarization by @stas00 in #15149
Check the repo consistency in model templates test by @sgugger in #15141
Add TF glu activation function by @gante in #15146
Make sure all submodules are properly registered by @sgugger in #15144
[Fix doc example] - OpenAIGPTDoubleHeadsModel by @ydshieh in #15143
fix BertTokenizerFast tokenize_chinese_chars arg by @SaulLu in #15158
Fix typo in test_configuration_common.py by @novice03 in #15160
Add "open in hf spaces" gradio button issue #73 by @AK391 in #15106
TF Bert inference - support np.ndarray optional arguments by @gante in #15074
Fixing flaky test (hopefully). by @Narsil in #15154
Better dummies by @sgugger in #15148
Update from keras2onnx to tf2onnx by @gante in #15162
[doc] performance: Efficient Software Prebuilds by @stas00 in #15147
[Speech models] Disable non-existing chunking in tests by @patrickvonplaten in #15163
Added forward pass of test_inference_image_classification_head by @MrinalTyagi in #14777
Fix dtype issue in TF BART by @Rocketknight1 in #15178
[doc] new MoE paper by @stas00 in #15184
Mark bad tokenizers version by @sgugger in #15188
[Fix doc example] UniSpeechSatForPreTraining by @ydshieh in #15152
is_ctc needs to be updated to `self.type == "ctc". by @Narsil in #15194
[Fix doc example] TFRagModel by @ydshieh in #15187
Error when code examples are improperly closed by @sgugger in #15186
Fix deprecation warnings for int div by @sgugger in #15180
Copies and docstring styling by @sgugger in #15202
[ASR pipeline] correct with lm pipeline by @patrickvonplaten in #15200
Remove dependency to quiet Dependabot by @sgugger in #15205
Ignore empty subfolders when identifying submodules by @sgugger in #15204
[MBartTokenizer] remove dep on xlm-roberta tokenizer by @patil-suraj in #15201
fix: #14486 do not use BertPooler in DPR by @PaulLerner in #15068
[Fix doc example] Wrong checkpoint name by @ydshieh in #15079
[Robust Speech Event] Add guides by @patrickvonplaten in #15155
Enable tqdm toggling by @jaketae in #15167
[FLAX] glue training example refactor by @kamalkraj in #13815
Rename compute_loss in TF models by @Rocketknight1 in #15207
Build dev documentation by @LysandreJik in #15210
[Fix doc example] TFFunnelTokenizer' is not defined by @ydshieh in #15225
Correct Speech Event Readme by @patrickvonplaten in #15226
[ViTMAE] Various fixes by @NielsRogge in #15221
[Speech Event] Fix speech event readme by @patil-suraj in #15227
Fix typo in BERT tokenization file by @qqaatw in #15228
Fix PR number by @LysandreJik in #15231
Adapt Common Voice Talk Title and Abstract by @patrickvonplaten in #15233
Update Trainer code example by @NielsRogge in #15070
Make chuking smartly (long files) work on asr ctc_with_lm. by @Narsil in #15219
Fix usage of additional kwargs in from_encoder_decoder_pretrained in encoder-decoder models by @jsnfly in #15056
Update README.md by @anton-l in #15239
Update README.md by @anton-l in #15246
Update pipelines.mdx by @kamalkraj in #15243
[Fix doc example] missing import by @ydshieh in #15240
Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels by @Rocketknight1 in #15234
Make sure to raise NotImplementedError with correct method name by @kumapo in #15253
Fix crash when logs are empty because Keras has wiped them out of spite by @Rocketknight1 in #15258
Tentative workflow improvement by @LysandreJik in #15255
Fix code examples by @NielsRogge in #15257
Adds missing module_specs for usages of _LazyModule by @jkuball in #15230
Prepare ONNX export for torch v1.11 by @lewtun in #15270
Fix by @novice03 in #15276
Move BART + ONNX example to research_projects by @lewtun in #15271
Specify providers explicitly in ORT session initialization by @wangyems in #15235
Fixes Benchmark example link by @evandrosks in #15278
[Robust Speech Challenge] Add timeline by @patrickvonplaten in #15274
[Fix doc example] TFLayoutLMForTokenClassification: missing import tf by @ydshieh in #15268
[Wav2Vec2ProcessorWithLM] improve multi processing by @patrickvonplaten in #15247
Refine errors for pretrained objects by @sgugger in #15261
[PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests by @patrickvonplaten in #15272
Update eval.py by @patrickvonplaten in #15310
Update CONTRIBUTING.md by @kamalkraj in #15290
Fix a typo in tag addition by @sgugger in #15286
Remove old debug code leftover. by @Narsil in #15306
[Fix doc example] fix missing import jnp by @ydshieh in #15291
[LayoutLMV2 Tests] Make sure input is on GPU by @patrickvonplaten in #15314
Replace NystromformerTokenizer with AutoTokenizer by @novice03 in #15312
[Beam Search] Correct returned beam scores by @patrickvonplaten in #14654
[Examples] Correct run ner label2id for fine-tuned models by @patrickvonplaten in #15017
Avoid using get_list_of_files by @sgugger in #15287
[Tests] Fix test by @NielsRogge in #15324
Add 🤗 Accelerate tutorial by @stevhliu in #15263
Added missing code in exemplary notebook - custom datasets fine-tuning by @Pawloch247 in #15300
Fix encoder-decoder models when labels is passed by @ydshieh in #15172
Fix table formatting in SegFormer docs by @deppen8 in #15337
Fix deepspeed docs by @ngoquanghuy99 in #15346
Fix 'eval_split_name' described as defaulting to 'train' by @FremyCompany in #15348
Update doc writing guide by @sgugger in #15350
Add YOSO by @novice03 in #15091
[docs] post-PR merge fix by @stas00 in #15355
Fix YosoConfig doc by @sgugger in #15353
[DocTests Speech] Add doc tests for all speech models by @patrickvonplaten in #15031
Push to hub save by @sgugger in #15327
Fix KerasMetricCallback prediction with generate() and inference of column names by @Rocketknight1 in #15351
Add a device argument to the eval script by @anton-l in #15371
improve saving strategy of sentencepiece tokenizer by @SaulLu in #15328
Implement fixes for TrainingArguments doc by @sgugger in #15370
Super-small fix stops us confusing Keras console logging by modifying… by @Rocketknight1 in #15373
Add proper documentation for Keras callbacks by @sgugger in #15374
Example script for PushToHubCallback by @Rocketknight1 in #15375

Impressive community contributors

The community contributors below have significantly contributed to the v4.16.0 release. Thank you!

@novice03, for contributing Nyströmformer, Swin Transformer and YOSO
@qqaatw, for contributing REALM
@stancld, for adding support for ELECTRA as a decoder, and porting RoFormer to Flax
@ydshieh, for a myriad of documentation fixes, the port of CLIP to TensorFlow, the addition of the TensorFlow vision encoder-decoder model, and the contribution of an image captioning example in Flax.

New Contributors

@YangDong2002 made their first contribution in #14894
@Sanster made their first contribution in #14917
@kleinay made their first contribution in #14930
@MihaiBalint made their first contribution in #14946
@milyiyo made their first contribution in #15002
@mattchurgin made their first contribution in #15037
@itsTurner made their first contribution in #14963
@kct22aws made their first contribution in #14982
@yoquankara made their first contribution in #14082
@cody-moveworks made their first contribution in #15019
@MaximovaIrina made their first contribution in #14868
@JejuWayfarer made their first contribution in #15099
@novice03 made their first contribution in #14659
@banda-larga made their first contribution in #15125
@manuelciosici made their first contribution in #14744
@carlos-aguayo made their first contribution in #15142
@gante made their first contribution in #15146
@AK391 made their first contribution in #15106
@MrinalTyagi made their first contribution in #14777
@jsnfly made their first contribution in #15056
@jkuball made their first contribution in #15230
@wangyems made their first contribution in #15235
@evandrosks made their first contribution in #15278
@Pawloch247 made their first contribution in #15300
@deppen8 made their first contribution in #15337
@ngoquanghuy99 made their first contribution in #15346

Full Changelog: v4.15.0...v4.16.0

huggingface/transformers v4.16.0 v4.16.0: Nyströmformer, REALM, ViTMAE, ViLT, Swin Transformer, YOSO, ... on GitHub