v4.5.0: BigBird, GPT Neo, Examples, Flax support

BigBird (@vasudevgupta7)

Seven new models are released as part of the BigBird implementation: BigBirdModel, BigBirdForPreTraining, BigBirdForMaskedLM, BigBirdForCausalLM, BigBirdForSequenceClassification, BigBirdForMultipleChoice, BigBirdForQuestionAnswering in PyTorch.

BigBird is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. In addition to sparse attention, BigBird also applies global attention as well as random attention to the input sequence.

The BigBird model was proposed in Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.

It is released with an accompanying blog post: Understanding BigBird's Block Sparse Attention

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=big_bird

BigBird #10183 (@vasudevgupta7)
[BigBird] Fix big bird gpu test #10967 (@patrickvonplaten)
[Notebook] add BigBird trivia qa notebook #10995 (@patrickvonplaten)
[Docs] Add blog to BigBird docs #10997 (@patrickvonplaten)

GPT Neo (@patil-suraj)

Two new models are released as part of the GPT Neo implementation: GPTNeoModel, GPTNeoForCausalLM in PyTorch.

GPT⁠-⁠Neo is the code name for a family of transformer-based language models loosely styled around the GPT architecture. EleutherAI's primary goal is to replicate a GPT⁠-⁠3 DaVinci-sized model and open-source it to the public.

The implementation within Transformers is a GPT2-like causal language model trained on the Pile dataset.

Compatible checkpoints can be found on the Hub: https://huggingface.co/models?filter=gpt_neo

GPT Neo #10848 (@patil-suraj)
GPT Neo few fixes #10968 (@patil-suraj)
GPT Neo configuration needs to be set to use GPT2 tokenizer #10992 (@LysandreJik)
[GPT Neo] fix example in config #10993 (@patil-suraj)
GPT Neo cleanup #10985 (@patil-suraj )

Examples

Features have been added to some examples, and additional examples have been added.

Raw training loop examples

Based on the accelerate library, examples completely exposing the training loop are now part of the library. For easy customization if you want to try a new research idea!

Expand a bit the presentation of examples #10799 (@sgugger)
Add examples/multiple-choice/run_swag_no_trainer.py #10934 (@stancld)
Update the example template for a no Trainer option #10865 (@sgugger)
Add examples/run_ner_no_trainer.py #10902 (@stancld)
Add examples/language_modeling/run_mlm_no_trainer.py #11001 (@hemildesai)
Add examples/language_modeling/run_clm_no_trainer.py #11026 (@hemildesai)

Standardize examples with Trainer

Thanks to the amazing contributions of @bhadreshpsavani, all examples with Trainer are now standardized and all support the predict stage and will return/save metrics in the same fashion.

[Example] Updating Question Answering examples for Predict Stage #10792 (@bhadreshpsavani)
[Examples] Added predict stage and Updated Example Template #10868 (@bhadreshpsavani)
[Example] Fixed finename for Saving null_odds in the evaluation stage in QA Examples #10939 (@bhadreshpsavani)
[trainer] Fixes Typo in Predict Method of Trainer #10861 (@bhadreshpsavani)

Trainer & SageMaker Model Parallelism

The Trainer now supports SageMaker model parallelism out of the box, the old SageMakerTrainer is deprecated as a consequence and will be removed in version 5.

Merge trainers #10975 (@sgugger)
added new notebook and merge of trainer #11015 (@philschmid)

FLAX

FLAX support has been widened to support all model heads of the BERT architecture, alongside a general conversion script for checkpoints in PyTorch to be used in FLAX.

Auto models now have a FLAX implementation.

[Flax] Add general conversion script #10809 (@patrickvonplaten)
[Flax] Add other BERT classes #10977 (@patrickvonplaten)
Refactor AutoModel classes and add Flax Auto classes #11027 (@sgugger)

General improvements and bugfixes

Patches the full import failure and adds a test #10750 (@LysandreJik)
Patches full import failure when sentencepiece is not installed #10752 (@LysandreJik)
[Deepspeed] Allow HF optimizer and scheduler to be passed to deepspeed #10464 (@cli99)
Fix ProphetNet Flaky Test #10771 (@patrickvonplaten)
[doc] [testing] extend the pytest -k section with more examples #10761 (@stas00)
Wav2Vec2 - fix flaky test #10773 (@patrickvonplaten)
[DeepSpeed] simplify init #10762 (@stas00)
[DeepSpeed] improve checkpoint loading code plus tests #10760 (@stas00)
[trainer] make failure to find a resume checkpoint fatal + tests #10777 (@stas00)
[Issue template] need to update/extend who to tag #10728 (@stas00)
[examples] document resuming #10776 (@stas00)
Check copies blackify #10775 (@sgugger)
Smmp batch not divisible by microbatches fix #10778 (@mansimane)
Add support for detecting intel-tensorflow version #10781 (@mfuntowicz)
wav2vec2: support datasets other than LibriSpeech #10581 (@elgeish)
add run_common_voice script #10767 (@patil-suraj)
Fix bug in input check for LengthGroupSampler #10783 (@thominj)
[file_utils] do not gobble certain kinds of requests.ConnectionError #10235 (@julien-c)
from_pretrained: check that the pretrained model is for the right model architecture #10586 (@vimarshc)
[examples/seq2seq/README.md] fix t5 examples #10734 (@stas00)
Fix distributed evaluation #10795 (@sgugger)
Add XLSR-Wav2Vec2 Fine-Tuning README.md #10786 (@patrickvonplaten)
addressing vulnerability report in research project deps #10802 (@stas00)
fix backend tokenizer args override: key mismatch #10686 (@theo-m)
[XLSR-Wav2Vec2 Info doc] Add a couple of lines #10806 (@patrickvonplaten)
Add transformers id to hub requests #10811 (@philschmid)
wav2vec doc tweaks #10808 (@julien-c)
Sort init import #10801 (@sgugger)
[wav2vec sprint doc] add doc for Local machine #10828 (@patil-suraj)
Add new community notebook - wav2vec2 with GPT #10794 (@voidful)
[Wav2Vec2] Small improvements for wav2vec2 info script #10829 (@patrickvonplaten)
[Wav2Vec2] Small tab fix #10846 (@patrickvonplaten)
Fix: typo in FINE_TUNE_XLSR_WAV2VEC2.md #10849 (@qqhann)
Bump jinja2 from 2.11.2 to 2.11.3 in /examples/research_projects/lxmert #10818 (@dependabot[bot])
[vulnerability] in example deps fix #10817 (@stas00)
Correct AutoConfig call docstrings #10822 (@Sebelino)
[makefile] autogenerate target #10814 (@stas00)
Fix on_step_begin and on_step_end Callback Sequencing #10839 (@siddk)
feat(wandb): logging and configuration improvements #10826 (@borisdayma)
Modify the Trainer class to handle simultaneous execution of Ray Tune and Weights & Biases #10823 (@ruanchaves)
Use DataCollatorForSeq2Seq in run_summarization in all cases #10856 (@elsanns)
[Generate] Add save mode logits processor to remove nans and infs if necessary #10769 (@patrickvonplaten)
Make convert_to_onnx runable as script again #10857 (@sgugger)
[trainer] fix nan in full-fp16 label_smoothing eval #10815 (@stas00)
Fix p_mask cls token masking in question-answering pipeline #10863 (@mmaslankowska-neurosys)
Amazon SageMaker Documentation #10867 (@philschmid)
[file_utils] import refactor #10859 (@stas00)
Fixed confusing order of args in generate() docstring #10862 (@RafaelWO)
Sm trainer smp init fix #10870 (@philschmid)
Fix test_trainer_distributed #10875 (@sgugger)
Add new notebook links in the docs #10876 (@sgugger)
error type of tokenizer in init definition #10879 (@ZhengZixiang)
[Community notebooks] Add notebook for fine-tuning Bart with Trainer in two langs #10883 (@elsanns)
Fix overflowing bad word ids #10889 (@LysandreJik)
Remove version warning in pretrained BART models #10890 (@sgugger)
Update Training Arguments Documentation: ignore_skip_data -> ignore_data_skip #10891 (@siddk)
run_glue_no_trainer: datasets -> raw_datasets #10898 (@jethrokuan)
updates sagemaker documentation #10899 (@philschmid)
Fix comment in modeling_t5.py #10886 (@lexhuismans)
Rename NLP library to Datasets library #10920 (@tomy0000000)
[vulnerability] fix dependency #10914 (@stas00)
Add ImageFeatureExtractionMixin #10905 (@sgugger)
Return global attentions (see #7514) #10906 (@gui11aume)
Updated colab links in readme of examples #10932 (@WybeKoper)
Fix initializing BertJapaneseTokenizer with AutoTokenizers #10936 (@singletongue)
Instantiate model only once in pipeline #10888 (@sgugger)
Use pre-computed lengths, if available, when grouping by length #10953 (@pcuenca)
[trainer metrics] fix cpu mem metrics; reformat runtime metric #10937 (@stas00)
[vulnerability] dep fix #10954 (@stas00)
Fixes in the templates #10951 (@sgugger)
Sagemaker test #10925 (@philschmid)
Fix summarization notebook link #10959 (@philschmid)
improved sagemaker documentation for git_config and examples #10966 (@philschmid)
Fixed a bug where the pipeline.framework would actually contain a fully qualified model. #10970 (@Narsil)
added py7zr #10971 (@philschmid)
fix md file to avoid evaluation crash #10962 (@ydshieh)
Fixed some typos and removed legacy url #10989 (@WybeKoper)
Sagemaker test fix #10987 (@philschmid)
Fix the checkpoint for I-BERT #10994 (@LysandreJik)
Add more metadata to the user agent #10972 (@sgugger)
Enforce string-formatting with f-strings #10980 (@sgugger)
In the group by length documentation length is misspelled as legnth #11000 (@JohnnyC08)
Fix Adafactor documentation (recommend correct settings) #10526 (@jsrozner)
Improve the speed of adding tokens from added_tokens.json #10780 (@cchen-dialpad)
Add Vision Transformer and ViTFeatureExtractor #10950 (@NielsRogge)
DebertaTokenizer Rework closes #10258 #10703 (@cronoik)
[doc] no more bucket #10793 (@julien-c)
Layout lm tf 2 #10636 (@atahmasb)
fixed typo: logging instead of logger #11025 (@versis)
Add a script to check inits are consistent #11024 (@sgugger)
fix incorrect case for s|Pretrained|PreTrained| #11048 (@stas00)
[doc] fix code-block rendering #11053 (@erensahin)
Pin docutils #11062 (@LysandreJik)
Remove unnecessary space #11060 (@LysandreJik)
Some models have no tokenizers #11064 (@LysandreJik)
Documentation about loading a fast tokenizer within Transformers #11029 (@LysandreJik)
Add example for registering callbacks with trainers #10928 (@amalad)
Replace pkg_resources with importlib_metadata #11061 (@konstin)
Add center_crop to ImageFeatureExtractionMixin #11066 (@sgugger)
Document common config attributes #11070 (@sgugger)
Fix distributed gather for tuples of tensors of varying sizes #11071 (@sgugger)
Make a base init in FeatureExtractionMixin #11074 (@sgugger)
Add Readme for language modeling scripts with custom training loop and accelerate #11073 (@hemildesai)
HF emoji unicode doesn't work in console #11081 (@stas00)
added social thumbnail for docs #11083 (@philschmid)
added new merged Trainer test #11090 (@philschmid)

transformers 4.5.0 v4.5.0: BigBird, GPT Neo, Examples, Flax support on Python PyPI