v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.

TAPAS (@NielsRogge)

Four new models are released as part of the TAPAS implementation: TapasModel, TapasForQuestionAnswering, TapasForMaskedLM and TapasForSequenceClassification, in PyTorch.

TAPAS is a question answering model, used to answer queries given a table. It is a multi-modal model, joining text for the query and tabular data.

The TAPAS model was proposed in TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.

Tapas v4 (tres) #9117 (@NielsRogge)
AutoModelForTableQuestionAnswering #9154 (@LysandreJik)
TableQuestionAnsweringPipeline #9145 (@LysandreJik)

Six new models are released as part of the MPNet implementation: MPNetModel, MPNetForMaskedLM, MPNetForSequenceClassification, MPNetForMultipleChoice, MPNetForTokenClassification, MPNetForQuestionAnswering, in both PyTorch and TensorFlow.

MPNet introduces a novel self-supervised objective named masked and permuted language modeling for language understanding. It inherits the advantages of both the masked language modeling (MLM) and the permuted language modeling (PLM) to addresses the limitations of MLM/PLM, and further reduce the inconsistency between the pre-training and fine-tuning paradigms.

The MPNet model was proposed in MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.

MPNet: Masked and Permuted Pre-training for Language Understanding #8971 (@StillKeepTry)

Model parallel (@alexorona)

Model parallelism is introduced, allowing users to load very large models on two or more GPUs by spreading the model layers over them. This can allow GPU training even for very large models.

gpt2 and t5 parallel modeling #8696 (@alexorona)
Model parallel documentation #8741 (@LysandreJik)
Patch model parallel test #8825, #8920 (@LysandreJik)

Conda release (@LysandreJik)

Transformers welcome their first conda releases, with v4.0.0, v4.0.1 and v4.1.0. The conda packages are now officially maintained on the huggingface channel.

Put Transformers on Conda #8918 (@LysandreJik)

Multi-part uploads (@julien-c)

For the first time, very large models can be uploaded to the model hub, by using multi-part uploads.

transformers-cli: LFS multipart uploads (> 5GB) #8663 (@julien-c)

New examples and reorganization (@sgugger)

We introduced a refactored SQuAD example & notebook, which is faster and simpler than the previous scripts.

The example directory has been re-ordered as we introduce the separation between "examples", which are maintained examples showcasing how to do one specific task, and "research projects", which are bigger projects and maintained by the community.

New squad example #8992 (@sgugger)
Reorganize examples #9010 (@sgugger)

Introduction of fairscale with Sharded DDP (@sgugger)

We introduce support for fariscale's ShardedDDP in the Trainer, allowing reduced memory usage when training models in a distributed fashion.

Experimental support for fairscale ShardedDDP #9139 (@sgugger)
Fix gradient clipping for Sharded DDP #9168 (@sgugger)

Barthez (@moussaKam)

The BARThez model is a French variant of the BART model. We welcome its specific tokenizer to the library and multiple checkpoints to the modelhub.

Add barthez model #8393 (@moussaKam)

General improvements and bugfixes

disable_ngram_loss fix for prophetnet #8554 (@Zhylkaaa)
Fix run_ner script #8664 (@sgugger)
[tokenizers] convert_to_tensors: don't reconvert when the type is already right #8283 (@stas00)
[examples/seq2seq] fix PL deprecation warning #8577 (@stas00)
Add sentencepiece to the CI and fix tests #8672 (@sgugger)
Alternative to globals() #8667 (@sgugger)
Update the bibtex with EMNLP demo #8678 (@JetRunner)
Document adam betas TrainingArguments #8688 (@sgugger)
Fix rag finetuning + add finetuning test #8585 (@lhoestq)
moved temperature warper before topP/topK warpers #8686 (@theorm)
Vectorize RepetitionPenaltyLogitsProcessor to improve performance #8598 (@bdalal)
[Generate Test] fix flaky ci #8694 (@patrickvonplaten)
Fix bug in x-attentions output for roberta and harden test to catch it #8660 (@ysgit)
Add pip install update to resolve import error in transformers notebook #8616 (@jessicayung)
Improve bert-japanese tokenizer handling #8659 (@julien-c)
Change default cache path #8734 (@sgugger)
[trainer] make generate work with multigpu #8716 (@stas00)
consistent ignore keys + make private #8737 (@stas00)
Fix max length in run_plm script #8738 (@sgugger)
Add early stopping callback to pytorch trainer #8581 (@cbrochtrup)
Support various BERT relative position embeddings (2nd) #8276 (@zhiheng-huang)
Fix slow tests v2 #8746 (@LysandreJik)
MT5 should have an autotokenizer #8743 (@LysandreJik)
added instructions for syncing upstream master with forked master via PR #8745 (@bdalal)
fix rag index names in eval_rag.py example #8730 (@lhoestq)
[core] implement support for run-time dependency version checking #8645 (@stas00)
New TF model inputs #8602 (@jplu)
Big model table #8774 (@sgugger)
Attempt to get a better fix for QA #8768 (@Narsil)
Fix QA argument handler #8765 (@LysandreJik)
Return correct Bart hidden state tensors #8747 (@joeddav)
[XLNet] Fix mems behavior #8567 (@patrickvonplaten)
[s2s] finetune.py: specifying generation min_length #8478 (@danyaljj)
Revert "[s2s] finetune.py: specifying generation min_length" #8805 (@patrickvonplaten)
Fix PPLM #8779 (@chutaklee)
[s2s finetune trainer] potpurri of small fixes #8807 (@stas00)
[FlaxBert] Fix non-broadcastable attention mask for batched forward-passes #8791 (@KristianHolsheimer)
[Flax test] Add require pytorch to flix flax test #8816 (@patrickvonplaten)
Fix dpr<>bart config for RAG #8808 (@patrickvonplaten)
Extend typing to path-like objects in PretrainedConfig and PreTrainedModel #8770 (@gcompagnoni)
Fix setup.py on Windows #8798 (@jplu)
BART & FSMT: fix decoder not returning hidden states from the last layer #8597 (@maksym-del)
suggest a numerical limit of 50MB for determining @slow #8824 (@stas00)
[MT5] Add use_cache to config #8832 (@patrickvonplaten)
[Pegasus] Refactor Tokenizer #8731 (@patrickvonplaten)
[CI] implement job skipping for doc-only PRs #8826 (@stas00)
Migration guide from v3.x to v4.x #8763 (@LysandreJik)
Add T5 Encoder for Feature Extraction #8717 (@agemagician)
token-classification: use is_world_process_zero instead of is_world_master() #8828 (@stefan-it)
Correct docstring. #8845 (@Fraser-Greenlee)
Add a direct link to the big table #8850 (@sgugger)
Use model.from_pretrained for DataParallel also #8795 (@shaie)
Remove deprecated evalutate_during_training #8852 (@sgugger)
Attempt to fix Flax CI error(s) #8829 (@mfuntowicz)
NerPipeline (TokenClassification) now outputs offsets of words #8781 (@Narsil)
[s2s trainer] fix DP mode #8823 (@stas00)
Ctrl for sequence classification #8812 (@elk-cloner)
Fix docstring for language code in mBart #8848 (@RQuispeC)
2 typos in modeling_rag.py #8676 (@ratthachat)
Make the big table creation/check platform independent #8856 (@sgugger)
Prevent BatchEncoding from blindly passing casts down to the tensors it contains #8860 (@Craigacp)
Better warning when loading a tokenizer with AutoTokenizer w/o Sneten… #8881 (@LysandreJik)
[CI] skip docs-only jobs take #2 #8853 (@stas00)
Better support for resuming training #8878 (@sgugger)
Add a parallel_mode property to TrainingArguments #8877 (@sgugger)
[trainer] start using training_args.parallel_mode #8882 (@stas00)
[ci] skip doc jobs take #3 #8885 (@stas00)
Transfoxl seq classification #8868 (@spatil6)
Warning about too long input for fast tokenizers too #8799 (@Narsil)
[trainer] improve code readability #8903 (@stas00)
[PyTorch] Refactor Resize Token Embeddings #8880 (@patrickvonplaten)
Don't warn that models aren't available if Flax is available. #8841 (@skye)
Avoid erasing the attention mask when double padding #8915 (@sgugger)
Fix move when the two cache folders exist #8917 (@sgugger)
Tweak wording + Add badge w/ number of models on the hub #8914 (@julien-c)
[s2s finetune_trainer] add instructions for distributed training #8884 (@stas00)
Better booleans handling in the TF models #8777 (@jplu)
Fix TF T5 only encoder model with booleans #8925 (@LysandreJik)
[ci] skip doc jobs - circleCI is not reliable - disable skip for now #8926 (@stas00)
[seq2seq] document the caveat of leaky native amp #8930 (@stas00)
Don't pass in token_type_ids to BART for GLUE #8929 (@ethanjperez)
Fix typo for modeling_bert import resulting in ImportError #8931 (@machelreid)
Fix QA pipeline on Windows #8947 (@sgugger)
Add TFGPT2ForSequenceClassification based on DialogRPT #8714 (@spatil6)
Remove sourcerer #8965 (@clmnt)
Use word_ids to get labels in run_ner #8962 (@sgugger)
Small fix to the run clm script #8973 (@sgugger)
Update quicktour docs to showcase the use of truncation #8975 (@navjotts)
Copyright #8970 (@sgugger)
Check table as independent script #8976 (@LysandreJik)
[training] SAVE_STATE_WARNING was removed in pytorch #8979 (@stas00)
Optional layers #8961 (@jplu)
Make ModelOutput pickle-able #8989 (@sgugger)
Fix interaction of return_token_type_ids and add_special_tokens #8854 (@LysandreJik)
Removed unused encoder_hidden_states and encoder_attention_mask #8972 (@guillaume-be)
Checking output format + check raises ValueError #8986 (@Narsil)
Templates overhaul 1 #8993 (@LysandreJik)
Diverse beam search 2 #9006 (@patrickvonplaten)
Fix link to stable version in the doc navbar #9007 (@sgugger)
Remove use of deprecated method in Trainer HP search #8996 (@sgugger)
Add the code_search_net datasets tag to CodeBERTa model cards #9005 (@SBrandeis)
fixes #8968 #9009 (@cronoik)
Flax Masked Language Modeling training example #8728 (@mfuntowicz)
[Bart] Refactor - fix issues, consistency with the library, naming #8900 (@patrickvonplaten)
[wip] [ci] doc-job-skip take #4 dry-run #8980 (@stas00)
Fix typo in modeling_tf_bart #9020 (@astariul-colanim)
Fix documention of book in LayoutLM #9017 (@sgugger)
MPNet copyright files #9015 (@sgugger)
Enforce all objects in the main init are documented #9014 (@sgugger)
Refactor FLAX tests #9034 (@sgugger)
Remove value error #8985 (@jplu)
Change nn.dropout to layer.Dropout in TFBart #9047 (@astariul-colanim)
update tatoeba workflow #9051 (@patil-suraj)
Fix PreTrainedTokenizer.pad when first inputs are empty #9018 (@sgugger)
Remove docs only check #9065 (@LysandreJik)
Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/movement-pruning/lxmert #9062 (@dependabot[bot])
Make ProphetNetModel really compatible with EncoderDecoder #9033 (@patrickvonplaten)
Fix min_null_pred in the run_qa script #9067 (@sgugger)
[model_cards] Migrate cards from this repo to model repos on huggingface.co #9013 (@julien-c)
Fix embeddings resizing in TF models #8657 (@jplu)
Patch *ForCausalLM model with TF resize_token_embeddings #9092 (@LysandreJik)
[RAG, Bart] Align RAG, Bart cache with T5 and other models of transformers #9098 (@patrickvonplaten)
Fix variable name in TrainingArguments docstring #9096 (@navjotts)
Fix a broken link in documentation #9101 (@SBrandeis)
[CI doc] safely testing experimental CI features #9070 (@stas00)
Add parallelization support for T5EncoderModel #9082 (@agemagician)
Fix T5 and BART for TF #9063 (@jplu)
[finetune_trainer] enhancements and fixes #9042 (@stas00)
Fix a bug in eval_batch_retrieval of eval_rag.py #9089 (@yoshitomo-matsubara)
Clarify use of TrainingArguments.disable_tqdm in Jupyter Notebooks #9076 (@lewtun)
[doc] pytorch native amp leak fix landed in 1.7.1 #9115 (@stas00)
Fix stack overflow #9114 (@LysandreJik)
Fix T5 model parallel test #9107 (@LysandreJik)
Fix tf2.4 #9120 (@jplu)
Added TF OpenAi GPT1 Sequence Classification #9105 (@spatil6)
[TF Bart] Refactor TFBart #9029 (@patrickvonplaten)
Fix typo in trainer_tf.py #9132 (@luckynozomi)
[Bart] fix bart loss masking #9131 (@patrickvonplaten)
[Bart] Correct wrong order in shift token to right in Bart #9134 (@patrickvonplaten)
Fix Bart Shift #9135 (@patrickvonplaten)
Fix TF Transfo XL #9129 (@jplu)
[Examples] Add automatic dataset splitting in language-modeling examples #9133 (@TevenLeScao)
Fix T5 Encoder model parallel tests #9140 (@LysandreJik)
Add possibility to switch between APEX and AMP in Trainer #9137 (@sgugger)
[Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init #9054 (@patrickvonplaten)
DistilBertForSequenceClassification #9148 (@AndreaSottana)
Support for private models from huggingface.co #9141 (@julien-c)
Update notebook table and transformers intro notebook #9136 (@sgugger)
Add message to documentation that longformer doesn't support token_type_ids #9152 (@HHousen)

transformers 4.1.1
v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.

on Python PyPI

v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.

TAPAS (@NielsRogge)

MPNet (@StillKeepTry)

Model parallel (@alexorona)

Conda release (@LysandreJik)

Multi-part uploads (@julien-c)

New examples and reorganization (@sgugger)

Introduction of fairscale with Sharded DDP (@sgugger)

Barthez (@moussaKam)

General improvements and bugfixes

transformers 4.1.1 v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads. on Python PyPI

v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.

TAPAS (@NielsRogge)

MPNet (@StillKeepTry)

Model parallel (@alexorona)

Conda release (@LysandreJik)

Multi-part uploads (@julien-c)

New examples and reorganization (@sgugger)

Introduction of fairscale with Sharded DDP (@sgugger)

Barthez (@moussaKam)

General improvements and bugfixes

transformers 4.1.1
v4.1.1: TAPAS, MPNet, model parallelization, Sharded DDP, conda, multi-part downloads.

on Python PyPI