ProphetNet, Blenderbot, SqueezeBERT, DeBERTa

ProphetNET

Two new models are released as part of the ProphetNet implementation: ProphetNet and XLM-ProphetNet.

ProphetNet is an encoder-decoder model and can predict n-future tokens for “ngram” language modeling instead of just the next token.

XLM-ProphetNet is an encoder-decoder model with an identical architecture to ProhpetNet, but the model was trained on the multi-lingual “wiki100” Wikipedia dump.

The ProphetNet model was proposed in ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.

It was added to the library in PyTorch with the following checkpoints:

microsoft/xprophetnet-large-wiki100-cased-xglue-ntg
microsoft/prophetnet-large-uncased
microsoft/prophetnet-large-uncased-cnndm
microsoft/xprophetnet-large-wiki100-cased
microsoft/xprophetnet-large-wiki100-cased-xglue-qg

Contributions:

ProphetNet #7157 (@qiweizhen, @patrickvonplaten)

BlenderBot

Blenderbot is an encoder-decoder model for open-domain chat. It uses a standard seq2seq model transformer-based architecture.

The Blender chatbot model was proposed in Recipes for building an open-domain chatbot Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.

It was added to the library in PyTorch with the following checkpoints:

facebook/blenderbot-90M
facebook/blenderbot-3B

Contributions:

Blenderbot #7418 (@sshleifer)

SqueezeBERT

The SqueezeBERT model was proposed in SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. It’s a bidirectional transformer similar to the BERT model. The key difference between the BERT architecture and the SqueezeBERT architecture is that SqueezeBERT uses grouped convolutions instead of fully-connected layers for the Q, K, V and FFN layers.

It was added to the library in PyTorch with the following checkpoints:

squeezebert/squeezebert-mnli
squeezebert/squeezebert-uncased
squeezebert/squeezebert-mnli-headless

Contributions:

SqueezeBERT architecture #7083 (@forresti)
Fix squeezebert docs #7587 (@LysandreJik)

DeBERTa

The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.

It was added to the library in PyTorch with the following checkpoints:

microsoft/deberta-base
microsoft/deberta-large

Contributions:

Add DeBERTa model #5929 (@BigBird01)
Fix DeBERTa integration tests #7729 (@LysandreJik)

Both SentencePiece and Tokenizers are now optional libraries

Support for SentencePiece is now part of the tokenizers library! Thanks to this we now have near-full support of fast tokenizers in the library.

With this new feature, we slightly change the paradigm regarding installation:

SentencePiece is now an optional dependency, paving the way to a fully-featured conda install in the near future
Tokenizers is now also an optional dependency, making it possible to install and use the library even when rust cannot be compiled on the machine.
[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies #7659 (@thomwolf)

The main __init__ has been improved to always import the same functions and classes. If someone then tries to use a class that requires an optional dependency, an ImportError will be raised at init (with instructions on how to install the missing dependency) #7537 (@sgugger)

Improvements made to the `Trainer`

The Trainer API has been improved to work with models requiring several labels or returning several outputs, and to have clearer progress tracking. A new TrainerCallback class has been added to allow the user to easily customize the default training loop.

Remove config assumption in Trainer #7464 (@sgugger)
Clean the Trainer state #7490 (@sgugger)
Small QOL improvements to TrainingArguments #7475 (@sgugger)
Allow nested tensors in predicted logits #7542 (@sgugger)
Trainer callbacks #7596 (@sgugger)
Add specific notebook ProgressCalback #7793 (@sgugger)
Small fixes to NotebookProgressCallback #7813 (@sgugger)
Add predict step accumulation #7767 (@sgugger)
Don't use store_xxx on optional bools #7786 (@sgugger)

Seq2Seq Trainer

A child of Trainer specialized for training seq2seq models, from @patil-suraj and @sshleifer. Accessible through examples/seq2seq/finetune_trainer.py.

example scripts at examples/seq2seq/builtin_trainer/
same functionality as examples/seq2seq/finetune.py, but better TPU support.
[examples/s2s] clean up finetune_trainer #7509 (@patil-suraj)
[s2s] trainer scripts: Remove --run_name, thanks sylvain! #7521 (@sshleifer)
[s2s] Adafactor support for builtin trainer #7522 (@sshleifer)
[s2s] add config params like Dropout in Seq2SeqTrainingArguments #7532 (@patil-suraj)
Distributed Trainer: 2 little fixes #7461 (@sshleifer)
[s2sTrainer] test + code cleanup #7467 (@sshleifer)
Seq2SeqDataset: avoid passing src_lang everywhere #7470 (@amanpreet692)
[s2strainer] fix eval dataset loading #7477 (@patil-suraj)
[pseudolabels] cleanup markdown table #7653 (@sshleifer)

Distributed Generation

You can run model.generate in pytorch on a large dataset and split the work across multiple GPUs, using examples/seq2seq/run_distributed_eval.py
[s2s] release pseudolabel links and instructions #7639 (@sshleifer)
[s2s] Fix t5 warning for distributed eval #7487 (@sshleifer)
[s2s] fix kwargs style #7488 (@sshleifer)
[s2s] fix lockfile and peg distillation constants #7545 (@sshleifer)
[s2s] fix nltk pytest race condition with FileLock #7515 (@sshleifer)

Notebooks

Train T5 in Tensoflow 2 Community Notebook #7428 (@HarrisDePerceptron)

General improvements and bugfixes

remove codecov PR comments #7400 (@sshleifer)
Get a better error when check_copies fails #7457 (@sgugger)
Multi-GPU Testing setup #7453 (@LysandreJik)
Fix LXMERT with DataParallel #7471 (@LysandreJik)
Number of GPUs for multi-gpu #7472 (@LysandreJik)
Make transformers install check positive #7473 (@FremyCompany)
Alphabetize model lists #7478 (@sgugger)
Bump isort version. #7484 (@sgugger)
Add forgotten return_dict argument in the docs #7483 (@sgugger)
Enable pegasus fp16 by clamping large activations #7243 (@sshleifer)
Update LayoutLM doc #7388 (@al31415)
Report Tune metrics in final evaluation #7507 (@krfricke)
Fix Ray Tune progress_reporter kwarg #7508 (@krfricke)
[Seq2Seq] Fix a couple of bugs and clean examples #7474 (@patrickvonplaten)
[Attention Mask] Fix data type #7513 (@patrickvonplaten)
Fix seq2seq example test #7518 (@sgugger)
Remove labels from the RagModel example #7560 (@sgugger)
added script for fine-tuning roberta for sentiment analysis task #7505 (@DhavalTaunk08)
LayoutLM: add exception handling for bbox values #7452 (@al31415)
Cleanup documentation for BART, Marian, MBART and Pegasus #7523 (@sgugger)
Add Electra unexpected keys #7569 (@LysandreJik)
Fix tokenization in SQuAD for RoBERTa, Longformer, BART #7387 (@tholor)
docs(pretrained_models): fix num parameters #7575 (@amineabdaoui)
Update Code example according to deprecation of AutoModeWithLMHead #7555 (@jshamg)
Allow soft dependencies in the namespace with ImportErrors at use #7537 (@sgugger)
Fix post_init of some TrainingArguments #7525 (@sgugger)
Check and update model list in index.rst automatically #7527 (@sgugger)
Expand test to locate flakiness #7580 (@sgugger)
Custom TF weights loading #7422 (@jplu)
Documentation fixes #7585 (@sgugger)
Documentation framework toggle should stick #7586 (@LysandreJik)
Support T5 Distillation w/hidden state supervision #7599 (@sshleifer)
[makefile] check only .py files #7588 (@stas00)
[TF generation] Fix typo #7582 (@SidJain1412)
change return dicitonary for DataCollatorForNextSentencePrediction from masked_lm_labels to labels #7595 (@gmihaila)
Docker GPU Images: Add NVIDIA/apex to the cuda images with pytorch #7598 (@AdrienDS)
typo fix #7611 (@agemagician)
[bart] fix config.classif_dropout #7593 (@sshleifer)
[s2s] save first batch to json for debugging purposes #6810 (@sshleifer)
Add GPT2ForSequenceClassification based on DialogRPT #7501 (@LysandreJik)
Fix wrong reference name/filename in docstring of SquadProcessor #7616 (@phiyodr)
Fix tokenizer UnboundLocalError when padding is set to PaddingStrategy.MAX_LENGTH #7610 (@GabrielePicco)
Add GPT2 to sequence classification auto model #7630 (@LysandreJik)
Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load #6935 (@w4nderlust)
Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer #7141 (@thomwolf)
Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) #7658 (@thomwolf)
Fix RobertaForCausalLM docs #7642 (@LysandreJik)
[s2s] configure lr_scheduler from command line #7641 (@patil-suraj)
[pseudo] Switch URLS to CDN #7661 (@sshleifer)
[s2s] Switch README urls to cdn #7670 (@sshleifer)
fix nn.DataParallel compatibility with PyTorch 1.5 #7671 (@guhur)
Update XLM-RoBERTa pretrained model details #7669 (@noahtren)
Fix dataset cardinality #7678 (@jplu)
[pegasus] Faster tokenizer tests #7672 (@stas00)
Delete extra test file in repo root #7681 (@sshleifer)
Better links for models in README and doc index #7680 (@sgugger)
Import integration libraries first #7650 (@dsblank)
Fix title level in Blenderbot doc #7687 (@sgugger)
Fix flaky test in test_trainer #7689 (@sgugger)
Adds license information for default and distilbert models #7688 (@ankane)
Fix docstring in AutoModel class #7694 (@al31415)
[examples] bump pl=0.9.0 #7053 (@sshleifer)
Corrected typo: maked → masked #7703 (@MiggyMigz)
fixed typo in warning line 207. #7718 (@Berowne)
Fix typo in all model docs #7714 (@sgugger)
Fix check for xla in PreTrainedModel.save_pretrained() #7699 (@fteufel)
Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural. #7696 (@AndreaSottana)
The input training data files (multiple files in glob format). #7717 (@kfkelvinng)
Fix trainer callback #7720 (@cccntu)
Fix tf text class #7724 (@jplu)
Fix #7731 #7732 (@LysandreJik)
Fix 3 failing slow bart/blender tests #7652 (@sshleifer)
Add license info to nlptown/bert-base-multilingual-uncased-sentiment #7738 (@alexcombessie)
[marian] Automate Tatoeba-Challenge conversion #7709 (@sshleifer)
ElectraTokenizerFast #7754 (@LysandreJik)
Gpt1 for sequence classification #7683 (@fmcurti)
[Rag] Fix loading of pretrained Rag Tokenizer #7756 (@patrickvonplaten)
Do not softmax when num_labels==1 #7726 (@LysandreJik)
Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 #7742 (@noamwies)
fixed lots of typos. #7758 (@NieTiger)
Adding optional trial argument to model_init #7759 (@madlag)
Faster pegasus tokenization test with reduced data size #7762 (@sshleifer)
Fix bert position ids in DPR convert script #7776 (@lhoestq)
Add batch inferencing support for GPT2LMHeadModel #7552 (@cccntu)
fix examples/rag imports, tests #7712 (@sshleifer)
Fix TF savedmodel in Roberta #7795 (@jplu)
Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. #7728 (@Narsil)
Upgrading in pipelines TFAutoModelWithLMHead to new Causal/Masked/Seq2Seq LM classes #7730 (@Narsil)
fix wandb/comet problems #7830 (@stas00)
[utils/check_copies.py] fix DeprecationWarning #7834 (@stas00)
[DOC] Typo and fix the input of labels to cross_entropy #7841 (@katarinaslama)
[seq2seq] get_git_info fails gracefully #7843 (@stas00)
[Pipelines] Fix links to model lists #7826 (@julien-c)
Herbert polish model #7798 (@rmroczkowski)
[cleanup] assign todos, faster bart-cnn test #7835 (@sshleifer)
Remove masked_lm_labels from returned dictionary in DataCollatorForNextSentencePrediction #7818 (@vblagoje)
[testing] fix/hide warnings #7837 (@stas00)
Small fixes to HP search #7839 (@sgugger)
[testing] disable FutureWarning in examples tests #7842 (@stas00)
Fix missing reference titles in retrieval evaluation of RAG #7817 (@lhoestq)
[seq2seq testing] improve readability #7845 (@stas00)
[s2s testing] turn all to unittests, use auto-delete temp dirs #7859 (@stas00)
Fix Rag example docstring #7872 (@patrickvonplaten)
Remove duplicated mish activation function #7856 (@Razcle)
[tests] fix slow bart cnn test, faster marian tests #7888 (@sshleifer)
Fix small type hinting error #7820 (@AndreaSottana)
Add support to provide initial tokens to decoder of encoder-decoder type models #7577 (@ayushtiku5)
style: fix typo #7883 (@rememberYou)
[testing] remove USE_CUDA #7861 (@stas00)
[CIs] report slow tests add --durations=0 to some pytest jobs #7884 (@stas00)
style: fix typo in the README #7882 (@rememberYou)
[RAG] Propagating of n_docs as parameter to all RagModel's related functions #7891 (@lalitpagaria)
Trainer with Iterable Dataset #7858 (@j-rossi-nl)
Allow Custom Dataset in RAG Retriever #7763 (@lhoestq)
Modelling Encoder-Decoder | Error :- decoder_config used before intialisation #7903 (@ayubSubhaniya)
[Docstring] fix t5 training docstring #7911 (@patrickvonplaten)
Raise error when using AMP on non-CUDA device #7869 (@BramVanroy)
[EncoderDecoder] Fix Typo #7915 (@patrickvonplaten)
[testing] rename skip targets + docs #7863 (@stas00)

transformers 3.4.0 ProphetNet, Blenderbot, SqueezeBERT, DeBERTa on Python PyPI

ProphetNet, Blenderbot, SqueezeBERT, DeBERTa

ProphetNET

BlenderBot

SqueezeBERT

DeBERTa

Both SentencePiece and Tokenizers are now optional libraries

Improvements made to the Trainer

Seq2Seq Trainer

Distributed Generation

Notebooks

General improvements and bugfixes

transformers 3.4.0
ProphetNet, Blenderbot, SqueezeBERT, DeBERTa

on Python PyPI

Improvements made to the `Trainer`