Pegasus, mBART, DPR, self-documented outputs and new pipelines

Pegasus

The Pegasus model from PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization by Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu, was added to the library in PyTorch.

Model implemented as a collaboration between Jingqing Zhang and @sshleifer in #6340

PegasusForConditionalGeneration (torch version) #6340
add pegasus finetuning script #6811 script. (warning very slow)

DPR

The DPR model from Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih was added to the library in PyTorch.

Add DPR model #5279 (@lhoestq)
Fix tests imports dpr #5576 (@lhoestq)

DeeBERT

The DeeBERT model from DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference by Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, Jimmy Lin has been added to the examples/ folder alongside its training script, in PyTorch.

Add DeeBERT (entropy-based early exiting for *BERT) #5477 (@ji-xin)

Self-documented outputs

As well as returning tuples, PyTorch and TensorFlow models now return a subclass of ModelOutput that is appropriate. A ModelOutput is a dataclass containing all model returns. This allows for easier inspection, and for self-documenting model outputs.

Change model outputs types to self-document outputs #5438 (@sgugger)
Tf model outputs #6247 (@sgugger)

Models return tuples by default, and return self-documented outputs if the return_dict configuration flag is set to True or if the return_dict=True keyword argument is passed to the forward/call method.

Summary of the behavior:

# The new outputs are opt-in, you have to activate them explicitly with `return_dict=True`
# Either at instantiation
model = BertForSequenceClassification.from_pretrained('bert-base-cased', return_dict=True)
# Or when calling the model
output = model(**inputs, return_dict=True)

# You can access the elements of the outputs with
# (1) named attributes
loss = outputs.loss
logits = outputs.logits

# (2) their names as strings like a dict
loss = outputs["loss"]
logits = outputs["logits"]

# (3) their index as integers or slices in the pre-3.1.0 outputs tuples
loss = outputs[0]
logits = outputs[1]
loss, logits = outputs[:2]

# One **breaking behavior** of these new outputs (which is the reason you have to opt-in to use these new outputs:
# Iterating on the outputs now return the names (keys) instead of the values:
print([element for element in outputs])
>>> ['loss', 'logits']
# Thus you cannot unpack the output like pre-3.1.0 (you get the string names instead of the values):
# (But you can query a slice like indicated in (3) above)
loss_keys, logits_key = outputs

Encoder-Decoder framework

The encoder-decoder framework has been enhanced to allow more encoder decoder model combinations, e.g.:
Bert2Bert, Bert2GPT2, Roberta2Roberta, Longformer2Roberta, ....

[EncoderDecoder] Add encoder-decoder for roberta/ vanilla longformer #6411 (@patrickvonplaten)
[EncoderDecoder] Add Cross Attention for GPT2 #6415 (@patrickvonplaten)
[EncoderDecoder] Add functionality to tie encoder decoder weights #6538 (@patrickvonplaten)
Multiple combinations of EncoderDecoder models have been fine-tuned and evaluated on CNN/Daily-Mail summarization: https://huggingface.co/models?search=cnn_dailymail-fp16 (@patrickvonplaten)

TensorFlow as a first-class citizen

As we continue working towards having TensorFlow be a first-class citizen, we continually improve on our TensorFlow API and models.

[Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile #5395 (@patrickvonplaten)
[Benchmark] Add benchmarks for TF Training #5594 (@patrickvonplaten)

Machine Translation

MarianMTModel

en-zh and 357 other checkpoints for machine translation were added from the Helsinki-NLP group's Tatoeba Project (@sshleifer + @jorgtied). There are now > 1300 supported pairs for machine translation.
Marian converter updates #6342 (@sshleifer)
Marian distill scripts + integration test #6799 (@sshleifer)

mBART

The mBART model from Multilingual Denoising Pre-training for Neural Machine Translation was can now be accessed through MBartForConditionalGeneration.

Add mbart-large-cc25, support translation finetuning #5129 (@sshleifer)
[mbart] prepare_translation_batch passes **kwargs to allow DeprecationWarning #5581 (@sshleifer)
MBartForConditionalGeneration #6441 (@patil-suraj)
[fix] mbart_en_ro_generate test now identical to fairseq #5731 (@sshleifer)
[Doc] explaining romanian postprocessing for MBART BLEU hacking #5943 (@sshleifer)
[test] partial coverage for train_mbart_enro_cc25.sh #5976 (@sshleifer)
MbartTokenizer: do not hardcode vocab size #5998 (@sshleifer)
MBART: support summarization tasks where max_src_len > max_tgt_len #6003 (@sshleifer)
Fix #6096: MBartTokenizer's mask token #6098 (@sshleifer)
[s2s] Document better mbart finetuning command #6229 (@sshleifer)
mBART Conversion script #6230 (@sshleifer)
[s2s] add BartTranslationDistiller for distilling mBART #6363 (@sshleifer)
[Doc] add more MBart and other doc #6490 (@patil-suraj)

examples/seq2seq

examples/seq2seq/finetune.py supports --task translation
All sequence to sequence tokenizers (T5, Bart, Marian, Pegasus) expose a prepare_seq2seq_batch method that makes batches for sequence to sequence trianing.

PRs:

Seq2SeqDataset uses linecache to save memory #5792 (@Pradhy729)
[examples/seq2seq]: add --label_smoothing option #5919 (@sshleifer)
seq2seq/run_eval.py can take decoder_start_token_id #5949 (@sshleifer)
[examples (seq2seq)] fix preparing decoder_input_ids for T5 #5994 (@patil-suraj)
[s2s] add support for overriding config params #6149 (@stas00)
s2s: fix LR logging, remove some dead code. #6205 (@sshleifer)
[s2s] tiny QOL improvement: run_eval prints scores #6341 (@sshleifer)
[s2s] fix label_smoothed_nll_loss #6344 (@patil-suraj)
[s2s] fix --gpus clarg collision #6358 (@sshleifer)
[s2s] Script to save wmt data to disk #6403 (@sshleifer)
rename prepare_translation_batch -> prepare_seq2seq_batch #6103 (@sshleifer)
Mult rouge by 100: standard units #6359 (@sshleifer)
allow spaces in bash args with "$@" #6521 (@sshleifer)
[seq2seq] MAX_LEN env var for MT commands #5837 (@sshleifer)
[seq2seq] distillation.py accepts trainer arguments #5865 (@sshleifer)
[s2s]Use prepare_translation_batch for Marian finetuning #6293 (@sshleifer)
[BartTokenizer] add prepare s2s batch #6212 (@patil-suraj)
[T5Tokenizer] add prepare_seq2seq_batch method #6122 (@patil-suraj)
[s2s] round runtime in run_eval #6798 (@sshleifer)
[s2s README] Add more dataset download instructions #6737 (@sshleifer)
[s2s] round bleu, rouge to 4 digits #6704 (@sshleifer)
[s2s] command line args for faster val steps #6833

New documentation

Several new documentation pages have been added and older documentation has been tweaked to be more accurate and understandable. An open in colab button has been added on the tutorial pages.

Guide to fixed-length model perplexity evaluation #5449 (@joeddav)
Improvements to PretrainedConfig documentation #5642 (@sgugger)
Document model outputs #5673 (@sgugger)
docs(wandb): explain how to use W&B integration #5607 (@borisdayma)
Model utils doc #6005 (@sgugger)
ONNX documentation #5992 (@mfuntowicz)
Tokenizer documentation #6110 (@sgugger)
Pipeline documentation #6175 (@sgugger)
Encoder decoder config docs #6195 (@afcruzs)
Colab button #6389 (@sgugger)
Generation documentation #6470 (@sgugger)
Add custom datasets tutorial #6466 (@joeddav)
Logging documentation #6852 (@sgugger)

Trainer updates

New additions to the Trainer

Added data collator for permutation (XLNet) language modeling and related calls #5522 (@shngt)
Trainer support for iterabledataset #5834 (@Pradhy729)
Adding PaddingDataCollator #6442 (@sgugger)
Add hyperparameter search to Trainer #6576 (@sgugger)
[examples] Add trainer support for question-answering #4829 (@patil-suraj)
Adds comet_ml to the list of auto-experiment loggers #6176 (@dsblank)
Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task #6644 (@HuangLianzhe)

New models & model architectures

The following model architectures have been added to the library

FlaubertForTokenClassification #5644 (@stas00)
TFXLMForTokenClassification #5614 (@LysandreJik)
TFXLMForMultipleChoice #5614 (@LysandreJik)
TFFlaubertForTokenClassification #5614 (@LysandreJik)
TFFlaubertForMultipleChoice #5614 (@LysandreJik)
TFElectraForSequenceClassification #6227 (@jplu)
TFElectraForMultipleChoice #6227 (@jplu)
TF Longformer #5764 (@patrickvonplaten)
CamembertForCausalLM #6577 (@patil-suraj)

Regression testing on TPU & TPU CI

Thanks to @zcain117 we now have access to TPU CI for the PyTorch/xla framework. This enables regression testing on the TPU aspects of the Trainer, and offers very simple regression testing on model training performance.

Test XLA examples #5583
Add setup for TPU CI to run every hour. #6219 (@zcain117)
Add missing docker arg for TPU CI. #6393 (@zcain117)
Get GKE logs via kubectl logs instead of gcloud logging read. #6446 (@zcain117)

New pipelines

New pipelines have been added:

Zero shot classification pipeline #5760 (@joeddav)
Addition of a DialoguePipeline #5516 (@guillaume-be)
Add targets arg to fill-mask pipeline #6239 (@joeddav)

Community notebooks

Fine-tune Electra and interpret with Integrated Gradients #6321 (@elsanns)
Update ONNX notebook to include section on quantization. #6831 (@mfuntowicz)

Centralized logging

Logging is now centralized. The library offers methods to handle the verbosity level of all loggers contained in the library. [Link to logging doc here]:

Centralize logging #6434 (@LysandreJik)

Bug fixes and improvements

[Reformer] Adapt Reformer MaskedLM Attn mask #5560 (@patrickvonplaten)
Make T5 compatible with ONNX #5518 (@abelriboulot)
[Bart] enable test_torchscript, update test_tie_weights #5457 (@sshleifer)
[docs] fix model_doc links in model summary #5566 (@patil-suraj)
[Benchmark] Readme for benchmark #5363 (@patrickvonplaten)
Fix Inconsistent NER Grouping (Pipeline) #4987 (@enzoampil)
QA pipeline BART compatible #5496 (@mfuntowicz)
More explicit error when failing to tensorize overflowing tokens #5633 (@LysandreJik)
Should check that torch TPU is available #5636 (@LysandreJik)
Add forum link in the docs #5637 (@sgugger)
Fixed TextGenerationPipeline on torch + GPU #5629 (@TevenLeScao)
Fixed use of memories in XLNet (caching for language generation + warning when loading improper memoryless model) #5632 (@TevenLeScao)
[squad] add version tag to squad cache #5669 (@lazovich)
Deprecate old past arguments #5671 (@sgugger)
Pipeline model type check #5679 (@JetRunner)
rename the functions to match the rest of the test convention #5692 (@stas00)
doc improvements #5688 (@stas00)
Fix Trainer in DataParallel setting #5685 (@sgugger)
[Longformer] fix longformer global attention output #5659 (@patrickvonplaten)
[Fix] github actions CI by reverting #5138 #5686 (@sshleifer)
[Reformer classification head] Implement the reformer model classification head for text classification #5198 (@as-stevens)
Cleanup bart caching logic #5640 (@sshleifer)
[AutoModels] Fix config params handling of all PT and TF AutoModels #5665 (@patrickvonplaten)
[cleanup] T5 test, warnings #5761 (@sshleifer)
[fix] T5 ONNX test: model.to(torch_device) #5769 (@mfuntowicz)
[Benchmark] fix benchmark non standard model #5801 (@patrickvonplaten)
[Benchmark] Fix models without architectures param in config #5808 (@patrickvonplaten)
[Longformer] fix longformer slow-down #5811 (@patrickvonplaten)
[seq2seq] pack_dataset.py rewrites dataset in max_tokens format #5819 (@sshleifer)
[seq2seq] Don't copy self.source in sortishsampler #5818 (@sshleifer)
[cleanups] make Marian save as Marian #5830 (@sshleifer)
[Reformer] - Cache hidden states and buckets to speed up inference #5578 (@patrickvonplaten)
Lightning Updates for v0.8.5 #5798 (@nateraw)
Update tokenizers to 0.8.1.rc to fix Mac OS X issues #5867 (@sepal)
Xlnet outputs #5883 (@TevenLeScao)
DataParallel fixes #5733 (@stas00)
[cleanup] squad processor #5868 (@sshleifer)
Improve doc of use_cache #5912 (@sgugger)
[Fix] seq2seq pack_dataset.py actually packs #5913 (@sshleifer)
Add AlbertForPretraining to doc #5914 (@sgugger)
DataParallel fix: multi gpu evaluation #5926 (@csarron)
Clarify arg class #5916 (@sgugger)
[CI] self-scheduled runner tests examples/ #5927 (@sshleifer)
Update doc to new model outputs #5946 (@sgugger)
[CI] Install examples/requirements.txt #5956 (@sshleifer)
Expose padding_strategy on squad processor to fix QA pipeline performance regression #5932 (@mfuntowicz)
[docs] Add integration test example to copy pasta template #5961 (@sshleifer)
Cleanup Trainer and expose customization points #5982 (@sgugger)
Avoid unnecessary warnings when loading pretrained model #5922 (@sgugger)
Ensure OpenAI GPT position_ids is correctly initialized and registered at init. #5773 (@mfuntowicz)
[CI] Don't test apex #6021 (@sshleifer)
add a summary report flag for run_examples on CI #6035 (@stas00)
don't complain about missing W&B when WANDB_DISABLED=true #6036 (@stas00)
Allow to set Adam beta1, beta2 in TrainingArgs #5592 (@gonglinyuan)
Fix the return documentation rendering for all model outputs #6022 (@sgugger)
Fix typo (model saving TF) #5734 (@Colanim)
Add new AutoModel classes in pipeline #6062 (@patil-suraj)
[pack_dataset] don't sort before packing, only pack train #5954 (@sshleifer)
CL util to convert models to fp16 before upload #5953 (@sshleifer)
Add fire to setup.cfg to make isort happy #6066 (@sgugger)
[fix] no warning for position_ids buffer #6063 (@sshleifer)
Pipelines should use tuples instead of namedtuples #6061 (@LysandreJik)
Moving transformers package import statements to relative imports in some files #5796 (@afcruzs)
github issue template suggests who to tag #5790 (@sshleifer)
Make all data collators accept dict #6065 (@sgugger)
Add inference widget examples #5825 (@clmnt)
[s2s] Delete useless method, log tokens_per_batch #6081 (@sshleifer)
Logs should not be hidden behind a logger.info #6097 (@LysandreJik)
Fix zero-shot pipeline single seq output shape #6104 (@joeddav)
[fix] add bart to LM_MAPPING #6099 (@sshleifer)
[Fix] position_ids tests again #6100 (@sshleifer)
Fix deebert tests #6102 (@sshleifer)
Use FutureWarning to deprecate #6111 (@sgugger)
Added capability to quantize a model while exporting through ONNX. #6089 (@mfuntowicz)
XLNet PLM Readme #6121 (@LysandreJik)
Fix TF CTRL model naming #6134 (@jplu)
Use google style to document properties #6130 (@sgugger)
Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} #5614
Rework TF trainer #6038 (@jplu)
Actually the extra_id are from 0-99 and not from 1-100 #5967 (@orena1)
add another e.g. to avoid confusion #6055 (@orena1)
Tf trainer cleanup #6143 (@sgugger)
Switch from return_tuple to return_dict #6138 (@sgugger)
Fix FlauBERT GPU test #6142 (@LysandreJik)
Enable ONNX/ONNXRuntime optimizations through converter script #6131 (@mfuntowicz)
Add Pytorch Native AMP support in Trainer #6151 (@prajjwal1)
enable easy checkout switch #5645 (@stas00)
Replace mecab-python3 with fugashi for Japanese tokenization #6086 (@polm)
parse arguments from dict #4869 (@patil-suraj)
Harmonize both Trainers API #6157 (@sgugger)
Model output test #6155 (@sgugger)
[s2s] clean up + doc #6184 (@stas00)
Add script to convert BERT tf2.x checkpoint to PyTorch #5791 (@mar-muel)
Empty assert hunt #6056 (@TevenLeScao)
Fix saved model creation #5468 (@jplu)
Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. #5331 (@jaymody)
[DataCollatorForLanguageModeling] fix labels #6213 (@patil-suraj)
Fix _shift_right function in TFT5PreTrainedModel #6214 (@maurice-g)
Remove outdated BERT tips #6217 (@JetRunner)
run_hans label fix #6221 (@VictorSanh)
Make the order of additional special tokens deterministic #5704 (@gonglinyuan)
cleanup torch unittests #6196 (@stas00)
test_tokenization_common.py: Remove redundant coverage #6224 (@sshleifer)
[Reformer] fix reformer fp16 test #6237 (@patrickvonplaten)
[Reformer] Make random seed generator available on random seed and not on model device #6244 (@patrickvonplaten)
Update to match renamed attributes in fairseq master #5972 (@LilianBordeau)
[WIP] lightning_base: support --lr_scheduler with multiple possibilities #6232 (@stas00)
Trainer + wandb quality of life logging tweaks #6241 (@TevenLeScao)
Add strip_accents to basic BertTokenizer. #6280 (@PhilipMay)
Argument to set GPT2 inner dimension #6296 (@TevenLeScao)
[Reformer] fix default generators for pytorch < 1.6 #6300 (@patrickvonplaten)
Remove redundant line in run_pl_glue.py #6305 (@xujiaze13)
[Fix] text-classification PL example #6027 (@bhashithe)
fix the shuffle agrument usage and the default #6307 (@stas00)
CI dependency wheel caching #6287 (@LysandreJik)
Patch GPU failures #6281 (@LysandreJik)
fix consistency CrossEntropyLoss in modeling_bart #6265 (@idoh)
Add a script to check all models are tested and documented #6298 (@sgugger)
Fix the tests for Electra #6284 (@jplu)
[examples] consistently use --gpus, instead of --n_gpu #6315 (@stas00)
refactor almost identical tests #6339 (@stas00)
Small docfile fixes #6328 (@sgugger)
Patch models #6326 (@LysandreJik)
Ci GitHub caching #6382 (@LysandreJik)
Fix links for open in colab #6391 (@sgugger)
[EncoderDecoderModel] add a add_cross_attention boolean to config #6377 (@patrickvonplaten)
Feed forward chunking #6024 (@Pradhy729)
add pl_glue example test #6034 (@stas00)
testing utils: capturing std streams context manager #6231 (@stas00)
Fix tokenizer saving and loading error #6026 (@yobekiko)
Warn if debug requested without TPU #6390 (@dmlap)
[Performance improvement] "Bad tokens ids" optimization #6064 (@guillaume-be)
pl version: examples/requirements.txt is single source of truth #6309 (@stas00)
[s2s] wmt download script use less ram #6405 (@stas00)
[pl] restore lr logging behavior for glue, ner examples #6314 (@stas00)
lr_schedulers: add get_polynomial_decay_schedule_with_warmup #6361 (@stas00)
[examples] add pytest dependency #6425 (@sshleifer)
[test] replace capsys with the more refined CaptureStderr/CaptureStdout #6422 (@stas00)
Fixes to make life easier with the nlp library #6423 (@sgugger)
Move prediction_loss_only to TrainingArguments #6426 (@sgugger)
Activate check on the CI #6427 (@sgugger)
cleanup tf unittests: part 2 #6260 (@stas00)
Fix docs and bad word tokens generation_utils.py #6387 (@ZhuBaohe)
Test model outputs equivalence #6445 (@LysandreJik)
add LongformerTokenizerFast in AutoTokenizer #6463 (@patil-suraj)
add BartTokenizerFast in AutoTokenizer #6464 (@patil-suraj)
Add POS tagging and Phrase chunking token classification examples #6457 (@vblagoje)
Clean directory after script testing #6453 (@JetRunner)
Use hash to clean the test dirs #6475 (@JetRunner)
Sort unique_no_split_tokens to make it deterministic #6461 (@lhoestq)
Fix TPU Convergence bug #6488 (@jysohn23)
Support additional dictionaries for BERT Japanese tokenizers #6515 (@singletongue)
[doc] Summary of the models fixes #6511 (@stas00)
Remove deprecated assertEquals #6532 (@JetRunner)
[testing] a new TestCasePlus subclass + get_auto_remove_tmp_dir() #6494 (@stas00)
[sched] polynomial_decay_schedule use default power=1.0 #6473 (@stas00)
Fix flaky ONNX tests #6531 (@mfuntowicz)
[doc] make the text more readable, fix some typos, add some disambiguation #6508 (@stas00)
[doc] multiple corrections to "Summary of the tasks" #6509 (@stas00)
replace _ with __ rst links #6541 (@stas00)
Fixed label datatype for STS-B #6492 (@amodaresi)
fix incorrect codecov reports #6553 (@stas00)
[docs] Fix wrong newline in the middle of a paragraph #6573 (@romainr)
[docs] Fix number of 'ug' occurrences in tokenizer_summary #6574 (@romainr)
add BartConfig.force_bos_token_to_be_generated #6526 (@sshleifer)
Fix bart base test #6587 (@sshleifer)
Feed forward chunking others #6365 (@Pradhy729)
tf generation utils: remove unused kwargs #6591 (@sshleifer)
[BartTokenizerFast] add prepare_seq2seq_batch #6543 (@patil-suraj)
[doc] lighter 'make test' #6512 (@stas00)
[docs] Copy code button misses '...' prefixed code #6518 (@romainr)
removed redundant arg in prepare_inputs #6614 (@prajjwal1)
add intro to nlp lib & dataset links to custom datasets tutorial #6583 (@joeddav)
Add tests to Trainer #6605 (@sgugger)
TFTrainer dataset doc & fix evaluation bug #6618 (@joeddav)
Add tests/test_tokenization_reformer.py #6485 (@D-Roberts)
[Tests] fix attention masks in Tests #6621 (@patrickvonplaten)
XLNet Bug when training with apex 16-bit precision #6567 (@johndolgov)
Move threshold up for flaky test with Electra #6622 (@sgugger)
Regression test for pegasus bugfix #6606 (@sshleifer)
Trainer automatically drops unused columns in nlp datasets #6449 (@sgugger)
[Docs model summaries] Add pegasus to docs #6640 (@patrickvonplaten)
[Doc model summary] add MBart model summary #6649 (@patil-suraj)
Specify config filename in HfArgumentParser #6626 (@jarednielsen)
Don't reset the dataset type + plug for rm unused columns #6683 (@sgugger)
Fixed DataCollatorForLanguageModeling not accepting lists of lists #6685 (@TevenLeScao)
Update repo to isort v5 #6686 (@sgugger)
Fix PL token classification examples #6682 (@vblagoje)
Lat fix for Ray HP search #6691 (@sgugger)
Create PULL_REQUEST_TEMPLATE.md #6660 (@stas00)
[doc] remove BartForConditionalGeneration.generate #6659 (@stas00)
Move unused args to kwargs #6694 (@sgugger)
[fixdoc] Add import to pegasus usage doc #6698 (@sshleifer)
Fix hyperparameter_search doc #6695 (@sgugger)
Remove hard-coded uses of float32 to fix mixed precision use #6648 (@schmidek)
Add DPR to models summary #6690 (@lhoestq)
Add typing.overload for convert_ids_tokens #6637 (@tamuhey)
Allow tests in examples to use cuda or fp16,if they are available #5512 (@Joel-hanson)
ci/gh/self-scheduled: add newline to make examples tests run even if src/ tests fail #6706 (@sshleifer)
Use separate tqdm progressbars #6696 (@sgugger)
More tests to Trainer #6699 (@sgugger)
Add tokenizer to Trainer #6689 (@sgugger)
tensor.nonzero() is deprecated in PyTorch 1.6 #6715 (@mfuntowicz)
[Albert] Add position ids to allowed uninitialized weights #6719 (@patrickvonplaten)
Fix ONNX test_quantize unittest #6716 (@mfuntowicz)
[squad] make examples and dataset accessible from SquadDataset object #6710 (@lazovich)
Fix pegasus-xsum integration test #6726 (@sshleifer)
T5Tokenizer adds EOS token if not already added #5866 (@sshleifer)
Install nlp for github actions test #6728 (@sgugger)
[Torchscript] Fix docs #6740 (@patrickvonplaten)
Add "tie_word_embeddings" config param #6692 (@patrickvonplaten)
Fix tf boolean mask in graph mode #6741 (@JayYip)
Fix TF optimizer #6717 (@jplu)
[TF Longformer] Improve Speed for TF Longformer #6447 (@patrickvonplaten)
add init.py to utils #6754 (@joeddav)
[s2s] run_eval.py QOL improvements and cleanup #6746 (@sshleifer)
s2s distillation uses AutoModelForSeqToSeqLM #6761 (@sshleifer)
Add AdaFactor optimizer from fairseq #6722 (@moscow25)
Adds Adafactor to the docs and slightly fixes the formatting #6765 (@LysandreJik)
Fix the TF Trainer gradient accumulation and the TF NER example #6713 (@jplu)
Fix run_squad.py to work with BART #6756 (@tomgrek)
Add NLP install to self-scheduled CI #6767 (@sshleifer)
[testing] replace hardcoded paths to allow running tests from anywhere #6523 (@stas00)
[test schedulers] adjust to test the first step's reading #6429 (@stas00)
new Makefile target: docs #6510 (@stas00)
[transformers-cli] fix logger getter #6777 (@stas00)
PL: --adafactor option #6776 (@sshleifer)
[style] set the minimal required version for black #6784 (@stas00)
Transformer-XL: Improved tokenization with sacremoses #6322 (@RafaelWO)
prepare_seq2seq_batch makes labels/ decoder_input_ids made later. #6654 (@sshleifer)
t5 model should make decoder_attention_mask #6800 (@sshleifer)
[s2s] Test hub configs in self-scheduled CI #6809 (@sshleifer)
[bart] rename self-attention -> attention #6708 (@sshleifer)
[tests] fix typos in inputs #6818 (@stas00)
Fixed open in colab link #6825 (@PandaWhoCodes)
clarify shuffle #6312 (@xujiaze13)
TF Flaubert w/ pre-norm #6841 (@LysandreJik)
Fix resuming training for Windows #6847 (@sgugger)
Only access loss tensor every logging_steps #6802 (@jysohn23)
Add checkpointing to Ray Tune HPO #6747 (@krfricke)
Split hp search methods #6857 (@sgugger)
Fix marian slow test #6854 (@sshleifer)
Bart can make decoder_input_ids from labels #6758 (@sshleifer)
add a final report to all pytest jobs #6861 (@stas00)
Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. #6875 (@mfuntowicz)
[Generate] Facilitate PyTorch generate using ModelOutputs #6735 (@patrickvonplaten)

transformers 3.1.0 Pegasus, DPR, self-documented outputs, new pipelines and MT support on Python PyPI