Trainer & TFTrainer (@julien-c)

Version 2.9 introduces a new Trainer class for PyTorch, and its equivalent TFTrainer for TF 2.

This let us reorganize the example scripts completely for a cleaner codebase.

The main features of the Trainer are:

Same user-facing API for PyTorch and TF 2
Support for CPU, GPU, Multi-GPU, and TPU
Easier than ever to share your fine-tuned models

The TFTrainer was largely contributed by awesome community member @jplu! 🔥 🔥

A few additional features of the example scripts are:

Generate argparsers from type hints on dataclasses
Can load arguments from json files
Logging through TensorBoard and wandb

Documentation for the Trainer is still work-in-progress, please consider contributing improvements.

TPU Support

Both the TensorFlow and PyTorch trainers have TPU support (@jplu, @LysandreJik, @julien-c). An additional utility is added so that the TPU scripts may be launched in a similar manner to torch.distributed.
This was built with the support of @jysohn23, member of the Google TPU team

Multilingual BART (@sshleifer)

New BART checkpoint converted: this adds mbart-en-ro model, a BART variant finetuned on english-romanian translation.

Improved support for `huggingface/tokenizers`

Additional tests and support has been added to huggingface/tokenizers tokenizers. (@mfuntowicz, @thomwolf)
TensorFlow models work out-of-the-box with the new tokenizers (@LysandreJik)

Decoder caching for T5 (@patrickvonplaten)

Auto-regressive decoding for T5 has been greatly sped up by storing past key/value states. Work done on both PyTorch and TensorFlow.

Breaking change

This introduces a breaking change, in that it increases the default output length of T5Model and T5ForConditionalGeneration from 4 to 5 (including the past_key_value_states).

Encoder-Decoder enhancements

Apply Encoder Decoder 1.5GB memory savings to TF as well (@patrickvonplaten, translation of same work on PyTorch models by @sshleifer)
BART Summarization fine-tuning script now works for T5 as well (@sshleifer)
Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (@patrickvonplaten)

Additional model architectures

Question Answering support for Albert and Roberta in TF with (@Pierrci):

Question Answering support for Albert and Roberta in TF
TFAlbertForQuestionAnswering

Pipelines

The question answering pipeline now handles impossible answers (@bryant1410)
Remove tqdm logging (@mfuntowicz)
Sentiment analysis pipeline can now handle more than two sequences (@xxbidiao)
Rewritten batch support in pipelines (@mfuntowicz)

Text Generation pipeline (@enzoampil)

Implements a text generation pipeline, GenerationPipeline, which works on any ModelWithLMHead head.

Fixes and improvements

Clean the generate testing functions (@patrickvonplaten)
Notebooks updated in the documentation (@LysandreJik)
Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (@ethanjperez)
Fixed RoBERTa conversion script (@myleott)
Speedup torch summarization tests (@sshleifer)
Optimize causal mask using torch.where (@Akababa)
Improved benchmarking utils (@patrickvonplaten)
Fixed edge case for bert tokenization (@patrickvonplaten)
SummarizationDataset cleanup (@sshleifer)
BART: Replace config.output_past with use_cache kwarg (@sshleifer)
Better documentation for Summarization and Translation pipeline (@julien-c)
Additional documentation for model cards (@julien-c)
Fix force_download of files on Windows (@calpt)
Fix shuffling issue for distributed training (@elk-cloner)
Shift labels internally within TransfoXLLMHeadModel when called with labels (@TevenLeScao)
Remove output_past everywhere and replace by use_cache argument (@patrickvonplaten)
Added unit test for run_bart_sum (@sshleifer)
Cleaner code by factorizating a few methods back in the PreTrainedModel (@sshleifer)
[Bert] remove hard-coded pad token id (@patrickvonplaten)
Clean pipelines test and remove unnecessary code (@patrickvonplaten)
JITting is not compatible with PyTorch/XLA or any other frameworks that requires serialization. The JITted methods were removed (@LysandreJik)
Change newstest2013 to newstest2014 and clean up (@patrickvonplaten)
Factor out tensor conversion method in PretrainedTokenizer (@sshleifer)
Remove tanh torch warnings (@aryanshomray)
Fix token_type_id in BERT question-answering example (@siboehm)
Add CircleCI workflow to build docs for preview (@harupy)
Higher tolerance for past testing in T5 and TF T5 (@patrickvonplaten)
XLM tokenizer should encode with bos token (@LysandreJik)
XLM tokenizer should encode with bos token (@patrickvonplaten)
fix summarization do_predict (@sshleifer)
Encode to max length of input not max length of tokenizer for batch input (@patrickvonplaten)
Add qas_id to SquadResult and SquadExample (@jarednielsen)
Fix bug in run_*.py scripts: double wrap into DataParallel during eval (@and-kul)
Fix torchhub integration (@julien-c)
Fix TFAlbertForSequenceClassification classifier dropout probability (@jarednielsen)
Change uses of pow(x, 3) to pow(x, 3.0) (@mneilly-et)
Shuffle train subset for summarization example (@Colanim)
Removed the boto3 dependency (@julien-c)
Add dialogpt training tips (@patrickvonplaten)
Generation can now start with an empty prompt (@patrickvonplaten)
GPT-2 is now traceable (@jazzcook15)
Add known 3rd party to setup.cfg; removes local/circle ci isort discrepancy. (@sshleifer)
Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (@thomwolf)
Now using CDN urls for weights (@julien-c)
[Fix common tests on GPU] send model, ids to torch_device (@sshleifer)
Fix TF input docstrings to refer to tf.Tensor rather than torch.Float (@jarednielsen)
Additional metadata to traing arguments (@parmarsuraj99)
[ci] Load pretrained models into the default (long-lived) cache (@julien-c)
add timeout_decorator to tests (@sshleifer)
Added XLM-R to the multilingual section in the documentation (@stefan-it)
Better num_labels in configuration objects
Updated pytorch lightning scripts (@williamFalcon)
Tests now pass with torch 1.5.0 (@LysandreJik)
Ensure fast tokenizer can construct single-element tensor without pad token (@mfuntowicz)

transformers 2.9.0 Trainer, TFTrainer, Multilingual BART, Encoder-decoder improvements, Generation Pipeline on Python PyPI