Trainer & TFTrainer (@julien-c)
Version 2.9 introduces a new Trainer
class for PyTorch, and its equivalent TFTrainer
for TF 2.
This let us reorganize the example scripts completely for a cleaner codebase.
The main features of the Trainer are:
- Same user-facing API for PyTorch and TF 2
- Support for CPU, GPU, Multi-GPU, and TPU
- Easier than ever to share your fine-tuned models
The TFTrainer was largely contributed by awesome community member @jplu! 🔥 🔥
A few additional features of the example scripts are:
- Generate argparsers from type hints on dataclasses
- Can load arguments from json files
- Logging through TensorBoard and wandb
Documentation for the Trainer is still work-in-progress, please consider contributing improvements.
TPU Support
- Both the TensorFlow and PyTorch trainers have TPU support (@jplu, @LysandreJik, @julien-c). An additional utility is added so that the TPU scripts may be launched in a similar manner to
torch.distributed
. - This was built with the support of @jysohn23, member of the Google TPU team
Multilingual BART (@sshleifer)
New BART checkpoint converted: this adds mbart-en-ro model
, a BART variant finetuned on english-romanian translation.
Improved support for huggingface/tokenizers
- Additional tests and support has been added to
huggingface/tokenizers
tokenizers. (@mfuntowicz, @thomwolf) - TensorFlow models work out-of-the-box with the new tokenizers (@LysandreJik)
Decoder caching for T5 (@patrickvonplaten)
Auto-regressive decoding for T5 has been greatly sped up by storing past key/value states. Work done on both PyTorch and TensorFlow.
Breaking change
This introduces a breaking change, in that it increases the default output length of T5Model and T5ForConditionalGeneration from 4 to 5 (including the past_key_value_states).
Encoder-Decoder enhancements
- Apply Encoder Decoder 1.5GB memory savings to TF as well (@patrickvonplaten, translation of same work on PyTorch models by @sshleifer)
- BART Summarization fine-tuning script now works for T5 as well (@sshleifer)
- Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (@patrickvonplaten)
Additional model architectures
Question Answering support for Albert and Roberta in TF with (@Pierrci):
- Question Answering support for Albert and Roberta in TF
- TFAlbertForQuestionAnswering
Pipelines
- The question answering pipeline now handles impossible answers (@bryant1410)
- Remove tqdm logging (@mfuntowicz)
- Sentiment analysis pipeline can now handle more than two sequences (@xxbidiao)
- Rewritten batch support in pipelines (@mfuntowicz)
Text Generation pipeline (@enzoampil)
Implements a text generation pipeline, GenerationPipeline
, which works on any ModelWithLMHead
head.
Fixes and improvements
- Clean the generate testing functions (@patrickvonplaten)
- Notebooks updated in the documentation (@LysandreJik)
- Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (@ethanjperez)
- Fixed RoBERTa conversion script (@myleott)
- Speedup torch summarization tests (@sshleifer)
- Optimize causal mask using torch.where (@Akababa)
- Improved benchmarking utils (@patrickvonplaten)
- Fixed edge case for bert tokenization (@patrickvonplaten)
- SummarizationDataset cleanup (@sshleifer)
- BART: Replace config.output_past with use_cache kwarg (@sshleifer)
- Better documentation for Summarization and Translation pipeline (@julien-c)
- Additional documentation for model cards (@julien-c)
- Fix force_download of files on Windows (@calpt)
- Fix shuffling issue for distributed training (@elk-cloner)
- Shift labels internally within TransfoXLLMHeadModel when called with labels (@TevenLeScao)
- Remove
output_past
everywhere and replace byuse_cache
argument (@patrickvonplaten) - Added unit test for run_bart_sum (@sshleifer)
- Cleaner code by factorizating a few methods back in the
PreTrainedModel
(@sshleifer) - [Bert] remove hard-coded pad token id (@patrickvonplaten)
- Clean pipelines test and remove unnecessary code (@patrickvonplaten)
- JITting is not compatible with PyTorch/XLA or any other frameworks that requires serialization. The JITted methods were removed (@LysandreJik)
- Change newstest2013 to newstest2014 and clean up (@patrickvonplaten)
- Factor out tensor conversion method in
PretrainedTokenizer
(@sshleifer) - Remove tanh torch warnings (@aryanshomray)
- Fix token_type_id in BERT question-answering example (@siboehm)
- Add CircleCI workflow to build docs for preview (@harupy)
- Higher tolerance for past testing in T5 and TF T5 (@patrickvonplaten)
- XLM tokenizer should encode with bos token (@LysandreJik)
- XLM tokenizer should encode with bos token (@patrickvonplaten)
- fix summarization do_predict (@sshleifer)
- Encode to max length of input not max length of tokenizer for batch input (@patrickvonplaten)
- Add
qas_id
to SquadResult and SquadExample (@jarednielsen) - Fix bug in run_*.py scripts: double wrap into DataParallel during eval (@and-kul)
- Fix torchhub integration (@julien-c)
- Fix TFAlbertForSequenceClassification classifier dropout probability (@jarednielsen)
- Change uses of pow(x, 3) to pow(x, 3.0) (@mneilly-et)
- Shuffle train subset for summarization example (@Colanim)
- Removed the boto3 dependency (@julien-c)
- Add dialogpt training tips (@patrickvonplaten)
- Generation can now start with an empty prompt (@patrickvonplaten)
- GPT-2 is now traceable (@jazzcook15)
- Add known 3rd party to setup.cfg; removes local/circle ci isort discrepancy. (@sshleifer)
- Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (@thomwolf)
- Now using CDN urls for weights (@julien-c)
- [Fix common tests on GPU] send model, ids to torch_device (@sshleifer)
- Fix TF input docstrings to refer to tf.Tensor rather than torch.Float (@jarednielsen)
- Additional metadata to traing arguments (@parmarsuraj99)
- [ci] Load pretrained models into the default (long-lived) cache (@julien-c)
- add timeout_decorator to tests (@sshleifer)
- Added XLM-R to the multilingual section in the documentation (@stefan-it)
- Better
num_labels
in configuration objects - Updated pytorch lightning scripts (@williamFalcon)
- Tests now pass with torch 1.5.0 (@LysandreJik)
- Ensure fast tokenizer can construct single-element tensor without pad token (@mfuntowicz)