pypi transformers 2.9.0
Trainer, TFTrainer, Multilingual BART, Encoder-decoder improvements, Generation Pipeline

latest releases: 4.41.0, 4.40.2, 4.40.1...
4 years ago

Trainer & TFTrainer (@julien-c)

Version 2.9 introduces a new Trainer class for PyTorch, and its equivalent TFTrainer for TF 2.

This let us reorganize the example scripts completely for a cleaner codebase.

The main features of the Trainer are:

  • Same user-facing API for PyTorch and TF 2
  • Support for CPU, GPU, Multi-GPU, and TPU
  • Easier than ever to share your fine-tuned models

The TFTrainer was largely contributed by awesome community member @jplu! 🔥 🔥

A few additional features of the example scripts are:

  • Generate argparsers from type hints on dataclasses
  • Can load arguments from json files
  • Logging through TensorBoard and wandb

Documentation for the Trainer is still work-in-progress, please consider contributing improvements.

TPU Support

  • Both the TensorFlow and PyTorch trainers have TPU support (@jplu, @LysandreJik, @julien-c). An additional utility is added so that the TPU scripts may be launched in a similar manner to torch.distributed.
  • This was built with the support of @jysohn23, member of the Google TPU team

Multilingual BART (@sshleifer)

New BART checkpoint converted: this adds mbart-en-ro model, a BART variant finetuned on english-romanian translation.

Improved support for huggingface/tokenizers

  • Additional tests and support has been added to huggingface/tokenizers tokenizers. (@mfuntowicz, @thomwolf)
  • TensorFlow models work out-of-the-box with the new tokenizers (@LysandreJik)

Decoder caching for T5 (@patrickvonplaten)

Auto-regressive decoding for T5 has been greatly sped up by storing past key/value states. Work done on both PyTorch and TensorFlow.

Breaking change

This introduces a breaking change, in that it increases the default output length of T5Model and T5ForConditionalGeneration from 4 to 5 (including the past_key_value_states).

Encoder-Decoder enhancements

  • Apply Encoder Decoder 1.5GB memory savings to TF as well (@patrickvonplaten, translation of same work on PyTorch models by @sshleifer)
  • BART Summarization fine-tuning script now works for T5 as well (@sshleifer)
  • Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (@patrickvonplaten)

Additional model architectures

Question Answering support for Albert and Roberta in TF with (@Pierrci):

  • Question Answering support for Albert and Roberta in TF
  • TFAlbertForQuestionAnswering

Pipelines

  • The question answering pipeline now handles impossible answers (@bryant1410)
  • Remove tqdm logging (@mfuntowicz)
  • Sentiment analysis pipeline can now handle more than two sequences (@xxbidiao)
  • Rewritten batch support in pipelines (@mfuntowicz)

Text Generation pipeline (@enzoampil)

Implements a text generation pipeline, GenerationPipeline, which works on any ModelWithLMHead head.

Fixes and improvements

  • Clean the generate testing functions (@patrickvonplaten)
  • Notebooks updated in the documentation (@LysandreJik)
  • Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (@ethanjperez)
  • Fixed RoBERTa conversion script (@myleott)
  • Speedup torch summarization tests (@sshleifer)
  • Optimize causal mask using torch.where (@Akababa)
  • Improved benchmarking utils (@patrickvonplaten)
  • Fixed edge case for bert tokenization (@patrickvonplaten)
  • SummarizationDataset cleanup (@sshleifer)
  • BART: Replace config.output_past with use_cache kwarg (@sshleifer)
  • Better documentation for Summarization and Translation pipeline (@julien-c)
  • Additional documentation for model cards (@julien-c)
  • Fix force_download of files on Windows (@calpt)
  • Fix shuffling issue for distributed training (@elk-cloner)
  • Shift labels internally within TransfoXLLMHeadModel when called with labels (@TevenLeScao)
  • Remove output_past everywhere and replace by use_cache argument (@patrickvonplaten)
  • Added unit test for run_bart_sum (@sshleifer)
  • Cleaner code by factorizating a few methods back in the PreTrainedModel (@sshleifer)
  • [Bert] remove hard-coded pad token id (@patrickvonplaten)
  • Clean pipelines test and remove unnecessary code (@patrickvonplaten)
  • JITting is not compatible with PyTorch/XLA or any other frameworks that requires serialization. The JITted methods were removed (@LysandreJik)
  • Change newstest2013 to newstest2014 and clean up (@patrickvonplaten)
  • Factor out tensor conversion method in PretrainedTokenizer (@sshleifer)
  • Remove tanh torch warnings (@aryanshomray)
  • Fix token_type_id in BERT question-answering example (@siboehm)
  • Add CircleCI workflow to build docs for preview (@harupy)
  • Higher tolerance for past testing in T5 and TF T5 (@patrickvonplaten)
  • XLM tokenizer should encode with bos token (@LysandreJik)
  • XLM tokenizer should encode with bos token (@patrickvonplaten)
  • fix summarization do_predict (@sshleifer)
  • Encode to max length of input not max length of tokenizer for batch input (@patrickvonplaten)
  • Add qas_id to SquadResult and SquadExample (@jarednielsen)
  • Fix bug in run_*.py scripts: double wrap into DataParallel during eval (@and-kul)
  • Fix torchhub integration (@julien-c)
  • Fix TFAlbertForSequenceClassification classifier dropout probability (@jarednielsen)
  • Change uses of pow(x, 3) to pow(x, 3.0) (@mneilly-et)
  • Shuffle train subset for summarization example (@Colanim)
  • Removed the boto3 dependency (@julien-c)
  • Add dialogpt training tips (@patrickvonplaten)
  • Generation can now start with an empty prompt (@patrickvonplaten)
  • GPT-2 is now traceable (@jazzcook15)
  • Add known 3rd party to setup.cfg; removes local/circle ci isort discrepancy. (@sshleifer)
  • Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (@thomwolf)
  • Now using CDN urls for weights (@julien-c)
  • [Fix common tests on GPU] send model, ids to torch_device (@sshleifer)
  • Fix TF input docstrings to refer to tf.Tensor rather than torch.Float (@jarednielsen)
  • Additional metadata to traing arguments (@parmarsuraj99)
  • [ci] Load pretrained models into the default (long-lived) cache (@julien-c)
  • add timeout_decorator to tests (@sshleifer)
  • Added XLM-R to the multilingual section in the documentation (@stefan-it)
  • Better num_labels in configuration objects
  • Updated pytorch lightning scripts (@williamFalcon)
  • Tests now pass with torch 1.5.0 (@LysandreJik)
  • Ensure fast tokenizer can construct single-element tensor without pad token (@mfuntowicz)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.