pypi transformers 4.0.0
Transformers v4.0.0: Fast tokenizers, model outputs, file reorganization

latest releases: 4.40.1, 4.40.0, 4.39.3...
3 years ago

Transformers v4.0.0-rc-1: Fast tokenizers, model outputs, file reorganization

Breaking changes since v3.x

Version v4.0.0 introduces several breaking changes that were necessary.

1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.

The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set. The main breaking change is the handling of overflowing tokens between the python and rust tokenizers.

How to obtain the same behavior as v3.x in v4.x

In version v3.x:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xxx")

to obtain the same in version v4.x:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("xxx", use_fast=False)

2. SentencePiece is removed from the required dependencies

The requirement on the SentencePiece dependency has been lifted from the setup.py. This is done so that we may have a channel on anaconda cloud without relying on conda-forge. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard transformers installation.

This includes the slow versions of:

  • XLNetTokenizer
  • AlbertTokenizer
  • CamembertTokenizer
  • MBartTokenizer
  • PegasusTokenizer
  • T5Tokenizer
  • ReformerTokenizer
  • XLMRobertaTokenizer

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version v3.x, you should install sentencepiece additionally:

In version v3.x:

pip install transformers

to obtain the same in version v4.x:

pip install transformers[sentencepiece]

or

pip install transformers sentencepiece

3. The architecture of the repo has been updated so that each model resides in its folder

The past and foreseeable addition of new models means that the number of files in the directory src/transformers keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories.

This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path.

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version v3.x, you should update the path used to access the layers.

In version v3.x:

from transformers.modeling_bert import BertLayer

to obtain the same in version v4.x:

from transformers.models.bert.modeling_bert import BertLayer

4. Switching the return_dict argument to True by default

The return_dict argument enables the return of named-tuples-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.

This is a breaking change as the limitation of that tuple is that it cannot be unpacked: value0, value1 = outputs will not work.

How to obtain the same behavior as v3.x in v4.x

In order to obtain the same behavior as version v3.x, you should specify the return_dict argument to False, either in the model configuration or during the forward pass.

In version v3.x:

outputs = model(**inputs)

to obtain the same in version v4.x:

outputs = model(**inputs, return_dict=False)

5. Removed some deprecated attributes

Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in #8604.

Here is a list of these attributes/methods/arguments and what their replacements should be:

In several models, the labels become consistent with the other models:

  • masked_lm_labels becomes labels in AlbertForMaskedLM and AlbertForPreTraining.
  • masked_lm_labels becomes labels in BertForMaskedLM and BertForPreTraining.
  • masked_lm_labels becomes labels in DistilBertForMaskedLM.
  • masked_lm_labels becomes labels in ElectraForMaskedLM.
  • masked_lm_labels becomes labels in LongformerForMaskedLM.
  • masked_lm_labels becomes labels in MobileBertForMaskedLM.
  • masked_lm_labels becomes labels in RobertaForMaskedLM.
  • lm_labels becomes labels in BartForConditionalGeneration.
  • lm_labels becomes labels in GPT2DoubleHeadsModel.
  • lm_labels becomes labels in OpenAIGPTDoubleHeadsModel.
  • lm_labels becomes labels in T5ForConditionalGeneration.

In several models, the caching mechanism becomes consistent with the other models:

  • decoder_cached_states becomes past_key_values in all BART-like, FSMT and T5 models.
  • decoder_past_key_values becomes past_key_values in all BART-like, FSMT and T5 models.
  • past becomes past_key_values in all CTRL models.
  • past becomes past_key_values in all GPT-2 models.

Regarding the tokenizer classes:

  • The tokenizer attribute max_len becomes model_max_length.
  • The tokenizer attribute return_lengths becomes return_length.
  • The tokenizer encoding argument is_pretokenized becomes is_split_into_words.

Regarding the Trainer class:

  • The Trainer argument tb_writer is removed in favor of the callback TensorBoardCallback(tb_writer=...).
  • The Trainer argument prediction_loss_only is removed in favor of the class argument args.prediction_loss_only.
  • The Trainer attribute data_collator should be a callable.
  • The Trainer method _log is deprecated in favor of log.
  • The Trainer method _training_step is deprecated in favor of training_step.
  • The Trainer method _prediction_loop is deprecated in favor of prediction_loop.
  • The Trainer method is_local_master is deprecated in favor of is_local_process_zero.
  • The Trainer method is_world_master is deprecated in favor of is_world_process_zero.

Regarding the TFTrainer class:

  • The TFTrainer argument prediction_loss_only is removed in favor of the class argument args.prediction_loss_only.
  • The Trainer method _log is deprecated in favor of log.
  • The TFTrainer method _prediction_loop is deprecated in favor of prediction_loop.
  • The TFTrainer method _setup_wandb is deprecated in favor of setup_wandb.
  • The TFTrainer method _run_model is deprecated in favor of run_model.

Regarding the TrainerArgument and TFTrainerArgument classes:

  • The TrainerArgument argument evaluate_during_training is deprecated in favor of evaluation_strategy.
  • The TFTrainerArgument argument evaluate_during_training is deprecated in favor of evaluation_strategy.

Regarding the Transfo-XL model:

  • The Transfo-XL configuration attribute tie_weight becomes tie_words_embeddings.
  • The Transfo-XL modeling method reset_length becomes reset_memory_length.

Regarding pipelines:

  • The FillMaskPipeline argument topk becomes top_k.

Model Templates

Version 4.0.0 will be the first to include the experimental feature of model templates. These model templates aim to facilitate the addition of new models to the library by doing most of the work: generating the model/configuration/tokenization/test files that fit the API, with respect to the choice the user has made in terms of naming and functionality.

This release includes a model template for the encoder model (similar to the BERT architecture). Generating a model using the template will generate the files, put them at the appropriate location, reference them throughout the code-base, and generate a working test suite. The user should then only modify the files to their liking, rather than creating the model from scratch.

Feedback welcome, get started from the README here.

New model additions

mT5 and T5 version 1.1 (@patrickvonplaten )

The T5v1.1 is an improved version of the original T5 model, see here: https://github.com/google-research/text-to-text-transfer-transformer/blob/master/released_checkpoints.md

The multilingual T5 model (mT5) was presented in https://arxiv.org/abs/2010.11934 and is based on the T5v1.1 architecture.

Multiple pre-trained checkpoints have been added to the library:

Relevant pull requests:

TF DPR

The DPR model has been added in TensorFlow to match its PyTorch counterpart by @ratthachat

TF Longformer

Additional heads have been added to the TensorFlow Longformer implementation: SequenceClassification, MultipleChoice and TokenClassification

Bug fixes and improvements

Don't miss a new transformers release

NewReleases is sending notifications on new releases.