github huggingface/transformers v2.7.0
T5 Model, BART summarization example and reduced memory, translation pipeline

latest releases: v4.40.2, v4.40.1, v4.40.0...
4 years ago

T5 Model (@patrickvonplaten, @thomwolf )

T5 is a powerful encoder-decoder model that formats every NLP problem into a text-to-text format. It achieves state of the art results on a variety of NLP tasks (Summarization, Question-Answering, ...).

Five sets of pre-trained weights (pre-trained on a multi-task mixture of unsupervised and supervised tasks) are released. In ascending order from 60 million parameters to 11 billion parameters:

t5-small, t5-base, t5-large, t5-3b, t5-11b

T5 can now be used with the translation and summarization pipeline.

Related:

Big thanks to the original authors, especially @craffel who helped answer our questions, reviewed PRs and tested T5 extensively.

New BART checkpoint: bart-large-xsum (@sshleifer)

These weights are from BART finetuned on the XSum abstractive summarization challenge, which encourages shorter (more abstractive) summaries. It achieves state of the art.

BART summarization example with pytorch-lightning (@acarrera94)

New example: BART for summarization, using Pytorch-lightning. Trains on CNN/DM and evaluates.

Translation pipeline (@patrickvonplaten)

A new pipeline is available, leveraging the T5 model. The T5 model was added to the summarization pipeline as well.

Memory improvements with BART (@sshleifer)

In an effort to have the same memory footprint and same computing power necessary to run inference on BART, several improvements have been made on the model:

  • Remove the LM head and use the embedding matrix instead (~200MB)
  • Call encoder before expanding input_ids (~1GB)
  • SelfAttention only returns weights if config.output_attentions (~500MB)
  • Two separate, smaller decoder attention masks (~500MB)
  • drop columns that are exclusively pad_token_id from input_ids in evaluate_cnn example.

TensorFlow models may now be serialized (@gthb)

Supports JSON serialization of Keras layers by overriding get_config, so that they can be sent to Tensorboard to display a conceptual graph of the model. TensorFlow models may now be saved using model.save, as other Keras models.

New model: XLMForTokenClassification (@sakares)

A new head was added to XLM: XLMForTokenClassification.

Don't miss a new transformers release

NewReleases is sending notifications on new releases.