github huggingface/transformers v3.4.0
ProphetNet, Blenderbot, SqueezeBERT, DeBERTa

latest releases: v4.40.1, v4.40.0, v4.39.3...
3 years ago

ProphetNet, Blenderbot, SqueezeBERT, DeBERTa

ProphetNET

Two new models are released as part of the ProphetNet implementation: ProphetNet and XLM-ProphetNet.

ProphetNet is an encoder-decoder model and can predict n-future tokens for “ngram” language modeling instead of just the next token.

XLM-ProphetNet is an encoder-decoder model with an identical architecture to ProhpetNet, but the model was trained on the multi-lingual “wiki100” Wikipedia dump.

The ProphetNet model was proposed in ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou on 13 Jan, 2020.

It was added to the library in PyTorch with the following checkpoints:

  • microsoft/xprophetnet-large-wiki100-cased-xglue-ntg
  • microsoft/prophetnet-large-uncased
  • microsoft/prophetnet-large-uncased-cnndm
  • microsoft/xprophetnet-large-wiki100-cased
  • microsoft/xprophetnet-large-wiki100-cased-xglue-qg

Contributions:

BlenderBot

Blenderbot is an encoder-decoder model for open-domain chat. It uses a standard seq2seq model transformer-based architecture.

The Blender chatbot model was proposed in Recipes for building an open-domain chatbot Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.

It was added to the library in PyTorch with the following checkpoints:

  • facebook/blenderbot-90M
  • facebook/blenderbot-3B

Contributions:

SqueezeBERT

The SqueezeBERT model was proposed in SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. It’s a bidirectional transformer similar to the BERT model. The key difference between the BERT architecture and the SqueezeBERT architecture is that SqueezeBERT uses grouped convolutions instead of fully-connected layers for the Q, K, V and FFN layers.

It was added to the library in PyTorch with the following checkpoints:

  • squeezebert/squeezebert-mnli
  • squeezebert/squeezebert-uncased
  • squeezebert/squeezebert-mnli-headless

Contributions:

DeBERTa

The DeBERTa model was proposed in DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google’s BERT model released in 2018 and Facebook’s RoBERTa model released in 2019.

It was added to the library in PyTorch with the following checkpoints:

  • microsoft/deberta-base
  • microsoft/deberta-large

Contributions:

Both SentencePiece and Tokenizers are now optional libraries

Support for SentencePiece is now part of the tokenizers library! Thanks to this we now have near-full support of fast tokenizers in the library.

With this new feature, we slightly change the paradigm regarding installation:

  • SentencePiece is now an optional dependency, paving the way to a fully-featured conda install in the near future

  • Tokenizers is now also an optional dependency, making it possible to install and use the library even when rust cannot be compiled on the machine.

  • [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies #7659 (@thomwolf)

The main __init__ has been improved to always import the same functions and classes. If someone then tries to use a class that requires an optional dependency, an ImportError will be raised at init (with instructions on how to install the missing dependency) #7537 (@sgugger)

Improvements made to the Trainer

The Trainer API has been improved to work with models requiring several labels or returning several outputs, and to have clearer progress tracking. A new TrainerCallback class has been added to allow the user to easily customize the default training loop.

Seq2Seq Trainer

A child of Trainer specialized for training seq2seq models, from @patil-suraj and @sshleifer. Accessible through examples/seq2seq/finetune_trainer.py.

Distributed Generation

  • You can run model.generate in pytorch on a large dataset and split the work across multiple GPUs, using examples/seq2seq/run_distributed_eval.py
  • [s2s] release pseudolabel links and instructions #7639 (@sshleifer)
  • [s2s] Fix t5 warning for distributed eval #7487 (@sshleifer)
  • [s2s] fix kwargs style #7488 (@sshleifer)
  • [s2s] fix lockfile and peg distillation constants #7545 (@sshleifer)
  • [s2s] fix nltk pytest race condition with FileLock #7515 (@sshleifer)

Notebooks

General improvements and bugfixes

Don't miss a new transformers release

NewReleases is sending notifications on new releases.