pypi transformers 2.8.0
ELECTRA, Bad word filters, bugfixes & improvements

latest releases: 4.45.1, 4.45.0, 4.44.2...
4 years ago

ELECTRA Model (@LysandreJik)

ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer networks using relatively little compute. ELECTRA models are trained to distinguish "real" input tokens vs "fake" input tokens generated by another neural network, similar to the discriminator of a GAN. At small scale, ELECTRA achieves strong results even when trained on a single GPU. At large scale, ELECTRA achieves state-of-the-art results on the SQuAD 2.0 dataset.

This release comes with 6 ELECTRA checkpoints:

  • google/electra-small-discriminator
  • google/electra-small-generator
  • google/electra-base-discriminator
  • google/electra-base-generator
  • google/electra-large-discriminator
  • google/electra-large-generator

Related:

Thanks to the author @clarkkev for his help during the implementation.

Thanks to community members @hfl-rc @stefan-it @shoarora for already sharing more fine-tuned Electra variants!

Bad word filters in generate (@patrickvonplaten)

The generate method now has a bad word filter.

Fixes and improvements

  • Decoder input ids are not necessary for T5 training anymore (@patrickvonplaten)
  • Update encoder and decoder on set_input_embedding for BART (@sshleifer)
  • Using loaded checkpoint with --do_predict (instead of random init) for Pytorch-lightning scripts (@ethanjperez)
  • Clean summarization and translation example testing files for T5 and Bart (@patrickvonplaten)
  • Cleaner examples (@julien-c)
  • Extensive testing for T5 model (@patrickvonplaten)
  • Force models outputs to always have batch_size as their first dim (@patrickvonplaten)
  • Fix for continuing training in some scripts (@xeb)
  • Resizing embedding matrix before sending it to the optimizer (@ngarneau)
  • BertJapaneseTokenizer accept options for mecab (@tamuhey)
  • Speed up GELU computation with torch.jit (@mryab)
  • fix argument order of update_mems fn in TF version (@patrickvonplaten, @dmytyar)
  • Split generate test function into beam search, no beam search (@patrickvonplaten)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.