github huggingface/transformers v4.20.0
v4.20.0 Big Model infernece, BLOOM, CvT, GPT Neo-X, LayoutLMv3, LeViT, LongT5, M-CTC-T, Trajectory Transformer and Wav2Vec2-Conformer

latest releases: v4.46.2, v4.46.1, v4.46.0...
2 years ago

Big model inference

You can now use the big model inference of Accelerate directly in any call to from_pretrained by specifying device_map="auto" (or your own device_map). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained(
  "bigscience/T0pp", revision="sharded", device_map="auto"
)
  • Use Accelerate in from_pretrained for big model inference by @sgugger in #17341

BLOOM

The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.

CvT

The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

GPT Neo-X

GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

LayoutLMv3

LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

LeViT

LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

LongT5

LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

M-CTC-T

The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.

Trajectory Transformer

This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).

Wav2Vec2-Conformer

The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.

TensorFlow implementations

Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.

Flax implementations

OPT is now available in Flax.

Documentation translation in Italian and Portuguese

A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.

Improvements and bugfixes

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @sayakpaul
    • Include a comment to reflect Amy's contributions (#17689)
    • Add TFData2VecVision for semantic segmentation (#17271)
  • @jianan-gu
    • Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153)
    • Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138)
  • @stancld
    • Add LongT5 model (#16792)
    • Fix a typo relative_postion_if_large -> relative_position_if_large (#17366)
  • @mfumanelli
    • Translation/autoclass (#17615)
    • Add installation.mdx Italian translation (#17530)
    • Setup for Italian translation and add quicktour.mdx translation (#17472)
  • @cwkeam
  • @zphang
    • Remove RuntimeErrors for NaN-checking in 20B (#17563)
    • Adding GPT-NeoX-20B (#16659)
  • @AnugunjNaman
    • fix integration test levit (#17555)
    • Adding LeViT Model by Facebook (#17466)
    • Fix cvt docstrings (#17367)
  • @yharyarias
    • Spanish translation of the file preprocessing.mdx (#16299)
  • @mraunak
    • Add Information Gain Filtration algorithm (#16953)
  • @rzimmerdev
    • Added translation of installation.mdx to Portuguese Issue #16824 (#16979)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.