transformers 4.25.1 on Python PyPI

PyTorch 2.0 stack support

We are very excited by the newly announced PyTorch 2.0 stack. You can enable torch.compile on any of our models, and get support with the Trainer (and in all our PyTorch examples) by using the torchdynamo training argument. For instance, just add --torchdynamo inductor when launching those examples from the command line.

This API is still experimental and may be subject to changes as the PyTorch 2.0 stack matures.

Note that to get the best performance, we recommend:

using an Ampere GPU (or more recent)
sticking to fixed shaped for now (so use --pad_to_max_length in our examples)

Repurpose torchdynamo training args towards torch._dynamo by @sgugger in #20498

Audio Spectrogram Transformer

The Audio Spectrogram Transformer model was proposed in AST: Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung, James Glass. The Audio Spectrogram Transformer applies a Vision Transformer to audio, by turning audio into an image (spectrogram). The model obtains state-of-the-art results for audio classification.

Add Audio Spectogram Transformer by @NielsRogge in #19981

Jukebox

The Jukebox model was proposed in Jukebox: A generative model for music by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditionned on an artist, genres and lyrics.

Add Jukebox model (replaces #16875) by @ArthurZucker in #17826

Switch Transformers

The SwitchTransformers model was proposed in Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by William Fedus, Barret Zoph, Noam Shazeer.

It is the first MoE model supported in transformers, with the largest checkpoint currently available currently containing 1T parameters.

Add Switch transformers by @younesbelkada and @ArthurZucker in #19323

RocBert

The RoCBert model was proposed in RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. It’s a pretrained Chinese language model that is robust under various forms of adversarial attacks.

Add RocBert by @sww9370 in #20013

CLIPSeg

The CLIPSeg model was proposed in Image Segmentation Using Text and Image Prompts by Timo Lüddecke and Alexander Ecker. CLIPSeg adds a minimal decoder on top of a frozen CLIP model for zero- and one-shot image segmentation.

Add CLIPSeg by @NielsRogge in #20066

NAT and DiNAT

NAT

NAT was proposed in Neighborhood Attention Transformer by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.

It is a hierarchical vision transformer based on Neighborhood Attention, a sliding-window self attention pattern.

DiNAT

DiNAT was proposed in Dilated Neighborhood Attention Transformer by Ali Hassani and Humphrey Shi.

It extends NAT by adding a Dilated Neighborhood Attention pattern to capture global context, and shows significant performance improvements over it.

Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models by @alihassanijr in #20219

MobileNetV2

The MobileNet model was proposed in MobileNetV2: Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.

add MobileNetV2 model by @hollance in #17845

MobileNetV1

The MobileNet model was proposed in MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.

add MobileNetV1 model by @hollance in #17799

Image processors

Image processors replace feature extractors as the processing class for computer vision models.

Important changes:

size parameter is now a dictionary of {"height": h, "width": w}, {"shortest_edge": s}, {"shortest_egde": s, "longest_edge": l} instead of int or tuple.
Addition of data_format flag. You can now specify if you want your images to be returned in "channels_first" - NCHW - or "channels_last" - NHWC - format.
Processing flags e.g. do_resize can be passed directly to the preprocess method instead of modifying the class attribute: image_processor([image_1, image_2], do_resize=False, return_tensors="pt", data_format="channels_last")
Leaving return_tensors unset will return a list of numpy arrays.

The classes are backwards compatible and can be created using existing feature extractor configurations - with the size parameter converted.

Add Image Processors by @amyeroberts in #19796
Add Donut image processor by @amyeroberts #20425
Add segmentation + object detection image processors by @amyeroberts in #20160
AutoImageProcessor by @amyeroberts in #20111

Backbone for computer vision models

We're adding support for a general AutoBackbone class, which turns any vision model (like ConvNeXt, Swin Transformer) into a backbone to be used with frameworks like DETR and Mask R-CNN. The design is in early stages and we welcome feedback.

Add AutoBackbone + ResNetBackbone by @NielsRogge in #20229
Improve backbone by @NielsRogge in #20380
[AutoBackbone] Improve API by @NielsRogge in #20407

Support for `safetensors` offloading

If the model you are using has a safetensors checkpoint and you have the library installed, offload to disk will take advantage of this to be more memory efficient and roughly 33% faster.

Safetensors offload by @sgugger in #20321

Contrastive search in the `generate` method

Generate: TF contrastive search with XLA support by @gante in #20050
Generate: contrastive search with full optional outputs by @gante in #19963

Breaking changes

🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string by @beneyal in #15775

Bugfixes and improvements

add dataset by @stevhliu in #20005
Add BERT resources by @stevhliu in #19852
Add LayoutLMv3 resource by @stevhliu in #19932
fix typo by @stevhliu in #20006
Update object detection pipeline to use post_process_object_detection methods by @alaradirik in #20004
clean up vision/text config dict arguments by @ydshieh in #19954
make sentencepiece import conditional in bertjapanesetokenizer by @ripose-jp in #20012
Fix gradient checkpoint test in encoder-decoder by @ydshieh in #20017
Quality by @sgugger in #20002
Update auto processor to check image processor created by @amyeroberts in #20021
[Doctest] Add configuration_deberta_v2.py by @Saad135 in #19995
Improve model tester by @ydshieh in #19984
Fix doctest by @ydshieh in #20023
Show installed libraries and their versions in CI jobs by @ydshieh in #20026
reorganize glossary by @stevhliu in #20010
Now supporting pathlike in pipelines too. by @Narsil in #20030
Add **kwargs by @amyeroberts in #20037
Fix some doctests after PR 15775 by @ydshieh in #20036
[Doctest] Add configuration_camembert.py by @Saad135 in #20039
[Whisper Tokenizer] Make more user-friendly by @sanchit-gandhi in #19921
[FuturWarning] Add futur warning for LEDForSequenceClassification by @ArthurZucker in #19066
fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc by @sywangyi in #19891
Update esmfold conversion script by @Rocketknight1 in #20028
Fixed torch.finfo issue with torch.fx by @michaelbenayoun in #20040
Only resize embeddings when necessary by @sgugger in #20043
Speed up TF token classification postprocessing by converting complete tensors to numpy by @deutschmn in #19976
Fix ESM LM head test by @Rocketknight1 in #20045
Update README.md by @bofenghuang in #20063
fix tokenizer_type to avoid error when loading checkpoint back by @pacman100 in #20062
[Trainer] Fix model name in push_to_hub by @sanchit-gandhi in #20064
PoolformerImageProcessor defaults to match previous FE by @amyeroberts in #20048
change constant torch.tensor to torch.full by @MerHS in #20061
Update READMEs for ESMFold and add notebooks by @Rocketknight1 in #20067
Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 by @jordiclive in #20068
Allow passing arguments to model testers for CLIP-like models by @ydshieh in #20044
Show installed libraries and their versions in GA jobs by @ydshieh in #20069
Update defaults and logic to match old FE by @amyeroberts in #20065
Update modeling_tf_utils.py by @cakiki in #20076
Update hub.py by @cakiki in #20075
[Doctest] Add configuration_dpr.py by @Saad135 in #20080
Removing RobertaConfig inheritance from CamembertConfig by @Saad135 in #20059
Skip 2 tests in VisionTextDualEncoderProcessorTest by @ydshieh in #20098
Replace unsupported facebookresearch/bitsandbytes by @tomaarsen in #20093
docs: Resolve many typos in the English docs by @tomaarsen in #20088
use huggingface_hub.model_inifo() to get pipline_tag by @y-tag in #20077
Fix generate_dummy_inputs for ImageGPTOnnxConfig by @ydshieh in #20103
docs: Fixed variables in f-strings by @tomaarsen in #20087
Add new terms to the glossary by @stevhliu in #20051
Replace awkward timm link with the expected one by @tomaarsen in #20109
Fix AutoTokenizer with subfolder passed by @sgugger in #20110
[Audio Processor] Only pass sr to feat extractor by @sanchit-gandhi in #20022
Update github pr docs actions by @mishig25 in #20125
Adapt has_labels test when no labels were found by @sgugger in #20113
Improve tiny model creation script by @ydshieh in #20119
Remove BertConfig inheritance from RobertaConfig by @Saad135 in #20124
[Swin] Add Swin SimMIM checkpoints by @NielsRogge in #20034
Update CLIPSegModelTester by @ydshieh in #20134
Update SwinForMaskedImageModeling doctest values by @amyeroberts in #20139
Attempting to test automatically the _keys_to_ignore. by @Narsil in #20042
Generate: move generation_.py src files into generation/.py by @gante in #20096
add cv + audio labels by @stevhliu in #20114
Update VisionEncoderDecoder to use an image processor by @amyeroberts in #20137
[CLIPSeg] Add resources by @NielsRogge in #20118
Make DummyObject more robust by @mariosasko in #20146
Add RoCBertTokenizer to TOKENIZER_MAPPING_NAMES by @ydshieh in #20141
Adding support for LayoutLMvX variants for object-detection. by @Narsil in #20143
Add doc tests by @NielsRogge in #20158
doc comment fix: Args was in wrong place by @hollance in #20164
Update OnnxConfig.generate_dummy_inputs to check ImageProcessingMixin by @ydshieh in #20157
Generate: fix TF doctests by @gante in #20159
Fix arg names for our models by @Rocketknight1 in #20166
[processor] Add 'model input names' property by @sanchit-gandhi in #20117
Fix object-detection bug (height, width inversion). by @Narsil in #20167
[OWL-ViT] Make model consistent with CLIP by @NielsRogge in #20144
Fix type - update any PIL.Image.Resampling by @amyeroberts in #20172
Fix tapas scatter by @Bearnardd in #20149
Update README.md by @code-with-rajeev in #19530
Proposal Remove the weird inspect in ASR pipeline and make WhisperEncoder just nice to use. by @Narsil in #19571
Pytorch type hints by @IMvision12 in #20112
Generate: TF sample doctest result update by @gante in #20208
[ROC_BERT] Make CI happy by @younesbelkada in #20175
add _keys_to_ignore_on_load_unexpected = [r"pooler"] by @ArthurZucker in #20210
docs: translated index page to korean by @wonhyeongseo in #20180
feat: add i18n issue template by @wonhyeongseo in #20199
[Examples] Generalise Seq2Seq ASR to handle Whisper by @sanchit-gandhi in #19519
mark test_save_load_fast_init_from_base as is_flaky by @ydshieh in #20200
Update README.md by @Nietism in #20188
Downgrade log warning -> info by @amyeroberts in #20202
Generate: add Bloom fixes for contrastive search by @gante in #20213
Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. by @Narsil in #20104
[docs] set overflowing image width to auto-scale by @wonhyeongseo in #20197
Update tokenizer_summary.mdx by @bofenghuang in #20135
Make ImageSegmentationPipelineTests less flaky by @ydshieh in #20147
update relative positional embedding by @ArthurZucker in #20203
[WHISPER] Update modeling tests by @ArthurZucker in #20162
Add accelerate support for ViT family by @younesbelkada in #20174
Add param_name to size_dict logs & tidy by @amyeroberts in #20205
Add object detection + segmentation transforms by @amyeroberts in #20003
Typo on doctring in ElectraTokenizer by @FacerAin in #20192
Remove authorized_missing_keysin favor of _keys_to_ignore_on_load_missing by @ArthurZucker in #20228
Add missing ESM autoclass by @Rocketknight1 in #20177
fix device issue by @ydshieh in #20227
fixed spelling error in testing.mdx by @kasmith11 in #20220
Fix run_clip.py by @ydshieh in #20234
Fix docstring of CLIPTokenizer(Fast) by @TilmannR in #20233
Fix MaskformerFeatureExtractor by @NielsRogge in #20100
New logging support to "Trainer" Class (ClearML Logger) by @skinan in #20184
Enable PyTorch 1.13 by @sgugger in #20168
[CLIP] allow loading projection layer in vision and text model by @patil-suraj in #18962
Slightly alter Keras dummy loss by @Rocketknight1 in #20232
Add to DeBERTa resources by @Saad135 in #20155
Add clip resources to the transformers documentation by @ambujpawar in #20190
Update reqs to include min gather_for_metrics Accelerate version by @muellerzr in #20242
Allow trainer to return eval. loss for CLIP-like models by @ydshieh in #20214
Adds image-guided object detection support to OWL-ViT by @alaradirik in #20136
Adding audio-classification example in the doc. by @Narsil in #20235
Updating the doctest for conversational. by @Narsil in #20236
Adding doctest for fill-mask pipeline. by @Narsil in #20241
Adding doctest for feature-extraction. by @Narsil in #20240
Adding ASR pipeline example. by @Narsil in #20226
Adding doctest for document-question-answering by @Narsil in #20239
Adding an example for depth-estimation pipeline. by @Narsil in #20237
Complete doc migration by @mishig25 in #20267
Fix result saving errors of pytorch examples by @li-plus in #20276
Adding a doctest for table-question-answering pipeline. by @Narsil in #20260
Adding doctest for image-segmentation pipeline. by @Narsil in #20256
Adding doctest for text2text-generation pipeline. by @Narsil in #20261
Adding doctest for text-generation pipeline. by @Narsil in #20264
Add TF protein notebook to notebooks doc by @Rocketknight1 in #20271
Rephrasing the link. by @Narsil in #20253
Add Chinese-CLIP implementation by @yangapku in #20368
Adding doctest example for image-classification pipeline. by @Narsil in #20254
Adding doctest for zero-shot-image-classification pipeline. by @Narsil in #20272
Adding doctest for zero-shot-classification pipeline. by @Narsil in #20268
Adding doctest for visual-question-answering pipeline. by @Narsil in #20266
Adding doctest for text-classification pipeline. by @Narsil in #20262
Adding doctest for question-answering pipeline. by @Narsil in #20259
[Docs] Add resources of OpenAI GPT by @shogohida in #20084
Adding doctest for image-to-text pipeline. by @Narsil in #20257
Adding doctest for token-classification pipeline. by @Narsil in #20265
remaining pytorch type hints by @IMvision12 in #20217
Data collator for token classification pads labels column when receives pytorch tensors by @markovalexander in #20244
[Doctest] Add configuration_deformable_detr.py by @Saad135 in #20273
Fix summarization script by @muellerzr in #20286
[DOCTEST] Fix the documentation of RoCBert by @ArthurZucker in #20142
[bnb] Let's warn users when saving 8-bit models by @younesbelkada in #20282
Adding zero-shot-object-detection pipeline doctest. by @Narsil in #20274
Adding doctest for object-detection pipeline. by @Narsil in #20258
Image transforms functionality used instead by @amyeroberts in #20278
TF: add test for PushToHubCallback by @gante in #20231
Generate: general TF XLA constrastive search are now slow tests by @gante in #20277
Fixing the doctests failures. by @Narsil in #20294
set the default cache_enable to True, aligned with the default value in pytorch cpu/cuda amp autocast by @sywangyi in #20289
Add docstrings for canine model by @raghavanone in #19457
Add missing report button for Example test by @ydshieh in #20293
refactor test by @younesbelkada in #20300
[Tiny model creation] deal with ImageProcessor by @ydshieh in #20298
Fix blender bot missleading doc by @ArthurZucker in #20301
remove two tokens that should not be suppressed by @ArthurZucker in #20302
[ASR Examples] Update README for Whisper by @sanchit-gandhi in #20230
Add padding image transformation by @amyeroberts in #19838
Pin TensorFlow by @sgugger in #20313
Add AnyPrecisionAdamW optimizer by @atturaioe in #18961
[Proposal] Breaking change zero-shot-object-detection for improved consistency. by @Narsil in #20280
Fix flakey test with seed by @muellerzr in #20318
Pin TF 2.10.1 for Push CI by @ydshieh in #20319
Remove double brackets by @stevhliu in #20307
TF: future proof our keras imports by @gante in #20317
organize pipelines by modality by @stevhliu in #20306
Fix torch device issues by @ydshieh in #20304
Generate: add generation config class by @gante in #20218
translate zh quicktour by @bfss in #20095)
Add Spanish translation of serialization.mdx by @donelianc in #20245
Add LayerScale to NAT/DiNAT by @alihassanijr in #20325
[Switch Transformers] Fix failing slow test by @younesbelkada in #20346
fix: "BigSicence" typo in docs by @rajrajhans in #20331
Generate: model_kwargs can also be an input to prepare_inputs_for_generation by @gante in #20353
Update Special Language Tokens for PLBART by @jordiclive in #19980
Add resources by @NielsRogge in #20296
Enhance HfArgumentParser functionality and ease of use by @konstantinjdobler in #20323
Add inference section to task guides by @stevhliu in #18781
Fix toctree for Section 3 in Spanish Documentation by @donelianc in #20360
Generate: shorter XLA contrastive search tests by @gante in #20354
revert keys_to_ignore for M2M100 by @younesbelkada in #20381
add accelerate support for ESM by @younesbelkada in #20379
Fix nightly runs by @sgugger in #20352
Optimizes DonutProcessor token2json method for speed by @michaelnation26 in #20283
Indicate better minimal version of PyTorch in big model inference by @sgugger in #20385
Fix longformer onnx broken export by @fxmarty in #20292
Use tiny models for ONNX tests - text modality by @lewtun in #20333
[ESM] fix accelerate tests for esmfold by @younesbelkada in #20387
Generate: fix plbart generation tests by @gante in #20391
[bloom] convert script tweaks by @stas00 in #18593
Fix doctest file path by @ydshieh in #20400
[Image Transformers] to_pil fix float edge cases by @patrickvonplaten in #20406
make daily CI happy by @younesbelkada in #20410
fix nasty bnb bug by @younesbelkada in #20408
change the way sentinel tokens can retrived by @raghavanone in #20373
[BNB] Throw ValueError when trying to cast or assign by @younesbelkada in #20409
Use updated model_max_length when saving tokenizers by @ydshieh in #20401
Add Spanish translation of pr_checks.mdx by @donelianc in #20339
fix device in longformer onnx path by @fxmarty in #20419
Fix ModelOutput instantiation when there is only one tuple by @sgugger in #20416
accelerate support for OwlViT by @younesbelkada in #20411
[AnyPrecisionAdamW] test fix by @stas00 in #20454
fix word_to_tokens docstring format by @SaulLu in #20450
Fix typo in FSMT Tokenizer by @kamalkraj in #20456
Fix device issues in CLIPSegModelIntegrationTest by @ydshieh in #20467
Fix links for contrastive_loss by @ydshieh in #20455
Fix doctests for audio models by @ydshieh in #20468
Fix ESM checkpoints for tests by @Rocketknight1 in #20436
More TF int dtype fixes by @Rocketknight1 in #20384
make tensors in function build_relative_position created on proper device instead of always on cpu by @qq775294390 in #20434
update cpu related doc by @sywangyi in #20444
with pytorch cpu only version. without --no_cuda, using --bf16 will trigger error like "Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0" by @sywangyi in #20445
[CLIPTokenizer] Improve warning by @patrickvonplaten in #20458
Replace assertions with value errors on distilbert model by @JuheonChu in #20463
[Doctest] Add configuration_fsmt.py by @sha016 in #19936
Replace assertion with ValueError exceptions in run_image_captioning_flax.py by @katiele47 in #20365
[FLAX] Add dtype to embedding for bert/bart/opt/t5 by @merrymercy in #20340
fix both failing RoCBert tests by @ArthurZucker in #20469
Include image processor in add-new-model-like by @amyeroberts in #20439
chore: add link to the video cls notebook. by @sayakpaul in #20386
add timeout option for deepspeed engine by @henghuiz in #20443
[Maskformer] Add MaskFormerSwin backbone by @NielsRogge in #20344
Extract warnings from CI artifacts by @ydshieh in #20474
Add Donut image processor by @amyeroberts in #20425
Fix torch meshgrid warnings by @fxmarty in #20475
Fix init import_structure sorting by @sgugger in #20477
extract warnings in GH workflows by @ydshieh in #20487
add in layer gpt2 tokenizer by @piEsposito in #20421
Replace assert statements with raise exceptions by @miyu386 in #20478
fixed small typo by @sandeepgadhwal in #20490
Fix documentation code to import facebook/detr-resnet-50 model by @JuanFKurucz in #20491
Fix disk offload for full safetensors checkpoints by @sgugger in #20497
[modelcard] Check for IterableDataset by @sanchit-gandhi in #20495
[modelcard] Set model name if empty by @sanchit-gandhi in #20496
Add segmentation + object detection image processors by @amyeroberts in #20160
remove attention_mask truncation in whisper by @ydshieh in #20488
Make add_special_tokens more clear by @ydshieh in #20424
[OPT/Galactica] Load large galactica models by @younesbelkada in #20390
Support extraction of both train and eval XLA graphs by @jeffhataws in #20492
fix ipex+fp32 jit trace error in ipex 1.13 by @sywangyi in #20504
Expected output for the test changed by @ArthurZucker in #20493
Fix TF nightly tests by @Rocketknight1 in #20507
Update doc examples feature extractor -> image processor by @amyeroberts in #20501
Fix Typo in Docs for GPU by @julianpollmann in #20509
Fix minimum version for device_map by @sgugger in #20489
Update AutomaticSpeechRecognitionPipeline doc example by @ydshieh in #20512
Add natten for CI by @ydshieh in #20511
Fix Data2VecTextForCasualLM example code documentation by @JuanFKurucz in #20510
Add some warning for Dynamo and enable TF32 when it's set by @sgugger in #20515
[modelcard] Update dataset tags by @sanchit-gandhi in #20506
Change Doctests CI launch time by @ydshieh in #20523
Fix PLBart doctest by @ydshieh in #20527
Fix ConditionalDetrForSegmentation doc example by @ydshieh in #20531
add doc for by @younesbelkada in #20525
Update ZeroShotObjectDetectionPipeline doc example by @ydshieh in #20528
update post_process_image_guided_detection by @fcakyon in #20521
QnA example: add speed metric by @sywangyi in #20522
Fix doctest by @NielsRogge in #20534
Fix Hubert models in TFHubertModel and TFHubertForCTC documentation code by @JuanFKurucz in #20516
Fix link in pipeline device map by @stevhliu in #20517

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@sww9370
- Add RocBert (#20013)
@IMvision12
- Pytorch type hints (#20112)
- remaining pytorch type hints (#20217)
@alihassanijr
- Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models (#20219)
- Add LayerScale to NAT/DiNAT (#20325)
@bfss
- translate zh quicktour(#20095) (#20181)
@donelianc
- Add Spanish translation of serialization.mdx (#20245)
- Fix toctree for Section 3 in Spanish Documentation (#20360)
- Add Spanish translation of pr_checks.mdx (#20339)
@yangapku
- Add Chinese-CLIP implementation (#20368)

transformers 4.25.1 PyTorch 2.0 support, Audio Spectogram Transformer, Jukebox, Switch Transformers and more on Python PyPI