PyTorch 2.0 stack support
We are very excited by the newly announced PyTorch 2.0 stack. You can enable torch.compile
on any of our models, and get support with the Trainer
(and in all our PyTorch examples) by using the torchdynamo
training argument. For instance, just add --torchdynamo inductor
when launching those examples from the command line.
This API is still experimental and may be subject to changes as the PyTorch 2.0 stack matures.
Note that to get the best performance, we recommend:
- using an Ampere GPU (or more recent)
- sticking to fixed shaped for now (so use
--pad_to_max_length
in our examples)
Audio Spectrogram Transformer
The Audio Spectrogram Transformer model was proposed in AST: Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung, James Glass. The Audio Spectrogram Transformer applies a Vision Transformer to audio, by turning audio into an image (spectrogram). The model obtains state-of-the-art results for audio classification.
- Add Audio Spectogram Transformer by @NielsRogge in #19981
Jukebox
The Jukebox model was proposed in Jukebox: A generative model for music by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditionned on an artist, genres and lyrics.
- Add Jukebox model (replaces #16875) by @ArthurZucker in #17826
Switch Transformers
The SwitchTransformers model was proposed in Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by William Fedus, Barret Zoph, Noam Shazeer.
It is the first MoE model supported in transformers
, with the largest checkpoint currently available currently containing 1T parameters.
- Add Switch transformers by @younesbelkada and @ArthurZucker in #19323
RocBert
The RoCBert model was proposed in RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. It’s a pretrained Chinese language model that is robust under various forms of adversarial attacks.
CLIPSeg
The CLIPSeg model was proposed in Image Segmentation Using Text and Image Prompts by Timo Lüddecke and Alexander Ecker. CLIPSeg adds a minimal decoder on top of a frozen CLIP model for zero- and one-shot image segmentation.
- Add CLIPSeg by @NielsRogge in #20066
NAT and DiNAT
NAT
NAT was proposed in Neighborhood Attention Transformer by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.
It is a hierarchical vision transformer based on Neighborhood Attention, a sliding-window self attention pattern.
DiNAT
DiNAT was proposed in Dilated Neighborhood Attention Transformer by Ali Hassani and Humphrey Shi.
It extends NAT by adding a Dilated Neighborhood Attention pattern to capture global context, and shows significant performance improvements over it.
- Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models by @alihassanijr in #20219
MobileNetV2
The MobileNet model was proposed in MobileNetV2: Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.
MobileNetV1
The MobileNet model was proposed in MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.
Image processors
Image processors replace feature extractors as the processing class for computer vision models.
Important changes:
size
parameter is now a dictionary of{"height": h, "width": w}
,{"shortest_edge": s}
,{"shortest_egde": s, "longest_edge": l}
instead of int or tuple.- Addition of
data_format
flag. You can now specify if you want your images to be returned in"channels_first"
- NCHW - or"channels_last"
- NHWC - format. - Processing flags e.g.
do_resize
can be passed directly to thepreprocess
method instead of modifying the class attribute:image_processor([image_1, image_2], do_resize=False, return_tensors="pt", data_format="channels_last")
- Leaving
return_tensors
unset will return a list of numpy arrays.
The classes are backwards compatible and can be created using existing feature extractor configurations - with the size
parameter converted.
- Add Image Processors by @amyeroberts in #19796
- Add Donut image processor by @amyeroberts #20425
- Add segmentation + object detection image processors by @amyeroberts in #20160
- AutoImageProcessor by @amyeroberts in #20111
Backbone for computer vision models
We're adding support for a general AutoBackbone
class, which turns any vision model (like ConvNeXt, Swin Transformer) into a backbone to be used with frameworks like DETR and Mask R-CNN. The design is in early stages and we welcome feedback.
- Add AutoBackbone + ResNetBackbone by @NielsRogge in #20229
- Improve backbone by @NielsRogge in #20380
- [AutoBackbone] Improve API by @NielsRogge in #20407
Support for safetensors
offloading
If the model you are using has a safetensors
checkpoint and you have the library installed, offload to disk will take advantage of this to be more memory efficient and roughly 33% faster.
Contrastive search in the generate
method
- Generate: TF contrastive search with XLA support by @gante in #20050
- Generate: contrastive search with full optional outputs by @gante in #19963
Breaking changes
- 🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in
convert_tokens_to_string
by @beneyal in #15775
Bugfixes and improvements
- add dataset by @stevhliu in #20005
- Add BERT resources by @stevhliu in #19852
- Add LayoutLMv3 resource by @stevhliu in #19932
- fix typo by @stevhliu in #20006
- Update object detection pipeline to use post_process_object_detection methods by @alaradirik in #20004
- clean up vision/text config dict arguments by @ydshieh in #19954
- make sentencepiece import conditional in bertjapanesetokenizer by @ripose-jp in #20012
- Fix gradient checkpoint test in encoder-decoder by @ydshieh in #20017
- Quality by @sgugger in #20002
- Update auto processor to check image processor created by @amyeroberts in #20021
- [Doctest] Add configuration_deberta_v2.py by @Saad135 in #19995
- Improve model tester by @ydshieh in #19984
- Fix doctest by @ydshieh in #20023
- Show installed libraries and their versions in CI jobs by @ydshieh in #20026
- reorganize glossary by @stevhliu in #20010
- Now supporting pathlike in pipelines too. by @Narsil in #20030
- Add **kwargs by @amyeroberts in #20037
- Fix some doctests after PR 15775 by @ydshieh in #20036
- [Doctest] Add configuration_camembert.py by @Saad135 in #20039
- [Whisper Tokenizer] Make more user-friendly by @sanchit-gandhi in #19921
- [FuturWarning] Add futur warning for LEDForSequenceClassification by @ArthurZucker in #19066
- fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc by @sywangyi in #19891
- Update esmfold conversion script by @Rocketknight1 in #20028
- Fixed torch.finfo issue with torch.fx by @michaelbenayoun in #20040
- Only resize embeddings when necessary by @sgugger in #20043
- Speed up TF token classification postprocessing by converting complete tensors to numpy by @deutschmn in #19976
- Fix ESM LM head test by @Rocketknight1 in #20045
- Update README.md by @bofenghuang in #20063
- fix
tokenizer_type
to avoid error when loading checkpoint back by @pacman100 in #20062 - [Trainer] Fix model name in push_to_hub by @sanchit-gandhi in #20064
- PoolformerImageProcessor defaults to match previous FE by @amyeroberts in #20048
- change constant torch.tensor to torch.full by @MerHS in #20061
- Update READMEs for ESMFold and add notebooks by @Rocketknight1 in #20067
- Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 by @jordiclive in #20068
- Allow passing arguments to model testers for CLIP-like models by @ydshieh in #20044
- Show installed libraries and their versions in GA jobs by @ydshieh in #20069
- Update defaults and logic to match old FE by @amyeroberts in #20065
- Update modeling_tf_utils.py by @cakiki in #20076
- Update hub.py by @cakiki in #20075
- [Doctest] Add configuration_dpr.py by @Saad135 in #20080
- Removing RobertaConfig inheritance from CamembertConfig by @Saad135 in #20059
- Skip 2 tests in
VisionTextDualEncoderProcessorTest
by @ydshieh in #20098 - Replace unsupported facebookresearch/bitsandbytes by @tomaarsen in #20093
- docs: Resolve many typos in the English docs by @tomaarsen in #20088
- use huggingface_hub.model_inifo() to get pipline_tag by @y-tag in #20077
- Fix
generate_dummy_inputs
forImageGPTOnnxConfig
by @ydshieh in #20103 - docs: Fixed variables in f-strings by @tomaarsen in #20087
- Add new terms to the glossary by @stevhliu in #20051
- Replace awkward timm link with the expected one by @tomaarsen in #20109
- Fix AutoTokenizer with subfolder passed by @sgugger in #20110
- [Audio Processor] Only pass sr to feat extractor by @sanchit-gandhi in #20022
- Update github pr docs actions by @mishig25 in #20125
- Adapt has_labels test when no labels were found by @sgugger in #20113
- Improve tiny model creation script by @ydshieh in #20119
- Remove BertConfig inheritance from RobertaConfig by @Saad135 in #20124
- [Swin] Add Swin SimMIM checkpoints by @NielsRogge in #20034
- Update
CLIPSegModelTester
by @ydshieh in #20134 - Update SwinForMaskedImageModeling doctest values by @amyeroberts in #20139
- Attempting to test automatically the
_keys_to_ignore
. by @Narsil in #20042 - Generate: move generation_.py src files into generation/.py by @gante in #20096
- add cv + audio labels by @stevhliu in #20114
- Update VisionEncoderDecoder to use an image processor by @amyeroberts in #20137
- [CLIPSeg] Add resources by @NielsRogge in #20118
- Make DummyObject more robust by @mariosasko in #20146
- Add
RoCBertTokenizer
toTOKENIZER_MAPPING_NAMES
by @ydshieh in #20141 - Adding support for LayoutLMvX variants for
object-detection
. by @Narsil in #20143 - Add doc tests by @NielsRogge in #20158
- doc comment fix: Args was in wrong place by @hollance in #20164
- Update
OnnxConfig.generate_dummy_inputs
to checkImageProcessingMixin
by @ydshieh in #20157 - Generate: fix TF doctests by @gante in #20159
- Fix arg names for our models by @Rocketknight1 in #20166
- [processor] Add 'model input names' property by @sanchit-gandhi in #20117
- Fix object-detection bug (height, width inversion). by @Narsil in #20167
- [OWL-ViT] Make model consistent with CLIP by @NielsRogge in #20144
- Fix type - update any PIL.Image.Resampling by @amyeroberts in #20172
- Fix tapas scatter by @Bearnardd in #20149
- Update README.md by @code-with-rajeev in #19530
- Proposal Remove the weird
inspect
in ASR pipeline and make WhisperEncoder just nice to use. by @Narsil in #19571 - Pytorch type hints by @IMvision12 in #20112
- Generate: TF sample doctest result update by @gante in #20208
- [ROC_BERT] Make CI happy by @younesbelkada in #20175
- add _keys_to_ignore_on_load_unexpected = [r"pooler"] by @ArthurZucker in #20210
- docs: translated index page to korean by @wonhyeongseo in #20180
- feat: add i18n issue template by @wonhyeongseo in #20199
- [Examples] Generalise Seq2Seq ASR to handle Whisper by @sanchit-gandhi in #19519
- mark
test_save_load_fast_init_from_base
asis_flaky
by @ydshieh in #20200 - Update README.md by @Nietism in #20188
- Downgrade log warning -> info by @amyeroberts in #20202
- Generate: add Bloom fixes for contrastive search by @gante in #20213
- Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. by @Narsil in #20104
- [docs] set overflowing image width to auto-scale by @wonhyeongseo in #20197
- Update tokenizer_summary.mdx by @bofenghuang in #20135
- Make
ImageSegmentationPipelineTests
less flaky by @ydshieh in #20147 - update relative positional embedding by @ArthurZucker in #20203
- [WHISPER] Update modeling tests by @ArthurZucker in #20162
- Add
accelerate
support forViT
family by @younesbelkada in #20174 - Add param_name to size_dict logs & tidy by @amyeroberts in #20205
- Add object detection + segmentation transforms by @amyeroberts in #20003
- Typo on doctring in ElectraTokenizer by @FacerAin in #20192
- Remove
authorized_missing_keys
in favor of _keys_to_ignore_on_load_missing by @ArthurZucker in #20228 - Add missing ESM autoclass by @Rocketknight1 in #20177
- fix device issue by @ydshieh in #20227
- fixed spelling error in testing.mdx by @kasmith11 in #20220
- Fix
run_clip.py
by @ydshieh in #20234 - Fix docstring of CLIPTokenizer(Fast) by @TilmannR in #20233
- Fix MaskformerFeatureExtractor by @NielsRogge in #20100
- New logging support to "Trainer" Class (ClearML Logger) by @skinan in #20184
- Enable PyTorch 1.13 by @sgugger in #20168
- [CLIP] allow loading projection layer in vision and text model by @patil-suraj in #18962
- Slightly alter Keras dummy loss by @Rocketknight1 in #20232
- Add to DeBERTa resources by @Saad135 in #20155
- Add clip resources to the transformers documentation by @ambujpawar in #20190
- Update reqs to include min gather_for_metrics Accelerate version by @muellerzr in #20242
- Allow trainer to return eval. loss for CLIP-like models by @ydshieh in #20214
- Adds image-guided object detection support to OWL-ViT by @alaradirik in #20136
- Adding
audio-classification
example in the doc. by @Narsil in #20235 - Updating the doctest for conversational. by @Narsil in #20236
- Adding doctest for
fill-mask
pipeline. by @Narsil in #20241 - Adding doctest for
feature-extraction
. by @Narsil in #20240 - Adding ASR pipeline example. by @Narsil in #20226
- Adding doctest for document-question-answering by @Narsil in #20239
- Adding an example for
depth-estimation
pipeline. by @Narsil in #20237 - Complete doc migration by @mishig25 in #20267
- Fix result saving errors of pytorch examples by @li-plus in #20276
- Adding a doctest for
table-question-answering
pipeline. by @Narsil in #20260 - Adding doctest for
image-segmentation
pipeline. by @Narsil in #20256 - Adding doctest for
text2text-generation
pipeline. by @Narsil in #20261 - Adding doctest for
text-generation
pipeline. by @Narsil in #20264 - Add TF protein notebook to notebooks doc by @Rocketknight1 in #20271
- Rephrasing the link. by @Narsil in #20253
- Add Chinese-CLIP implementation by @yangapku in #20368
- Adding doctest example for
image-classification
pipeline. by @Narsil in #20254 - Adding doctest for
zero-shot-image-classification
pipeline. by @Narsil in #20272 - Adding doctest for
zero-shot-classification
pipeline. by @Narsil in #20268 - Adding doctest for
visual-question-answering
pipeline. by @Narsil in #20266 - Adding doctest for
text-classification
pipeline. by @Narsil in #20262 - Adding doctest for
question-answering
pipeline. by @Narsil in #20259 - [Docs] Add resources of OpenAI GPT by @shogohida in #20084
- Adding doctest for
image-to-text
pipeline. by @Narsil in #20257 - Adding doctest for
token-classification
pipeline. by @Narsil in #20265 - remaining pytorch type hints by @IMvision12 in #20217
- Data collator for token classification pads labels column when receives pytorch tensors by @markovalexander in #20244
- [Doctest] Add configuration_deformable_detr.py by @Saad135 in #20273
- Fix summarization script by @muellerzr in #20286
- [DOCTEST] Fix the documentation of RoCBert by @ArthurZucker in #20142
- [bnb] Let's warn users when saving 8-bit models by @younesbelkada in #20282
- Adding
zero-shot-object-detection
pipeline doctest. by @Narsil in #20274 - Adding doctest for
object-detection
pipeline. by @Narsil in #20258 - Image transforms functionality used instead by @amyeroberts in #20278
- TF: add test for
PushToHubCallback
by @gante in #20231 - Generate: general TF XLA constrastive search are now slow tests by @gante in #20277
- Fixing the doctests failures. by @Narsil in #20294
- set the default cache_enable to True, aligned with the default value in pytorch cpu/cuda amp autocast by @sywangyi in #20289
- Add docstrings for canine model by @raghavanone in #19457
- Add missing report button for Example test by @ydshieh in #20293
- refactor test by @younesbelkada in #20300
- [Tiny model creation] deal with
ImageProcessor
by @ydshieh in #20298 - Fix blender bot missleading doc by @ArthurZucker in #20301
- remove two tokens that should not be suppressed by @ArthurZucker in #20302
- [ASR Examples] Update README for Whisper by @sanchit-gandhi in #20230
- Add padding image transformation by @amyeroberts in #19838
- Pin TensorFlow by @sgugger in #20313
- Add AnyPrecisionAdamW optimizer by @atturaioe in #18961
- [Proposal] Breaking change
zero-shot-object-detection
for improved consistency. by @Narsil in #20280 - Fix flakey test with seed by @muellerzr in #20318
- Pin TF 2.10.1 for Push CI by @ydshieh in #20319
- Remove double brackets by @stevhliu in #20307
- TF: future proof our keras imports by @gante in #20317
- organize pipelines by modality by @stevhliu in #20306
- Fix torch device issues by @ydshieh in #20304
- Generate: add generation config class by @gante in #20218
- translate zh quicktour by @bfss in #20095)
- Add Spanish translation of serialization.mdx by @donelianc in #20245
- Add LayerScale to NAT/DiNAT by @alihassanijr in #20325
- [Switch Transformers] Fix failing slow test by @younesbelkada in #20346
- fix: "BigSicence" typo in docs by @rajrajhans in #20331
- Generate:
model_kwargs
can also be an input toprepare_inputs_for_generation
by @gante in #20353 - Update Special Language Tokens for PLBART by @jordiclive in #19980
- Add resources by @NielsRogge in #20296
- Enhance HfArgumentParser functionality and ease of use by @konstantinjdobler in #20323
- Add inference section to task guides by @stevhliu in #18781
- Fix toctree for Section 3 in Spanish Documentation by @donelianc in #20360
- Generate: shorter XLA contrastive search tests by @gante in #20354
- revert
keys_to_ignore
for M2M100 by @younesbelkada in #20381 - add
accelerate
support forESM
by @younesbelkada in #20379 - Fix nightly runs by @sgugger in #20352
- Optimizes DonutProcessor token2json method for speed by @michaelnation26 in #20283
- Indicate better minimal version of PyTorch in big model inference by @sgugger in #20385
- Fix longformer onnx broken export by @fxmarty in #20292
- Use tiny models for ONNX tests - text modality by @lewtun in #20333
- [ESM] fix
accelerate
tests for esmfold by @younesbelkada in #20387 - Generate: fix plbart generation tests by @gante in #20391
- [bloom] convert script tweaks by @stas00 in #18593
- Fix doctest file path by @ydshieh in #20400
- [Image Transformers] to_pil fix float edge cases by @patrickvonplaten in #20406
- make daily CI happy by @younesbelkada in #20410
- fix nasty
bnb
bug by @younesbelkada in #20408 - change the way sentinel tokens can retrived by @raghavanone in #20373
- [BNB] Throw
ValueError
when trying to cast or assign by @younesbelkada in #20409 - Use updated
model_max_length
when saving tokenizers by @ydshieh in #20401 - Add Spanish translation of pr_checks.mdx by @donelianc in #20339
- fix device in longformer onnx path by @fxmarty in #20419
- Fix ModelOutput instantiation when there is only one tuple by @sgugger in #20416
accelerate
support forOwlViT
by @younesbelkada in #20411- [AnyPrecisionAdamW] test fix by @stas00 in #20454
- fix
word_to_tokens
docstring format by @SaulLu in #20450 - Fix typo in FSMT Tokenizer by @kamalkraj in #20456
- Fix device issues in
CLIPSegModelIntegrationTest
by @ydshieh in #20467 - Fix links for
contrastive_loss
by @ydshieh in #20455 - Fix doctests for audio models by @ydshieh in #20468
- Fix ESM checkpoints for tests by @Rocketknight1 in #20436
- More TF int dtype fixes by @Rocketknight1 in #20384
- make tensors in function build_relative_position created on proper device instead of always on cpu by @qq775294390 in #20434
- update cpu related doc by @sywangyi in #20444
- with pytorch cpu only version. without --no_cuda, using --bf16 will trigger error like "Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0" by @sywangyi in #20445
- [CLIPTokenizer] Improve warning by @patrickvonplaten in #20458
- Replace assertions with value errors on distilbert model by @JuheonChu in #20463
- [Doctest] Add configuration_fsmt.py by @sha016 in #19936
- Replace assertion with ValueError exceptions in run_image_captioning_flax.py by @katiele47 in #20365
- [FLAX] Add dtype to embedding for bert/bart/opt/t5 by @merrymercy in #20340
- fix both failing RoCBert tests by @ArthurZucker in #20469
- Include image processor in add-new-model-like by @amyeroberts in #20439
- chore: add link to the video cls notebook. by @sayakpaul in #20386
- add timeout option for deepspeed engine by @henghuiz in #20443
- [Maskformer] Add MaskFormerSwin backbone by @NielsRogge in #20344
- Extract warnings from CI artifacts by @ydshieh in #20474
- Add Donut image processor by @amyeroberts in #20425
- Fix torch meshgrid warnings by @fxmarty in #20475
- Fix init import_structure sorting by @sgugger in #20477
- extract warnings in GH workflows by @ydshieh in #20487
- add in layer gpt2 tokenizer by @piEsposito in #20421
- Replace assert statements with raise exceptions by @miyu386 in #20478
- fixed small typo by @sandeepgadhwal in #20490
- Fix documentation code to import facebook/detr-resnet-50 model by @JuanFKurucz in #20491
- Fix disk offload for full safetensors checkpoints by @sgugger in #20497
- [modelcard] Check for IterableDataset by @sanchit-gandhi in #20495
- [modelcard] Set model name if empty by @sanchit-gandhi in #20496
- Add segmentation + object detection image processors by @amyeroberts in #20160
- remove
attention_mask
truncation in whisper by @ydshieh in #20488 - Make
add_special_tokens
more clear by @ydshieh in #20424 - [OPT/Galactica] Load large
galactica
models by @younesbelkada in #20390 - Support extraction of both train and eval XLA graphs by @jeffhataws in #20492
- fix ipex+fp32 jit trace error in ipex 1.13 by @sywangyi in #20504
- Expected output for the test changed by @ArthurZucker in #20493
- Fix TF nightly tests by @Rocketknight1 in #20507
- Update doc examples feature extractor -> image processor by @amyeroberts in #20501
- Fix Typo in Docs for GPU by @julianpollmann in #20509
- Fix minimum version for device_map by @sgugger in #20489
- Update
AutomaticSpeechRecognitionPipeline
doc example by @ydshieh in #20512 - Add
natten
for CI by @ydshieh in #20511 - Fix Data2VecTextForCasualLM example code documentation by @JuanFKurucz in #20510
- Add some warning for Dynamo and enable TF32 when it's set by @sgugger in #20515
- [modelcard] Update dataset tags by @sanchit-gandhi in #20506
- Change Doctests CI launch time by @ydshieh in #20523
- Fix
PLBart
doctest by @ydshieh in #20527 - Fix
ConditionalDetrForSegmentation
doc example by @ydshieh in #20531 - add doc for by @younesbelkada in #20525
- Update
ZeroShotObjectDetectionPipeline
doc example by @ydshieh in #20528 - update post_process_image_guided_detection by @fcakyon in #20521
- QnA example: add speed metric by @sywangyi in #20522
- Fix doctest by @NielsRogge in #20534
- Fix Hubert models in TFHubertModel and TFHubertForCTC documentation code by @JuanFKurucz in #20516
- Fix link in pipeline device map by @stevhliu in #20517
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @sww9370
- Add RocBert (#20013)
- @IMvision12
- @alihassanijr
- @bfss
- @donelianc
- @yangapku
- Add Chinese-CLIP implementation (#20368)