github huggingface/transformers v4.18.0
v4.18.0: Checkpoint sharding, vision models

latest releases: v4.46.2, v4.46.1, v4.46.0...
2 years ago

New model additions

You'll notice that we are starting to add several older models in vision. This is because those models are used as backbones in recent architectures. While we could rely on existing libraries for such pretrained models, we will ultimately need some support for those backbones in PyTorch/TensorFlow and Jax, and there is currently no library that supports those three frameworks. This is why we are starting to add those models to Transformers directly (here ResNet and VAN)

GLPN

The GLPN model was proposed in Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. GLPN combines SegFormer’s hierarchical mix-Transformer with a lightweight decoder for monocular depth estimation. The proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity.

ResNet

The ResNet model was proposed in Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. Our implementation follows the small changes made by Nvidia, we apply the stride=2 for downsampling in bottleneck’s 3x3 conv and not in the first 1x1. This is generally known as “ResNet v1.5”.

ResNet introduced residual connections, they allow to train networks with an unseen number of layers (up to 1000). ResNet won the 2015 ILSVRC & COCO competition, one important milestone in deep computer vision.

VAN

The VAN model was proposed in Visual Attention Network by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.

This paper introduces a new attention layer based on convolution operations able to capture both local and distant relationships. This is done by combining normal and large kernel convolution layers. The latter uses a dilated convolution to capture distant correlations.

VisionTextDualEncoder

The VisionTextDualEncoderModel can be used to initialize a vision-text dual encoder model with any pretrained vision autoencoding model as the vision encoder (e.g. ViT, BEiT, DeiT) and any pretrained text autoencoding model as the text encoder (e.g. RoBERTa, BERT). Two projection layers are added on top of both the vision and text encoder to project the output embeddings to a shared latent space. The projection layers are randomly initialized so the model should be fine-tuned on a downstream task. This model can be used to align the vision-text embeddings using CLIP like contrastive image-text training and then can be used for zero-shot vision tasks such image-classification or retrieval.

In LiT: Zero-Shot Transfer with Locked-image Text Tuning it is shown how leveraging pre-trained (locked/frozen) image and text model for contrastive learning yields significant improvment on new zero-shot vision tasks such as image classification or retrieval.

DiT

DiT was proposed in DiT: Self-supervised Pre-training for Document Image Transformer by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. DiT applies the self-supervised objective of BEiT (BERT pre-training of Image Transformers) to 42 million document images, allowing for state-of-the-art results on tasks including:

  • document image classification: the RVL-CDIP dataset (a collection of 400,000 images belonging to one of 16 classes).
  • document layout analysis: the PubLayNet dataset (a collection of more than 360,000 document images constructed by automatically parsing PubMed XML files).
  • table detection: the ICDAR 2019 cTDaR dataset (a collection of 600 training images and 240 testing images).

DPT

The DPT model was proposed in Vision Transformers for Dense Prediction by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun. DPT is a model that leverages the Vision Transformer (ViT) as backbone for dense prediction tasks like semantic segmentation and depth estimation.

Checkpoint sharding

Large models are becoming more and more the norm and having a checkpoint in a single file is challenging for several reasons:

  • it's tougher to upload/download files bigger than 20/30 GB efficiently
  • the whole checkpoint might not fit into RAM even if you have enough GPU memory

That's why the save_pretrained method will know automatically shard a checkpoint in several files when you go above a 10GB threshold for PyTorch models. from_pretrained will handle such sharded checkpoints as if there was only one file.

TensorFlow implementations

GPT-J and ViTMAE are now available in TensorFlow.

Documentation guides

The IA migration is wrapped up with a new conceptual guide available.

Improvements and bugfixes

Impressive community contributors

The community contributors below have significantly contributed to the v4.18.0 release. Thank you!

@sayakpaul, for contributing the TensorFlow version of ViTMAE
@stancld, for contributing the TensorFlow version of of GPT-J

New Contributors

  • @Soonhwan-Kwon made their first contribution in #13727
  • @jonatasgrosman made their first contribution in #15428
  • @ToluClassics made their first contribution in #15432
  • @peregilk made their first contribution in #15423
  • @bugface made their first contribution in #15480
  • @AyushExel made their first contribution in #14582
  • @thinksoso made their first contribution in #15403
  • @davidleonfdez made their first contribution in #15473
  • @sanchit-gandhi made their first contribution in #15519
  • @arron1227 made their first contribution in #15084
  • @cimeister made their first contribution in #15504
  • @cwkeam made their first contribution in #15416
  • @Albertobegue made their first contribution in #13831
  • @derenrich made their first contribution in #15614
  • @tkukurin made their first contribution in #15636
  • @muzhi1991 made their first contribution in #15638
  • @versae made their first contribution in #15590
  • @jonrbates made their first contribution in #15617
  • @arampacha made their first contribution in #15413
  • @FrancescoSaverioZuppichini made their first contribution in #15657
  • @coyotte508 made their first contribution in #15680
  • @heytanay made their first contribution in #15531
  • @gautierdag made their first contribution in #15702
  • @SSardorf made their first contribution in #15741
  • @Crabzmatic made their first contribution in #15740
  • @dreamgonfly made their first contribution in #15644
  • @lsb made their first contribution in #15468
  • @pbelevich made their first contribution in #15776
  • @sayakpaul made their first contribution in #15750
  • @rahul003 made their first contribution in #15877
  • @rhjohnstone made their first contribution in #15884
  • @cosmoquester made their first contribution in #15913
  • @konstantinjdobler made their first contribution in #15951
  • @yhavinga made their first contribution in #15963
  • @dlwh made their first contribution in #15961
  • @basilevh made their first contribution in #15972
  • @andstor made their first contribution in #16033
  • @davidsbatista made their first contribution in #16063
  • @feifang24 made their first contribution in #16065
  • @kevinpl07 made their first contribution in #15245
  • @johnnv1 made their first contribution in #16088
  • @Abdelrhman-Hosny made their first contribution in #16097
  • @p-mishra1 made their first contribution in #16099
  • @jbrry made their first contribution in #16108
  • @jorgtied made their first contribution in #16124
  • @vumichien made their first contribution in #16110
  • @merveenoyan made their first contribution in #16138
  • @yharyarias made their first contribution in #16047
  • @bhavika made their first contribution in #16129
  • @PepijnBoers made their first contribution in #16107
  • @soomiles made their first contribution in #16121
  • @Tegzes made their first contribution in #16126
  • @jacobdineen made their first contribution in #16106
  • @wpan03 made their first contribution in #16123
  • @infinite-Joy made their first contribution in #16147
  • @marxav made their first contribution in #16132
  • @Duedme made their first contribution in #16158
  • @MarkusSagen made their first contribution in #16087
  • @mowafess made their first contribution in #16163
  • @jcmc00 made their first contribution in #16174
  • @utkusaglm made their first contribution in #16178
  • @johko made their first contribution in #16181
  • @johnryan465 made their first contribution in #16090
  • @daysm made their first contribution in #16208
  • @forsc made their first contribution in #16212
  • @Sophylax made their first contribution in #16227
  • @function2-llx made their first contribution in #15795
  • @ktzsh made their first contribution in #16131
  • @louisowen6 made their first contribution in #16247
  • @omarespejel made their first contribution in #16215
  • @dinesh-GDK made their first contribution in #16266
  • @aflah02 made their first contribution in #16115
  • @PolarisRisingWar made their first contribution in #16291
  • @happyXia made their first contribution in #16284
  • @robotjellyzone made their first contribution in #16270
  • @yhl48 made their first contribution in #16257
  • @johnnygreco made their first contribution in #16244
  • @IvanLauLinTiong made their first contribution in #16307
  • @beomseok-lee made their first contribution in #15593
  • @clefourrier made their first contribution in #16200
  • @OllieBroadhurst made their first contribution in #16356
  • @reichenbch made their first contribution in #16281
  • @edbeeching made their first contribution in #15845
  • @xuzhao9 made their first contribution in #16034
  • @Dahlbomii made their first contribution in #16376
  • @simonzli made their first contribution in #16377
  • @Gladiator07 made their first contribution in #16406
  • @silvererudite made their first contribution in #16414
  • @garfieldnate made their first contribution in #15293
  • @basicv8vc made their first contribution in #15932
  • @kurianbenoy made their first contribution in #16113
  • @jaesuny made their first contribution in #16405
  • @FernandoLpz made their first contribution in #16149
  • @arnaudstiegler made their first contribution in #16398
  • @wesleyacheng made their first contribution in #16467
  • @akashe made their first contribution in #16416
  • @sanderland made their first contribution in #16451
  • @AdityaKane2001 made their first contribution in #16481
  • @dctelus made their first contribution in #16493
  • @tomerip made their first contribution in #16492
  • @roywei made their first contribution in #16371
  • @chenbohua3 made their first contribution in #16490
  • @SimplyJuanjo made their first contribution in #16329
  • @lilianabs made their first contribution in #16229
  • @Sangohe made their first contribution in #16176
  • @Agoniii made their first contribution in #16531
  • @akuma12 made their first contribution in #16498
  • @fschlatt made their first contribution in #16536
  • @KMFODA made their first contribution in #16521
  • @andrescodas made their first contribution in #16530
  • @JohnGiorgi made their first contribution in #16485
  • @JunMa11 made their first contribution in #16612

Full Changelog: v4.17.0...v4.18.0

Don't miss a new transformers release

NewReleases is sending notifications on new releases.