New model additions
You'll notice that we are starting to add several older models in vision. This is because those models are used as backbones in recent architectures. While we could rely on existing libraries for such pretrained models, we will ultimately need some support for those backbones in PyTorch/TensorFlow and Jax, and there is currently no library that supports those three frameworks. This is why we are starting to add those models to Transformers directly (here ResNet and VAN)
GLPN
The GLPN model was proposed in Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. GLPN combines SegFormer’s hierarchical mix-Transformer with a lightweight decoder for monocular depth estimation. The proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity.
- Add GLPN by @NielsRogge in #16199
ResNet
The ResNet model was proposed in Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. Our implementation follows the small changes made by Nvidia, we apply the stride=2 for downsampling in bottleneck’s 3x3 conv and not in the first 1x1. This is generally known as “ResNet v1.5”.
ResNet introduced residual connections, they allow to train networks with an unseen number of layers (up to 1000). ResNet won the 2015 ILSVRC & COCO competition, one important milestone in deep computer vision.
- Resnet by @FrancescoSaverioZuppichini in #15770
VAN
The VAN model was proposed in Visual Attention Network by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
This paper introduces a new attention layer based on convolution operations able to capture both local and distant relationships. This is done by combining normal and large kernel convolution layers. The latter uses a dilated convolution to capture distant correlations.
- Visual Attention Network (VAN) by @FrancescoSaverioZuppichini in #16027
VisionTextDualEncoder
The VisionTextDualEncoderModel can be used to initialize a vision-text dual encoder model with any pretrained vision autoencoding model as the vision encoder (e.g. ViT, BEiT, DeiT) and any pretrained text autoencoding model as the text encoder (e.g. RoBERTa, BERT). Two projection layers are added on top of both the vision and text encoder to project the output embeddings to a shared latent space. The projection layers are randomly initialized so the model should be fine-tuned on a downstream task. This model can be used to align the vision-text embeddings using CLIP like contrastive image-text training and then can be used for zero-shot vision tasks such image-classification or retrieval.
In LiT: Zero-Shot Transfer with Locked-image Text Tuning it is shown how leveraging pre-trained (locked/frozen) image and text model for contrastive learning yields significant improvment on new zero-shot vision tasks such as image classification or retrieval.
- add VisionTextDualEncoder and CLIP fine-tuning script by @patil-suraj in #15701
DiT
DiT was proposed in DiT: Self-supervised Pre-training for Document Image Transformer by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. DiT applies the self-supervised objective of BEiT (BERT pre-training of Image Transformers) to 42 million document images, allowing for state-of-the-art results on tasks including:
- document image classification: the RVL-CDIP dataset (a collection of 400,000 images belonging to one of 16 classes).
- document layout analysis: the PubLayNet dataset (a collection of more than 360,000 document images constructed by automatically parsing PubMed XML files).
- table detection: the ICDAR 2019 cTDaR dataset (a collection of 600 training images and 240 testing images).
- Add Document Image Transformer (DiT) by @NielsRogge in #15984
DPT
The DPT model was proposed in Vision Transformers for Dense Prediction by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun. DPT is a model that leverages the Vision Transformer (ViT) as backbone for dense prediction tasks like semantic segmentation and depth estimation.
- Add DPT by @NielsRogge in #15991
Checkpoint sharding
Large models are becoming more and more the norm and having a checkpoint in a single file is challenging for several reasons:
- it's tougher to upload/download files bigger than 20/30 GB efficiently
- the whole checkpoint might not fit into RAM even if you have enough GPU memory
That's why the save_pretrained
method will know automatically shard a checkpoint in several files when you go above a 10GB threshold for PyTorch models. from_pretrained
will handle such sharded checkpoints as if there was only one file.
TensorFlow implementations
GPT-J and ViTMAE are now available in TensorFlow.
- Add TF implementation of GPT-J by @stancld in #15623
- Add TF ViT MAE by @sayakpaul in #16255
Documentation guides
The IA migration is wrapped up with a new conceptual guide available.
Improvements and bugfixes
- Fix doc links in release utils by @sgugger in #15903
- Fix a TF Vision Encoder Decoder test by @ydshieh in #15896
- [Fix link in pipeline doc] by @patrickvonplaten in #15906
- Fix and improve REALM fine-tuning by @qqaatw in #15297
- Freeze FlaxWav2Vec2 Feature Encoder by @sanchit-gandhi in #15873
- The tests were not updated after the addition of
torch.diag
by @Narsil in #15890 - [Doctests] Fix ignore bug and add more doc tests by @patrickvonplaten in #15911
- Enabling MaskFormer in pipelines by @Narsil in #15917
- Minor fixes for MaskFormer by @FrancescoSaverioZuppichini in #15916
- Add vision models to doc tests by @NielsRogge in #15905
- Fix #15898 by @davidleonfdez in #15928
- Update doc test readme by @patrickvonplaten in #15926
- Re-enabling all fast pipeline tests. by @Narsil in #15924
- Support CLIPTokenizerFast for CLIPProcessor by @cosmoquester in #15913
- Updating the slow tests: by @Narsil in #15893
- Adding
MODEL_FOR_INSTANCE_SEGMENTATION_MAPPING
by @Narsil in #15934 - Add missing support for Flax XLM-RoBERTa by @versae in #15900
- [FlaxT5 Example] Fix flax t5 example pretraining by @patrickvonplaten in #15835
- Do not change the output from tuple to list - to match PT's version by @ydshieh in #15918
- Tests for MaskFormerFeatureExtractor's post_process*** methods by @FrancescoSaverioZuppichini in #15929
- Constrained Beam Search [With Disjunctive Decoding] by @cwkeam in #15761
- [LayoutLMv2] Update requires_backends of feature extractor by @NielsRogge in #15941
- Made MaskFormerModelTest faster by @FrancescoSaverioZuppichini in #15942
- [Bug Fix] Beam search example in docs fails & a fix (integrating
max_length
inBeamScorer.finalize()
) by @cwkeam in #15555 - remove re-defination of FlaxWav2Vec2ForCTCModule by @patil-suraj in #15965
- Support modern list type hints in HfArgumentParser by @konstantinjdobler in #15951
- Backprop Test for Freeze FlaxWav2Vec2 Feature Encoder by @sanchit-gandhi in #15938
- Fix Embedding Module Bug in Flax Models by @sanchit-gandhi in #15920
- Make is_thing_map in Feature Extractor post_process_panoptic_segmentation defaults to all instances by @FrancescoSaverioZuppichini in #15954
- Update training scripts docs by @stevhliu in #15931
- Set scale_embedding to False in some TF tests by @ydshieh in #15952
- Fix LayoutLMv2 test by @NielsRogge in #15939
- [Tests] Fix ViTMAE integration test by @NielsRogge in #15949
- Returning outputs only when asked for for MaskFormer. by @Narsil in #15936
- Speedup T5 Flax training by using Numpy instead of JAX for batch shuffling by @yhavinga in #15963
- Do a pull in case docs were updated during build by @sgugger in #15922
- Fix TFEncDecModelTest - Pytorch device by @ydshieh in #15979
- [Env Command] Add hf hub to env version command by @patrickvonplaten in #15981
- TF: Update multiple choice example by @gante in #15868
- TF generate refactor - past without encoder outputs by @gante in #15944
- Seed _get_train_sampler's generator with arg seed to improve reproducibility by @dlwh in #15961
- Add
ForInstanceSegmentation
models toimage-segmentation
pipelines by @Narsil in #15937 - [Doctests] Move doctests to new GPU & Fix bugs by @patrickvonplaten in #15969
- Removed an outdated check about hdf5_version by @ydshieh in #16011
- Swag example: Update doc format by @gante in #16014
- Fix github actions comment by @LysandreJik in #16009
- Simplify release utils by @sgugger in #15921
- Make
pos
optional inPerceiverAudioPreprocessor
to avoid crashingPerceiverModel
operation by @basilevh in #15972 - Fix MaskFormer failing test on master by @FrancescoSaverioZuppichini in #16012
- Fix broken code blocks in README.md by @upura in #15967
- Use tiny models for get_pretrained_model in TFEncoderDecoderModelTest by @ydshieh in #15989
- Add ONNX export for ViT by @lewtun in #15658
- Add FlaxBartForCausalLM by @sanchit-gandhi in #15995
- add doctests for bart like seq2seq models by @patil-suraj in #15987
- Fix warning message in ElectraForCausalLM by @pbelevich in #16023
- Freeze Feature Encoder in FlaxSpeechEncoderDecoder by @sanchit-gandhi in #15997
- Fix dependency error message in ServeCommand by @andstor in #16033
- [Docs] Improve PyTorch, Flax generate API by @patrickvonplaten in #15988
- [Tests] Add attentions_option to ModelTesterMixin by @NielsRogge in #15909
- [README] fix url for Preprocessing tutorial by @patil-suraj in #16042
- Fix Bug in Flax-Speech-Encoder-Decoder Test by @sanchit-gandhi in #16041
- Fix TFDebertaV2ConvLayer in TFDebertaV2Model by @ydshieh in #16031
- Build the doc in a seperate folder then move it by @sgugger in #16020
- Don't compute metrics in LM examples on TPU by @sgugger in #16029
- TF: Unpack model inputs through a decorator by @gante in #15907
- Fix Bug in Flax Seq2Seq Models by @sanchit-gandhi in #16021
- DeBERTa/DeBERTa-v2/SEW Support for torch 1.11 by @LysandreJik in #16043
- support new marian models by @patil-suraj in #15831
- Fix duplicate arguments passed to dummy inputs in ONNX export by @lewtun in #16045
- FIX: updating doc/example for fine-tune for downstream Token Classification by @davidsbatista in #16063
- Fix a TF test name (LayoutLMModelTest) by @ydshieh in #16061
- Move QDQBert in just PyTorch block by @sgugger in #16062
- Remove assertion over possible activation functions in DistilBERT by @mfuntowicz in #16066
- Fix torch-scatter version by @LysandreJik in #16072
- Add type annotations for BERT and copies by @Rocketknight1 in #16074
- Adding type hints for TFRoBERTa by @Rocketknight1 in #16057
- Make sure
'torch.dtype'
has str-type value in config and all nested dicts for JSON serializability by @feifang24 in #16065 - Run daily doctests without time-out at least once by @patrickvonplaten in #16077
- Add soft length regulation for sequence generation by @kevinpl07 in #15245
- Update troubleshoot guide by @stevhliu in #16001
- Add type annotations for ImageGPT by @johnnv1 in #16088
- Rebuild deepspeed by @LysandreJik in #16081
- Add missing type hints for all flavors of RoBERTa PyTorch models. by @ChainYo in #16086
- [Fix doc example] FSMT by @ydshieh in #16085
- Audio/vision task guides by @stevhliu in #15808
- [ZeRO] Fixes issue with embedding resize by @jeffra in #16093
- [Deepspeed] add support for bf16 mode by @stas00 in #14569
- Change unpacking of TF Bart inputs to use decorator by @osanseviero in #16094
- add unpack_inputs decorator to mbart tf by @Abdelrhman-Hosny in #16097
- Add type annotations for segformer pytorch by @p-mishra1 in #16099
- Add unpack_input decorator to ViT model by @johnnv1 in #16102
- Add type hints to XLM model (PyTorch) by @jbrry in #16108
- Add missing type hints for all flavors of LayoutLMv2 PyTorch models. by @ChainYo in #16089
- Add TFCamembertForCausalLM and ONNX integration test by @lewtun in #16073
- Fix and document Zero Shot Image Classification by @osanseviero in #16079
- Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints by @sanchit-gandhi in #16056
- Update convert_marian_to_pytorch.py by @jorgtied in #16124
- Make TF pt-tf equivalence test more aggressive by @ydshieh in #15839
- Fix ProphetNetTokenizer by @ydshieh in #16082
- Change unpacking of TF mobilebert inputs to use decorator by @vumichien in #16110
- Steps strategy fix for PushtoHubCallback and changed docstring by @merveenoyan in #16138
- [ViTMAE] Add copied from statements and fix prefix by @NielsRogge in #16119
- Spanish translation of the file training.mdx by @yharyarias in #16047
- Added missing type hints - ELECTRA PyTorch by @kamalkraj in #16103
- Added missing type hints - Deberta V1 and V2 by @kamalkraj in #16105
- [Fix doc example] Fix checkpoint name in docstring example by @ydshieh in #16083
- Better input variable naming for OpenAI (TF) by @bhavika in #16129
- Improve model variable naming - CLIP [TF] by @bhavika in #16128
- Add type hints for TFDistilBert by @PepijnBoers in #16107
- Choose framework for ONNX export by @michaelbenayoun in #16018
- Add type hints for Luke in PyTorch by @bhavika in #16111
- Add type hints for PoolFormer in Pytorch by @soomiles in #16121
- Add type hints for SqueezeBert PyTorch by @Tegzes in #16126
- Added missing type hints - ELECTRA TF by @kamalkraj in #16104
- Dcoker images runtime -> devel by @LysandreJik in #16141
- Add type annotations for CLIP (torch) (#16059) by @jacobdineen in #16106
- Add type hints for FNet PyTorch by @wpan03 in #16123
- Use
HF_ENDPOINT
for custom endpoints by @sgugger in #16139 - update albert with tf decorator by @infinite-Joy in #16147
- clearer model variable naming: ELECTRA by @kamalkraj in #16143
- Add type hints for GPTNeo PyTorch by @Tegzes in #16127
- Improve Swin for VisionEncoderDecoder by @NielsRogge in #16070
- Make transformers.utils.fx. _SUPPORTED_MODELS unique by @pbelevich in #16015
- Shift responsibilities a bit for issues by @patrickvonplaten in #16154
- typo "conaining" -> "containing" by @marxav in #16132
- Configurable Relative Position Max. Distance by @agemagician in #16155
- Added spanish translation of quicktour.mdx by @Duedme in #16158
- Use templates by @sgugger in #16142
- [Fix doc example] Fix first example for the custom_datasets tutorial by @MarkusSagen in #16087
- [Fix doc example] Fix 2 PyTorch Vilt docstring examples by @ydshieh in #16076
- TF XLA greedy generation by @Rocketknight1 in #15786
- clearer model variable naming: pegasus by @kamalkraj in #16152
- Change unpacking of TF layoutlm inputs to use decorator by @vumichien in #16112
- update transformer XL with tf decorator by @infinite-Joy in #16166
- added type hints to yoso by @mowafess in #16163
- Framework split by @sgugger in #16030
- [MT5Config] add relative_attention_max_distance in config by @patil-suraj in #16170
- clearer model variable naming: Tapas by @kamalkraj in #16145
- clearer model variable naming: Deberta by @kamalkraj in #16146
- Add flaubert types by @ChainYo in #16118
- clearer model variable naming: xlnet by @kamalkraj in #16150
- Add type hints for Perceiver Pytorch by @jcmc00 in #16174
- Add type hints for Reformer PyTorch by @Tegzes in #16175
- Fix some Flax models'
hidden_states
by @ydshieh in #16167 - Add the XTREME-S fine-tuning example by @anton-l in #15985
- [Xtreme-S] fix some namings by @patrickvonplaten in #16183
- Replace all deprecated
jax.ops
operations with jnp'sat
by @sanchit-gandhi in #16078 - clearer model variable naming: funnel by @utkusaglm in #16178
- clearer model variable naming: blenderbot by @utkusaglm in #16192
- Minor fixes to XTREME-S by @anton-l in #16193
- unpack_input decorator for tf_convnext by @johko in #16181
- clearer model variable naming: blenderbot_small by @utkusaglm in #16194
- Adding type hints for Distilbert by @johnryan465 in #16090
- ResNet: update modules names by @FrancescoSaverioZuppichini in #16196
- Update a CI job step name by @ydshieh in #16189
- Fix loading CLIPVisionConfig and CLIPTextConfig by @patil-suraj in #16198
- TF: add beam search tests by @gante in #16202
- Swin support for any input size by @FrancescoSaverioZuppichini in #15986
- Fix generation min length by @patrickvonplaten in #16206
- Add/type annotations/model vision by @johnnv1 in #16151
- VAN: update modules names by @FrancescoSaverioZuppichini in #16201
- Fixes Loss for TransfoXL when using Trainer API v2 by @LysandreJik in #16140
- [Tests] Fix DiT test by @NielsRogge in #16218
- Fix FlaxRoFormerClassificationHead activation by @ydshieh in #16168
- Fix typos in docstrings of data_collator.py by @daysm in #16208
- Fix reproducibility in Training for PyTorch 1.11 by @sgugger in #16209
- Fix readmes by @qqaatw in #16217
- MaskFormer: fix device on test by @FrancescoSaverioZuppichini in #16219
- Adding Unpack Decorator For DPR model by @forsc in #16212
- Skip equivalence test for TransfoXL by @LysandreJik in #16224
- Fix Type Hint of Nan/Inf Logging Filter Arg by @Sophylax in #16227
- [Flax] remove jax.ops.index by @patil-suraj in #16220
- Support PEP 563 for HfArgumentParser by @function2-llx in #15795
- add unpack_inputs decorator for marian by @johko in #16226
- fix(flax): generate with logits processor/warper by @borisdayma in #16231
- [FlaxSpeechEncoderDecoderModel] Skip from_encoder_decoder_pretrained by @patil-suraj in #16236
- [Generate Docs] Correct docs by @patrickvonplaten in #16133
- [Deepspeed] non-HF Trainer doc update by @stas00 in #16238
- integrations: mlflow: skip start_run() if a run is already active and sanity check on enabling integration by @ktzsh in #16131
- Update expected slices for pillow > 9 by @NielsRogge in #16117
- Attention mask is important in the case of batching... by @Narsil in #16222
- Change assertion to warning when passing past_key_value to T5 encoder by @ZhaofengWu in #16153
- Override _pad in LEDTokenizer to deal with global_attention_mask by @ydshieh in #15940
- Update XLM with TF decorator by @louisowen6 in #16247
- Add unpack_inputs decorator for ctrl by @johko in #16242
- update jax version and re-enable some tests by @patil-suraj in #16254
- [Constrained Beam Search] Adding Notebook Example & Minor Typo Fix by @cwkeam in #16246
- value check for typical sampling by @cimeister in #16165
- Make Flax pt-flax equivalence test more aggressive by @ydshieh in #15841
- Aggressive PT/TF equivalence test on PT side by @ydshieh in #16250
- Update flaubert with TF decorator by @Tegzes in #16258
- Fix links in guides by @stevhliu in #16182
- Small fixes to the documentation by @sgugger in #16180
- [WIP] add
has_attentions
as done in PyTorch side by @ydshieh in #16259 - Make
add-new-model-like
work in an env without all frameworks by @sgugger in #16239 - Deberta v2 code simplification by @guillaume-be in #15732
- Add Slack notification support for doc tests by @patrickvonplaten in #16253
- Framework split for Spanish version of doc quicktour.mdx by @omarespejel in #16215
- Removed the 'optional' string (in DETR post_process) by @dinesh-GDK in #16266
- Draft a guide with our code quirks for new models by @sgugger in #16237
- Fixed Error Raised Due to Wrongly Accessing Training Sample by @aflah02 in #16115
- Fix XGLM cross attention by @patil-suraj in #16290
- Fix a typo (add a coma) by @PolarisRisingWar in #16291
- Add type hints to xlnet by @mowafess in #16214
- Remove disclaimer from Longformer docs by @gchhablani in #16296
- Add argument "cache_dir" for transformers.onnx by @happyXia in #16284
- Add type hints transfoxl by @jcmc00 in #16267
- added type hints for BART model by @robotjellyzone in #16270
- ResNet & VAN: Fixed code sample tests by @FrancescoSaverioZuppichini in #16294
- GPT2 TensorFlow Type Hints by @cakiki in #16261
- Added type hints for PyTorch T5 model by @yhl48 in #16257
- Fix Marian conversion script by @patil-suraj in #16300
- [SegFormer] Remove unused attributes by @NielsRogge in #16285
- Update troubleshoot with more content by @stevhliu in #16243
- fix last element in hidden_states for XGLM by @ydshieh in #16301
- [FlaxGPTJ] Fix bug in rotary embeddings by @patil-suraj in #16298
- Add missing type hints for PyTorch Longformer models by @johnnygreco in #16244
- Fix Seq2SeqTrainingArguments docs by @gchhablani in #16295
- [xtreme-s] Update Minds14 results by @anton-l in #16241
- added type hints for blenderbot and blenderbot_small (v2) by @IvanLauLinTiong in #16307
- Update Makefile Phonies by @gchhablani in #16306
- TF - update (vision_)encoder_decoder past variable by @gante in #16260
- Add Flaubert OnnxConfig to Transformers by @ChainYo in #16279
- TFLongformer: Add missing type hints and unpack inputs decorator by @johnnygreco in #16228
- add xglm conversion script by @patil-suraj in #16305
- Fix bugs of s2t fairseq model converting by @beomseok-lee in #15593
- Add type hints for Pegasus model (PyTorch) by @Tegzes in #16324
- Funnel type hints by @AMontgomerie in #16323
- Add type hints for ProphetNet PyTorch by @Tegzes in #16272
- [GLPN] Improve docs by @NielsRogge in #16331
- Added type hints for Pytorch Marian calls by @clefourrier in #16200
- VAN: Code sample tests by @FrancescoSaverioZuppichini in #16340
- Add type annotations for Rembert/Splinter and copies by @jacobdineen in #16338
- [Bug template] Shift responsibilities for long-range by @patrickvonplaten in #16344
- Fix code repetition in serialization guide by @osanseviero in #16346
- Adopt framework-specific blocks for content by @stevhliu in #16342
- Updates the default branch from master to main by @LysandreJik in #16326
- [T5] Add t5 download script by @patrickvonplaten in #16328
- Reorganize file utils by @sgugger in #16264
- [FlaxBart] make sure no grads are computed an bias by @patrickvonplaten in #16345
- Trainer evaluation delay by @OllieBroadhurst in #16356
- Adding missing type hints for mBART model (TF) by @reichenbch in #16281
- Add type annotations of config for vision models by @johnnv1 in #16263
- TF - Fix interchangeable past/past_key_values and revert output variable name in GPT2 by @gante in #16332
- Swap inequalities by @OllieBroadhurst in #16368
- Make Transformers use cache files when hf.co is down by @sgugger in #16362
- Decision transformer gym by @edbeeching in #15845
- add GPT-J ONNX config to Transformers by @ChainYo in #16274
- Update docs/README.md by @ydshieh in #16333
- Make BigBird model compatiable to fp16 dtype. by @xuzhao9 in #16034
- [Doctests] Make roberta-like meaningfull by @patrickvonplaten in #16363
- [Doctests] Make TFRoberta-like meaningfull by @ydshieh in #16370
- Update readme with how to train offline and fix BPE command by @ncoop57 in #15897
- Fix BigBirdModelTester by @ydshieh in #16310
- Type hints and decorator for TF T5 by @Dahlbomii in #16376
- Add type hints for ConvBert model by @simonzli in #16377
- Update pt flax equivalence tests in pt by @ydshieh in #16280
- Bump cookiecutter version by @ydshieh in #16387
- Fix style by @LysandreJik in #16391
- Fix readme links and add CI check by @sgugger in #16392
- variable naming for Distilbert model by @robotjellyzone in #16384
- Added type hints by @yhl48 in #16389
- Rename semantic segmentation outputs by @NielsRogge in #15849
- Make FeaturesManager.get_model_from_feature a static method by @michaelbenayoun in #16357
- Big file_utils cleanup by @sgugger in #16396
- fixed typo from enable to disable in disable_progress_bar function by @Gladiator07 in #16406
- Rename master to main for notebooks links and leftovers by @sgugger in #16397
- TF PushToHubCallback fixes and updates by @Rocketknight1 in #16409
- Add ONNX support for Blenderbot and BlenderbotSmall by @lewtun in #15875
- [FlaxSpeechEncoderDecoder] Fix feature extractor gradient test by @sanchit-gandhi in #16407
- Fix Typo in Argument of FlaxWav2Vec2ForPreTrainingModule by @sanchit-gandhi in #16084
- Removed inputs_processing and replaced with decorator for lxmert by @silvererudite in #16414
- remove references to PDF reading via PIL by @garfieldnate in #15293
- Update comments in class BatchEncoding by @basicv8vc in #15932
- Fix broken links by @kurianbenoy in #16113
cached_download ∘ hf_hub_url
ishf_hub_download
by @julien-c in #16375- QDQBert example update by @shangz-ai in #16395
- [Flax] Improve Robustness of Back-Prop Tests by @sanchit-gandhi in #16418
- Fix typo in language modeling example comment by @dreamgonfly in #16421
- Use doc builder styler by @sgugger in #16412
- Fix PerceiverMLP and test by @jaesuny in #16405
- [FlaxSpeechEncoderDecoderModel] Ensure Input and Output Word Embeddings Are Not Tied by @sanchit-gandhi in #16444
- Translation from english to spanish of file pipeline_tutorial.mdx by @FernandoLpz in #16149
- Remove kwargs argument from IBERT MLM forward pass by @lewtun in #16449
- Fix blenderbot conversion script by @patil-suraj in #16472
- Adding DocTest to TrOCR by @arnaudstiegler in #16398
- [MNLI example] Prevent overwriting matched with mismatched metrics by @eldarkurtic in #16475
- Remove duplicate mLuke by @stevhliu in #16460
- Fix missing output_attentions in PT/Flax equivalence test by @ydshieh in #16271
- Fix some TF GPT-J CI testings by @ydshieh in #16454
- Fix example test and test_fetcher for examples by @sgugger in #16478
- fix wrong variable name by @wesleyacheng in #16467
- Add TF vision model code samples by @ydshieh in #16477
- missing trainer import by @wesleyacheng in #16469
- Add type hints for UniSpeech by @Tegzes in #16399
- TF: properly handle kwargs in encoder_decoder architectures by @gante in #16465
- added typehints for RAG pytorch models by @akashe in #16416
- Avoid accessing .dataset of a DataLoader in Trainer by @sanderland in #16451
- TF GPT2: clearer model variable naming with @unpack_inputs by @cakiki in #16311
- Raise diff tolerance value for TFViTMAEModelTest by @ydshieh in #16483
- Do not initialize
torch.distributed
process group if one is already initailized by @Yard1 in #16487 - TF GPT-J Type hints and TF decorator by @Dahlbomii in #16488
- Nit: MCSCOCO -> MSCOCO by @AdityaKane2001 in #16481
- Add length to PreTrainedTokenizer train_new_from_iterator by @dctelus in #16493
- Add support for exporting GPT-J to ONNX-TRT by @tomerip in #16492
- TF: unpack inputs on Convbert, GPTJ, LED, and templates by @gante in #16491
- Feature Extractor accepts
segmentation_maps
by @FrancescoSaverioZuppichini in #15964 - [examples] max samples can't be bigger than the len of dataset by @stas00 in #16501
- update smddp api to v1.4.0 by @roywei in #16371
- Support reduce_bucket_size="auto" for deepspeed stages <3 by @manuelciosici in #16496
- Modeling Outputs by @FrancescoSaverioZuppichini in #16341
- make tuple annotation more specific to avoid failures during symbolic_trace by @chenbohua3 in #16490
- Spanish translation of the file multilingual.mdx by @SimplyJuanjo in #16329
- Translate installation.mdx to Spanish by @lilianabs in #16229
- Translate accelerate.mdx from english to spanish by @Sangohe in #16176
- [Typo][Example] Fixed a typo in
run_qa_no_trainer.py
by @bhadreshpsavani in #16508 - added type hints to xglm pytorch by @mowafess in #16500
- Fix syntax error in generate docstrings by @sgugger in #16516
- [research] link to the XTREME-S paper by @anton-l in #16519
- Fixed a typo in seq2seq_trainer.py by @Agoniii in #16531
- Add ONNX export for BeiT by @akuma12 in #16498
- call on_train_end when optuna trial is pruned by @fschlatt in #16536
- Type hints added to OpenAIGPT by @Dahlbomii in #16529
- Fix Bart type hints by @gchhablani in #16297
- Add VisualBert type hints by @gchhablani in #16544
- Adding missing type hints for mBART model (PyTorch) by @reichenbch in #16429
- Remove MBart subclass of XLMRoberta in tokenzier docs by @gchhablani in #16546
- Use random_attention_mask for TF tests by @ydshieh in #16517
- [GLPN] Improve code example by @NielsRogge in #16450
- Pin tokenizers version <0.13 by @LysandreJik in #16539
- add code samples for TF speech models by @ydshieh in #16494
- [FlaxSpeechEncoderDecoder] Fix dtype bug by @patrickvonplaten in #16581
- Making the impossible to connect error actually report the right URL. by @Narsil in #16446
- Fix flax import in
__init__.py
:modeling_xglm -> modeling_flax_xglm
by @stancld in #16556 - Add utility to find model labels by @sgugger in #16526
- Enable doc in Spanish by @sgugger in #16518
- Add use_auth to load_datasets for private datasets to PT and TF examples by @KMFODA in #16521
- add a test checking the format of
convert_tokens_to_string
's output by @SaulLu in #16540 - TF: Finalize
unpack_inputs
-related changes by @gante in #16499 - [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output by @sanchit-gandhi in #16586
- initialize the default rank set on TrainerState by @andrescodas in #16530
- Fix CI: test_inference_for_pretraining in ViTMAEModelTest by @ydshieh in #16591
- add a template to add missing tokenization test by @SaulLu in #16553
- PretrainedModel: made
_load_pretrained_model_low_mem
static + bug fix by @FrancescoSaverioZuppichini in #16548 - handle torch_dtype in low cpu mem usage by @patil-suraj in #16580
- [Doctests] Correct filenaming by @patrickvonplaten in #16599
- Adding new train_step logic to make things less confusing for users by @Rocketknight1 in #15994
- Adding missing type hints for BigBird model by @reichenbch in #16555
- [deepspeed] fix typo, adjust config name by @stas00 in #16597
- Add global_attention_mask to gen_kwargs in Seq2SeqTrainer.prediction_step by @JohnGiorgi in #16485
- [benchmark tool] trainer-benchmark.py by @stas00 in #14934
- Update summary of the tasks by @stevhliu in #16528
- added type hints to CTRL pytorch by @anmolsjoshi in #16593
- fix default num_attention_heads in segformer doc by @JunMa11 in #16612
- [Docs] Correct quicktour minds14 dataset by @patrickvonplaten in #16626
- Fix seq2seq doc tests by @patil-suraj in #16606
- don't load state_dict twice when using low_cpu_mem_usage in from_pretrained by @patil-suraj in #16602
- Use CLIP model config to set some kwargs for components by @ydshieh in #16609
- [modeling_utils] typo by @stas00 in #16621
- [Speech2Text Doc] Fix docs by @patrickvonplaten in #16611
- [FlaxSpeechEncoderDecoderModel] More Rigorous PT-Flax Equivalence Tests by @sanchit-gandhi in #16589
- Fix TFTransfoXLLMHeadModel outputs by @ydshieh in #16590
Impressive community contributors
The community contributors below have significantly contributed to the v4.18.0 release. Thank you!
@sayakpaul, for contributing the TensorFlow version of ViTMAE
@stancld, for contributing the TensorFlow version of of GPT-J
New Contributors
- @Soonhwan-Kwon made their first contribution in #13727
- @jonatasgrosman made their first contribution in #15428
- @ToluClassics made their first contribution in #15432
- @peregilk made their first contribution in #15423
- @bugface made their first contribution in #15480
- @AyushExel made their first contribution in #14582
- @thinksoso made their first contribution in #15403
- @davidleonfdez made their first contribution in #15473
- @sanchit-gandhi made their first contribution in #15519
- @arron1227 made their first contribution in #15084
- @cimeister made their first contribution in #15504
- @cwkeam made their first contribution in #15416
- @Albertobegue made their first contribution in #13831
- @derenrich made their first contribution in #15614
- @tkukurin made their first contribution in #15636
- @muzhi1991 made their first contribution in #15638
- @versae made their first contribution in #15590
- @jonrbates made their first contribution in #15617
- @arampacha made their first contribution in #15413
- @FrancescoSaverioZuppichini made their first contribution in #15657
- @coyotte508 made their first contribution in #15680
- @heytanay made their first contribution in #15531
- @gautierdag made their first contribution in #15702
- @SSardorf made their first contribution in #15741
- @Crabzmatic made their first contribution in #15740
- @dreamgonfly made their first contribution in #15644
- @lsb made their first contribution in #15468
- @pbelevich made their first contribution in #15776
- @sayakpaul made their first contribution in #15750
- @rahul003 made their first contribution in #15877
- @rhjohnstone made their first contribution in #15884
- @cosmoquester made their first contribution in #15913
- @konstantinjdobler made their first contribution in #15951
- @yhavinga made their first contribution in #15963
- @dlwh made their first contribution in #15961
- @basilevh made their first contribution in #15972
- @andstor made their first contribution in #16033
- @davidsbatista made their first contribution in #16063
- @feifang24 made their first contribution in #16065
- @kevinpl07 made their first contribution in #15245
- @johnnv1 made their first contribution in #16088
- @Abdelrhman-Hosny made their first contribution in #16097
- @p-mishra1 made their first contribution in #16099
- @jbrry made their first contribution in #16108
- @jorgtied made their first contribution in #16124
- @vumichien made their first contribution in #16110
- @merveenoyan made their first contribution in #16138
- @yharyarias made their first contribution in #16047
- @bhavika made their first contribution in #16129
- @PepijnBoers made their first contribution in #16107
- @soomiles made their first contribution in #16121
- @Tegzes made their first contribution in #16126
- @jacobdineen made their first contribution in #16106
- @wpan03 made their first contribution in #16123
- @infinite-Joy made their first contribution in #16147
- @marxav made their first contribution in #16132
- @Duedme made their first contribution in #16158
- @MarkusSagen made their first contribution in #16087
- @mowafess made their first contribution in #16163
- @jcmc00 made their first contribution in #16174
- @utkusaglm made their first contribution in #16178
- @johko made their first contribution in #16181
- @johnryan465 made their first contribution in #16090
- @daysm made their first contribution in #16208
- @forsc made their first contribution in #16212
- @Sophylax made their first contribution in #16227
- @function2-llx made their first contribution in #15795
- @ktzsh made their first contribution in #16131
- @louisowen6 made their first contribution in #16247
- @omarespejel made their first contribution in #16215
- @dinesh-GDK made their first contribution in #16266
- @aflah02 made their first contribution in #16115
- @PolarisRisingWar made their first contribution in #16291
- @happyXia made their first contribution in #16284
- @robotjellyzone made their first contribution in #16270
- @yhl48 made their first contribution in #16257
- @johnnygreco made their first contribution in #16244
- @IvanLauLinTiong made their first contribution in #16307
- @beomseok-lee made their first contribution in #15593
- @clefourrier made their first contribution in #16200
- @OllieBroadhurst made their first contribution in #16356
- @reichenbch made their first contribution in #16281
- @edbeeching made their first contribution in #15845
- @xuzhao9 made their first contribution in #16034
- @Dahlbomii made their first contribution in #16376
- @simonzli made their first contribution in #16377
- @Gladiator07 made their first contribution in #16406
- @silvererudite made their first contribution in #16414
- @garfieldnate made their first contribution in #15293
- @basicv8vc made their first contribution in #15932
- @kurianbenoy made their first contribution in #16113
- @jaesuny made their first contribution in #16405
- @FernandoLpz made their first contribution in #16149
- @arnaudstiegler made their first contribution in #16398
- @wesleyacheng made their first contribution in #16467
- @akashe made their first contribution in #16416
- @sanderland made their first contribution in #16451
- @AdityaKane2001 made their first contribution in #16481
- @dctelus made their first contribution in #16493
- @tomerip made their first contribution in #16492
- @roywei made their first contribution in #16371
- @chenbohua3 made their first contribution in #16490
- @SimplyJuanjo made their first contribution in #16329
- @lilianabs made their first contribution in #16229
- @Sangohe made their first contribution in #16176
- @Agoniii made their first contribution in #16531
- @akuma12 made their first contribution in #16498
- @fschlatt made their first contribution in #16536
- @KMFODA made their first contribution in #16521
- @andrescodas made their first contribution in #16530
- @JohnGiorgi made their first contribution in #16485
- @JunMa11 made their first contribution in #16612
Full Changelog: v4.17.0...v4.18.0