github huggingface/transformers v4.22.0
v4.22.0: Swin Transformer v2, VideoMAE, Donut, Pegasus-X, X-CLIP, ERNIE

latest releases: v4.45.1, v4.45.0, v4.44.2...
2 years ago

Swin Transformer v2

The Swin Transformer V2 model was proposed in Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

Swin Transformer v2 improves the original Swin Transformer using 3 main techniques: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) a log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

VideoMAE

The VideoMAE model was proposed in VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders (MAE) to video, claiming state-of-the-art performance on several video classification benchmarks.

VideoMAE is an extension of ViTMAE for video.

Donut

The Donut model was proposed in OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

Pegasus-X

The PEGASUS-X model was proposed in Investigating Efficiently Extending Transformers for Long Input Summarization by Jason Phang, Yao Zhao and Peter J. Liu.

PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

X-CLIP

The X-CLIP model was proposed in Expanding Language-Image Pretrained Models for General Video Recognition by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of CLIP for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

X-CLIP is a minimal extension of CLIP for video-language understanding.

ERNIE

ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including ERNIE1.0, ERNIE2.0, ERNIE3.0, ERNIE-Gram, ERNIE-health, etc.
These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

TensorFlow models

MobileViT and LayoutLMv3 are now available in TensorFlow.

New task-specific architectures

A new question answering head was added for the LayoutLM model.

New pipelines

Two new pipelines are available in transformers: a document question answering pipeline, as well as an image to text generation pipeline.

M1 support

There is now Mac M1 support in PyTorch in transformers in pipelines and the Trainer.

Backend version compatibility

Starting from version v4.22.0, we'll now officially support PyTorch and TensorFlow versions that were released up to two years ago.
Versions older than two years-old will not be supported going forward.

We're making this change as we begin actively testing transformers compatibility on older versions.
This project can be followed here.

Generate method updates

The generate method now starts enforcing stronger validation in order to ensure proper usage.

  • Generate: validate model_kwargs (and catch typos in generate arguments) by @gante in #18261
  • Generate: validate model_kwargs on TF (and catch typos in generate arguments) by @gante in #18651
  • Generate: add model class validation by @gante in #18902

API changes

The as_target_tokenizer and as_target_processor context managers have been deprecated. The new API is to use the call method of the tokenizer/processor with keyword arguments. For instance:

with tokenizer.as_target_tokenizer():
    encoded_labels = tokenizer(labels, padding=True)

becomes

encoded_labels = tokenizer(text_target=labels, padding=True)
  • Replace as_target context managers by direct calls by @sgugger in #18325

Bits and bytes integration

Bits and bytes is now integrated within transformers. This feature can reduce the size of large models by up to 2, with low loss in precision.

Large model support

Models that have sharded checkpoints in PyTorch can be loaded in Flax.

TensorFlow improvements

The TensorFlow examples have been rewritten to support all recent features developped in the past months.

DeBERTa-v2 is now trainable with XLA.

Documentation changes

Improvements and bugfixes

  • sentencepiece shouldn't be required for the fast LayoutXLM tokenizer by @LysandreJik in #18320
  • Fix sacremoses sof dependency for Transformers XL by @sgugger in #18321
  • Owlvit test fixes by @alaradirik in #18303
  • [Flax] Fix incomplete batches in example scripts by @sanchit-gandhi in #17863
  • start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … by @sywangyi in #18229
  • Update feature extractor docs by @stevhliu in #18324
  • fixed typo by @banda-larga in #18331
  • updated translation by @banda-larga in #18333
  • Updated _toctree.yml by @nickprock in #18337
  • Update automatic_speech_recognition.py by @bofenghuang in #18339
  • Fix codeparrot deduplication - ignore whitespaces by @loubnabnl in #18023
  • Remove Flax OPT from doctest for now by @ydshieh in #18338
  • Include tensorflow-aarch64 as a candidate by @ankrgyl in #18345
  • [BLOOM] Deprecate position_ids by @thomasw21 in #18342
  • Migrate metric to Evaluate library for tensorflow examples by @VijayKalmath in #18327
  • Migrate metrics used in flax examples to Evaluate by @VijayKalmath in #18348
  • [Docs] Fix Speech Encoder Decoder doc sample by @sanchit-gandhi in #18346
  • Fix OwlViT torchscript tests by @ydshieh in #18347
  • Fix some doctests by @ydshieh in #18359
  • [FX] Symbolic trace for Bloom by @michaelbenayoun in #18356
  • Fix TFSegformerForSemanticSegmentation doctest by @ydshieh in #18362
  • fix FSDP ShardedGradScaler by @pacman100 in #18358
  • Migrate metric to Evaluate in Pytorch examples by @atturaioe in #18369
  • Correct the spelling of bleu metric by @ToluClassics in #18375
  • Remove pt-like calls on tf tensor by @amyeroberts in #18393
  • Fix from_pretrained kwargs passing by @YouJiacheng in #18387
  • Add a check regarding the number of occurrences of ``` by @ydshieh in #18389
  • Add evaluate to test dependencies by @sgugger in #18396
  • Fix OPT doc tests by @ArthurZucker in #18365
  • Fix doc tests by @NielsRogge in #18397
  • Add balanced strategies for device_map in from_pretrained by @sgugger in #18349
  • Fix docs by @NielsRogge in #18399
  • Adding fine-tuning models to LUKE by @ikuyamada in #18353
  • Fix ROUGE add example check and update README by @sgugger in #18398
  • Add Flax BART pretraining script by @duongna21 in #18297
  • Rewrite push_to_hub to use upload_files by @sgugger in #18366
  • Layoutlmv2 tesseractconfig by @kelvinAI in #17733
  • fix: create a copy for tokenizer object by @YBooks in #18408
  • Fix uninitialized parameter in conformer relative attention. by @PiotrDabkowski in #18368
  • Fix the hub user name in a longformer doctest checkpoint by @ydshieh in #18418
  • Change audio kwarg to images in TROCR processor by @ydshieh in #18421
  • update maskformer docs by @alaradirik in #18423
  • Fix test_load_default_pipelines_tf test error by @ydshieh in #18422
  • fix run_clip README by @ydshieh in #18332
  • Improve generate docstring by @JoaoLages in #18198
  • Accept trust_remote_code and ignore it in PreTrainedModel.from_pretrained by @ydshieh in #18428
  • Update pipeline word heuristic to work with whitespace in token offsets by @davidbenton in #18402
  • Add programming languages by @cakiki in #18434
  • fixing error when using sharded ddp by @pacman100 in #18435
  • Update _toctree.yml by @stevhliu in #18440
  • support ONNX export of XDropout in deberta{,_v2} and sew_d by @garymm in #17502
  • Add Spanish translation of run_scripts.mdx by @donelianc in #18415
  • Update no trainer scripts for language modeling and image classification examples by @nandwalritik in #18443
  • Update pinned hhub version by @osanseviero in #18448
  • Fix failing tests for XLA generation in TF by @dsuess in #18298
  • add zero-shot obj detection notebook to docs by @alaradirik in #18453
  • fix: keras fit tests for segformer tf and minor refactors. by @sayakpaul in #18412
  • Fix torch version comparisons by @LSinev in #18460
  • [BLOOM] Clean modeling code by @thomasw21 in #18344
  • change shape to support dynamic batch input in tf.function XLA generate for tf serving by @nlpcat in #18372
  • HFTracer.trace can now take callables and torch.nn.Module by @michaelbenayoun in #18457
  • Update no trainer scripts for multiple-choice by @kiansierra in #18468
  • Fix load of model checkpoints in the Trainer by @sgugger in #18470
  • Add FX support for torch.baddbmm andd torch.Tensor.baddbmm by @thomasw21 in #18363
  • Add machine type in the artifact of Examples directory job by @ydshieh in #18459
  • Update no trainer examples for QA and Semantic Segmentation by @kiansierra in #18474
  • Add TF_MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING by @ydshieh in #18469
  • Fixing issue where generic model types wouldn't load properly with the pipeline by @Narsil in #18392
  • Fix TFSwinSelfAttention to have relative position index as non-trainable weight by @harrydrippin in #18226
  • Refactor TFSwinLayer to increase serving compatibility by @harrydrippin in #18352
  • Add TF prefix to TF-Res test class by @ydshieh in #18481
  • Remove py.typed by @sgugger in #18485
  • Fix pipeline tests by @sgugger in #18487
  • Use new huggingface_hub tools for download models by @sgugger in #18438
  • Fix test_dbmdz_english by updating expected values by @ydshieh in #18482
  • Move cache folder to huggingface/hub for consistency with hf_hub by @sgugger in #18492
  • Update some expected values in quicktour.mdx for resampy 0.3.0 by @ydshieh in #18484
  • disable Onnx test for google/long-t5-tglobal-base by @ydshieh in #18454
  • Typo reported by Joel Grus on TWTR by @julien-c in #18493
  • Just re-reading the whole doc every couple of months 😬 by @julien-c in #18489
  • transformers-cli login => huggingface-cli login by @julien-c in #18490
  • Add seed setting to image classification example by @regisss in #18519
  • [DX fix] Fixing QA pipeline streaming a dataset. by @Narsil in #18516
  • Clean up hub by @sgugger in #18497
  • update fsdp docs by @pacman100 in #18521
  • Fix compatibility with 1.12 by @sgugger in #17925
  • Specify en in doc-builder README example by @ankrgyl in #18526
  • New cache fixes: add safeguard before looking in folders by @sgugger in #18522
  • unpin resampy by @ydshieh in #18527
  • ✨ update to use interlibrary links instead of Markdown by @stevhliu in #18500
  • Add example of multimodal usage to pipeline tutorial by @stevhliu in #18498
  • [VideoMAE] Add model to doc tests by @NielsRogge in #18523
  • Update perf_train_gpu_one.mdx by @mishig25 in #18532
  • Update no_trainer.py scripts to include accelerate gradient accumulation wrapper by @Rasmusafj in #18473
  • Add Spanish translation of converting_tensorflow_models.mdx by @donelianc in #18512
  • Spanish translation of summarization.mdx by @AguilaCudicio in #15947)
  • Let's not cast them all by @younesbelkada in #18471
  • fix: data2vec-vision Onnx ready-made configuration. by @NikeNano in #18427
  • Add mt5 onnx config by @ChainYo in #18394
  • Minor update of run_call_with_unpacked_inputs by @ydshieh in #18541
  • BART - Fix attention mask device issue on copied models by @younesbelkada in #18540
  • Adding a new align_to_words param to qa pipeline. by @Narsil in #18010
  • 📝 update metric with evaluate by @stevhliu in #18535
  • Restore _init_weights value in no_init_weights by @YouJiacheng in #18504
  • 📝 update documentation build section by @stevhliu in #18548
  • Preserve hub-related kwargs in AutoModel.from_pretrained by @sgugger in #18545
  • Use commit hash to look in cache instead of calling head by @sgugger in #18534
  • Update philosophy to include other preprocessing classes by @stevhliu in #18550
  • Properly move cache when it is not in default path by @sgugger in #18563
  • Adds CLIP to models exportable with ONNX by @unography in #18515
  • raise atol for MT5OnnxConfig by @ydshieh in #18560
  • fix string by @mrwyattii in #18568
  • Segformer TF: fix output size in documentation by @joihn in #18572
  • Fix resizing bug in OWL-ViT by @alaradirik in #18573
  • Fix LayoutLMv3 documentation by @pocca2048 in #17932
  • Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training by @donebydan in #18486
  • german docs translation by @flozi00 in #18544
  • Deberta V2: Fix critical trace warnings to allow ONNX export by @iiLaurens in #18272
  • [FX] _generate_dummy_input supports audio-classification models for labels by @michaelbenayoun in #18580
  • Fix docstrings with last version of hf-doc-builder styler by @sgugger in #18581
  • fix owlvit tests, update docstring examples by @alaradirik in #18586
  • Return the permuted hidden states if return_dict=True by @amyeroberts in #18578
  • Add type hints for ViLT models by @donelianc in #18577
  • update doc for perf_train_cpu_many, add intel mpi introduction by @sywangyi in #18576
  • typos by @stas00 in #18594
  • FSDP bug fix for load_state_dict by @pacman100 in #18596
  • Add TFAutoModelForSemanticSegmentation to the main __init__.py by @ydshieh in #18600
  • Fix URLs by @NielsRogge in #18604
  • Update BLOOM parameter counts by @Muennighoff in #18531
  • [doc] fix anchors by @stas00 in #18591
  • [fsmt] deal with -100 indices in decoder ids by @stas00 in #18592
  • small change by @younesbelkada in #18584
  • Flax Remat for LongT5 by @KMFODA in #17994
  • Change scheduled CIs to use torch 1.12.1 by @ydshieh in #18644
  • Add checks for some workflow jobs by @ydshieh in #18583
  • TF: Fix generation repetition penalty with XLA by @gante in #18648
  • Update longt5.mdx by @flozi00 in #18634
  • Update run_translation_no_trainer.py by @zhoutang776 in #18637
  • [bnb] Minor modifications by @younesbelkada in #18631
  • Examples: add Bloom support for token classification by @stefan-it in #18632
  • Fix Yolos ONNX export test by @ydshieh in #18606
  • Fix matmul inputs dtype by @JingyaHuang in #18585
  • Update feature extractor methods to enable type cast before normalize by @amyeroberts in #18499
  • Allow users to force TF availability by @Rocketknight1 in #18650
  • [LongT5] Correct docs long t5 by @patrickvonplaten in #18669
  • Generate: validate model_kwargs on FLAX (and catch typos in generate arguments) by @gante in #18653
  • Ping detectron2 for CircleCI tests by @ydshieh in #18680
  • Rename method to avoid clash with property by @amyeroberts in #18677
  • Rename second input dimension from "sequence" to "num_channels" for CV models by @regisss in #17976
  • Fix repo consistency by @lewtun in #18682
  • Fix breaking change in onnxruntime for ONNX quantization by @severinsimmler in #18336
  • Add evaluate to examples requirements by @muellerzr in #18666
  • [bnb] Move documentation by @younesbelkada in #18671
  • Add an examples folder for code downstream tasks by @loubnabnl in #18679
  • model.tie_weights() should be applied after accelerator.prepare() by @Gladiator07 in #18676
  • Generate: add missing **model_kwargs in sample tests by @gante in #18696
  • Temp fix for broken detectron2 import by @patrickvonplaten in #18699
  • [Hotfix] pin detectron2 5aeb252 to avoid test fix by @ydshieh in #18701
  • Fix Data2VecVision ONNX test by @ydshieh in #18587
  • Add missing tokenizer tests - Longformer by @tgadeliya in #17677
  • remove check for main process for trackers initialization by @Gladiator07 in #18706
  • Unpin detectron2 by @ydshieh in #18727
  • Removing warning of model type for microsoft/tapex-base-finetuned-wtq by @Narsil in #18711
  • improve add_tokens docstring by @SaulLu in #18687
  • CLI: Don't check the model head when there is no model head by @gante in #18733
  • Update perf_infer_gpu_many.mdx by @mishig25 in #18744
  • Add minor doc-string change to include hp_name param in hyperparameter_search by @constantin-huetterer in #18700
  • fix pipeline_tutorial.mdx doctest by @ydshieh in #18717
  • Add TF implementation of XGLMModel by @stancld in #16543
  • fixed docstring typos by @JadeKim042386 in #18739
  • add warning to let the user know that the __call__ method is faster than encode + pad for a fast tokenizer by @SaulLu in #18693
  • examples/run_summarization_no_trainer: fixed incorrect param to hasattr by @rahular in #18720
  • Add ONNX support for Longformer by @deutschmn in #17176
  • Determine framework automatically before ONNX export by @rachthree in #18615
  • streamlining 'checkpointing_steps' parsing by @rahular in #18755
  • CLI: Improved error control and updated hub requirement by @gante in #18752
  • [VisionEncoderDecoder] Add gradient checkpointing by @patrickvonplaten in #18697
  • [Wav2vec2 + LM Test] Improve wav2vec2 with lm tests and make torch version dependent for now by @patrickvonplaten in #18749
  • Fix incomplete outputs of FlaxBert by @duongna21 in #18772
  • Fix broken link DeepSpeed documentation link by @philschmid in #18783
  • fix missing block when there is no failure by @ydshieh in #18775
  • fix a possible typo in auto feature extraction by @fcakyon in #18779
  • Fix memory leak issue in torch_fx tests by @ydshieh in #18547
  • Fix mock in test_cached_files_are_used_when_internet_is_down by @Wauplin in #18804
  • Add SegFormer and ViLT links by @NielsRogge in #18808
  • send model to the correct device by @ydshieh in #18800
  • Revert to and safely handle flag in owlvit config by @amyeroberts in #18750
  • Add docstring for BartForCausalLM by @ekagra-ranjan in #18795
  • up by @qqaatw in #18805
  • [Swin, Swinv2] Fix attn_mask dtype by @NielsRogge in #18803
  • Run tests if skip condition not met by @amyeroberts in #18764
  • Remove ViltForQuestionAnswering from check_repo by @NielsRogge in #18762
  • Adds OWLViT to models exportable with ONNX by @unography in #18588
  • Adds GroupViT to models exportable with ONNX by @unography in #18628
  • LayoutXLMProcessor: ensure 1-to-1 mapping between samples and images, and add test for it by @anthony2261 in #18774
  • Added Docstrings for Deberta and DebertaV2 [PyTorch] by @Tegzes in #18610
  • Improving the documentation for "word", within the pipeline. by @Narsil in #18763
  • Disable nightly CI temporarily by @ydshieh in #18820
  • Pin max tf version by @gante in #18818
  • Fix cost condition in DetrHungarianMatcher and YolosHungarianMatcher to allow zero-cost by @kongzii in #18647
  • oob performance improvement for cpu DDP by @sywangyi in #18595
  • Warn on TPUs when the custom optimizer and model device are not the same by @muellerzr in #18668
  • Update location identification by @LysandreJik in #18834
  • fix bug: register_for_auto_class should be defined on TFPreTrainedModel instead of TFSequenceSummary by @azonti in #18607
  • [DETR] Add num_channels attribute by @NielsRogge in #18714
  • Pin ffspec by @sgugger in #18837
  • Improve GPT2 doc by @ekagra-ranjan in #18787
  • Add an option to HfArgumentParser.parse_{dict,json_file} to raise an Exception when there extra keys by @FelixSchneiderZoom in #18692
  • Improve Text Generation doc by @ekagra-ranjan in #18788
  • Add SegFormer ONNX support by @NielsRogge in #18006
  • Add security warning about the from_pretrained() method by @lewtun in #18801
  • Owlvit memory leak fix by @alaradirik in #18734
  • Create pipeline_tutorial.mdx german docs by @flozi00 in #18625
  • Unpin fsspec by @albertvillanova in #18846
  • Delete state_dict to release memory as early as possible by @ydshieh in #18832
  • Generate: smaller TF serving test by @gante in #18840
  • add a script to get time info. from GA workflow jobs by @ydshieh in #18822
  • Pin rouge_score by @albertvillanova in #18247
  • Minor typo in prose of model outputs documentation. by @pcuenca in #18848
  • reflect max_new_tokens in Seq2SeqTrainer by @kumapo in #18786
  • Adds timeout argument to training_args to avoid socket timeouts in DDP by @gugarosa in #18562
  • Cache results of is_torch_tpu_available() by @comaniac in #18777
  • Tie weights after preparing the model in run_clm by @sgugger in #18855
  • Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests by @ankrgyl in #18854
  • Split docs on modality by @stevhliu in #18205
  • if learning rate is a tensor, get item (float) by @kmckiern in #18861
  • Fix naming issue with ImageToText pipeline by @OlivierDehaene in #18864
  • [LayoutLM] Add clarification to docs by @NielsRogge in #18716
  • Add OWL-ViT to the appropriate section by @NielsRogge in #18867
  • Clean up utils.hub using the latest from hf_hub by @sgugger in #18857
  • pin Slack SDK to 3.18.1 to avoid failing issue by @ydshieh in #18869
  • Fix number of examples for iterable datasets in multiprocessing by @sgugger in #18856
  • postpone bnb load until it's needed by @stas00 in #18859
  • A script to download artifacts and perform CI error statistics by @ydshieh in #18865
  • Remove cached torch_extensions on CI runners by @ydshieh in #18868
  • Update docs landing page by @stevhliu in #18590
  • Finetune guide for semantic segmentation by @stevhliu in #18640
  • Add Trainer to quicktour by @stevhliu in #18723
  • TF: TFMarianMTModel final logits bias as a layer by @gante in #18833
  • Mention TF and Flax checkpoints by @LysandreJik in #18894
  • Correct naming pegasus x by @patrickvonplaten in #18896
  • Update perf_train_gpu_one.mdx by @thepurpleowl in #18442
  • Add type hints to XLM-Roberta-XL models by @asofiaoliveira in #18475
  • Update Chinese documentation by @zkep in #18893
  • Generate: get the correct beam index on eos token by @gante in #18851
  • Mask t5 relative position bias then head pruned by @hadaev8 in #17968
  • updating gather function with gather_for_metrics in run_wav2vec2_pretraining by @arun99481 in #18877
  • Fix decode_input_ids to bare T5Model and improve doc by @ekagra-ranjan in #18791
  • Fix test_tf_encode_plus_sent_to_model for LayoutLMv3 by @ydshieh in #18898
  • fixes bugs to handle non-dict output by @alaradirik in #18897
  • Further reduce the number of alls to head for cached objects by @sgugger in #18871
  • unpin slack_sdk version by @ydshieh in #18901
  • Fix incorrect size of input for 1st strided window length in Perplexity of fixed-length models by @ekagra-ranjan in #18906
  • [VideoMAE] Improve code examples by @NielsRogge in #18919
  • Add checks for more workflow jobs by @ydshieh in #18905
  • Accelerator end training by @nbroad1881 in #18910
  • update the train_batch_size in case HPO change batch_size_per_device by @sywangyi in #18918
  • Update TF fine-tuning docs by @Rocketknight1 in #18654
  • TF: final bias as a layer in seq2seq models (replicate TFMarian fix) by @gante in #18903
  • remvoe _create_and_check_torch_fx_tracing in specific test files by @ydshieh in #18667
  • [DeepSpeed ZeRO3] Fix performance degradation in sharded models by @tjruwase in #18911
  • pin TF 2.9.1 for self-hosted CIs by @ydshieh in #18925
  • Fix XLA fp16 and bf16 error checking by @ymwangg in #18913
  • Starts on a list of external deps required for dev by @colindean in #18929
  • Add image height and width to ONNX dynamic axes by @lewtun in #18915
  • Skip some doctests in quicktour by @stevhliu in #18927
  • Fix LayoutXLM wrong link in README by @Devlee247 in #18932
  • Update translation requests contact by @NimaBoscarino in #18941
  • [JAX] Replace all jax.tree_* calls with jax.tree_util.tree_* by @sanchit-gandhi in #18361
  • Neptune.ai integration improvements by @Raalsky in #18934
  • Generate: Simplify is_pad_token_not_equal_to_eos_token_id by @ekagra-ranjan in #18933
  • Fix train_step, test_step and tests for CLIP by @Rocketknight1 in #18684
  • Exit early in load if no weights are in the sharded state dict by @sgugger in #18937
  • update black target version by @BramVanroy in #18955
  • RFC: Replace custom TF embeddings by Keras embeddings by @gante in #18939
  • TF: unpin maximum TF version by @gante in #18917
  • Revert "TF: unpin maximum TF version by @sgugger in #18917)"
  • remove unused activation dropout by @shijie-wu in #18842
  • add DDP HPO support for sigopt by @sywangyi in #18931
  • Remove decoder_position_ids from check_decoder_model_past_large_inputs by @ydshieh in #18980
  • create Past CI results as tables for GitHub issue by @ydshieh in #18953
  • Remove dropout in embedding layer of OPT by @shijie-wu in #18845
  • Fix TF start docstrings by @Rocketknight1 in #18991
  • Align try_to_load_from_cache with huggingface_hub by @sgugger in #18966
  • Fix tflongformer int dtype by @Rocketknight1 in #18907
  • TF: correct TFBart embeddings weights name when load_weight_prefix is passed by @gante in #18993
  • fix checkpoint name for wav2vec2 conformer by @ydshieh in #18994
  • added type hints by @daspartho in #18996
  • TF: TF 2.10 unpin + related onnx test skips by @gante in #18995
  • Fixed typo by @tnusser in #18921
  • Removed issue in wav2vec link by @chrisemezue in #18945
  • Fix MaskFormerFeatureExtractor instance segmentation preprocessing bug by @alaradirik in #18997
  • Add type hints for M2M by @daspartho in #18998
  • Fix tokenizer for XLMRobertaXL by @ydshieh in #19004
  • Update default revision for document-question-answering by @ankrgyl in #18938
  • Fixed bug which caused overwrite_cache to always be True by @rahular in #19000
  • add DDP HPO support for optuna by @sywangyi in #19002
  • add missing require_tf for TFOPTGenerationTest by @ydshieh in #19010
  • Re-add support for single url files in objects download by @sgugger in #19014

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @nandwalritik
    • Add swin transformer v2 (#17469)
    • Update no trainer scripts for language modeling and image classification examples (#18443)
  • @ankrgyl
    • Include tensorflow-aarch64 as a candidate (#18345)
    • Specify en in doc-builder README example (#18526)
    • Add LayoutLMForQuestionAnswering model (#18407)
    • Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests (#18854)
    • Add DocumentQuestionAnswering pipeline (#18414)
    • Update default revision for document-question-answering (#18938)
  • @ikuyamada
    • Adding fine-tuning models to LUKE (#18353)
  • @duongna21
    • Add Flax BART pretraining script (#18297)
    • Fix incomplete outputs of FlaxBert (#18772)
  • @donelianc
    • Add Spanish translation of run_scripts.mdx (#18415)
    • Add Spanish translation of converting_tensorflow_models.mdx (#18512)
    • Add type hints for ViLT models (#18577)
  • @sayakpaul
    • fix: keras fit tests for segformer tf and minor refactors. (#18412)
    • TensorFlow MobileViT (#18555)
  • @flozi00
    • german docs translation (#18544)
    • Update longt5.mdx (#18634)
    • Create pipeline_tutorial.mdx german docs (#18625)
  • @stancld
    • Add TF implementation of XGLMModel (#16543)
  • @ChrisFugl
    • [LayoutLMv3] Add TensorFlow implementation (#18678)
  • @zphang
  • @nghuyong
    • add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686)
  • @SO0529
    • Add support for Japanese GPT-NeoX-based model by ABEJA, Inc. (#18814)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.