github huggingface/transformers v4.49.0
v4.49.0: Helium, Qwen2.5-VL, SuperGlue, Granite Vision, Zamba2, GOT-OCR 2.0, DAB-DETR, Depth Pro, RT-DETRv2, GPTQModel

latest release: v4.49.0-SmolVLM-2
3 days ago

New models

Helium

Helium-1 preview is a lightweight language model with 2B parameters, targeting edge and mobile devices. It supports the following languages: English, French, German, Italian, Portuguese, Spanish.

image

Qwen2.5-VL

The Qwen2.5-VL model is an update to Qwen2-VL from Qwen team, Alibaba Group.

The abstract from this update is the following:

Qwen2.5-VL marks a major step forward from Qwen2-VL, built upon the latest Qwen2.5 LLM. We’ve accelerated training and testing through the strategic implementation of window attention within the ViT. The ViT architecture itself has been refined with SwiGLU and RMSNorm, aligning it more closely with the LLM’s structure. A key innovation is the expansion of native dynamic resolution to encompass the temporal dimension, in addition to spatial aspects. Furthermore, we’ve upgraded MRoPE, incorporating absolute time alignment on the time axis to allow the model to effectively capture temporal dynamics, regardless of frame rate, leading to superior video understanding.

image

SuperGlue

The SuperGlue model was proposed in SuperGlue: Learning Feature Matching with Graph Neural Networks by Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz and Andrew Rabinovich.

This model consists of matching two sets of interest points detected in an image. Paired with the SuperPoint model, it can be used to match two images and estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.

image

Granite Vision Support

The Granite Vision model is a variant of LLaVA-NeXT, leveraging a Granite language model alongside a SigLIP visual encoder. It utilizes multiple concatenated vision hidden states as its image features, similar to VipLlava. It also uses a larger set of image grid pinpoints than the original LlaVa-NeXT models to support additional aspect ratios.

Zamba2

Zamba2 is a large language model (LLM) trained by Zyphra, and made available under an Apache 2.0 license.

Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B are hybrid models combining state-space models (Specifically Mamba) and transformer, and were trained using next-token prediction. Zamba2 uses shared transformer layers after every 6 mamba blocks. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B were pre-trained on 2T and 3T tokens, respectively.

image

GOT-OCR 2.0

GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music. While this implementation of the model will only output plain text, the outputs can be further processed to render the desired format, with packages like pdftex, mathpix, matplotlib, tikz, verovio or pyecharts. The model can also be used for interactive OCR, where the user can specify the region to be recognized by providing the coordinates or the color of the region’s bounding box.

image

DAB-DETR

DAB-DETR is an enhanced variant of Conditional DETR. It utilizes dynamically updated anchor boxes to provide both a reference query point (x, y) and a reference anchor size (w, h), improving cross-attention computation. This new approach achieves 45.7% AP when trained for 50 epochs with a single ResNet-50 model as the backbone.

image

Depth PRO

DepthPro is a foundation model for zero-shot metric monocular depth estimation, designed to generate high-resolution depth maps with remarkable sharpness and fine-grained details. It employs a multi-scale Vision Transformer (ViT)-based architecture, where images are downsampled, divided into patches, and processed using a shared Dinov2 encoder. The extracted patch-level features are merged, upsampled, and refined using a DPT-like fusion stage, enabling precise depth estimation.

RT-DETRv2

An improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 refines RT-DETR by introducing selective multi-scale feature extraction, a discrete sampling operator for broader deployment compatibility. These improvements yield a 0.3 to 1.4 increase in mAP metrics on the COCO dataset, all while maintaining the same parameter count and frames-per-second (FPS) performance.

Transformers-CLI

Transformers' CLI welcomes a new command: chat. This command starts a conversation with the model of your choosing directly in your terminal.

This feature exists in TRL and has been migrated to transformers for easier usage.

ezgif-56c494108b6d77

Processor Standardization

An ongoing work is to standardize the image processors so that their API is equivalent. Additionally, the processors are given a fast variant so that they are never blockers in the image processing pipelines.

In this release, several processors have been standardized and have seen their fast version be contributed.

Breaking changes

DPT segmentation maps

DPT image processors did not support segmentation_maps, instead only requiring images. This has been fixed.
This adds an argument to the preprocess method, therefore users using arguments as positional arguments with that method may see changed behavior. We recommend using keyword arguments for such methods so as to not be bothered by the addition of new features.

  • 🔴 🔴 🔴 Added segmentation maps support for DPT image processor by @simonreise in #34345

Image classification pipeline and single vs multi-label

The problem_type in the config.json file was read incorrectly by the pipeline, which mapped single-label to multi-label losses, and vice-versa. This has been fixed.

  • 🚨🚨🚨 image-classification pipeline single-label and multi-label prob type squashing fns (sigmoid vs softmax) are backwards by @rwightman in #35848

Fixing the LayerNorm beta/gamma renames

The description of the pull request is the easiest way to understand the problem, why it exists, and how it is solved; please read the description below:

  • 🚨🚨🚨 An attempt to fix #29554. Include 'LayerNorm.' in gamma/beta rename scope, optimize string search. by @rwightman in #35615

VLM cleanup

The ignore_index property of the llava configuration has been removed as it was not serving a purpose.

Quantization

Quantization has received several improvements and fixes, including the contribution of FP8 quantization and the HIGGS quantization interface.

Additionally, we're replacing the AutoGPTQ implementaiton with GPTQModel from ModelCloud (see repository here)).

GPTQModel originated as major refractor of AutoGPTQ but is now a full-stand-in replacement with cleaner api, up-to-date model support, faster inference, higher quality quants.

  • Enable gptqmodel by @ jiqing-feng in #35012
  • Split and clean up GGUF quantization tests by @Isotr0py in #35502
  • Display warning for unknown quants config instead of an error by @SunMarc in #35963
  • Adding FP8 Quantization to transformers by @MekkCyber in #36026
  • New HIGGS quantization interfaces, JIT kernel compilation support. by @BlackSamorez in #36148

Generate

  • [generate] revert change in Aria: the maximum cache length must match max_length by @gante in #36120
  • 🧹 remove generate-related objects and methods scheduled for removal in v4.48 by @gante in #35677
  • [generate] can instantiate GenerationConfig(cache_implementation="static") by @gante in #35679
  • [generate] return Cache object even if passed in a legacy format by @gante in #35673
  • [generate] update docstring of SequenceBiasLogitsProcessor by @gante in #35699
  • Test: generate with torch.compile(model.forward) as a fast test by @gante in #34544
  • [generate] move max time tests by @gante in #35962
  • [generate] shape checks in tests compatible with fixed-length caches (+ some minor fixes) by @gante in #35993

Pipelines

Pipelines have received several bug fixes and improvements which are detailed below.

Bugfixes and improvements

  • Fix flaky test_custom_4d_attention_mask by @ydshieh in #35606
  • Use inherit tempdir makers for tests + fix failing DS tests by @muellerzr in #35600
  • Added error when sequence length is bigger than max_position_embeddings by @Taha1506 in #32156
  • Let EarlyStoppingCallback not require load_best_model_at_end by @muellerzr in #35101
  • Fix flaky test_beam_search_low_memory by @ydshieh in #35611
  • Skip MobileNetV1ModelTest::test_batching_equivalence for now by @ydshieh in #35614
  • Update codeowners with individual model owners by @Rocketknight1 in #35595
  • Fix device in rope module when using dynamic updates by @Cyrilvallez in #35608
  • Fix whisper compile by @jiqing-feng in #35413
  • Removed some duplicated code by @Sai-Suraj-27 in #35637
  • [Phi] bias should be True by @ArthurZucker in #35650
  • Enable different torch dtype in sub models by @zucchini-nlp in #34873
  • [Compile] Only test compiling model forward pass by @ArthurZucker in #35658
  • [tests] make cuda-only tests device-agnostic by @faaany in #35607
  • [i18n-ar] Translated file : docs/source/ar/tasks/token_classification.md into Arabic by @AhmedAlmaghz in #35193
  • Fix zero_shot_image_classification documentation guide link in SigLIP by @aretrace in #35671
  • Fix : adding einops lib in the CI docker for some bitsandbytes tests by @MekkCyber in #35652
  • Update torchao.md: use auto-compilation by @martin0258 in #35490
  • Fix : HQQ config when hqq not available by @MekkCyber in #35655
  • Fix expected output for ggml test by @MekkCyber in #35686
  • Fix : add require_read_token for gemma2 gated model by @MekkCyber in #35687
  • Enhanced Installation Section in README.md by @egojoseph in #35094
  • Enhance DataCollatorForLanguageModeling with Configurable Token Replacement Probabilities by @mahdibaghbanzadeh in #35251
  • Clean-up composite configs by @zucchini-nlp in #34603
  • Add future import for Py < 3.10 by @Rocketknight1 in #35666
  • Enable gptqmodel by @jiqing-feng in #35012
  • Fix : Nemotron Processor in GGUF conversion by @MekkCyber in #35708
  • Fix typo in /docs/source/ja/model_doc/decision_transformer.md URL by @hiroaki222 in #35705
  • Replace deprecated batch_size with max_batch_size when using HybridCache by @mtreinik in #35498
  • Fix: Falcon tie_word_embeddings in GGUF by @MekkCyber in #35715
  • Fix condition when GA loss bug fix is not performed by @techkang in #35651
  • Fix the bug that Trainer cannot correctly call torch_jit_model_eval by @Wanguy in #35722
  • [generation] fix type hint by @gante in #35725
  • Add proper jinja2 error by @Rocketknight1 in #35533
  • Optimize ForCausalLMLoss by removing unnecessary contiguous() call to reduce memory overhead by @efsotr in #35646
  • Modular: support for importing functions from any file by @Cyrilvallez in #35692
  • Remove batch size argument warning when unjustified by @quintenroets in #35519
  • [cache] add a test to confirm we can use cache at train time by @gante in #35709
  • Remove pt_to_tf by @gante in #35672
  • Added resource class configuration option for check_circleci_user job by @Sai-Suraj-27 in #32866
  • Fix some tests by @Cyrilvallez in #35682
  • Unable to use MimiModel with DeepSpeed ZeRO-3 by @anferico in #34735
  • check is added for the report_to variable in TrainingArguments by @alpertunga-bile in #35403
  • Added liger_kernel compatibility with PeftModel by @ambroser53 in #35680
  • Restore is_torch_greater_or_equal_than for backward compatibility by @tlrmchlsmth in #35734
  • Revert "Unable to use MimiModel with DeepSpeed ZeRO-3" by @eustlb in #35755
  • ci: fix xpu skip condition for test_model_parallel_beam_search by @dvrogozh in #35742
  • Use AMD CI workflow defined in hf-workflows by @ivarflakstad in #35058
  • Fix CI for VLMs by @zucchini-nlp in #35690
  • Security fix for self-comment-ci.yml by @ydshieh in #35548
  • [ViTPose] Convert more checkpoints by @NielsRogge in #35638
  • fix register_buffer in MimiEuclideanCodebook by @anferico in #35759
  • remove code owners as it was generating too much noise BUT by @ArthurZucker in #35784
  • Skip Falcon 7B GGML Test by @MekkCyber in #35783
  • [fix] cannot import name 'Pop2PianoFeatureExtractor' from 'transformers' by @faaany in #35604
  • transformers.image_transforms.normalize wrong types by @CalOmnie in #35773
  • Patch moonshine by @eustlb in #35731
  • Don't import torch.distributed when it's not available by @booxter in #35777
  • Fix vits low-precision dtype by @jiqing-feng in #35418
  • Tool calling: support more types by @aymeric-roucher in #35776
  • Fixes, improvements to timm import behaviour by @rwightman in #35800
  • modular_model_converter bugfix on assignments by @nikosanto13 in #35642
  • Deterministic sorting in modular converter when adding new functions by @Cyrilvallez in #35795
  • Fix "test_chat_template_dict" in video LLMs by @zucchini-nlp in #35660
  • Update AMD Docker image by @ivarflakstad in #35804
  • Add LlavaImageProcessor by @NielsRogge in #33191
  • Byebye test_batching_equivalence's flakiness by @ydshieh in #35729
  • [Doc] Adding blog post to model doc for TimmWrapper by @ariG23498 in #35744
  • add a new flax example for Bert model inference by @louie-tsai in #34794
  • Support adamw_torch_8bit by @fzyzcjy in #34993
  • Auto-add timm tag to timm-wrapper models. by @pcuenca in #35794
  • Fix : BLOOM tie_word_embeddings in GGUF by @MekkCyber in #35812
  • Fixed typo in autoawq version number in an error message for IPEX backend requirements. by @InfroLab in #35815
  • Remove deprecated get_cached_models by @Wauplin in #35809
  • Optimized set_initialized_submodules. by @LagPixelLOL in #35493
  • [i18n-ar] Translated file: docs/source/ar/tasks/masked_language_modeling.md into Arabic by @AhmedAlmaghz in #35198
  • move fastspeech to audio models by @eustlb in #35788
  • Improve modular documentation by @Cyrilvallez in #35737
  • [Mimi] update test expected values for t4 runners by @eustlb in #35696
  • Remove old benchmark code by @gante in #35730
  • Remove pyav pin to allow python 3.11 to be used by @CalOmnie in #35823
  • Another security patch for self-comment-ci.yml by @ydshieh in #35816
  • Init cache on meta device by @zucchini-nlp in #35164
  • Hotfix: missing working-directory in self-comment-ci.yml by @ydshieh in #35833
  • [gpt2] fix generation tests by @gante in #35822
  • Fix : Nemotron tokenizer for GGUF format by @MekkCyber in #35836
  • Fix head_dim in config extracted from Gemma2 GGUF model by @Isotr0py in #35818
  • [chat] docs fix by @gante in #35840
  • Fix compatibility issues when using auto_gptq with these older versions by @LRL-ModelCloud in #35830
  • Add PyTorch version check for FA backend on AMD GPUs by @mht-sharma in #35813
  • Fix NoneType type as it requires py>=3.10 by @SunMarc in #35843
  • [ tests] remove some flash attention class tests by @ArthurZucker in #35817
  • [Backend support] Allow num_logits_to_keep as Tensor + add flag by @Cyrilvallez in #35757
  • Fix GA loss for Deepspeed by @timjeffrey10 in #35808
  • Fix uploading processors/tokenizers to WandB on train end by @jack89roberts in #35701
  • Fix more CI tests by @ArthurZucker in #35661
  • [DOC] Fix contamination and missing paragraph in translation by @Yosshi999 in #35851
  • Fix typo by @SilverSoldier in #35854
  • fix apply_chat_template() padding choice by @baoyf4244 in #35828
  • Fix test_pipelines_video_classification that was always failing by @CalOmnie in #35842
  • Fix Llava-NeXT / Llava-NeXT Video / Llava-OneVision's token unpadding mismatch by @sheryc in #35779
  • use torch.testing.assertclose instead to get more details about error in cis by @ArthurZucker in #35659
  • add xpu device check in device_placement by @faaany in #35865
  • Add Rocketknight1 to self-comment-ci.yml by @ydshieh in #35881
  • [doctest] Fixes by @stevhliu in #35863
  • Fix fast image processor warnings in object detection examples by @sugendran in #35892
  • Update deepspeed amd image by @ivarflakstad in #35906
  • Fix typing in audio_utils.chroma_filter_bank by @CalOmnie in #35888
  • [docs] uv install by @stevhliu in #35821
  • Fix the config class comparison for remote code models by @Rocketknight1 in #35592
  • Close Zamba2Config code block by @Rocketknight1 in #35914
  • [docs] Fix Zamba2 by @stevhliu in #35916
  • Remove _supports_static_cache = True for some model classes by @ydshieh in #34975
  • Use rocm6.2 for AMD images by @ivarflakstad in #35930
  • Add default TP plan for all models with backend support by @Cyrilvallez in #35870
  • Fix: loading DBRX back from saved path by @zucchini-nlp in #35728
  • Fix mask slicing for models with HybridCache by @Cyrilvallez in #35681
  • Qwen-2-5-VL: fix CI by @zucchini-nlp in #35935
  • Fix TP initialization by @Cyrilvallez in #35860
  • fix(FA): QKV not being casted to target_dtype for FA with dpo lora by @NanoCode012 in #35834
  • Remove INC notebook reference in documentation by @echarlaix in #35936
  • use torch constraints to check if covariance is positive definite during mean resizing. by @abuelnasr0 in #35693
  • fix test_generated_length_assisted_generation by @keyboardAnt in #34935
  • Update unwrap_and_save_reload_schedule to use weights_only=False by @ydshieh in #35952
  • Update squad_convert_example_to_features to work with numpy v2 by @ydshieh in #35955
  • Fix flaky test_assisted_decoding_matches_greedy_search by @ydshieh in #35951
  • Trainer Refactor: Part 1 by @muellerzr in #35567
  • update docker file transformers-pytorch-deepspeed-latest-gpu by @ydshieh in #35940
  • [tests] further fix Tester object has no attribute '_testMethodName' by @faaany in #35781
  • Update README.md by @BlessedTatonka in #35958
  • fix iterator overflow when gradient accumulation is 1 by @winglian in #35960
  • Fix is_causal being a tensor by @IlyasMoutawwakil in #35791
  • [bart] minor test fixes by @gante in #35965
  • Pixtral: vectorize patch embeddings and enable tests by @zucchini-nlp in #35122
  • Whisper: fix static cache CI by @zucchini-nlp in #35852
  • Less flaky for TimmBackboneModelTest::test_batching_equivalence by @ydshieh in #35971
  • Support batching for UsefulSensors Moonshine by @njeffrie in #35922
  • not to use A100 for benchmark.yml by @ydshieh in #35974
  • Handle empty change indices in SAM's mask to rle conversion by @MSt-10 in #35665
  • Add support for nested images to LLava and VipLLava by @yonigozlan in #35558
  • [Moonshine] compute head_dim_padding at init by @eustlb in #35984
  • [Moshi] disable automatic compilation if the model can't compile by @gante in #35992
  • use torch 2.6 for daily CI by @ydshieh in #35985
  • Update-tp test by @ArthurZucker in #35844
  • Add mean_resizing for every VLMs' resizing_token_embeddings() by @YenFuLin in #35717
  • Update Granite Vision Model Path / Tests by @alex-jw-brooks in #35998
  • Qwen2-VL: fix rope delta calculation by @zucchini-nlp in #36013
  • Fix custom kernel for DeformableDetr, RT-Detr, GroindingDINO, OmDet-Turbo in Pytorch 2.6.0 by @qubvel in #35979
  • apply_chat_template: consistent behaviour for return_assistant_tokens_mask=True return_tensors=True by @mrsndmn in #35582
  • layernorm_decay_fix by @Ryoo72 in #35927
  • Update Mistral converter by @Cyrilvallez in #35967
  • Refactor (and fix) gpt_neox by @Cyrilvallez in #35610
  • Fix device mismatch error in Whisper model during feature extraction by @thedebugger in #35866
  • Fix RMSNormGated in Zamba2 by @pglorio in #35943
  • Commont bot CI for other jobs (generation / quantization) by @ydshieh in #35341
  • Hotfix for self-comment-ci.yml by @ydshieh in #36030
  • feat(ci): ignore trufflehog unverified results by @McPatate in #36031
  • CircleCI with python 3.9 by @ydshieh in #36027
  • Update tests regarding attention types after #35235 by @ydshieh in #36024
  • Fix Gemma2 synced multi-GPU generation by @ManukyanD in #35232
  • Fix synced multi-GPU generation with LLMs and VLMs by @ManukyanD in #35893
  • Add XPU type for work-around -inf mask causing sdpa NaN issue in modeling files by @Liangliang-Ma in #35647
  • add support for empty list as input to create_model_card by @ROZBEH in #36042
  • DeepSpeed github repo move sync by @stas00 in #36021
  • [docs] no hard coding cuda as bnb has multi-backend support by @faaany in #35867
  • [docs] fix bugs in the bitsandbytes documentation by @faaany in #35868
  • [docs] no hard-coding cuda by @faaany in #36043
  • Fix how we compute the final non-padding token for ForSequenceClassification models by @Rocketknight1 in #35911
  • Add Qwen2VLImageProcessorFast into Qwen2VLProcessor by @yeliudev in #35987
  • Iterative generation using Input embeds and past_key_values by @yaswanth19 in #35890
  • Fix usage of unpad_input function by @pavelgein in #35925
  • Fix repo consistency by @ydshieh in #36063
  • Update test_flash_attn_2_can_dispatch_composite_models by @ydshieh in #36050
  • Paligemma: fix generation with Gemma2 by @zucchini-nlp in #36044
  • Save checkpoint to temporary directory to handle partial saves during failures by @SilverSoldier in #35580
  • Nail in edge case of torch dtype being overriden permantly in the case of an error by @muellerzr in #35845
  • Fix words typos in ggml test. by @zhanluxianshen in #36060
  • Fix model kwargs by @muellerzr in #35875
  • Fix StopStringCriteria to handle tokens above len(tokenizer) by @Rocketknight1 in #35797
  • [docs] fix outdated example code in trainer.md by @faaany in #36066
  • Adding RT-DETRv2 for object detection by @jadechoghari in #34773
  • Fix bug in apply_rotary_pos_emb_flashatt: in Qwen2-5-VL by @DeepWaved in #36065
  • Move audio top_k tests to the right file and add slow decorator by @Rocketknight1 in #36072
  • Fix OS err by @muellerzr in #36094
  • [docs] fix model checkpoint name by @faaany in #36075
  • [docs] fix typo by @faaany in #36080
  • [docs] fix not-working example code in perf_infer_gpu_one.md by @faaany in #36087
  • fix MllamaVisionAttention typehint by @kylesayrs in #35975
  • Processors: allow tuples of images when checking by @zucchini-nlp in #36084
  • Chat template: update for processor by @zucchini-nlp in #35953
  • Paligemma: revert #36084 by @zucchini-nlp in #36113
  • Support constant lr with cooldown by @LoserCheems in #35453
  • Enable pytest live log and show warning logs on GitHub Actions CI runs by @ydshieh in #35912
  • Refactor OPT model by @jiqing-feng in #36101
  • Revert checkpoint tmp dir by @SunMarc in #36112
  • [Bugfix] fix file name of docstring in utils/check_table.py by @kkscilife in #36108
  • fix bnb warning by @SunMarc in #36116
  • AutoformerForPrediction test add atol by @ivarflakstad in #36017
  • Fix nighlty CIs: missing atols by @ArthurZucker in #35903
  • Add common test for torch.export and fix some vision models by @qubvel in #35124
  • fix: typos in documentation files by @maximevtush in #36122
  • update awesome-transformers.md. by @zhanluxianshen in #36115
  • Fix max size deprecated warning by @HichTala in #34998
  • Fix CI issues by @molbap in #35662
  • update tiktoken integ to use converted by @ArthurZucker in #36135
  • Make output_dir Optional in TrainingArguments #27866 by @sambhavnoobcoder in #35735
  • [docs] minor doc fix by @faaany in #36127
  • [docs] update awq doc by @faaany in #36079
  • Add pipeline parallel plan to PretrainedConfig and PreTrainedModel by @hmellor in #36091
  • add RAdamScheduleFree optimizer by @nhamanasu in #35313
  • added warning to Trainer when label_names is not specified for PeftModel by @MilkClouds in #32085
  • Whisper: remove redundant assisted generation tests by @gante in #34814
  • Add utility for Reload Transformers imports cache for development workflow #35508 by @sambhavnoobcoder in #35858
  • VLM: enable skipped tests by @zucchini-nlp in #35746
  • [commands] remove deprecated/inoperational commands by @gante in #35718
  • Fix Gradient Checkpointing for Deberta & Deberta-V2 using PEFT / Adapters by @lenglaender in #35898
  • 🚨 Remove cache migration script by @Wauplin in #35810
  • multi-gpu: fix tensor device placements for various models by @dvrogozh in #35763
  • Optim: APOLLO optimizer integration by @zhuhanqing in #36062
  • Fix multi gpu loss sync condition, add doc and test by @techkang in #35743
  • adding option to save/reload scaler by @hsilva664 in #34932
  • Update doc re list of models supporting TP by @kwen2501 in #35864
  • Add more rigerous non-slow grad accum tests by @muellerzr in #35668
  • Fix test fetcher by @ydshieh in #36129
  • skip test_initialization for VitPoseBackboneModelTest for now by @ydshieh in #36154
  • Add git LFS to AMD docker image by @ivarflakstad in #36016
  • Mllama fsdp by @blbadger in #36000
  • Fix PaliGemma Pad Token Masking During Training #35855 by @sambhavnoobcoder in #35859
  • Add reminder config to issue template and print DS version in env by @Ben-Schneider-code in #35156
  • Fix Gemma2 dtype issue when storing weights in float16 precision by @Nerogar in #35398
  • Replace deprecated update_repo_visibility by @Wauplin in #35970
  • Fix tests for vision models by @qubvel in #35654
  • qwen2.5vl: fix bugs when using flash2+bf16 or num_return_sequences>1 by @gewenbin0992 in #36083
  • docs: fix return type annotation of get_default_model_revision by @MarcoGorelli in #35982
  • Fix PretrainedTokenizerFast check => Fix PretrainedTokenizerFast Save by @CL-ModelCloud in #35835
  • Move DataCollatorForMultipleChoice from the docs to the package by @bauwenst in #34763
  • Helium documentation fixes by @LysandreJik in #36170
  • Remove loading custom kernel for RT-DETRv2 by @qubvel in #36098
  • [Modular] skip modular checks based on diff by @gante in #36130
  • Fix red CI by @ArthurZucker in #36174
  • Fix : fix doc fp8 by @MekkCyber in #36173
  • Efficient Inference Kernel for SpQR by @elvircrn in #34976
  • fix training issues by @ArthurZucker in #36158
  • add disable compile option by @ArthurZucker in #36161
  • CI: avoid human error, automatically infer generative models by @gante in #33212
  • Use tqdm auto by @SmartManoj in #35726
  • Optimize Qwen2VL vision model by precomputing cos/sin embeds before ViT blocks by @li-plus in #35837
  • Make check_repository_consistency run faster by MP by @ydshieh in #36175
  • Fix the key name for _load_rng_state under torch.cuda by @wizyoung in #36138
  • Follow up to SpQR integration by @MekkCyber in #36176
  • Fix a mistake in #36175 by @ydshieh in #36179
  • Fix make_batched_videos and add tests by @yonigozlan in #36143
  • Uniformize OwlViT and Owlv2 processors by @yonigozlan in #35700
  • Add support for partial rotary embeddings in Phi3 model by @garg-amit in #35947
  • CI: fix test-save-trainer by @zucchini-nlp in #36191
  • Chat template docs by @zucchini-nlp in #36163
  • Add ImageProcessorFast to Qwen2.5-VL processor by @Isotr0py in #36164
  • Prepare processors for VideoLLMs by @zucchini-nlp in #36149
  • Add require_read_token to fp8 tests by @MekkCyber in #36189
  • Revert qwen2 breaking changes related to attention refactor by @ArthurZucker in #36162
  • Guard against unset resolved_archive_file by @dmlap in #35628
  • [Bugfix] Fix reloading of pixtral/llava configs by @kylesayrs in #36077

Significant community contributions

The following contributors have made significant changes to the library over the last release:

Don't miss a new transformers release

NewReleases is sending notifications on new releases.