New Model additions
VidEoMT
Video Encoder-only Mask Transformer (VidEoMT) is a lightweight encoder-only model for online video segmentation built on a plain Vision Transformer (ViT). It eliminates the need for dedicated tracking modules by introducing a lightweight query propagation mechanism that carries information across frames and employs a query fusion strategy that combines propagated queries with temporally-agnostic learned queries. VidEoMT achieves competitive accuracy while being 5x-10x faster than existing approaches, running at up to 160 FPS with a ViT-L backbone.
Links: Documentation | Paper
- Add VidEoMT (#44285) by @NielsRogge in #44285
UVDoc
UVDoc is a machine learning model designed for document image rectification and correction. The main purpose of this model is to carry out geometric transformation on images to correct document distortion, inclination, perspective deformation and other problems in document images. It provides both single input and batched inference capabilities for processing distorted document images.
Links: Documentation
- [Model] Add UVDoc Model Support (#43385) by @XingweiDeng in #43385
Jina Embeddings v3
The Jina-Embeddings-v3 is a multilingual, multi-task text embedding model designed for a variety of NLP applications. Based on the XLM-RoBERTa architecture, this model supports Rotary Position Embeddings (RoPE) replacing absolute position embeddings to support long input sequences up to 8192 tokens. Additionally, it features 5 built-in Task-Specific LoRA Adapters that allow the model to generate task-specific embeddings (e.g., for retrieval vs. classification) without increasing inference latency significantly.
Links: Documentation | Paper
- Add
Jina-Embeddings-V3Model (#44251) by @Sai-Suraj-27 in #44251
Mistral4
Mistral 4 is a powerful hybrid model with the capability of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families - Instruct, Reasoning (previously called Magistral), and Devstral - into a single, unified model. The model features a MoE architecture with 128 experts and 4 active, 119B parameters with 6.5B activated per token, 256k context length, and supports multimodal input with both text and image processing capabilities.
Links: Documentation
- Add Mistral 4 (#44760) by @juliendenize in #44760
PI0
PI0 is a vision-language-action model for robotics manipulation that jointly processes visual observations and language instructions to generate robot actions. It uses a novel flow matching architecture built on top of a pre-trained vision-language model to inherit Internet-scale semantic knowledge. The model can perform complex dexterous tasks like laundry folding, table cleaning, and assembling boxes across multiple robot platforms including single-arm robots, dual-arm robots, and mobile manipulators.
Links: Documentation | Paper
SLANeXt
SLANeXt is a series of dedicated lightweight models for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The SLANeXt series is a new generation of table structure recognition models independently developed by the Baidu PaddlePaddle Vision Team, with dedicated weights trained separately for wired and wireless tables. The recognition ability for all types of tables has been significantly improved, especially for wired tables.
Links: Documentation
- [Model] Add SLANeXt Model Support (#43707) by @liu-jiaxuan in #43707
PP-OCRv5_mobile_rec
PP-OCRv5_mobile_rec is a dedicated lightweight model for text recognition, focusing specifically on efficient recognition and understanding of text elements in multi-language documents and natural scenes. It is designed to efficiently and accurately support the recognition of Simplified Chinese, Traditional Chinese, English, Japanese, as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters with a single model. While maintaining recognition performance, it also balances inference speed and model robustness, providing efficient and accurate technical support for document understanding in various scenarios.
Links: Documentation
- [Model] Add PP-OCRv5_server_rec and PP-OCRv5_mobile_rec models Support (#44808) by @zhang-prog in #44808
PP-OCRv5_server_rec
PP-OCRv5_server_rec is a dedicated lightweight model for text recognition, focusing specifically on efficient recognition and understanding of text elements in multi-language documents and natural scenes. It is designed to efficiently and accurately support the recognition of Simplified Chinese, Traditional Chinese, English, Japanese, as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters with a single model. While maintaining recognition performance, it also balances inference speed and model robustness, providing efficient and accurate technical support for document understanding in various scenarios.
Links: Documentation
- [Model] Add PP-OCRv5_server_rec and PP-OCRv5_mobile_rec models Support (#44808) by @zhang-prog in #44808
PP-OCRv5_mobile_det
PP-OCRv5_mobile_det is a dedicated lightweight model for text detection, focusing specifically on efficient detection and understanding of text elements in multi-language documents and natural scenes. It is part of the latest generation of text detection models developed by the PaddleOCR team that efficiently and accurately supports the detection of text in diverse scenarios—including handwriting, vertical, rotated, and curved text—across multiple languages such as Simplified Chinese, Traditional Chinese, English, and Japanese. The model features robust handling of complex layouts, varying text sizes, and challenging backgrounds, making it suitable for practical applications like document analysis, license plate recognition, and scene text detection.
Links: Documentation
- [Model] Add PP-OCRV5_mobile_det Model Support (#43247) by @XingweiDeng in #43247
PPLCNet
PP-LCNet is a family of efficient, lightweight convolutional neural networks designed for real-world document understanding and OCR tasks. It balances accuracy, speed, and model size, making it ideal for both server-side and edge deployment. The model has three main variants optimized for specific tasks: document image orientation classification, table classification, and text line orientation classification.
Links: Documentation
- [Model] Add PP-OCRV5_mobile_det Model Support (#43247) by @XingweiDeng in #43247
PPLCNetV3
PPLCNetV3 is a lightweight CPU-optimized convolutional backbone designed for efficient image classification and downstream vision tasks. It builds on the PP-LCNet architecture with improved training strategies and structural refinements for better accuracy-latency tradeoffs on CPU hardware.
Links: Documentation | Paper
- [Model] Add PP-OCRV5_mobile_det Model Support (#43247) by @XingweiDeng in #43247
PP-OCRv5_server_det
PP-OCRv5_server_det is a high-performance text detection model optimized for server-side applications, focusing on accurate detection of multi-language text in documents and natural scenes. It supports the detection of text in diverse scenarios—including handwriting, vertical, rotated, and curved text—across multiple languages such as Simplified Chinese, Traditional Chinese, English, and Japanese. The model features robust handling of complex layouts, varying text sizes, and challenging backgrounds, making it suitable for practical applications like document analysis, license plate recognition, and scene text detection.
Links: Documentation
- [Model] Add PP-OCRV5_server_det Model Support (#43274) by @XingweiDeng in #43274
CHMv2
CHMv2 is a global, meter-resolution canopy height mapping model that uses DINOv3 to estimate forest canopy heights from high-resolution optical satellite imagery. Building on the original canopy height maps released in 2024, CHMv2 delivers substantial improvements in accuracy, detail, and global consistency by leveraging Meta's self-supervised vision model. The model is trained against airborne laser scanning data and provides essential information for quantifying forest carbon, monitoring restoration and degradation, and assessing habitat structure.
Links: Documentation | Paper | Blog Post
- Add CHMv2 (#44595) by @yonigozlan in #44595
Breaking changes
The dual BaseImageProcessor/BaseImageProcessorFast design has been replaced with a unified backend architecture, and the image_processing_utils_fast module has been removed — users should migrate to the new unified image_processing_utils module.
- 🚨🚨 Refactor Image Processors to support different backends (#43514) by @yonigozlan
PreTrainedConfig and model config classes have been refactored to use @dataclass and no longer accept positional arguments — users must update any config instantiation calls to use keyword arguments only.
- 🚨 Validate config attributes (#41250) by @zucchini-nlp
Flash Attention 2 (FA2) support now requires version 2.3.3 or newer, and initial Flash Attention 4 (FA4) support has been added — users on older FA2 versions must upgrade to at least 2.3.3.
Weight tying behavior has changed so that weights are now tied even when both keys are already present in a checkpoint — users relying on the previous behavior (e.g., with .bin checkpoints containing duplicate keys) should verify their models load as expected.
- [tie weights] 🚨 If both weights are present with same weights, still tie them (#44497) by @Cyrilvallez
The cache_position argument has been removed from the forward signatures of most major models — users passing cache_position directly to these models should remove it, as it is now handled internally by generate.
- [core] 🚨 Completely remove cache positions (#44181) by @Cyrilvallez
Parallelization
Several bug fixes and improvements were made to pipeline parallel (PP) and tensor parallel (TP) support, including fixing supports_tp/pp_plan detection, resolving attribute errors in PP for Qwen2VL-based models, correcting FSDP loading with meta devices, and ensuring TP weight sharding properly updates parent module attributes (e.g., in_features/out_features) to improve compatibility with libraries like PEFT.
- Fix several based models' pipeline parallel support (#44699) by @hmellor in [#44699]
- [Model] Add PP-Chart2Table Model Support (#43767) by @XingweiDeng in [#43767]
- enable tp for benchmark (#43750) by @sywangyi in [#43750]
- Fix
supports_{tp/pp}_plan(#44696) by @hmellor in [#44696] - Allow to disable stdout hiding for TP (#44608) by @michaelbenayoun in [#44608]
- fix FSDP loading with meta devices (#44473) by @winglian in [#44473]
- Fix: Conditionally import
torch.distributed.fsdpintrainer_seq2seq.py(#44507) by @0xDELUXA in [#44507] - Supplement skip logic for XPU in the CPU-only tp tests (#44536) by @YangKai0616 in [#44536]
- Update parent module attributes when sharding with TP (#44421) by @michaelbenayoun in [#44421]
- trigger tensor parallel utils test in the CI (#44460) by @3outeille in [#44460]
Quantization
Quantization support was improved with up to 30x faster FP8 grouped and batched matmuls, static FP8 expert support for multi-GPU setups, and a torchao minimum version bump to 0.15.0. Additionally, MXFP4 dependency error messages were made more actionable, and AWQ tests were updated to align with the GPTQModel migration.
- fix: split MXFP4 dependency checks for specific error messages (#44930) by @javierdejesusda in [#44930]
- Add static FP8 expert support (#44895) by @SunMarc in [#44895]
- Bump torchao >=0.15 and fix quantization CI (#44604) by @SunMarc in [#44604]
- Fix AWQ tests for GPTQModel migration (#44654) by @jiqing-feng in [#44654]
- [Performance] FP8 Grouped and Batched Matmuls (#44231) by @IlyasMoutawwakil in [#44231]
- Fix PR comment CI for quantization job (#44579) by @ydshieh in [#44579]
Tokenization
Several performance improvements were made to tokenizer loading and saving, including eliminating redundant file parsing and unnecessary deep copies of large vocabularies that caused significant overhead. Additionally, bug fixes were applied for incorrect tokenizer class names on the Hub (DeepSeek V2/V3, ModernBERT), a clean_up_tokenization_spaces misconfiguration in Llama 3 tokenizer conversion, and a string replacement issue in AutoTokenizer class name resolution.
- fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927) by @ydshieh in [#44927]
- fix
processing_utils.py: avoid deepcopying tokenizer inProcessorMixinto improve performance (#44894) by @ydshieh in [#44894] - fix: set
clean_up_tokenization_spaces=Falsein Llama 3 tokenizer conversion (#44914) by @maxsloef-goodfire in [#44914] - deepseek_v2, deepseek_v3, and modernbert fix for having incorrect tokenizer class on the hub (#44801) by @itazap in [#44801]
- Add XPU Expectations for vibe voice acoustic tokenizer tests (#44428) by @kaixuanliu in [#44428]
- fix(tokenizer): Only strip Fast from class names in AutoTokenizer if used as a suffix (#44443) by @harshaljanjani in [#44443]
Kernels
Kernel support has been expanded with Flash Attention 4 fallback integration, a paged_attention kernel for continuous batching, and Neuron device support for custom kernels. Several stability fixes were also made, including bumping the kernels version dependency to prevent crashes and correcting the LFM2 kernel path.
- [
FA4] Add kernels fallback (#44797) by @vasqu in [#44797] - Bump kernels version dependency to avoid crashes (#44887) by @Cyrilvallez in [#44887]
- Fix lfm2 kernel path (#44634) by @Cyrilvallez in [#44634]
- [CB] Add paged_attention kernel (#44379) by @remi-or in [#44379]
- Neuron kernels integration (#44417) by @michaelbenayoun in [#44417]
Cache
Several cache-related fixes and improvements were made, including aligning LFM2's cache implementation with other Mamba caches, fixing a tensor indexing crash in KV cache continuation for the transformers serve streaming endpoint, and resolving a generation bug in Idefics3 when using use_cache=False. A caching layer was also added to the model linter to skip unchanged valid files and improve build performance.
- Align lfm2 cache to other mamba caches (#44866) by @Cyrilvallez in [#44866]
- feat: added cache to the model linter (#44790) by @tarekziade in [#44790]
- Fix tensor indexing crash in serve generate_response KV cache continuation (#44735) by @mango766 in [#44735]
- Idefics3 without cache fix (#44607) by @gabe-l-hart in [#44607]
Vision
Fixed backward compatibility for full-path imports of Fast Image Processors and resolved a Llama4 vision rotary embedding initialization error where freqs_ci was not registered as a buffer, causing failures when loading models with device_map="auto".
- Fix backward compatibility for full path imports of Fast Image Processors (#44926) by @yonigozlan in [#44926]
- fix(models, testing): Fix Llama4 vision rotary meta tensor initialization and MyT5 get_tokenizer signature (#44581) by @harshaljanjani in [#44581]
- Fix AMD Docker image build timeout by pinning Flash Attention commit (#44546) by @Abdennacer-Badaoui in [#44546]
Generation
The cache_position argument has been fully removed from the generation pipeline, as all models have been updated to no longer use it (with a backward-compatibility path retained for remote code models). Additionally, integration tests for LASR with chunked decoding were added, and outdated references to deprecated pipeline tasks were cleaned up.
- [generate] Never use
cache_positionanymore in generation (#44816) by @Cyrilvallez in [#44816] - Add an integration test for LASR using pipe and chunked decoding (#42823) by @kho in [#42823]
- Fix: Remove references to
text2text-generation,summarizationandtranslationpipeline tasks (#44510) by @math-hiyoko in [#44510]
Bugfixes and improvements
- Dynamic weight conversion is recursive (#44300) by @zucchini-nlp in [#44300]
- Don't run
tests_hubif no tests found (#45014) by @ydshieh in [#45014] - Fix type hint for
attention_chunk_sizeinLlama4TextConfig(#45002) by @hmellor in [#45002] - Fix AutoProcessor.from_pretrained silently dropping hub kwargs (#44710) by @he-yufeng in [#44710]
- Fix
maybe_autocastcrashing on meta device tensors (#44984) by @Butanium in [#44984] - fix: remove Copied from comments between @torch.jit.script and def for Python 3.13 compat (#44986) by @Krishnachaitanyakc in [#44986]
- More small vllm fixes (#44990) by @ArthurZucker in [#44990]
- fix(models): Fix Perceiver interpolate_pos_encoding interpolating to the source size (#44899) by @harshaljanjani in [#44899]
- Allow
mm_token_typebe non-padded lists (#44563) by @zucchini-nlp in [#44563] - Fix CPU 16 bytes alignment issue using equivalent fallback (#44970) by @IlyasMoutawwakil in [#44970]
- refactor: unify QA calls (#44879) by @tarekziade in [#44879]
- Fix tie_word_embedding issues with
Qwen2VL(#44976) by @hmellor in [#44976] - Support Modular (!!) + Configs in
check_auto_docstrings(#44803) by @yonigozlan in [#44803] - [
vllm x v5] nit (#44971) by @ArthurZucker in [#44971] - LwDetrImageLoss: Fix dtype casting to prevent crash when using amp on cuda device (#44886) by @m-matthias in [#44886]
- [AMD CI] Gemma3/Gemma3n Expectations (#44972) by @Abdennacer-Badaoui in [#44972]
- Officially launch parse_response (#44674) by @Rocketknight1 in [#44674]
- fix load_best_model_checkpoint_at_end do not load the best model chec… (#44583) by @wilnn in [#44583]
- Fix failing
T5ModelIntegrationTest(#44934) by @Sai-Suraj-27 in [#44934] - Config kwargs (#44953) by @zucchini-nlp in [#44953]
- [CB] [Minor] Simplify test suite (#44858) by @remi-or in [#44858]
- Allow arbitrary template kwargs in processors (#44881) by @zucchini-nlp in [#44881]
- Fix missing post_processor in DebertaV2Tokenizer causing no special t… (#44570) by @umbilnm in [#44570]
- incorrect model list update (#44880) by @itazap in [#44880]
- refactor: mlinter as its own package (#44939) by @tarekziade in [#44939]
- [CB] Add an option to return logprobs (#44835) by @remi-or in [#44835]
- [docs] peft (#44804) by @stevhliu in [#44804]
- Continuous batching thread safety (#44924) by @Qubitium in [#44924]
- Fix variable shadowing in pipeline example and typo in BART docs (BERT → BART) (#44935) by @VanshikaSohal in [#44935]
- Fix failing job
Update Transformers metadataafter #43514 (#44941) by @ydshieh in [#44941] - Clearer type hints and fix rope validation in configs (#44943) by @zucchini-nlp in [#44943]
- Correct docstrings for
from_pretrained(url input deprecated) (#44946) by @BSchilperoort in [#44946] - fix(i18n): replace broken relative links to awesome-transformers.md with absolute URLs (#44905) by @NicoleRobin in [#44905]
- chore(typing): added rule 11 (#44865) by @tarekziade in [#44865]
- fix(camembert): add tie_word_embeddings=True to CamembertConfig (#44931) by @r266-tech in [#44931]
- Support SizeDict import in get_size_dict (#44903) by @yonigozlan in [#44903]
- Add big angry code agent warnings! (#44890) by @Rocketknight1 in [#44890]
- [docs] model cards (#44837) by @stevhliu in [#44837]
- Add backward compatibility for direct imports from legacy
image_processing_utils_fast(#44897) by @yonigozlan in [#44897] - Fix core dumped when
NemotronHis torch compiled (#44854) by @ydshieh in [#44854] - fix(testing): Fix PaliGemma 2 and PaddleOCR-VL test failures on main (#44765) by @harshaljanjani in [#44765]
- Fix dtype guessing from state dict (#44883) by @Cyrilvallez in [#44883]
- Add missing dunder methods to
SizeDict(#44884) by @hmellor in [#44884] - Fix VL model rope_deltas batch size mismatch in online RL training (#44873) by @sergiopaniego in [#44873]
- Fix
layer_typestype hint forAFMoEandLlama4(#44874) by @hmellor in [#44874] - Fix nemotron config docstrings (#44878) by @Cyrilvallez in [#44878]
- Fix nemotron_h modular (#44876) by @Cyrilvallez in [#44876]
- [Mistral] Fix query scaling for Mistral4 and Ministral3 (#44860) by @Cyrilvallez in [#44860]
- Update some type hints (#44851) by @zucchini-nlp in [#44851]
- Fix glm dsa (#44564) by @ArthurZucker in [#44564]
- Update AFMoE architecture to use v5-style MoE impl (#44063) by @AutumnAurelium in [#44063]
- Fix KeyError in convert_to_native_format for dict vocab (#44452) by @ in [#44452]
- fix: XLNet: relative_positional_encoding computes on CPU every forward (#44782) by @JiwaniZakir in [#44782]
- Fix annotations reader for python 3.14 in
PreTrainedModel(#44672) by @neo in [#44672] - [CB] Better parametrization for compile (#44578) by @remi-or in [#44578]
- Fix
KeyErrorwhen patching mistral regex (#43376) by @LeonardoEmili in [#43376] - Correct code block formatting in weightconverter.md (#44839) by @zhulinchng in [#44839]
- feat(ci): added a network debug report (#44636) by @tarekziade in [#44636]
- Add GreedyLR adaptive learning rate scheduler (#44271) by @balak4 in [#44271]
- Fix unexpected
position_idskeys when loading OwlViT models (#44508) by @KartikPawade in [#44508] - Update more modular examples (#44834) by @Cyrilvallez in [#44834]
- Fix and re-run modular converter on examples (#44833) by @Cyrilvallez in [#44833]
- Remove cache_position in more models (4 and last one) (#44828) by @Cyrilvallez in [#44828]
- Fix loading issue in Sam3 (#44831) by @zucchini-nlp in [#44831]
- feat(integration): Add KubeflowCallback to enable automatic progress … (#44487) by @abhijeet-dhumal in [#44487]
- Add GGUF support for MiniMax-M2.1 model (#44526) by @JoursBleu in [#44526]
- Centralize AI agent templates in
.ai(#44489) by @tarekziade in [#44489] - support xxxFast alias in v5 tokenizers (#44766) by @itazap in [#44766]
- Remove cache_position in more models (3) (#44759) by @Cyrilvallez in [#44759]
- [CI] Temporarily skip Mistral4 tests as they almost all fail (#44825) by @Cyrilvallez in [#44825]
- [Gemma] Update conversion scripts for Transformers v5 Comaptibility (#44631) by @RyanMullins in [#44631]
- fix bug embedding_size mismatch with hidden_size in electra model test (#44657) by @kaixuanliu in [#44657]
- Fix pegasus conversion (#44571) by @ArthurZucker in [#44571]
- Fix repo-check bot (#44812) by @ydshieh in [#44812]
- [docs] is_causal feature (#44777) by @stevhliu in [#44777]
- docs(tasks): remove references to removed question-answering pipeline (#44787) by @ in [#44787]
- Fix configs with
@strict(#44770) by @zucchini-nlp in [#44770] - [AMD CI] Fix test failures across important models (#44632) by @Abdennacer-Badaoui in [#44632]
- Move VLM conversions to the main mapping (#44627) by @zucchini-nlp in [#44627]
- Fix config loading issues (type issues) (#44789) by @ydshieh in [#44789]
- Remove
is_causalfromEuroBertConfig(#44774) by @ydshieh in [#44774] - model-linter: Added rule 10 (#44761) by @tarekziade in [#44761]
- [fix] mistral 4 docs (#44776) by @stevhliu in [#44776]
- Fix: Eurobert model was missing @strict decorator and invalid test kwargs (#44767) by @tarekziade in [#44767]
- fix: sig lip import (#44764) by @tarekziade in [#44764]
- Disable async loading when quantizing on the fly (#44576) by @SunMarc in [#44576]
- [MistralCommonBackend] Upgrade mistral-common to v1.10.0 (#44656) by @juliendenize in [#44656]
- Fix
mlcdauto config/model/mapping issues (#44730) by @ydshieh in [#44730] - Fix bug and add XPU Expectations for qwen2 and jamba tests (#44733) by @kaixuanliu in [#44733]
- [medasr] doc update (#44633) by @eustlb in [#44633]
- Fix missing / incorrect
configclass in some model class definitions (#44715) by @ydshieh in [#44715] - Update Nvidia CI docker file to use torch 2.10 (#44712) by @ydshieh in [#44712]
- [
FA] Fix fa detection (#44703) by @vasqu in [#44703] - Fix
set_encoder(#44698) by @hmellor in [#44698] - [docs] cb config (#44675) by @stevhliu in [#44675]
- Fix more model tester missing
parentissue (#44685) by @ydshieh in [#44685] - Add register method for
ParallelInterface(#44640) by @michaelbenayoun in [#44640] - [CB] [Bug] Fix crashes when running without cuda (#44673) by @remi-or in [#44673]
- Another (small) set of fixes required for tiny model creation (#44666) by @ydshieh in [#44666]
- Fix CookieCutter (#44334) by @NielsRogge in [#44334]
- pipelines do not have modelcard (#44621) by @KoichiYasuoka in [#44621]
- [
Chmv2] Fix conversion after capture refactor (#44665) by @vasqu in [#44665] - [CB] Add dedicated config (#44434) by @remi-or in [#44434]
- fix(models): Forward timm model kwargs to timm.create_model for OmDet-Turbo (#44611) by @harshaljanjani in [#44611]
- Ensure same
dtypefor subconfig when_from_config(#44629) by @zucchini-nlp in [#44629] - Remove
cache_positionin more models (2) (#44602) by @Cyrilvallez in [#44602] - fix: cast to proper dtype in EmbeddingParallel (#44612) by @michaelbenayoun in [#44612]
- Remove many output_attentions and other traced outputs on 100+ models (#43590) by @molbap in [#43590]
- fix: raise error if mm_token_type_ids not supplied (#44433) by @leopold-tzafon in [#44433]
- Fix output capturing for Backbones (#44638) by @Cyrilvallez in [#44638]
- Fix for
VibeVoiceAcousticTokenizer(#44628) by @ydshieh in [#44628] - Fix off-by-one in decode_spans boundary check (#44584) by @mvanhorn in [#44584]
- Fix more wrong HF hub checkpoint names (#44624) by @ydshieh in [#44624]
- Update agentic contributions guidelines in AGENTS.md to force yielding. (#44411) by @burtenshaw in [#44411]
- Expand model-structure lint rules with a fast AST-based, ruff-like framework (#44174) by @tarekziade in [#44174]
- feat: add neuron in tensor parallelism initialization (#44498) by @michaelbenayoun in [#44498]
- [WIP] FIX Make Mixtral LoRA loading work (#44478) by @BenjaminBossan in [#44478]
- Fix Llava tests for torch too! (#44476) by @Rocketknight1 in [#44476]
- Fix training ci and clean some tests (#44491) by @SunMarc in [#44491]
- Remove useless identity assignment (#44600) by @Cyrilvallez in [#44600]
- Add Yoni to run-slow workflow (#44598) by @vasqu in [#44598]
- Add shared VLM tests (#42964) by @Rocketknight1 in [#42964]
- Fix wrong (non-existing) checkpoints (#44549) by @ydshieh in [#44549]
- Remove
cache_positionin more models (#44330) by @Cyrilvallez in [#44330] - Fix CircleCI summary report not showing due to missing dependency (#44597) by @ydshieh in [#44597]
- Fix typos in add_new_model_like docstrings (#43544) by @Olexandr88 in [#43544]
- Fix UnboundLocalError for tp_plan_alt when tp_plan is empty (#44540) by @YangKai0616 in [#44540]
- FIX Multiple PEFT errors after v5 transition (#44592) by @BenjaminBossan in [#44592]
- Fix missing BPE token conversion step in Chameleon (#44582) by @yonigozlan in [#44582]
- Make paligemma embed tokens standard (#44432) by @zucchini-nlp in [#44432]
- chore(typing): Add type checking to
src/transformers/quantizers(#44412) by @tarekziade in [#44412] - Fix: AQLM quantizer to match updated replace_with_aqlm_linear signature (#44577) by @tarekziade in [#44577]
- [device_map] Fix device_map computation by correctly adjusting memory available (#44565) by @Cyrilvallez in [#44565]
- Fix error message label and docstring default in load_sharded_checkpoint (#44523) by @jnMetaCode in [#44523]
- Correct Tapas initialization (#44575) by @Rocketknight1 in [#44575]
- [
fix] Prevent crash with Apertus without xielu installed (#44567) by @tomaarsen in [#44567] - Fix failing
MusicgenStereointegration tests (#44527) by @Sai-Suraj-27 in [#44527] - Fix zamba2 rotary embedding call when use_mem_rope is False (#44551) by @echarlaix in [#44551]
- [Bugfix] fix video inference of qwen3vl and qwen3.5 series (#44474) by @JJJYmmm in [#44474]
- add XPU Expectations for
higgs_audio_v2tests (#44482) by @kaixuanliu in [#44482] - chameleon added to MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS (#44475) by @itazap in [#44475]
- Revert "test merge queue 1" (#44552) by @ydshieh in [#44552]
- test merge queue 1 (#44529) by @ydshieh2 in [#44529]
- fix(testing): Fix MoonshineEncoder UnboundLocalError and Florence2VisionBackbone dtype mismatch (#44503) by @harshaljanjani in [#44503]
- Fix: Remove references to transformers run command (#44513) by @math-hiyoko in [#44513]
- [LW-DETR] Fix training (#44441) by @NielsRogge in [#44441]
- Make
_prepare_input_fnand_prepare_output_fninstance methods (#44499) by @michaelbenayoun in [#44499] - Fix ShieldGemma2 non-reproducible outputs by adding _tied_weights_keys (#44358) by @hardikmeisheri in [#44358]
- Tensor Parallelism and
mpsdevice (#44506) by @michaelbenayoun in [#44506] - Fix failing
GPTNeoModelLanguageGenerationTest(#44515) by @Sai-Suraj-27 in [#44515] - Fix failing
MarianIntegrationTests(#44519) by @Sai-Suraj-27 in [#44519] - fix pin_memory for contiguous batching (#44455) by @jiqing-feng in [#44455]
- Fix continuous batching for multimodal models (#44436) by @jw9603 in [#44436]
- Fix KeyError in _parse_type_hint when Union contains Any (#44525) by @jnMetaCode in [#44525]
- Fix AssistantTracker.is_active() returning False after activation with empty lists (#44524) by @jnMetaCode in [#44524]
- Fix and re-enable extra_state tests (#43510) by @pstjohn in [#43510]
- Fix ansi codes in loading reports when not connected to terminal (#44544) by @Cyrilvallez in [#44544]
- Follow-up typing checking fixes (#44500) by @tarekziade in [#44500]
- Fix backend dependency (#44542) by @Cyrilvallez in [#44542]
- Add a new job in
build_pr_documentation.yml(will be the new required job) (#44538) by @ydshieh in [#44538] - Update
build_pr_documentationworkflow formerge_groupevent (#44532) by @ydshieh in [#44532] - Fixed typo in docs/source/en/kv_cache.md (#44501) by @frogNotToad in [#44501]
- Docs: fix SigLIP2 usage examples (#43641) by @KOKOSde in [#43641]
- Fix type checker (#44502) by @Cyrilvallez in [#44502]
- Add MLU bf16 support to is_torch_bf16_gpu_available (#44381) by @carcel-yu in [#44381]
- fix model parallelism bug for eurobert model (#44490) by @kaixuanliu in [#44490]
- Update
tyto 0.0.20 (#44494) by @tarekziade in [#44494] - Add auto-docstring on configs (#44296) by @zucchini-nlp in [#44296]
- Fix failed unit tests for moonshine_streaming model (#43936) by @kaixuanliu in [#43936]
- Update distributed tests (#44338) by @SunMarc in [#44338]
- Add
diffusersto CI docker file (#44480) by @ydshieh in [#44480] - Replace placeholder tokens as specified in added_tokens_decoder (#44468) by @itazap in [#44468]
- [vLLM] Fix backward compatibility with hardcoded subprocessors classes in processors (#44447) by @yonigozlan in [#44447]
- [remote code/vllm] Fix incorrect tied weights (#44469) by @Cyrilvallez in [#44469]
- Integrate the Neuron device to TrainingArguments (#44302) by @michaelbenayoun in [#44302]
- Fix failing
DepthProModelIntegrationTest(#44456) by @Sai-Suraj-27 in [#44456] - [timesfm2_5] fix loss scaling (#44465) by @kashif in [#44465]
- Fix failing
ProphetNetModelIntegrationTest(#44439) by @Sai-Suraj-27 in [#44439] - [Trainer] fix SP loss (#44461) by @kashif in [#44461]
- skip 1 invalid test case for higgs_audio_v2 (#44350) by @kaixuanliu in [#44350]
- Fix position_ids typo in Qwen3_5TextModel forward pass (#44399) by @ in [#44399]
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @ydshieh
- Don't run
tests_hubif no tests found (#45014) - Fix failing job
Update Transformers metadataafter #43514 (#44941) - fix: improve processor loading performance by avoiding redundant tokenizer parsing (#44927)
- fix
processing_utils.py: avoid deepcopying tokenizer inProcessorMixinto improve performance (#44894) - Fix core dumped when
NemotronHis torch compiled (#44854) - Fix repo-check bot (#44812)
- Fix config loading issues (type issues) (#44789)
- Remove
is_causalfromEuroBertConfig(#44774) - Fix
mlcdauto config/model/mapping issues (#44730) - Fix missing / incorrect
configclass in some model class definitions (#44715) - Update Nvidia CI docker file to use torch 2.10 (#44712)
- Fix more model tester missing
parentissue (#44685) - Another (small) set of fixes required for tiny model creation (#44666)
- Fix for
VibeVoiceAcousticTokenizer(#44628) - Fix more wrong HF hub checkpoint names (#44624)
- Fix wrong (non-existing) checkpoints (#44549)
- Fix CircleCI summary report not showing due to missing dependency (#44597)
- Fix PR comment CI for quantization job (#44579)
- Revert "test merge queue 1" (#44552)
- Add a new job in
build_pr_documentation.yml(will be the new required job) (#44538) - Update
build_pr_documentationworkflow formerge_groupevent (#44532) - Add
diffusersto CI docker file (#44480)
- Don't run
- @NielsRogge
- @tarekziade
- refactor: unify QA calls (#44879)
- refactor: mlinter as its own package (#44939)
- chore(typing): added rule 11 (#44865)
- feat: added cache to the model linter (#44790)
- feat(ci): added a network debug report (#44636)
- Centralize AI agent templates in
.ai(#44489) - model-linter: Added rule 10 (#44761)
- Fix: Eurobert model was missing @strict decorator and invalid test kwargs (#44767)
- fix: sig lip import (#44764)
- Expand model-structure lint rules with a fast AST-based, ruff-like framework (#44174)
- chore(typing): Add type checking to
src/transformers/quantizers(#44412) - Fix: AQLM quantizer to match updated replace_with_aqlm_linear signature (#44577)
- Follow-up typing checking fixes (#44500)
- Update
tyto 0.0.20 (#44494)
- @Sai-Suraj-27
- Fix failing
T5ModelIntegrationTest(#44934) - Add
Jina-Embeddings-V3Model (#44251) - Fix failing
MusicgenStereointegration tests (#44527) - Fix failing
GPTNeoModelLanguageGenerationTest(#44515) - Fix failing
MarianIntegrationTests(#44519) - Fix failing
DepthProModelIntegrationTest(#44456) - Fix failing
ProphetNetModelIntegrationTest(#44439)
- Fix failing
- @remi-or
- @XingweiDeng
- @vasqu
- @liu-jiaxuan
- [Model] Add SLANeXt Model Support (#43707)
- @zhang-prog
- [Model] Add PP-OCRv5_server_rec and PP-OCRv5_mobile_rec models Support (#44808)
- @balak4
- Add GreedyLR adaptive learning rate scheduler (#44271)
- @kaixuanliu
- fix bug embedding_size mismatch with hidden_size in electra model test (#44657)
- Fix bug and add XPU Expectations for qwen2 and jamba tests (#44733)
- Add XPU Expectations for vibe voice acoustic tokenizer tests (#44428)
- add XPU Expectations for
higgs_audio_v2tests (#44482) - fix model parallelism bug for eurobert model (#44490)
- Fix failed unit tests for moonshine_streaming model (#43936)
- skip 1 invalid test case for higgs_audio_v2 (#44350)
- @juliendenize
- @molbap
- @JJJYmmm
- [Bugfix] fix video inference of qwen3vl and qwen3.5 series (#44474)
- @math-hiyoko