New Model additions
EXAONE-MoE
K-EXAONE is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.
PP-DocLayoutV3
PP-DocLayoutV3 is a unified and high-efficiency model designed for comprehensive layout analysis. It addresses the challenges of complex physical distortions—such as skewing, curving, and adverse lighting—by integrating instance segmentation and reading order prediction into a single, end-to-end framework.
- [Model] Add PP-DocLayoutV3 Model Support (#43098) by @zhang-prog
Youtu-LLM
Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.
GlmOcr
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.
- [GLM-OCR] GLM-OCR Support (#43391)by @zRzRzRzRzRzRzR
Breaking changes
-
🚨 T5Gemma2 model structure (#43633) - Makes sure that the attn implementation is set to all sub-configs. The config.encoder.text_config was not getting its attn set because we aren't passing it to PreTrainedModel.init. We can't change the model structure without breaking so I manually re-added a call to self.adjust_attn_implemetation in modeling code
-
🚨 Generation cache preparation (#43679) - Refactors cache initialization in generation to ensure sliding window configurations are now properly respected. Previously, some models (like Afmoe) created caches without passing the model config, causing sliding window limits to be ignored. This is breaking because models with sliding window attention will now enforce their window size limits during generation, which may change generation behavior or require adjusting sequence lengths in existing code.
-
🚨 Delete duplicate code in backbone utils (#43323) - This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant
After this PR, we'll have only one config.backbone_config as a single source of truth. The models will load the backbone from_config and load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite models. A few config arguments are removed as a result. -
🚨 Refactor DETR to updated standards (#41549) - standardizes the DETR model to be closer to other vision models in the library.
-
🚨Fix floating-point precision in JanusImageProcessor resize (#43187) - replaces an
int()withround(), expect light numerical differences -
🚨 Remove deprecated AnnotionFormat (#42983) - removes a missnamed class in favour of
AnnotationFormat.
Bugfixes and improvements
- fix(models): Migrate legacy segmentation_indices to out_indices in BeitConfig (#43505) by @harshaljanjani
- [docs] Update torch version (#42135) by @stevhliu
- Remove SDPA workarounds for torch 2.4+ (#43754) by @cyyever
- add use_deterministic to guarantee the consistency for youtu-llm model (#43759) by @kaixuanliu
- fix: add compatible_model_types to suppress model type mismatch warnings (#43495) by @leoneperdigao
- Fix T5 v1.1 detection (#43681) by @githubnemo
- Add moonshine streaming (#43702) by @eustlb
- Allow bi-directional attention for all models (#43705) by @Cyrilvallez
- Docs: fix Training step by removing tokenizer from trainer initialization (#43733) by @nesjett
- Fix scheduler initialization order (#43711) by @SunMarc
- Fix accelerate integration import (#43732) by @SunMarc
- Update torch minimum version to 2.4 (#41307) by @cyyever
- Fix dtype in image-text-to-text pipe (#43731) by @zucchini-nlp
- Preventing initialization of siglip's lecun_normal_, default_flax_embed_init in ZeRO3 (#43574) by @jp1924
- fix: AttributeError for Qwen3_omni_moe (#43593) by @Vallabh-1504
- Improve typing/explanations for general model properties (#43712) by @Cyrilvallez
- [Kernels] kernel migration updates for activation kernels (#43518) by @ariG23498
- [
feat] Allow loading T5Gemma2Encoder with AutoModel (#43559) by @tomaarsen - Added S110 - try-except-pass rule (#43687) by @tarekziade
- [docs] benchmarks (#43694) by @stevhliu
- fix norm_eps dtype (#43669) by @fschlatt
- Llava onevision: output align for tests and add
image_sizesinput param (#43678) by @kaixuanliu - Fix CLIPOutput attentions not being returned (#43657) by @jonathan-fulton
- [
Attn] Fixup interface usage after refactor (#43706) by @vasqu - Fix model/processor mismatch in SigLIP2 quantization example (#43652) by @jonathan-fulton
- Fix crash of custom models in Notebook or Repl (#43690) by @Cyrilvallez
- Simplify TrainingArguments docstring (#43568) by @SunMarc
- Composite model inherit automatically all important properties from their children (#43691) by @Cyrilvallez
- Update configuration_qwen3.py (#43703) by @francesco-bertolotti
- fix gptoss tp crash (#43695) by @sywangyi
- [CB] Keep order of incoming requests (#43626) by @remi-or
- Fix Apertus model loading (NotImplementedError: Cannot copy out of meta tensor; no data!) (#43473) by @xenova
- Remove
num_framesin ASR pipeline (#43546) by @jiqing-feng - remove ipex and ccl for xpu and cpu (#42852) by @yao-matrix
- update guide with new attr name for toks (#43689) by @itazap
- Docs: fix typos in Get started (index, quicktour) (#43666) by @CodeByKodi
- the cache class is deprecated by @vasqu (direct commit on main)
- custom tok init fix (#43591) by @itazap
- More export friendly rewrites and skipping the failing ones (#43436) by @IlyasMoutawwakil
- Cast byte_count to int in caching_allocator_warmup for MPS compatibility (#43608) by @tobyliu2004
- [Docs] Complete missing Llama4 configuration docs (#43460) by @udaymehta
- Fix t5 failures (#43374) by @Abdennacer-Badaoui
- Add EoMT with DINOv3 backbone (#41212) by @NielsRogge
- Update DBRX docs to reference re-uploaded checkpoint (#43196) by @qgallouedec
- [loading] Fix forced upcasting to fp32 (#43683) by @Cyrilvallez
- Fix FP8Expert for Qwen (#43670) by @yiliu30
- Simplify loading structure (#43589) by @Cyrilvallez
- [CB] Refactor logic for inputs and outputs outside of the main API (#43569) by @remi-or
- Make sure hub errors are surfaced in
PreTrainedTokenizerBase(#43675) by @tarekziade - Fix
FP8Expertfor DeepSeek R1 (#43616) by @yiliu30 - Use correct sampling rate in chat template (#43674) by @zucchini-nlp
- [
HunYuan] Fix RoPE init (#43411) by @vasqu - XPU now supports MoE kernel(MegaBlocks) implementation (#43435) by @YangKai0616
- [
Sam] Fixup training flags (#43567) by @vasqu - remove torchao.autoquant from transformers (#43561) by @vkuzo
- [DeepSpeed] properly handle MoE weight conversion (#43524) by @kashif
- Tie zamba weights correctly (#43623) by @zucchini-nlp
- [kernels] Centralize kernels tests (#42819) by @MekkCyber
- Fix
process_bad_commit_report.py: avoid items to appear innullauthor in the report (#43662) by @ydshieh - Fix
KeyErrorincheck_bad_commit.py(#43655) by @ydshieh - [Benchmark] Minor fix for benchmark: kernel is not correctly called (#43428) by @sywangyi
- Add explicit commit info to PR comment CI feedback (#43635) by @ydshieh
- Better new failures reporting for PR comment CI (#43629) by @ydshieh
- [docs] serving (#42853) by @stevhliu
- add XPU expected output for MixedInt8GPT2Test (#43615) by @kaixuanliu
- Don't modify mappings in tests (#43634) by @Rocketknight1
- Allow Attention and Experts to be used as standalone modules (#43622) by @Cyrilvallez
- Don't modify
tied_weight_keysin-place (#43619) by @zucchini-nlp - [
Rope] Revert #43410 and make inheritance implicit again (#43620) by @vasqu - [vllm compat] Separate renaming from conversion ops (#43621) by @Cyrilvallez
- refactor + robusts tests for Tensor Parallel (#42809) by @3outeille
- add contiguous operation for diffllama model for xpu to enable compile mode. (#43614) by @kaixuanliu
- add xpu expectation for lw_detr model (#43339) by @kaixuanliu
- minimax_m2: fix failed test case for XPU (#43324) by @kaixuanliu
- Improve new failures reporting (#43628) by @ydshieh
- Fix extras on all supported Python versions (#43490) by @tarekziade
- fix(models): Fix suno/bark-small CPU offload device mismatch causing CI failures (#43607) by @harshaljanjani
- [CB] [Serve] Fix broken serve tests (#43594) by @remi-or
- Docs: fix typo in weight converter guide (#43610) by @KOKOSde
- [MoE] Use int input for histc on CUDA to support deterministic algorithms (#43583) by @YangKai0616
- Fixes configuration default values (#43592) by @zucchini-nlp
- Fix
make_batched_videowith 5D arrays (#43486) by @zucchini-nlp - Operation Green CI II (#43537) by @Rocketknight1
- enable cpu paged cache (#42869) by @jiqing-feng
- Qwen3 omni - fix get video features (#43588) by @zucchini-nlp
- [GLM-Image] Add batch > 1 support and fix configuration defaults (#43342) by @JaredforReal
- [Model] Refactor modernbert with the attention interface (#43030) by @YangKai0616
- Regex post processing in loading (#43585) by @Cyrilvallez
- simplify extra tokens logic in base (#43230) by @itazap
- Add XPU support to the tests for solar_open (#43579) by @YangKai0616
- remove FbgemmFp8LinearTest (#43545) by @sywangyi
- Increase default ReadTimeout in tests (#43586) by @Wauplin
- Fix mistral checkpoint loading in
utils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584) by @ydshieh - [CI][AMD] Fix Pipeline CI (#43178) by @Abdennacer-Badaoui
- fix(converter): speed up
MistralConverter.extract_vocab_merges_from_model(#43557) by @tarekziade - Improve GPU monitoring: switch to multiprocessing and use amdsmi for AMD GPUs (#43552) by @Abdennacer-Badaoui
- Update test of Youtu-LLM to pr-aligned repos (#43578) by @LuJunru
- Rework dependencies and extras + Remove outdated
templatesfolder (#43536) by @Cyrilvallez - Fix repo. consistency bot (push permission issue) (#43570) by @ydshieh
- Fix Wav2vec and a few others (#43566) by @Cyrilvallez
- [
Modular] Allow to add new bases that are not present in the inherited class (#43556) by @vasqu - add an option to disable Sam3VideoModel progress bar (#43564) by @ndeybach
- check/fix repo. check bot workflow (#43565) by @ydshieh
- Increase timeout when preparing CI (#43560) by @Rocketknight1
- 43054: Add Siglip2Tokenizer to enforce training-time text preprocessing defaults (#43101) by @vaibhav-research
- check PR bot permission - part 3 (try content attribute) (#43555) by @ydshieh
- check PR bot permission - part 2 (style only) (#43554) by @ydshieh
- check PR bot permission - part 1 (#43553) by @ydshieh
- Fix failing tests due to no attribute
pad_token_id(#43453) by @Sai-Suraj-27 - fix: GPT OSS Conversion Script Enhancements (#42901) by @KyleMylonakisProtopia
- [Quantization] Fix triton_kernels name after being renamed to gpt-oss-triton-kernels (#43528) by @MekkCyber
- [Quantization] Add cutlass kernel for FP8 (#43304) by @MekkCyber
- [CB] Minor perf improvements and ty compatibility (#43521) by @remi-or
- Fix tiles mixing for batched input, add tie_word_embeddings to LFM2VL config (#43379) by @ankke
- fix: return labels instead of label in reduce_label method in BeitImageProcessorFast (#43527) by @sbucaille
- [
RoPE] Make explicit inheritance (#43410) by @vasqu - Fix for #43530 (#43535) by @Rocketknight1
- Operation Green CI (#43530) by @Rocketknight1
- Tie the weights even if initializing from a config on meta device (#43523) by @Cyrilvallez
- [kernels] Update cv_utils name (#43529) by @MekkCyber
- add trackio to training notebooks (#43442) by @merveenoyan
- Mark test_prompt_lookup_decoding as flaky (#42184) by @Rocketknight1
- Fix some MoE routers (#43445) by @IlyasMoutawwakil
- batched_mm is slow on cpu (#43438) by @IlyasMoutawwakil
- fix: initialize BatchNorm2d buffers only when needed (#43520) by @tarekziade
- Fix loading of Qwen3 FP8 (#43494) by @githubnemo
- fix
ShieldGemma2IntegrationTest::test_model(#43343) by @sywangyi - Update
SamHQModelIntegrationTest::test_inference_mask_generation_batched_points_batched_imagesforXPU(#43511) by @sywangyi - Revert utils files changes from PR #42845 (#43507) by @ydshieh
- Move hardcoded time_step params to config for Bamba, FalconH1, GraniteMoeHybrid (#43461) by @raimbekovm
- Prepare inputs for generation is called from
super()(#43280) by @zucchini-nlp - Enhance repo. consistency bot (#43503) by @ydshieh
- Add
pytest-random-orderfor reproducible test randomization (#43483) by @tarekziade - Add missing GPURawMetrics.from_dict() method in benchmark_v2 (#43499) by @Abdennacer-Badaoui
- push dev version 5.0.1.dev0 by @ArthurZucker (direct commit on main)
- Fix failing
markuplm&perception_lmintegration tests (#43464) by @Sai-Suraj-27 - fix(Phi4Multimodal): Fix incorrect default vision/audio config initialization in Phi4MultimodalConfig (#43480) by @charlieJ107
- handle 1D position_ids for modeling_flash_attention_utils as well (#43403) by @kaixuanliu
- Remove stale TODO comments in UDOP tied weights (#43477) by @raimbekovm
- Fix Mxfp4 dequantize (#43326) by @Cyrilvallez
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @cyyever
- @eustlb
- Add moonshine streaming (#43702)
- @tarekziade
- Added S110 - try-except-pass rule (#43687)
- Make sure hub errors are surfaced in
PreTrainedTokenizerBase(#43675) - Fix extras on all supported Python versions (#43490)
- fix(converter): speed up
MistralConverter.extract_vocab_merges_from_model(#43557) - fix: initialize BatchNorm2d buffers only when needed (#43520)
- Add
pytest-random-orderfor reproducible test randomization (#43483)
- @nuxlear
- Add EXAONE-MoE implementations (#43080)
- @vasqu
- [
Attn] Fixup interface usage after refactor (#43706) - the cache class is deprecated
- [
HunYuan] Fix RoPE init (#43411) - [
Sam] Fixup training flags (#43567) - [
Rope] Revert #43410 and make inheritance implicit again (#43620) - [
Modular] Allow to add new bases that are not present in the inherited class (#43556) - [
RoPE] Make explicit inheritance (#43410)
- [
- @remi-or
- @NielsRogge
- Add EoMT with DINOv3 backbone (#41212)
- @YangKai0616
- @ydshieh
- Fix
process_bad_commit_report.py: avoid items to appear innullauthor in the report (#43662) - Fix
KeyErrorincheck_bad_commit.py(#43655) - Add explicit commit info to PR comment CI feedback (#43635)
- Better new failures reporting for PR comment CI (#43629)
- Improve new failures reporting (#43628)
- Fix mistral checkpoint loading in
utils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584) - Fix repo. consistency bot (push permission issue) (#43570)
- check/fix repo. check bot workflow (#43565)
- check PR bot permission - part 3 (try content attribute) (#43555)
- check PR bot permission - part 2 (style only) (#43554)
- check PR bot permission - part 1 (#43553)
- Revert utils files changes from PR #42845 (#43507)
- Enhance repo. consistency bot (#43503)
- Fix
- @JaredforReal
- [GLM-Image] Add batch > 1 support and fix configuration defaults (#43342)
- @zhang-prog
- [Model] Add PP-DocLayoutV3 Model Support (#43098)
- @LuJunru
- @zRzRzRzRzRzRzR
- [GLM-OCR] GLM-OCR Support (#43391)