pypi transformers 5.1.0
v5.1.0: EXAONE-MoE, PP-DocLayoutV3, Youtu-LLM, GLM-OCR

8 hours ago

New Model additions

EXAONE-MoE

image

K-EXAONE is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.

PP-DocLayoutV3

image

PP-DocLayoutV3 is a unified and high-efficiency model designed for comprehensive layout analysis. It addresses the challenges of complex physical distortions—such as skewing, curving, and adverse lighting—by integrating instance segmentation and reading order prediction into a single, end-to-end framework.

Youtu-LLM

image

Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.

GlmOcr

image

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.

Breaking changes

  • 🚨 T5Gemma2 model structure (#43633) - Makes sure that the attn implementation is set to all sub-configs. The config.encoder.text_config was not getting its attn set because we aren't passing it to PreTrainedModel.init. We can't change the model structure without breaking so I manually re-added a call to self.adjust_attn_implemetation in modeling code

  • 🚨 Generation cache preparation (#43679) - Refactors cache initialization in generation to ensure sliding window configurations are now properly respected. Previously, some models (like Afmoe) created caches without passing the model config, causing sliding window limits to be ignored. This is breaking because models with sliding window attention will now enforce their window size limits during generation, which may change generation behavior or require adjusting sequence lengths in existing code.

  • 🚨 Delete duplicate code in backbone utils (#43323) - This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant
    After this PR, we'll have only one config.backbone_config as a single source of truth. The models will load the backbone from_config and load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite models. A few config arguments are removed as a result.

  • 🚨 Refactor DETR to updated standards (#41549) - standardizes the DETR model to be closer to other vision models in the library.

  • 🚨Fix floating-point precision in JanusImageProcessor resize (#43187) - replaces an int() with round(), expect light numerical differences

  • 🚨 Remove deprecated AnnotionFormat (#42983) - removes a missnamed class in favour of AnnotationFormat.

Bugfixes and improvements

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @cyyever
    • Remove SDPA workarounds for torch 2.4+ (#43754)
    • Update torch minimum version to 2.4 (#41307)
    • 🚨 Remove deprecated AnnotionFormat (#42983)
  • @eustlb
    • Add moonshine streaming (#43702)
  • @tarekziade
    • Added S110 - try-except-pass rule (#43687)
    • Make sure hub errors are surfaced in PreTrainedTokenizerBase (#43675)
    • Fix extras on all supported Python versions (#43490)
    • fix(converter): speed up MistralConverter.extract_vocab_merges_from_model (#43557)
    • fix: initialize BatchNorm2d buffers only when needed (#43520)
    • Add pytest-random-order for reproducible test randomization (#43483)
  • @nuxlear
    • Add EXAONE-MoE implementations (#43080)
  • @vasqu
    • [Attn] Fixup interface usage after refactor (#43706)
    • the cache class is deprecated
    • [HunYuan] Fix RoPE init (#43411)
    • [Sam] Fixup training flags (#43567)
    • [Rope] Revert #43410 and make inheritance implicit again (#43620)
    • [Modular] Allow to add new bases that are not present in the inherited class (#43556)
    • [RoPE] Make explicit inheritance (#43410)
  • @remi-or
    • [CB] Keep order of incoming requests (#43626)
    • [CB] Refactor logic for inputs and outputs outside of the main API (#43569)
    • [CB] [Serve] Fix broken serve tests (#43594)
    • [CB] Minor perf improvements and ty compatibility (#43521)
  • @NielsRogge
    • Add EoMT with DINOv3 backbone (#41212)
  • @YangKai0616
    • XPU now supports MoE kernel(MegaBlocks) implementation (#43435)
    • [MoE] Use int input for histc on CUDA to support deterministic algorithms (#43583)
    • [Model] Refactor modernbert with the attention interface (#43030)
    • Add XPU support to the tests for solar_open (#43579)
  • @ydshieh
    • Fix process_bad_commit_report.py: avoid items to appear in null author in the report (#43662)
    • Fix KeyError in check_bad_commit.py (#43655)
    • Add explicit commit info to PR comment CI feedback (#43635)
    • Better new failures reporting for PR comment CI (#43629)
    • Improve new failures reporting (#43628)
    • Fix mistral checkpoint loading in utils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584)
    • Fix repo. consistency bot (push permission issue) (#43570)
    • check/fix repo. check bot workflow (#43565)
    • check PR bot permission - part 3 (try content attribute) (#43555)
    • check PR bot permission - part 2 (style only) (#43554)
    • check PR bot permission - part 1 (#43553)
    • Revert utils files changes from PR #42845 (#43507)
    • Enhance repo. consistency bot (#43503)
  • @JaredforReal
    • [GLM-Image] Add batch > 1 support and fix configuration defaults (#43342)
  • @zhang-prog
    • [Model] Add PP-DocLayoutV3 Model Support (#43098)
  • @LuJunru
    • Update test of Youtu-LLM to pr-aligned repos (#43578)
    • Add Youtu-LLM model (#43166)
  • @zRzRzRzRzRzRzR
    • [GLM-OCR] GLM-OCR Support (#43391)

Don't miss a new transformers release

NewReleases is sending notifications on new releases.