github huggingface/transformers v5.9.0
Release v5.9.0

4 hours ago

Release v5.9.0

New Model additions

Cohere2Moe

Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts and supports a very large context window for processing extensive text sequences.

Links: Documentation

Parakeet tdt (#44171)

HRM-Text

HRM-Text is an improved autoregressive language-modeling variant of the Hierarchical Reasoning Model (HRM) that uses a hierarchical recurrent forward pass with two transformer stacks - one for slow, abstract planning (H) and one for fast, detailed computation (L) - reused inside a nested recurrence. It features PrefixLM attention where instruction tokens attend bidirectionally while response tokens attend causally, per-head sigmoid output gates, and parameterless RMSNorm. The model is designed as a base language model without instruction tuning or chat templates.

Links: Documentation | Paper

Breaking changes

The text_embeds input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs, aligning with other models in the library — users must update their inputs accordingly.

  • 🚨Fix memory leaks caused by lru decorators in vision models (#45922) by @yonigozlan

Audio

Audio support was expanded with the addition of AudioFlamingoNext model checkpoints and improved compilability of audio/vision encoders via standalone pure functions. Additional improvements include better error messaging when loading audio from video files and new documentation for audio/video processors.

Generation

Fixed generation issues including inputs_embeds and per_layer_inputs handling for Gemma4, an AttributeError in RAG's generate() caused by missing config fields, and flaky VLM generation tests by blocking special image tokens during sampling.

Bugfixes and improvements

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @lmaksym
  • @eustlb
    • user friendly error when loading audio from video (#45221)
    • [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029)
  • @remi-or
    • [CB] Remove OpenTelemetry (#45984)
    • [CB] [Major] Add tensor paralellism (#45821)
    • [CB] Hide activation footprint by using the CUDA graph pool (#45911)
  • @abcd1927

Don't miss a new transformers release

NewReleases is sending notifications on new releases.