Release v5.9.0
New Model additions
Cohere2Moe
Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts and supports a very large context window for processing extensive text sequences.
Links: Documentation
- Add new cohere2_moe model (#46115) by @Cyrilvallez in #46115
Parakeet tdt (#44171)
HRM-Text
HRM-Text is an improved autoregressive language-modeling variant of the Hierarchical Reasoning Model (HRM) that uses a hierarchical recurrent forward pass with two transformer stacks - one for slow, abstract planning (H) and one for fast, detailed computation (L) - reused inside a nested recurrence. It features PrefixLM attention where instruction tokens attend bidirectionally while response tokens attend causally, per-head sigmoid output gates, and parameterless RMSNorm. The model is designed as a base language model without instruction tuning or chat templates.
Links: Documentation | Paper
Breaking changes
The text_embeds input for SAM3, EdgeTAM, and SAM3-Lite-Text models now expects full text embeddings instead of just pooler outputs, aligning with other models in the library — users must update their inputs accordingly.
- 🚨Fix memory leaks caused by lru decorators in vision models (#45922) by @yonigozlan
Audio
Audio support was expanded with the addition of AudioFlamingoNext model checkpoints and improved compilability of audio/vision encoders via standalone pure functions. Additional improvements include better error messaging when loading audio from video files and new documentation for audio/video processors.
- user friendly error when loading audio from video (#45221) by @eustlb in [#45221]
- [docs] adding audio/video processors (#45795) by @stevhliu in [#45795]
- Support Audio Flamingo Next checkpoints (#44830) by @lashahub in [#44830]
- Extract dynamic vision/audio tensors into standalone pure functions (#45396) by @IlyasMoutawwakil in [#45396]
Generation
Fixed generation issues including inputs_embeds and per_layer_inputs handling for Gemma4, an AttributeError in RAG's generate() caused by missing config fields, and flaky VLM generation tests by blocking special image tokens during sampling.
- Fix Gemma4 generation from inputs_embeds and per_layer_inputs (#46049) by @Cyrilvallez in [#46049]
- Fix AttributeError in RAG generate() for missing config fields (#46035) by @Sriniketh24 in [#46035]
- Block image_start/end_token_id in generation test sampling (#45914) by @Rocketknight1 in [#45914]
Bugfixes and improvements
- Remove mask visualization tool from
masking_utils.py(#46066) by @Cyrilvallez in [#46066] - fix: owned_by field in GET /v1/models returns list instead of string (#46006) by @nileshpatil6 in [#46006]
- [CB] Remove OpenTelemetry (#45984) by @remi-or in [#45984]
- docs(readme): use canonical
huggingface.codomain in prose links (#46042) by @kiwigitops in [#46042] - Fix remaining RAG doc examples that crash on current transformers (#46044) by @Sriniketh24 in [#46044]
- Init the actual tensor, not a copy (#46030) by @Rocketknight1 in [#46030]
- docs: sync legacy ACL anthology URLs and update metrics across i18n READMEs (#46027) by @irfaan101 in [#46027]
- [MultimodalLM] add language_model to the get/set_input_embeddings logic (#46029) by @eustlb in [#46029]
- [
HRM Text] Add integration tests (#46033) by @vasqu in [#46033] - hy_v3: add XPU expectations (#45858) by @kaixuanliu in [#45858]
- exaone4_5: add XPU expectations (#45890) by @kaixuanliu in [#45890]
- hyperclovax: add XPU Expectations for CI test (#45926) by @kaixuanliu in [#45926]
- chore(ci): remove dead env vars from circleci-failure-summary-comment.yml (#45972) by @XciD in [#45972]
- [CB] [Major] Add tensor paralellism (#45821) by @remi-or in [#45821]
- docs: update models architecture count and sync ACL anthology URLs (#46001) by @irfaan101 in [#46001]
- bugfix(ci): avoid E2BIG in pr_slow_ci_suggestion (#45983) by @tarekziade in [#45983]
- RFDetr - use correct Roboflow org for release (#45946) by @sbucaille in [#45946]
- docs: Fix formatting issues in weightconverter.md (#45988) by @ArjunSrivastava1 in [#45988]
- Fix colqwen2 test (#45981) by @IlyasMoutawwakil in [#45981]
- Fix M-RoPE device mismatch in Qwen3VL family under FSDP2 CPU offload (#45861) by @jamesbraza in [#45861]
- [docs] chat template prefill (#45947) by @stevhliu in [#45947]
- [docs] decode fast path (#45899) by @stevhliu in [#45899]
- fix: restore
_attn_implementationand fix request offset ingenerate_batch()(#45943) by @sergiopaniego in [#45943] - Expose
per_layer_inputsfor every Gemma4 variants (#45927) by @Cyrilvallez in [#45927] - chore: update benchmark_v2.yml (#45966) by @hf-security-analysis[bot] in [#45966]
- fix(ci): set persist-credentials: false on actions/checkout and close remaining template injection findings (#45964) by @XciD in [#45964]
- chore(ci): set default workflow permissions to contents: read (#45961) by @XciD in [#45961]
- fix(ci): remove template injection on pull_request_target workflows (#45956) by @XciD in [#45956]
- chore(ci): pin all GitHub Actions and reusable workflows by SHA (#45955) by @XciD in [#45955]
- [docs] ALMModelTest (#45900) by @stevhliu in [#45900]
- Enhance apply_chat_template to support custom field prefilling (reasoning_content, thinking, etc.) (#45896) by @Mamiglia in [#45896]
- BUGFIX: Support hubert models that don't have conv_pos_batch_norm configured (#45921) by @igordertigor in [#45921]
- Revert 45777 (#45942) by @Rocketknight1 in [#45942]
- pass the otel secrets (#45933) by @tarekziade in [#45933]
- Add initial torch_tpu backend support (#45918) by @tengomucho in [#45918]
- [CB] Hide activation footprint by using the CUDA graph pool (#45911) by @remi-or in [#45911]
- Require input_ids for repetition penalty (#45389) by @ruben-aghayan in [#45389]
- Fix undefined 'input' variable (#45895) by @fullyz in [#45895]
- Fix post processing RF-DETR (#46041) by @yonigozlan (direct commit on v5.9.0)
- [loading] Free up tensors faster inside ConversionOps (#46110) by @Cyrilvallez (direct commit on v5.9.0)
- Add new cohere2_moe model (#46115) by @Cyrilvallez (direct commit on v5.9.0)
- Fix cohere2 tp_plan for release by @Cyrilvallez (direct commit on v5.9.0)
- Release v5.9.0 by @Cyrilvallez (direct commit on v5.9.0)
Significant community contributions
The following contributors have made significant changes to the library over the last release: