New Model additions
VoxtralRealtime
VoxtralRealtime is a streaming speech-to-text model from Mistral AI, designed for real-time automatic speech recognition (ASR). Unlike the offline Voxtral model which processes complete audio files, VoxtralRealtime is architected for low-latency, incremental transcription by processing audio in chunks as they arrive.
The model combines an audio encoder with a Mistral-based language model decoder, using time conditioning embeddings and causal convolutions with padding caches to enable efficient streaming inference.
GLM-5 - GlmMoeDsa
The zAI team launches GLM-5, and introduces it as such:
GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.
- Add GlmMoeDsa (#43858) by @Cyrilvallez
Qwen3.5, Qwen3.5 Moe
The Qwen team launches Qwen 3.5, and introduces it as such:
We are delighted to announce the official release of Qwen3.5, introducing the open-weight of the first model in the Qwen3.5 series, namely Qwen3.5-397B-A17B. As a native vision-language model, Qwen3.5-397B-A17B demonstrates outstanding results across a full range of benchmark evaluations, including reasoning, coding, agent capabilities, and multimodal understanding, empowering developers and enterprises to achieve significantly greater productivity. Built on an innovative hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts, the model attains remarkable inference efficiency: although it comprises 397 billion total parameters, just 17 billion are activated per forward pass, optimizing both speed and cost without sacrificing capability. We have also expanded our language and dialect support from 119 to 201, providing broader accessibility and enhanced support to users around the world.
- Adding Support for Qwen3.5 (#43830) by @bozheng-hit
VibeVoice Acoustic Tokenizer
VibeVoice is a novel framework for synthesizing high-fidelity, long-form speech with multiple speakers by employing a next-token diffusion approach within a Large Language Model (LLM) structure. It's designed to capture the authentic conversational "vibe" and is particularly suited for generating audio content like podcasts and multi-participant audiobooks.
One key feature of VibeVoice is the use of two continuous audio tokenizers, one for extracting acoustic features and another for semantic features.
Breaking changes
- 🚨 [
Attn] New attn mask interface everywhere (#42848) - 🚨 Modify ModernBERT's default attention implementation to stop using FA (#43764)
Bugfixes and improvements
- [docs] deploying (#43241) by @stevhliu
- [Trainer] Move NEFTune impl to standalone functions (#43714) by @SunMarc
- Fix
convert_rope_params_to_dictso it usesrope_thetafrom the config (#43766) by @hmellor - Bump dev version (#43777) by @qgallouedec
- Improved
AGENTS.md(#43763) by @tarekziade - Fix-release-ubild (#43773) by @ArthurZucker
- unpin torch for CircleCI (#43790) by @ydshieh
- [
Modular Dependencies] Fixup qwen rms norms (#43772) by @vasqu - fix(testing): Fix BLOOM tokenizer, CLAP audio features, and CLVP text tester usage in tests (#43798) by @harshaljanjani
- Remove unconditional train_batch_size assignment (#43770) by @lordaarush
- [
Repo Consistency] Fix rms norm (#43803) by @vasqu - fix: Prevent AutoTokenizer type mismatch from directory name substrin… (#43791) by @tarekziade
- Refactor trainer data_collator and callbacks tests (#43776) by @SunMarc
- [core] Faster and thread-safe
check_model_inputsimplementation (#43765) by @Cyrilvallez - [Trainer] use deepspeed SP process group when Accelerate doesn’t build a mesh (#43799) by @kashif
- fix(flaky): enforce manual seed to reduce flakiness (#43794) by @tarekziade
- Add TRL CI bot workflow to trigger tests on PR comments (#43809) by @qgallouedec
- Fix DeepSpeed model preparation logic in Trainer class (#43780) by @qgallouedec
- [docs] reveal more in toctree (#43808) by @stevhliu
- Fix markdown documentation (#43076) by @cyyever
- Fix slack-report workflow file (#43851) by @ydshieh
- add
do_sample=Falseto qwen2_5_vl model tests to stablize the output (#43728) by @kaixuanliu - Fix incorrect timestamp calculation in Qwen3VL Processor (#43659) by @jonathan-fulton
- Remove GPU tracking from TrackioCallback and remove env var support (#43371) by @qgallouedec
- Add id and resume support to SwanLab integration (#43719) by @i-pj
- fix gptoss crash in tp (#43853) by @sywangyi
- Delete batch_split from EncoderDecoderCache (#43814) by @cyyever
- delete unnecessary code to make moe compatible to full graph compile (#43855) by @kaixuanliu
- Update ModelType for Unigram tokenizer (#43860) by @pavel-esir
- [docs] Remove pipeline() examples from summarization/translation tasks (#43831) by @Mr-Neutr0n
- Fix video interpolation in pe_audio_video (#43811) by @Rocketknight1
- Look for the pad_token_id in the right place for Llama4 (#43539) by @Rocketknight1
- Fix cardinality error for DETR models without explicit background class (#43513) by @heathdutton
- docs: Add Switch Transformers docstring notes and update spectrogram comment (#43336) by @harshaljanjani
- [xLSTM] Fix bugs preventing small model training (#43209) by @Anri-Lombard
- docs: correct typo 'neccessary' to 'necessary' (#43868) by @thecaptain789
- Improve PR comment CI feedback (#43852) by @ydshieh
- Fix init weights in remote code (#43768) by @zucchini-nlp
- Fix GlmMoeDsaConfig default mlp_layer_types in modular conversion (#43876) by @OiPunk
- [MistralCommonBackend] fix loading proc (#43887) by @eustlb
- [
Jamba] Fallback to slow path and warn instead of error out (#43889) by @vasqu - Fix SwanLab callback to forward resume init args (#43848) by @OiPunk
- Fix old tech stack in doc (#43879) by @cyyever
- Update TrainingArguments (#43806) by @SunMarc
- Remove unnecessary code or checks for PT 2.4+ (#43787) by @cyyever
- Make it possible to evaluate when using sequence parallel in HF Trainer (#43517) by @jp1924
- [Trainer] Move optimizer cls init to trainer_optimizer.py (#43738) by @SunMarc
- fix the error of tests/quantization/fbgemm_fp8/test_fbgemm_fp8.py::Fb… (#43547) by @sywangyi
- fix fbgemm fp8 multi-device load failure. (#43581) by @sywangyi
- Refactor trainer init (#43807) by @SunMarc
- [
fix] Uselast_hidden_statekey fromget_image_featuresfor llama4 (#43882) by @tomaarsen - [Docs] Add docs for GLM-OCR and fix EomT-DINOv3 (#43710) by @NielsRogge
- Update hub metadata (#43892) by @zucchini-nlp
- [fix] DAC model: Apply STE in Dac.from_latents to match the forward pass (#43820) by @harshaljanjani
- Separate
check_model_inputsintocapture_outputsandmerge_with_config_defaults+ ensure correctness (#43862) by @Cyrilvallez - Remove mask slicing in all eager attentions (#42186) by @Cyrilvallez
- Fix expected DAC outputs due to (old) change in CI settings. (#43896) by @ebezzam
- Minor changes trainer (#43744) by @SunMarc
- adding BC for custom toks accessing slow tok attrs deprecated in v5 (#43898) by @itazap
- Fix typo in quantization_operations in PEFT integrations (#43821) by @redpanda1995
- Update KERNELS_MIN_VERSION to 0.10.2 to be the same as setup.py (#43753) by @cyyever
- Decorate cache updates with no_grad, just in case (#43897) by @Rocketknight1
- revert place_model_on_device to property (#43895) by @SunMarc
- Train sampler unification (#43138) by @jiosephlee
- fix(moe): Handle dtype mismatch in torch._grouped_mm with autocast (#43839) by @Mr-Neutr0n
- Fix missing fast image patch counter in Glm46V (#43877) by @OiPunk
- Fix old tech stack in doc (#43902) by @cyyever
- Move
_keys_to_ignore_on_load_missingfor now (#43893) by @ArthurZucker - Changes to cache_utils should trigger all tests all the time (#43920) by @Cyrilvallez
- Ernie4 5 vl moe (#43755) by @kaixuanliu
- Harmonize
input_embedstoinputs_embedseverywhere (#43916) by @Cyrilvallez - fix: TextClassificationPipeline docs mentioning deprecated return_all_scores (#43903) by @math-hiyoko
- Revert #43897 (#43923) by @Rocketknight1
- Fix AttributeError in OwlViT conversion script for Python 3.10+ (#43922) by @DimiChatzipavlis
- add openAI style
image_urlcontent support inapply_chat_template(#43786) by @kaixuanliu - Prepare and keep track of position ids in
generate(#43734) by @zucchini-nlp - Fix lifted_tensor in Gemma3n export which dynamo can't reason about (#43801) by @robell
- Fix bark test (#43942) by @Cyrilvallez
- Fix docker files (#43946) by @ydshieh
- Fix flaky test for multimodal LLMs (#43944) by @Rocketknight1
- Add explicit utf-8 encoding to CircleCI scripts for Windows compatibility (#43925) by @
- Modernize string formatting (f-strings) in conversion scripts (#43943) by @
- Fix weight decay exclusions in
run_*_no‑trainer.pyexamples (#42769) by @casinca - fix: Better weight decay exclusion in
run_*_no‑trainer.pyexamples (#43947) by @casinca - Timm backbone saves and loads
out_features(#43886) by @zucchini-nlp - Fix qwen-vl position ids when generating several times (#43952) by @zucchini-nlp
- Fix
get_number_of_image_tokens(#43948) by @zucchini-nlp - Fix typos in docstrings, comments, and error messages (#43949) by @
- Fix LASR test layerdrop issue (#43954) by @Rocketknight1
- [kernels] fix kernel versions (#43955) by @MekkCyber
- [Doc tests] Fix bug (#43729) by @NielsRogge
- fix(models): Preserve custom token IDs through DiaConfig save and load (#43928) by @harshaljanjani
- update somes audio models (#43865) by @Deep-unlearning
- Improve memory allocator during loading (#43945) by @Cyrilvallez
- Inclusion of process_group in the gather_full_tensor function in tensor_parallel.py (#43932) by @quic-meetkuma
- Fix sync gradient (#43919) by @SunMarc
- Reorder Trainer methods (#43914) by @SunMarc
- Fix TypeError in dot_natural_key when state_dict keys have mixed types at same position (#43966) by @shtse8
- Enhance JSON schema generation to support instance, static, and class methods (#43968) by @qgallouedec
- Remove unused squeeze from VJEPA2 embeddings rotation (#43984) by @materight
- Improve new failing test analysis for PR comment CI (#44033) by @ydshieh
- Remove
other_workflow_run_idsforissue_commentinutils/notification_service.py(#44036) by @ydshieh - stable grouped_mm API (#43977) by @IlyasMoutawwakil
- create .git-blame-ignore-revs file (#43982) by @SunMarc
- docs: fix typos across documentation files (#43993) by @saurav0369
- update python requirement to 3.10+ to match codebase (#44009) by @mariam851
- Improve use of torch.is_autocast_enabled (#43930) by @cyyever
- Use torch.xlogy (#44006) by @cyyever
- [Deespeed] fix WeightConverter.convert() use (#43926) by @kashif
- Reduce reduce CUDA sync (#44005) by @cyyever
- split out accelerator args builder method (#43987) by @winglian
- SINQ quantization strategy integration (adapted for Transformers V5) (#43112) by @ChiaraBoretti
- fix(models): Unpack BitNet packed weights to fix CI failure (#43721) by @harshaljanjani
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @ChiaraBoretti
- SINQ quantization strategy integration (adapted for Transformers V5) (#43112)
- @cyyever
- Reduce reduce CUDA sync (#44005)
- Use torch.xlogy (#44006)
- Improve use of torch.is_autocast_enabled (#43930)
- Fix old tech stack in doc (#43902)
- Update KERNELS_MIN_VERSION to 0.10.2 to be the same as setup.py (#43753)
- Remove unnecessary code or checks for PT 2.4+ (#43787)
- Fix old tech stack in doc (#43879)
- Delete batch_split from EncoderDecoderCache (#43814)
- Fix markdown documentation (#43076)
- @eustlb
- @ebezzam
- @vasqu
- @bozheng-hit
- Adding Support for Qwen3.5 (#43830)