huggingface/transformers v5.2.0 on GitHub

New Model additions

VoxtralRealtime

VoxtralRealtime is a streaming speech-to-text model from Mistral AI, designed for real-time automatic speech recognition (ASR). Unlike the offline Voxtral model which processes complete audio files, VoxtralRealtime is architected for low-latency, incremental transcription by processing audio in chunks as they arrive.

The model combines an audio encoder with a Mistral-based language model decoder, using time conditioning embeddings and causal convolutions with padding caches to enable efficient streaming inference.

Add Voxtral Realtime (#43769) by @eustlb

GLM-5 - GlmMoeDsa

The zAI team launches GLM-5, and introduces it as such:

GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed slime, a novel asynchronous RL infrastructure that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.

Add GlmMoeDsa (#43858) by @Cyrilvallez

Qwen3.5, Qwen3.5 Moe

The Qwen team launches Qwen 3.5, and introduces it as such:

We are delighted to announce the official release of Qwen3.5, introducing the open-weight of the first model in the Qwen3.5 series, namely Qwen3.5-397B-A17B. As a native vision-language model, Qwen3.5-397B-A17B demonstrates outstanding results across a full range of benchmark evaluations, including reasoning, coding, agent capabilities, and multimodal understanding, empowering developers and enterprises to achieve significantly greater productivity. Built on an innovative hybrid architecture that fuses linear attention (via Gated Delta Networks) with a sparse mixture-of-experts, the model attains remarkable inference efficiency: although it comprises 397 billion total parameters, just 17 billion are activated per forward pass, optimizing both speed and cost without sacrificing capability. We have also expanded our language and dialect support from 119 to 201, providing broader accessibility and enhanced support to users around the world.

Adding Support for Qwen3.5 (#43830) by @bozheng-hit

VibeVoice Acoustic Tokenizer

VibeVoice is a novel framework for synthesizing high-fidelity, long-form speech with multiple speakers by employing a next-token diffusion approach within a Large Language Model (LLM) structure. It's designed to capture the authentic conversational "vibe" and is particularly suited for generating audio content like podcasts and multi-participant audiobooks.

One key feature of VibeVoice is the use of two continuous audio tokenizers, one for extracting acoustic features and another for semantic features.

Add VibeVoice Acoustic Tokenizer (#43400) by @ebezzam

Breaking changes

🚨 [Attn] New attn mask interface everywhere (#42848)
🚨 Modify ModernBERT's default attention implementation to stop using FA (#43764)

Bugfixes and improvements

[docs] deploying (#43241) by @stevhliu
[Trainer] Move NEFTune impl to standalone functions (#43714) by @SunMarc
Fix convert_rope_params_to_dict so it uses rope_theta from the config (#43766) by @hmellor
Bump dev version (#43777) by @qgallouedec
Improved AGENTS.md (#43763) by @tarekziade
Fix-release-ubild (#43773) by @ArthurZucker
unpin torch for CircleCI (#43790) by @ydshieh
[Modular Dependencies] Fixup qwen rms norms (#43772) by @vasqu
fix(testing): Fix BLOOM tokenizer, CLAP audio features, and CLVP text tester usage in tests (#43798) by @harshaljanjani
Remove unconditional train_batch_size assignment (#43770) by @lordaarush
[Repo Consistency] Fix rms norm (#43803) by @vasqu
fix: Prevent AutoTokenizer type mismatch from directory name substrin… (#43791) by @tarekziade
Refactor trainer data_collator and callbacks tests (#43776) by @SunMarc
[core] Faster and thread-safe check_model_inputs implementation (#43765) by @Cyrilvallez
[Trainer] use deepspeed SP process group when Accelerate doesn’t build a mesh (#43799) by @kashif
fix(flaky): enforce manual seed to reduce flakiness (#43794) by @tarekziade
Add TRL CI bot workflow to trigger tests on PR comments (#43809) by @qgallouedec
Fix DeepSpeed model preparation logic in Trainer class (#43780) by @qgallouedec
[docs] reveal more in toctree (#43808) by @stevhliu
Fix markdown documentation (#43076) by @cyyever
Fix slack-report workflow file (#43851) by @ydshieh
add do_sample=False to qwen2_5_vl model tests to stablize the output (#43728) by @kaixuanliu
Fix incorrect timestamp calculation in Qwen3VL Processor (#43659) by @jonathan-fulton
Remove GPU tracking from TrackioCallback and remove env var support (#43371) by @qgallouedec
Add id and resume support to SwanLab integration (#43719) by @i-pj
fix gptoss crash in tp (#43853) by @sywangyi
Delete batch_split from EncoderDecoderCache (#43814) by @cyyever
delete unnecessary code to make moe compatible to full graph compile (#43855) by @kaixuanliu
Update ModelType for Unigram tokenizer (#43860) by @pavel-esir
[docs] Remove pipeline() examples from summarization/translation tasks (#43831) by @Mr-Neutr0n
Fix video interpolation in pe_audio_video (#43811) by @Rocketknight1
Look for the pad_token_id in the right place for Llama4 (#43539) by @Rocketknight1
Fix cardinality error for DETR models without explicit background class (#43513) by @heathdutton
docs: Add Switch Transformers docstring notes and update spectrogram comment (#43336) by @harshaljanjani
[xLSTM] Fix bugs preventing small model training (#43209) by @Anri-Lombard
docs: correct typo 'neccessary' to 'necessary' (#43868) by @thecaptain789
Improve PR comment CI feedback (#43852) by @ydshieh
Fix init weights in remote code (#43768) by @zucchini-nlp
Fix GlmMoeDsaConfig default mlp_layer_types in modular conversion (#43876) by @OiPunk
[MistralCommonBackend] fix loading proc (#43887) by @eustlb
[Jamba] Fallback to slow path and warn instead of error out (#43889) by @vasqu
Fix SwanLab callback to forward resume init args (#43848) by @OiPunk
Fix old tech stack in doc (#43879) by @cyyever
Update TrainingArguments (#43806) by @SunMarc
Remove unnecessary code or checks for PT 2.4+ (#43787) by @cyyever
Make it possible to evaluate when using sequence parallel in HF Trainer (#43517) by @jp1924
[Trainer] Move optimizer cls init to trainer_optimizer.py (#43738) by @SunMarc
fix the error of tests/quantization/fbgemm_fp8/test_fbgemm_fp8.py::Fb… (#43547) by @sywangyi
fix fbgemm fp8 multi-device load failure. (#43581) by @sywangyi
Refactor trainer init (#43807) by @SunMarc
[fix] Use last_hidden_state key from get_image_features for llama4 (#43882) by @tomaarsen
[Docs] Add docs for GLM-OCR and fix EomT-DINOv3 (#43710) by @NielsRogge
Update hub metadata (#43892) by @zucchini-nlp
[fix] DAC model: Apply STE in Dac.from_latents to match the forward pass (#43820) by @harshaljanjani
Separate check_model_inputs into capture_outputs and merge_with_config_defaults + ensure correctness (#43862) by @Cyrilvallez
Remove mask slicing in all eager attentions (#42186) by @Cyrilvallez
Fix expected DAC outputs due to (old) change in CI settings. (#43896) by @ebezzam
Minor changes trainer (#43744) by @SunMarc
adding BC for custom toks accessing slow tok attrs deprecated in v5 (#43898) by @itazap
Fix typo in quantization_operations in PEFT integrations (#43821) by @redpanda1995
Update KERNELS_MIN_VERSION to 0.10.2 to be the same as setup.py (#43753) by @cyyever
Decorate cache updates with no_grad, just in case (#43897) by @Rocketknight1
revert place_model_on_device to property (#43895) by @SunMarc
Train sampler unification (#43138) by @jiosephlee
fix(moe): Handle dtype mismatch in torch._grouped_mm with autocast (#43839) by @Mr-Neutr0n
Fix missing fast image patch counter in Glm46V (#43877) by @OiPunk
Fix old tech stack in doc (#43902) by @cyyever
Move _keys_to_ignore_on_load_missing for now (#43893) by @ArthurZucker
Changes to cache_utils should trigger all tests all the time (#43920) by @Cyrilvallez
Ernie4 5 vl moe (#43755) by @kaixuanliu
Harmonize input_embeds to inputs_embeds everywhere (#43916) by @Cyrilvallez
fix: TextClassificationPipeline docs mentioning deprecated return_all_scores (#43903) by @math-hiyoko
Revert #43897 (#43923) by @Rocketknight1
Fix AttributeError in OwlViT conversion script for Python 3.10+ (#43922) by @DimiChatzipavlis
add openAI style image_url content support in apply_chat_template (#43786) by @kaixuanliu
Prepare and keep track of position ids in generate (#43734) by @zucchini-nlp
Fix lifted_tensor in Gemma3n export which dynamo can't reason about (#43801) by @robell
Fix bark test (#43942) by @Cyrilvallez
Fix docker files (#43946) by @ydshieh
Fix flaky test for multimodal LLMs (#43944) by @Rocketknight1
Add explicit utf-8 encoding to CircleCI scripts for Windows compatibility (#43925) by @
Modernize string formatting (f-strings) in conversion scripts (#43943) by @
Fix weight decay exclusions in run_*_no‑trainer.py examples (#42769) by @casinca
fix: Better weight decay exclusion in run_*_no‑trainer.py examples (#43947) by @casinca
Timm backbone saves and loads out_features (#43886) by @zucchini-nlp
Fix qwen-vl position ids when generating several times (#43952) by @zucchini-nlp
Fix get_number_of_image_tokens (#43948) by @zucchini-nlp
Fix typos in docstrings, comments, and error messages (#43949) by @
Fix LASR test layerdrop issue (#43954) by @Rocketknight1
[kernels] fix kernel versions (#43955) by @MekkCyber
[Doc tests] Fix bug (#43729) by @NielsRogge
fix(models): Preserve custom token IDs through DiaConfig save and load (#43928) by @harshaljanjani
update somes audio models (#43865) by @Deep-unlearning
Improve memory allocator during loading (#43945) by @Cyrilvallez
Inclusion of process_group in the gather_full_tensor function in tensor_parallel.py (#43932) by @quic-meetkuma
Fix sync gradient (#43919) by @SunMarc
Reorder Trainer methods (#43914) by @SunMarc
Fix TypeError in dot_natural_key when state_dict keys have mixed types at same position (#43966) by @shtse8
Enhance JSON schema generation to support instance, static, and class methods (#43968) by @qgallouedec
Remove unused squeeze from VJEPA2 embeddings rotation (#43984) by @materight
Improve new failing test analysis for PR comment CI (#44033) by @ydshieh
Remove other_workflow_run_ids for issue_comment in utils/notification_service.py (#44036) by @ydshieh
stable grouped_mm API (#43977) by @IlyasMoutawwakil
create .git-blame-ignore-revs file (#43982) by @SunMarc
docs: fix typos across documentation files (#43993) by @saurav0369
update python requirement to 3.10+ to match codebase (#44009) by @mariam851
Improve use of torch.is_autocast_enabled (#43930) by @cyyever
Use torch.xlogy (#44006) by @cyyever
[Deespeed] fix WeightConverter.convert() use (#43926) by @kashif
Reduce reduce CUDA sync (#44005) by @cyyever
split out accelerator args builder method (#43987) by @winglian
SINQ quantization strategy integration (adapted for Transformers V5) (#43112) by @ChiaraBoretti
fix(models): Unpack BitNet packed weights to fix CI failure (#43721) by @harshaljanjani

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ChiaraBoretti
- SINQ quantization strategy integration (adapted for Transformers V5) (#43112)
@cyyever
- Reduce reduce CUDA sync (#44005)
- Use torch.xlogy (#44006)
- Improve use of torch.is_autocast_enabled (#43930)
- Fix old tech stack in doc (#43902)
- Update KERNELS_MIN_VERSION to 0.10.2 to be the same as setup.py (#43753)
- Remove unnecessary code or checks for PT 2.4+ (#43787)
- Fix old tech stack in doc (#43879)
- Delete batch_split from EncoderDecoderCache (#43814)
- Fix markdown documentation (#43076)
@eustlb
- Add Voxtral Realtime (#43769)
- [MistralCommonBackend] fix loading proc (#43887)
@ebezzam
- Fix expected DAC outputs due to (old) change in CI settings. (#43896)
- Add VibeVoice Acoustic Tokenizer (#43400)
@vasqu
- [Jamba] Fallback to slow path and warn instead of error out (#43889)
- 🚨 [Attn] New attn mask interface everywhere (#42848)
- [Repo Consistency] Fix rms norm (#43803)
- [Modular Dependencies] Fixup qwen rms norms (#43772)
@bozheng-hit
- Adding Support for Qwen3.5 (#43830)

huggingface/transformers v5.2.0 v5.2.0: GLM-5, Qwen3.5, Voxtral Realtime, VibeVoice Acoustic Tokenizer on GitHub