huggingface/pytorch-image-models v1.0.25 on GitHub

Feb 23, 2026

Add token distillation training support to distillation task wrappers
Remove some torch.jit usage in prep for official deprecation
Caution added to AdamP optimizer
Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
Release 1.0.25

Compat Break: Fix oversight w/ QKV vs MLP bias in ParallelScalingBlock (& DiffParallelScalingBlock)
- Does not impact any trained timm models but could impact downstream use.

Token distill task & distill task refactoring by @rwightman in #2647
Fix distilled head dropout using wrong token in PiT forward_head by @hassonofer in #2649
Fix #2653, no models with weights impacted so just a clean fix by @rwightman in #2654
Add the cautious optimizer to AdamP. by @Yuan-Jinghui in #2657
Enhance the numerical stability of the Cautious Optimizer by @Yuan-Jinghui in #2658
Some misc fixes for torch.jit deprecation and meta device init by @rwightman in #2664
fix(optim): replace bare except with Exception in Lion optimizer by @llukito in #2666
Change clamp_min_ to clamp_(min=) as former doesn't work with DTensor / FSDP2 by @rwightman in #2668
Add DTensor compatible NS impl for Muon by @rwightman in #2669

Full Changelog: v1.0.24...v1.0.25