github huggingface/pytorch-image-models v1.0.25
Release v1.0.25

9 hours ago

Feb 23, 2026

  • Add token distillation training support to distillation task wrappers
  • Remove some torch.jit usage in prep for official deprecation
  • Caution added to AdamP optimizer
  • Call reset_parameters() even if meta-device init so that buffers get init w/ hacks like init_empty_weights
  • Tweak Muon optimizer to work with DTensor/FSDP2 (clamp_ instead of clamp_min_, alternate NS branch for DTensor)
  • Release 1.0.25

Jan 21, 2026

  • Compat Break: Fix oversight w/ QKV vs MLP bias in ParallelScalingBlock (& DiffParallelScalingBlock)
    • Does not impact any trained timm models but could impact downstream use.

What's Changed

  • Token distill task & distill task refactoring by @rwightman in #2647
  • Fix distilled head dropout using wrong token in PiT forward_head by @hassonofer in #2649
  • Fix #2653, no models with weights impacted so just a clean fix by @rwightman in #2654
  • Add the cautious optimizer to AdamP. by @Yuan-Jinghui in #2657
  • Enhance the numerical stability of the Cautious Optimizer by @Yuan-Jinghui in #2658
  • Some misc fixes for torch.jit deprecation and meta device init by @rwightman in #2664
  • fix(optim): replace bare except with Exception in Lion optimizer by @llukito in #2666
  • Change clamp_min_ to clamp_(min=) as former doesn't work with DTensor / FSDP2 by @rwightman in #2668
  • Add DTensor compatible NS impl for Muon by @rwightman in #2669

New Contributors

Full Changelog: v1.0.24...v1.0.25

Don't miss a new pytorch-image-models release

NewReleases is sending notifications on new releases.