huggingface/pytorch-image-models v1.0.23 on GitHub

Dec 30, 2025

Add better NAdaMuon trained dpwee, dwee, dlittle (differential) ViTs with a small boost over previous runs
- https://huggingface.co/timm/vit_dlittle_patch16_reg1_gap_256.sbb_nadamuon_in1k (83.24% top-1)
- https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.80% top-1)
- https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_nadamuon_in1k (81.67% top-1)
Add a ~21M param timm variant of the CSATv2 model at 512x512 & 640x640
- https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
- https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
Factor non-persistent param init out of __init__ into a common method that can be externally called via init_non_persistent_buffers() after meta-device init.

Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
Add AdaMuon and NAdaMuon optimizer support to existing timm Muon impl. Appears more competitive vs AdamW with familiar hparams for image tasks.
End of year PR cleanup, merge aspects of several long open PR
- Merge differential attention (DiffAttention), add corresponding DiffParallelScalingBlock (for ViT), train some wee vits
  - https://huggingface.co/timm/vit_dwee_patch16_reg1_gap_256.sbb_in1k
  - https://huggingface.co/timm/vit_dpwee_patch16_reg1_gap_256.sbb_in1k
- Add a few pooling modules, LsePlus and SimPool
- Cleanup, optimize DropBlock2d (also add support to ByobNet based models)
Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10

Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
Remove old APEX AMP support

Add val-interval argument by @t0278611 in #2606
Add coord attn and some variants that I had lying around by @rwightman in #2617
Distill fixups by @rwightman in #2598
A simplification and some fixes for DropBlock2d. by @rwightman in #2620
Other pooling... by @rwightman in #2621
Experimenting with differential attention by @rwightman in #2314
Differential + parallel attn by @rwightman in #2625
AdaMuon impl w/ a few other ideas based on recent reading by @rwightman in #2626
Csatv2 contribution by @rwightman in #2627
Add HParams sections to hfdocs by @rwightman in #2630
Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #2633
[BUG] Modify autocasting in fast normalization functions to handle optional weight params safely by @tesfaldet in #2631
'init_non_persistent_buffers' scheme by @rwightman in #2632
Add docstrings to layer helper functions and modules by @raimbekovm in #2634
refactor(scheduler): add type hints to CosineLRScheduler by @haru-256 in #2640
A few misc weights to close out 2025 by @rwightman in #2639
Update typing in other scheduler classes. Add unit tests. by @rwightman in #2641

Full Changelog: v1.0.22...v1.0.23