Dec 30, 2025
- Add better NAdaMuon trained
dpwee,dwee,dlittle(differential) ViTs with a small boost over previous runs - Add a ~21M param
timmvariant of the CSATv2 model at 512x512 & 640x640- https://huggingface.co/timm/csatv2_21m.sw_r640_in1k (83.13% top-1)
- https://huggingface.co/timm/csatv2_21m.sw_r512_in1k (82.58% top-1)
- Factor non-persistent param init out of
__init__into a common method that can be externally called viainit_non_persistent_buffers()after meta-device init.
Dec 12, 2025
- Add CSATV2 model (thanks https://github.com/gusdlf93) -- a lightweight but high res model with DCT stem & spatial attention. https://huggingface.co/Hyunil/CSATv2
- Add AdaMuon and NAdaMuon optimizer support to existing
timmMuon impl. Appears more competitive vs AdamW with familiar hparams for image tasks. - End of year PR cleanup, merge aspects of several long open PR
- Merge differential attention (
DiffAttention), add correspondingDiffParallelScalingBlock(for ViT), train some wee vits - Add a few pooling modules,
LsePlusandSimPool - Cleanup, optimize
DropBlock2d(also add support to ByobNet based models)
- Merge differential attention (
- Bump unit tests to PyTorch 2.9.1 + Python 3.13 on upper end, lower still PyTorch 1.13 + Python 3.10
Dec 1, 2025
- Add lightweight task abstraction, add logits and feature distillation support to train script via new tasks.
- Remove old APEX AMP support
What's Changed
- Add val-interval argument by @t0278611 in #2606
- Add coord attn and some variants that I had lying around by @rwightman in #2617
- Distill fixups by @rwightman in #2598
- A simplification and some fixes for DropBlock2d. by @rwightman in #2620
- Other pooling... by @rwightman in #2621
- Experimenting with differential attention by @rwightman in #2314
- Differential + parallel attn by @rwightman in #2625
- AdaMuon impl w/ a few other ideas based on recent reading by @rwightman in #2626
- Csatv2 contribution by @rwightman in #2627
- Add HParams sections to hfdocs by @rwightman in #2630
- Upgrade GitHub Actions for Node 24 compatibility by @salmanmkc in #2633
- [BUG] Modify autocasting in fast normalization functions to handle optional weight params safely by @tesfaldet in #2631
- 'init_non_persistent_buffers' scheme by @rwightman in #2632
- Add docstrings to layer helper functions and modules by @raimbekovm in #2634
- refactor(scheduler): add type hints to CosineLRScheduler by @haru-256 in #2640
- A few misc weights to close out 2025 by @rwightman in #2639
- Update typing in other scheduler classes. Add unit tests. by @rwightman in #2641
New Contributors
- @t0278611 made their first contribution in #2606
- @salmanmkc made their first contribution in #2633
- @tesfaldet made their first contribution in #2631
- @raimbekovm made their first contribution in #2634
- @haru-256 made their first contribution in #2640
Full Changelog: v1.0.22...v1.0.23