Oct 16-20, 2025
- Add an impl of the Muon optimizer (based on https://github.com/KellerJordan/Muon) with customizations
- extra flexibility and improved handling for conv weights and fallbacks for weight shapes not suited for orthogonalization
- small speedup for NS iterations by reducing allocs and using fused (b)add(b)mm ops
- by default uses AdamW (or NAdamW if nesterov=True) updates if muon not suitable for parameter shape (or excluded via param group flag)
- like torch impl, select from several LR scale adjustment fns via adjust_lr_fn
- select from several NS coefficient presets or specify your own via ns_coefficients
 
- First 2 steps of 'meta' device model initialization supported
- Fix several ops that were breaking creation under 'meta' device context
- Add device & dtype factory kwarg support to all models and modules (anything inherting from nn.Module) in timm
 
- License fields added to pretrained cfgs in code
- Release 1.0.21
What's Changed
- Add calculate_drop_path_rates helper by @rwightman in #2589
- Review huggingface_hubintegration by @Wauplin in #2592
- Adding device/dtype factory_kwargs to modules and models by @rwightman in #2591
- Consistent license handling throughout timm by @alexanderdann in #2585
- Add impl of Muon optimizer. Fix #2580 by @rwightman in #2596
- Rename 'simple' flag for Muon to 'fallback' by @rwightman in #2599
New Contributors
- @alexanderdann made their first contribution in #2585
Full Changelog: v1.0.20...v1.0.21