What's Changed
- Update version.txt after release by @loadams in #7675
- [modal ci] fixes by @stas00 in #7676
- leaf modules: explain better by @stas00 in #7674
- disable nv-lightning-v100.yml cI by @stas00 in #7681
- allow seperate learning rate "muon_lr" and "adam_lr" for muon optimizer by @delock in #7658
- see_mem_usage: make always work by @stas00 in #7688
- make debug utils more resilient by @stas00 in #7690
- zero stage 1-2: don't pin memory if not configured by @stas00 in #7689
- modal ci: fix group concurrency by @stas00 in #7691
- Use pytorch utils to detect ninja by @Emrys-Merlin in #7687
- Update SECURITY.md to point to GitHub reporting rather than Microsoft by @loadams in #7692
- Add Qwen2.5 to AutoTP model list by @delock in #7696
- Trust intel server for XPU tests by @tohtana in #7698
- PyTorch-compatible backward API by @tohtana in #7665
- Add news about Ray x DeepSpeed Meetup by @PKUWZP in #7704
- Put Muon optimizer momentum buffer on GPU by @delock in #7648
- [ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases by @rraminen in #7655
- Fix that ds_secondary_tensor may be dirty when loading the model or zero checkpoint for zero++. by @zhengchenyu in #7707
- fix: skip aio wait when swap tensors is empty by @xylian86 in #7712
- Low-precision master params/grads/optimizer states by @tohtana in #7700
- Enabled compiled autograd for backward pass by @deepcharm in #7667
- Wall clock timers API by @sfc-gh-truwase in #7714
New Contributors
- @Emrys-Merlin made their first contribution in #7687
Full Changelog: v0.18.2...v0.18.3