What's Changed
- Add ZenFlow code for Stage 3 by @JoshWoo2003 in #7516
- [XPU][CI] recover xpu-max1100 workflow by @Liangliang-Ma in #7630
- Take **kwargs in init of DeepSpeedZeroOptimizer subclasses by @eternalNight in #7634
- add support for tensor learning rate (vs scalar) by @NirSonnenschein in #7633
- Fix illegal memory access with multi_tensor_apply size above INT_MAX by @wangyan-mms in #7639
- No Muon optimizer for embeding and lm_head layer by @delock in #7641
- z2: report param name and not zero id in assert by @stas00 in #7637
- z2: don't pass
dtypetoreport_ipg_memory_usageby @stas00 in #7636 - Ulysses HF Accelerate integration by @stas00 in #7638
- Add DataStates-LLM: Asynchronous Checkpointing Engine Support by @mauryaavinash95 in #7166
New Contributors
- @JoshWoo2003 made their first contribution in #7516
- @wangyan-mms made their first contribution in #7639
Full Changelog: v0.18.0...v0.18.1