What's Changed
- Respect
$TRITON_HOMEby @Flamefire in #7907 - Add Feature Universal Checkpoint for AutoTP by @nathon-lee in #7908
- fix: remove unnecessary shell=True in ROCm GPU architecture detection by @instantraaamen in #7915
- Don't detect local GPU if
$DS_IGNORE_CUDA_DETECTIONis set by @Flamefire in #7896 - Add HuggingFace tp_plan support for AutoTP by @delock in #7901
- fix: handle non-existent path in is_nfs_path for Triton autotune cache by @Krishnachaitanyakc in #7921
- Fix backward compatibility of torch.amp.custom_fwd for PyTorch < 2.4 by @tohtana in #7920
- Extending Muon Optimizer Support for ZeRO Stage 3 by @PKUWZP in #7919
- Add news item for ASPLOS 2026 Best Paper Award by @PKUWZP in #7923
- fix(superoffload) preserve multi-group updates with shared cpu buffers (#7905) by @xylian86 in #7906
- AGENTS.md: Add pre-commit command to existing CI requirements line by @delock in #7930
- Update README with latest news from DeepSpeed by @PKUWZP in #7931
- Merging AutoSP into DeepSpeed by @neeldani in #7860
- Add fallback to full test by @tohtana in #7933
- Remove Microsoft Corporation copyright from AGENTS.md and CLAUDE.md by @PKUWZP in #7932
- Update version.txt for latest incoming release 0.18.9 by @loadams in #7935
New Contributors
- @instantraaamen made their first contribution in #7915
- @Krishnachaitanyakc made their first contribution in #7921
- @neeldani made their first contribution in #7860
Full Changelog: v0.18.8...v0.18.9