What's Changed
- Update version after latest release (v0.18.9) by @loadams in #7936
- Refactor consolidate transpose by @nathon-lee in #7934
- Fix/fix autotp universal checkpoint ci by @tohtana in #7937
- Fix process hang in process-group shutdown by @Flamefire in #7941
- Zero3 defragment utility by @nathon-lee in #7940
- [SP] add SP deny list instead of allow by @kashif in #7887
- fix(zero): detach flat buffer to prevent autograd inplace error on CP… by @delock in #7948
- Fix FPQuantizer build by @Flamefire in #7963
- Fix zero 1 and 2 CPU-offloaded gradient norm by @alek6kun in #7967
- Fix overlap-comm buffer lifetimes by @tohtana in #7965
- Fix DeepCompile+Z3 on PyTorch v2.9/2.10 by @tohtana in #7951
- Fix WarmupCosineLR multi-group initialization by @tohtana in #7969
- Enable PyTorch version selection for full test by @tohtana in #7968
- fix(fp_quantizer): fix UB and negative shift warnings in fp_quantize_impl.cu by @Cursx in #7973
- fix(op_builder): avoid duplicate/wrong -gencode flags by @Cursx in #7974
- Rename dequantization template parameters by @Flamefire in #7976
- Avoid CUDA reinit error in CI tests by @tohtana in #7977
- Fix ZeRO-1/2 CPU-offloaded gradient loss with multiple backward() per step by @roycho96 in #7981
- deepcompile: Fix backward graph recompilation due to unbalanced forward/backward visits by @eternalNight in #7980
- Fix Adam subgroup inconsistency by @st-bang97 in #7982
- Dynamic offload compatible with static optimizer offload by @sfc-gh-truwase in #7979
- Fix modal ci timeout by @sfc-gh-truwase in #7989
- Fix BF16_Optimizer last-microbatch grad leak under ZeRO-1 by @maxyu1115 in #7985
- fix: topkgating major bug by @excepshenal in #7986
- Add DeepSpeed NVTX domain support by @heurry in #7988
- Add Gram Newton-Schulz orthogonalization for Muon optimizer by @delock in #7953
- [AutoSP] (Sequence Parallelism) support for Multimodal Models (ViT + LLM) by @nathon-lee in #7984
- Update version.txt before 0.19.0 release by @loadams in #7995
New Contributors
- @alek6kun made their first contribution in #7967
- @Cursx made their first contribution in #7973
- @roycho96 made their first contribution in #7981
- @st-bang97 made their first contribution in #7982
- @maxyu1115 made their first contribution in #7985
- @excepshenal made their first contribution in #7986
- @heurry made their first contribution in #7988
Full Changelog: v0.18.9...v0.19.0