What's Changed
- Update version by @sfc-gh-truwase in #7719
- Disable deterministic option in compile tests by @tohtana in #7720
- Fix SuperOffloadOptimizer_Stage3 crash due to missing param_names parameter by @ImaGoodFella in #7715
- [AMD][ROCm] Improve support of AMD by @k-artem in #7448
- fix typo by @stas00 in #7722
- Skip none in backward hook by @tohtana in #7725
- [Engine] Only scale gradients if scale_wrt_gas is True by @kashif in #7724
- Fix testcases that depends on triton by @k-artem in #7731
- Fix rare hang in DeepSpeed Async I/O wait by releasing the Python GIL by @xylian86 in #7727
- Fix #7733: Replace torch.sqrt with math.sqrt in scale_lr for sqrt method by @Rakshit-gen in #7735
- replace moe checkpoint dp_world_size with seq_dp_world_size by @wukong1992 in #7732
- [BUG] Fix UlyssesSPAttentionHF.register_with_transformers() crash with PEFT models by @Rakshit-gen in #7737
- Add core api update blog by @tohtana in #7738
- Fix Nebula checkpoint engine commit() API mismatch by @Rakshit-gen in #7740
- Fix DecoupledCheckpointEngine deadlock and improve reliability by @Rakshit-gen in #7742
- Fix OnebitLamb NaN propagation with empty parameters by @Rakshit-gen in #7736
- fix: remove premature MPI environment variable check in OpenMPIRunner by @leejianwoo-collab in #7751
- Enable python 3.11 and 3.12 tests by @loadams in #7007
- Add CI workflow to run tests on AWS by @tohtana in #7753
- Add fallback to BF16 support check by @tohtana in #7754
- Fix DeepCompile for PyTorch 2.8/2.9 compatibility by @tohtana in #7755
- Removed amp testcases by @k-artem in #7745
- fix: avoid IndexError in BF16_Optimizer.destroy() when using DummyOptim by @leejianwoo-collab in #7763
New Contributors
- @ImaGoodFella made their first contribution in #7715
- @k-artem made their first contribution in #7448
- @kashif made their first contribution in #7724
- @Rakshit-gen made their first contribution in #7735
- @leejianwoo-collab made their first contribution in #7751
Full Changelog: v0.18.3...v0.18.4