What's Changed
- Update version post release by @loadams in #7850
- Z1/2 init: flatten params on device by @ksugama in #7828
- Enable shm_comm support for arm by @phalani-paladugu in #7800
- Add news entry for DeepSpeed updates by @PKUWZP in #7854
- Add EXAONE 4.0 model support for Inference V2 by @Bias92 in #7853
- Fix ROCm BF16 conversion intrinsics in inference v2 (#7843) by @tohtana in #7846
- Fix compilation of Evoformer by @Flamefire in #7862
- Throw error when parameter is modified in GatheredParameters by @tohtana in #7832
- Fix Zero-3 static scale assertion in fp16 test by @tohtana in #7866
- Schedule nightly full test by @tohtana in #7870
- Fix broken links and add AutoTP Training tutorial to sidebar nav by @tohtana in #7874
- fix: replace 35 bare except clauses with except Exception by @haosenwang1018 in #7873
- perf: use deque for FIFO queues in sequence parallel, superoffload, and compile by @giulio-leone in #7880
- Fix: only add parameter with grads to parameter group by @delock in #7869
- Fix no-grad grad-fn lookup in ZeRO hook counting on PyTorch 2.3 (#7830) by @tohtana in #7841
- Fix import deepspeed crash on PyTorch v2.3 + Python 3.12 by @tohtana in #7875
- XPU use stock pytorch instead of Intel Extension for PyTorch by @delock in #7877
- Remove amp() from abstract accelerator by @delock in #7879
- Add document section explaining autocast nesting by @tohtana in #7883
- Fix hook count performance regression from v0.18.5 by @tohtana in #7886
New Contributors
- @ksugama made their first contribution in #7828
- @phalani-paladugu made their first contribution in #7800
- @Bias92 made their first contribution in #7853
- @haosenwang1018 made their first contribution in #7873
- @giulio-leone made their first contribution in #7880
Full Changelog: v0.18.6...v0.18.7