deepspeed 0.18.7 on Python PyPI

What's Changed

Update version post release by @loadams in #7850
Z1/2 init: flatten params on device by @ksugama in #7828
Enable shm_comm support for arm by @phalani-paladugu in #7800
Add news entry for DeepSpeed updates by @PKUWZP in #7854
Add EXAONE 4.0 model support for Inference V2 by @Bias92 in #7853
Fix ROCm BF16 conversion intrinsics in inference v2 (#7843) by @tohtana in #7846
Fix compilation of Evoformer by @Flamefire in #7862
Throw error when parameter is modified in GatheredParameters by @tohtana in #7832
Fix Zero-3 static scale assertion in fp16 test by @tohtana in #7866
Schedule nightly full test by @tohtana in #7870
Fix broken links and add AutoTP Training tutorial to sidebar nav by @tohtana in #7874
fix: replace 35 bare except clauses with except Exception by @haosenwang1018 in #7873
perf: use deque for FIFO queues in sequence parallel, superoffload, and compile by @giulio-leone in #7880
Fix: only add parameter with grads to parameter group by @delock in #7869
Fix no-grad grad-fn lookup in ZeRO hook counting on PyTorch 2.3 (#7830) by @tohtana in #7841
Fix import deepspeed crash on PyTorch v2.3 + Python 3.12 by @tohtana in #7875
XPU use stock pytorch instead of Intel Extension for PyTorch by @delock in #7877
Remove amp() from abstract accelerator by @delock in #7879
Add document section explaining autocast nesting by @tohtana in #7883
Fix hook count performance regression from v0.18.5 by @tohtana in #7886

Full Changelog: v0.18.6...v0.18.7