deepspeed 0.18.3 on Python PyPI

What's Changed

Update version.txt after release by @loadams in #7675
[modal ci] fixes by @stas00 in #7676
leaf modules: explain better by @stas00 in #7674
disable nv-lightning-v100.yml cI by @stas00 in #7681
allow seperate learning rate "muon_lr" and "adam_lr" for muon optimizer by @delock in #7658
see_mem_usage: make always work by @stas00 in #7688
make debug utils more resilient by @stas00 in #7690
zero stage 1-2: don't pin memory if not configured by @stas00 in #7689
modal ci: fix group concurrency by @stas00 in #7691
Use pytorch utils to detect ninja by @Emrys-Merlin in #7687
Update SECURITY.md to point to GitHub reporting rather than Microsoft by @loadams in #7692
Add Qwen2.5 to AutoTP model list by @delock in #7696
Trust intel server for XPU tests by @tohtana in #7698
PyTorch-compatible backward API by @tohtana in #7665
Add news about Ray x DeepSpeed Meetup by @PKUWZP in #7704
Put Muon optimizer momentum buffer on GPU by @delock in #7648
[ROCm] Relax tolerances for FP8 unit test for fp16 and bf16 cases by @rraminen in #7655
Fix that ds_secondary_tensor may be dirty when loading the model or zero checkpoint for zero++. by @zhengchenyu in #7707
fix: skip aio wait when swap tensors is empty by @xylian86 in #7712
Low-precision master params/grads/optimizer states by @tohtana in #7700
Enabled compiled autograd for backward pass by @deepcharm in #7667
Wall clock timers API by @sfc-gh-truwase in #7714

Full Changelog: v0.18.2...v0.18.3