What's Changed
- Update version.txt after 0.14.5 release by @loadams in #5982
- move pynvml install to setup.py by @Rohan138 in #5840
- add moe topk(k>2) gate support by @inkcherry in #5881
- Move inf_or_nan_tracker to cpu for cpu offload by @BacharL in #5826
- Enable dynamic shapes for pipeline parallel engine inputs by @tohtana in #5481
- Add and Remove ZeRO 3 Hooks by @jomayeri in #5658
- DeepNVMe GDS by @jomayeri in #5852
- Pin transformers version on nv-nightly by @loadams in #6002
- DeepSpeed on Window blog by @tjruwase in #6364
- Bug Fix 5880 by @jomayeri in #6378
- Update linear.py compatible with torch 2.4.0 by @terry-for-github in #5811
- GDS Swapping Fix by @jomayeri in #6386
- Long sequence parallelism (Ulysses) integration with HuggingFace by @samadejacobs in #5774
- reduce cpu host overhead when using moe by @ranzhejiang in #5578
- fix fp16 Qwen2 series model to DeepSpeed-FastGen by @ZonePG in #6028
- Add Japanese translation of Windows support blog by @tohtana in #6394
- Correct op_builder path to xpu files for trigger XPU tests by @loadams in #6398
- add pip install cutlass version check by @GuanhuaWang in #6393
- [XPU] API align with new intel pytorch extension release by @YizhouZ in #6395
- Pydantic v2 migration by @mrwyattii in #5167
- Fix torch check by @loadams in #6402
New Contributors
- @Rohan138 made their first contribution in #5840
- @terry-for-github made their first contribution in #5811
- @ranzhejiang made their first contribution in #5578
Full Changelog: v0.14.5...v0.15.0