What's Changed
- Update version.txt after 0.17.5 release by @loadams in #7502
- Support DeepSpeed offload and reload states with ZeRO1 and ZeRO2 by @LYMDLUT in #7421
- CI funding shout out to modal.com by @stas00 in #7503
- Fix assert when 'pp_int' object has no attribute 'custom_print_str' by @aeeeeeep in #7507
- Update TSC Committers by @PKUWZP in #7517
- Enabling Muon Optimizer in DeepSpeed by @PKUWZP in #7509
- Enable non-ZeRO mode by @sfc-gh-truwase in #7515
- Update README with ZenFlow release blog featured by PyTorch. by @Antlera in #7520
- Add riscv64 cpu support in deepspeed_shm_comm op by @heyujiao99 in #7519
- ZeRO3: Improve mismatch detection by @sfc-gh-truwase in #7525
- fix typo s/1014 /1024 by @digger-yu in #7528
- undo the revert by @stas00 in #7536
- [logging] less startup noise by @stas00 in #7526
- [doc] fixing moe tutorial by @stas00 in #7538
- docs typo:
lrrt.md
, reference tocycle_min_lr
should becycle_max_lr
by @jakehemmerle in #7530 - fixed DeepSpeedCPULion with ZeRO-Offload bug by @qibin0506 in #7531
- Fix scaling and allgather with
torch.autocast
by @tohtana in #7534 - Fix zenflow_torch_adam.py by @stas00 in #7544
- Relax restrictions of torch.autocast integration by @tohtana in #7543
- Autotune ZenFlow affinity by @delock in #7506
- fix get_cuda_compile_flag by @mingjielu in #7521
- avoid setting device_id to
init_process_group
by @kaixuanliu in #7542 - Improve error message and reduce validation in autocast test by @tohtana in #7547
- Revert "Add index to HPU devices (#7497)" by @deepcharm in #7545
- [ALST tutorial] support bs>1 by @sfc-gh-sbekman in #7550
- [MoE] Fix misuse of num_experts as expert parallel group size (ep_size) by @Flakes342 in #7551
- Limit random seed range in tests by @tohtana in #7553
- Fix gradient buffer access for DeepCompile Z1/2 by @tohtana in #7548
- Move modal tests to tests/v1 by @tohtana in #7557
- Add dependency for deepcompile test by @tohtana in #7558
- deepcompile: Create dummy inputs using empty_strided by @eternalNight in #7564
- deepcompile: Record graph order using OrderedDict by @eternalNight in #7563
- deepcompile: Create a full list of no-copy ops by @eternalNight in #7562
- fix npu device_id AttributeError issue by @we1sper in #7560
- Make Muon optimizer easier to enable by @delock in #7555
- scripts: Check .is_cuda only in non-C++ files by @eternalNight in #7561
- [bugfix] fix partition context unpatch by @hjh0119 in #7566
New Contributors
- @LYMDLUT made their first contribution in #7421
- @aeeeeeep made their first contribution in #7507
- @heyujiao99 made their first contribution in #7519
- @jakehemmerle made their first contribution in #7530
- @qibin0506 made their first contribution in #7531
- @mingjielu made their first contribution in #7521
- @kaixuanliu made their first contribution in #7542
- @sfc-gh-sbekman made their first contribution in #7550
- @Flakes342 made their first contribution in #7551
- @we1sper made their first contribution in #7560
- @hjh0119 made their first contribution in #7566
Full Changelog: v0.17.5...v0.17.6