github deepspeedai/DeepSpeed v0.19.2
v0.19.2 Patch Release

8 hours ago

What's Changed

  • fix(fp16): filter requires_grad in FP16 optimizer flat buffer init by @avicooper1 in #8029
  • Run AutoSP compile tests sequentially by @tohtana in #8020
  • Fix PR-target workflow concurrency groups by @tohtana in #8017
  • Fix full CI test isolation for ZeRO chmod and NVMe quantization tests by @tohtana in #8008
  • Keep required CI checks visible for ignored paths by @tohtana in #8019
  • Bump version by @sfc-gh-truwase in #8030
  • Add engine.coalesce_grad_reduction() for ZeRO 1/2/3 multi-backward by @roycho96 in #7992
  • feat(zero): enable torch.func transforms on engine for ZeRO 0/1/2 by @roycho96 in #8026
  • Simplify module_inject.transpose by @xbcReal in #8028
  • Fix DeepCompile all-gather scheduler candidate selection by @tohtana in #8033
  • Version fix to unblock pypi by @sfc-gh-truwase in #8039
  • Bump version after 0.19.1 release by @tohtana in #8040
  • Fix DeepCompile ZeRO-3 release parameter lifetime by @tohtana in #8032
  • Fix ZenFlow ZeRO-3 selective optimizer crash with parameter offload on nvme by @Antlera in #8042
  • Add test coverage for Muon muon_lr/adam_lr overrides by @sowndappan5 in #8047
  • Avoid HF Hub access in CPU unit test setup by @tohtana in #8053
  • Fix DeepCompile ZeRO-1 grad target lifetime by @tohtana in #8036
  • Normalize ZeRO-3 DeepCompile grad dtype before reduction by @tohtana in #8038
  • Remove AutoSP assertion against Transformers version by @tohtana in #8044
  • fix(transformer): use correct stride in Transpose_Kernel shared memory indexing to eliminate bank conflicts by @flutist in #8055
  • zero3: invalidate coordinator trace on hook re-registration by @roycho96 in #8043
  • Consistent fp32 grads flow by @sfc-gh-truwase in #8056
  • Add AutoEP by @tohtana in #7938
  • Fix: ZenFlow Adam integration for updated PyTorch backward flow (#7759) by @Antlera in #7771
  • Pass expected grad dtype to register_z3_param in ZeRO-3 release test by @tohtana in #8063
  • Add Biren SUPA accelerator support by @frozenleaves in #8054
  • Mixed-precision: per-policy param/buffer dtype cast (preserve fp32 buffers) by @sfc-gh-truwase in #8066

New Contributors

Full Changelog: v0.19.1...v0.19.2

Don't miss a new DeepSpeed release

NewReleases is sending notifications on new releases.