github microsoft/DeepSpeed v0.14.4
v0.14.4 Patch release

14 days ago

What's Changed

  • Update version.txt after 0.14.3 release by @mrwyattii in #5651
  • [CPU] SHM based allreduce improvement for small message size by @delock in #5571
  • _exec_forward_pass: place zeros(1) on the same device as the param by @nelyahu in #5576
  • [XPU] adapt lazy_call func to different versions by @YizhouZ in #5670
  • fix IDEX dependence in xpu accelerator by @Liangliang-Ma in #5666
  • Remove compile wrapper to simplify access to model attributes by @tohtana in #5581
  • Fix hpZ with zero element by @samadejacobs in #5652
  • Fixing the reshape bug in sequence parallel alltoall, which corrupted all QKV data by @YJHMITWEB in #5664
  • enable yuan autotp & add conv tp by @Yejing-Lai in #5428
  • Fix latest pytorch '_get_socket_with_port' import error by @Yejing-Lai in #5654
  • Fix numpy upgrade to 2.0.0 BUFSIZE import error by @Yejing-Lai in #5680
  • Update BUFSIZE to come from autotuner's constants.py, not numpy by @loadams in #5686
  • [XPU] support op builder from intel_extension_for_pytorch kernel path by @YizhouZ in #5425

New Contributors

Full Changelog: v0.14.3...v0.14.4

Don't miss a new DeepSpeed release

NewReleases is sending notifications on new releases.