github microsoft/DeepSpeed v0.13.3
v0.13.3 Patch release

latest releases: v0.15.4, v0.15.3, v0.15.2...
8 months ago

What's Changed

  • Update version.txt after 0.13.2 release by @mrwyattii in #5119
  • Stop tracking backward chain of broadcast (ZeRO3) by @tohtana in #5113
  • [NPU]ZeRO-Infinity feature compatibility by @misstek in #5077
  • BF16 optimizer: Improve device utilization by immediate grad update by @deepcharm in #4975
  • removed if condition in if collate_fn is None by @bm-synth in #5107
  • disable compile tests for torch<2.1 by @mrwyattii in #5121
  • Update inference test model names by @mrwyattii in #5127
  • Fix issue with zero-sized file after merging file on curriculum map_reduce by @bm-synth in #5106
  • Update return codes in PyTest to properly error out if tests fail by @loadams in #5122
  • add missing methods to MPS_Accelerator by @mrwyattii in #5134
  • Solve tensor vs numpy dtype conflicts in data efficiency map-reduce. by @bm-synth in #5108
  • Fix broadcast deadlock for incomplete batches in data sample for data analysis by @bm-synth in #5117
  • Avoid zero-sized microbatches for incomplete minibatches when doing curriculum learning by @bm-synth in #5118
  • remove mandatory index key from output of metric_function in DataAnalysis map operation by @bm-synth in #5112
  • tensorboard logging: avoid item() outside gas to improve performance by @nelyahu in #5135
  • Check overflow on device without host synchronization for each tensor by @BacharL in #5115
  • Update nv-inference torch version by @loadams in #5128
  • Method run_map_reduce to fix errors when running run_map followed by run_reduce by @bm-synth in #5131
  • Added missing isinstance check in PR 5112 by @bm-synth in #5142
  • Fix UserWarning: The torch.cuda.*DtypeTensor constructors are no long… by @ShukantPal in #5018
  • TestEmptyParameterGroup: replace fusedAdam with torch.optim.AdamW by @nelyahu in #5139
  • Update deprecated HuggingFace function by @mrwyattii in #5144
  • Pin to PyTest 8.0.0 by @loadams in #5163
  • get_grad_norm_direct: fix a case of empty norm group by @nelyahu in #5148
  • Distributed in-memory map-reduce for data analyzer by @bm-synth in #5129
  • DeepSpeedZeroOptimizer_Stage3: remove cuda specific optimizer by @nelyahu in #5138
  • MOE: Fix save checkpoint when TP > 1 by @mosheisland in #5157
  • Fix gradient clipping by @tohtana in #5150
  • Use ninja to speed up build by @jinzhen-lin in #5088
  • Update flops profiler to handle attn and matmul by @KimmiShi in #4724
  • Fix allreduce for BF16 and ZeRO0 by @tohtana in #5170
  • Write multiple items to output file at once, in distributed data analyzer. by @bm-synth in #5169
  • Fix typos in blogs/ by @jinyouzhi in #5172
  • Inference V2 Human Eval by @lekurile in #4804
  • Reduce ds_id name length by @jomayeri in #5176
  • Switch cpu-inference workflow from --extra-index-url to --index-url by @loadams in #5182

New Contributors

Full Changelog: v0.13.2...v0.13.3

Don't miss a new DeepSpeed release

NewReleases is sending notifications on new releases.