microsoft/DeepSpeed v0.13.3 on GitHub

What's Changed

Update version.txt after 0.13.2 release by @mrwyattii in #5119
Stop tracking backward chain of broadcast (ZeRO3) by @tohtana in #5113
[NPU]ZeRO-Infinity feature compatibility by @misstek in #5077
BF16 optimizer: Improve device utilization by immediate grad update by @deepcharm in #4975
removed if condition in if collate_fn is None by @bm-synth in #5107
disable compile tests for torch<2.1 by @mrwyattii in #5121
Update inference test model names by @mrwyattii in #5127
Fix issue with zero-sized file after merging file on curriculum map_reduce by @bm-synth in #5106
Update return codes in PyTest to properly error out if tests fail by @loadams in #5122
add missing methods to MPS_Accelerator by @mrwyattii in #5134
Solve tensor vs numpy dtype conflicts in data efficiency map-reduce. by @bm-synth in #5108
Fix broadcast deadlock for incomplete batches in data sample for data analysis by @bm-synth in #5117
Avoid zero-sized microbatches for incomplete minibatches when doing curriculum learning by @bm-synth in #5118
remove mandatory index key from output of metric_function in DataAnalysis map operation by @bm-synth in #5112
tensorboard logging: avoid item() outside gas to improve performance by @nelyahu in #5135
Check overflow on device without host synchronization for each tensor by @BacharL in #5115
Update nv-inference torch version by @loadams in #5128
Method run_map_reduce to fix errors when running run_map followed by run_reduce by @bm-synth in #5131
Added missing isinstance check in PR 5112 by @bm-synth in #5142
Fix UserWarning: The torch.cuda.*DtypeTensor constructors are no long… by @ShukantPal in #5018
TestEmptyParameterGroup: replace fusedAdam with torch.optim.AdamW by @nelyahu in #5139
Update deprecated HuggingFace function by @mrwyattii in #5144
Pin to PyTest 8.0.0 by @loadams in #5163
get_grad_norm_direct: fix a case of empty norm group by @nelyahu in #5148
Distributed in-memory map-reduce for data analyzer by @bm-synth in #5129
DeepSpeedZeroOptimizer_Stage3: remove cuda specific optimizer by @nelyahu in #5138
MOE: Fix save checkpoint when TP > 1 by @mosheisland in #5157
Fix gradient clipping by @tohtana in #5150
Use ninja to speed up build by @jinzhen-lin in #5088
Update flops profiler to handle attn and matmul by @KimmiShi in #4724
Fix allreduce for BF16 and ZeRO0 by @tohtana in #5170
Write multiple items to output file at once, in distributed data analyzer. by @bm-synth in #5169
Fix typos in blogs/ by @jinyouzhi in #5172
Inference V2 Human Eval by @lekurile in #4804
Reduce ds_id name length by @jomayeri in #5176
Switch cpu-inference workflow from --extra-index-url to --index-url by @loadams in #5182

New Contributors

@bm-synth made their first contribution in #5107
@KimmiShi made their first contribution in #4724

Full Changelog: v0.13.2...v0.13.3

microsoft/DeepSpeed v0.13.3 v0.13.3 Patch release on GitHub

What's Changed

New Contributors

microsoft/DeepSpeed v0.13.3
v0.13.3 Patch release

on GitHub