microsoft/DeepSpeed v0.16.3 on GitHub

What's Changed

Update version.txt after 0.16.2 release by @loadams in #6893
Allow to compile collective for PT>2.3 by @NirSonnenschein in #6899
Zero2: avoid graph breaks in torch.compile by using param_idx by @nelyahu in #6803
hpu_accelerator: use torch.use_deterministic_algorithms by @nelyahu in #6897
Fix error caused by all_reduce call in domino by @hwchen2017 in #6880
Update Gaudi2 jobs to latest 1.19 build by @raza-sikander in #6905
Change compile for pipeline module torch.compile by @NirSonnenschein in #6478
Stage3: Use new torch grad accumulation hooks API by @deepcharm in #6773
Cleanup ops/transformer/inference tests by @loadams in #6830
Fix checkpointable_layers Logic by @Quentin-Anthony in #6881
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm by @hj-wei in #6909
Add fp8_gemm fallback for non-triton systems by @oelayan7 in #6916
Reduce the device bubble introduced by heavy loop synchronization in coalesced fetch/release(z3_leaf_module) by @inkcherry in #6694
Cleanup ops/transformer/inference tests by @loadams in #6925
Check transformers version in BLOOM for inference v1 by @lekurile in #6766
inference: remove unused _validate_args function by @nelyahu in #5505
Use torch.log1p by @kit1980 in #6930
Update python version classifiers by @loadams in #6933
Fix building on Windows with presence of Triton by @woct0rdho in #6749
Fix windows blog examples by @loadams in #6934
Add deepseek autotp by @Yejing-Lai in #6937
Add position_ids arg to OPTEmbedding forward function by @lekurile in #6939
Add information on security expectations with this software by @loadams in #6941
Support pure meta model lm_head tp by @Yejing-Lai in #6812
Remove op compilation flags due to perf issue by @NirSonnenschein in #6944
Pin nv-a6000 workflow by @loadams in #6938
[inf] Add config var to enable keeping module on host by @oelayan7 in #6846
warn to warning by @qgallouedec in #6952
Add extra_repr to Linear classes for debugging purpose by @Xia-Weiwen in #6954
Update import for torchvision.transformers by @loadams in #6958
Remove Duplicate Declaration of pandas in Dockerfile by @Zerohertz in #6959
Add the missing view operations from sequence parallel(async). by @inkcherry in #6750
Update torch.norm to torch.linalg.norm and torch.linalg.vector_norm by @loadams in #6931
Using explicit GPU upcast for ZeRO-Offload by @xylian86 in #6962

New Contributors

@hj-wei made their first contribution in #6909
@kit1980 made their first contribution in #6930
@woct0rdho made their first contribution in #6749
@Xia-Weiwen made their first contribution in #6954
@Zerohertz made their first contribution in #6959

Full Changelog: v0.16.2...v0.16.3

microsoft/DeepSpeed v0.16.3 v0.16.3 Patch Release on GitHub

What's Changed

New Contributors

microsoft/DeepSpeed v0.16.3
v0.16.3 Patch Release

on GitHub