microsoft/DeepSpeed v0.8.0 on GitHub

New features

DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality
DeepSpeed Data Efficiency Library by @conglongli in #2585

What's Changed

fix blog link by @conglongli in #2600
Migrate ops tests to new inference_ops marker by @cmikeh2 in #2599
Move layer norm to new schedule by @lokoppakmsft in #2590
[deepspeed/autotuner] Bug fix for binary search for batch size by @rahilbathwal5 in #2162
Fix for older versions of pydantic by @mrwyattii in #2611
Use rocm/pytorch:latest for ROCm Dockerfile by @jithunnair-amd in #2613
skip torch.zeros and tensor.copy_ when model parallel is not used by @guoyejun in #2479
call empty_cache to really free up GPU memory as described in comment by @guoyejun in #2620
Remove GatheredParameters context from replace_with_policy by @lekurile in #2591
fixes #2498 by @clumsy in #2603
Update AVX512 Detection by @cmikeh2 in #2621
Add Megatron CI workflow by @mrwyattii in #2614
[inference] check for unsupported model generate args by @jeffra in #2627
[launcher] parse hostfile via regex and added error checks by @jeffra in #2626
Unit tests setup own venv by @mrwyattii in #2628
Fix #2409: add enable_each_rank_log to deepspeed/launcher/runner.py by @inkcherry in #2571
Fix typo in autotuner.py by @eltociear in #2639
[zero-3] Handle forward parameter return correctly in nested cases by @samyam in #2642
[inference] ds-attention refactor w.r.t. ops by @jeffra in #2623
Fix issue w. bloom int8 when changing tp size by @jeffra in #2645
fix assertion error in zero stage 3 by @GuanhuaWang in #2647
tweaks to ds-attn, distilbert policy, and mup by @jeffra in #2649
[doc] fix min_loss_scale default by @stas00 in #2660
[launcher] fail gracefully if hostname -i doesn't work as expected by @jeffra in #2631
Fix Opt injection by @RezaYazdaniAminabadi in #2541
Abstract accelerator (step 2) by @delock in #2560
Remove unnecessary device synchronization for stage 2 by @li-yi-dong in #2500
[Bug Fixed] torch.cuda.is_available -> torch.cuda.is_available() by @wkcn in #2661
[fp16] lower initial_scale_power to 16 by @stas00 in #2663
fix Tensor contiguous bug in model_compression by @xiaoxiawu-microsoft in #2671
[inference] ds-mlp refactor w.r.t. ops by @jeffra in #2668
real_accelerator validation check for both accelerator and deepspeed accelerator path by @delock in #2685
fix typo and remove duplicated code in ZeRO stage 1 and 2 by @wkcn in #2655
Add mlflow logging for aml by @cassieesvelt in #2495
Fix import error of op_builder by @tohtana in #2687
Pass training flag to forward call from module config by @lokoppakmsft in #2604
Extend quantization utils features by @lokoppakmsft in #2683
[GatheredParameters] add support for any iterable by @stas00 in #2664
Fix for latest diffusers by @mrwyattii in #2699
exclude benchmarks during install by @jeffra in #2698
Correct loss scale in ZeRO step by @jomayeri in #2695
[ZeRO] non-MoE stage 1 requires CG disabled by @jeffra in #2703
remove print side effect from importing deepspeed by @jeffra in #2704
ZeRO3 handling frozen weights by @tjruwase in #2653

New Contributors

@eltociear made their first contribution in #2639
@li-yi-dong made their first contribution in #2500
@wkcn made their first contribution in #2661
@xiaoxiawu-microsoft made their first contribution in #2671
@cassieesvelt made their first contribution in #2495
@tohtana made their first contribution in #2687

Full Changelog: v0.7.7...v0.8.0

microsoft/DeepSpeed v0.8.0 DeepSpeed v0.8.0 on GitHub

New features

What's Changed

New Contributors

microsoft/DeepSpeed v0.8.0
DeepSpeed v0.8.0

on GitHub