New features
- DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality
- DeepSpeed Data Efficiency Library by @conglongli in #2585
What's Changed
- fix blog link by @conglongli in #2600
- Migrate ops tests to new inference_ops marker by @cmikeh2 in #2599
- Move layer norm to new schedule by @lokoppakmsft in #2590
- [deepspeed/autotuner] Bug fix for binary search for batch size by @rahilbathwal5 in #2162
- Fix for older versions of pydantic by @mrwyattii in #2611
- Use rocm/pytorch:latest for ROCm Dockerfile by @jithunnair-amd in #2613
- skip torch.zeros and tensor.copy_ when model parallel is not used by @guoyejun in #2479
- call empty_cache to really free up GPU memory as described in comment by @guoyejun in #2620
- Remove GatheredParameters context from replace_with_policy by @lekurile in #2591
- fixes #2498 by @clumsy in #2603
- Update AVX512 Detection by @cmikeh2 in #2621
- Add Megatron CI workflow by @mrwyattii in #2614
- [inference] check for unsupported model generate args by @jeffra in #2627
- [launcher] parse hostfile via regex and added error checks by @jeffra in #2626
- Unit tests setup own venv by @mrwyattii in #2628
- Fix #2409: add enable_each_rank_log to deepspeed/launcher/runner.py by @inkcherry in #2571
- Fix typo in autotuner.py by @eltociear in #2639
- [zero-3] Handle forward parameter return correctly in nested cases by @samyam in #2642
- [inference] ds-attention refactor w.r.t. ops by @jeffra in #2623
- Fix issue w. bloom int8 when changing tp size by @jeffra in #2645
- fix assertion error in zero stage 3 by @GuanhuaWang in #2647
- tweaks to ds-attn, distilbert policy, and mup by @jeffra in #2649
- [doc] fix
min_loss_scale
default by @stas00 in #2660 - [launcher] fail gracefully if hostname -i doesn't work as expected by @jeffra in #2631
- Fix Opt injection by @RezaYazdaniAminabadi in #2541
- Abstract accelerator (step 2) by @delock in #2560
- Remove unnecessary device synchronization for stage 2 by @li-yi-dong in #2500
- [Bug Fixed] torch.cuda.is_available -> torch.cuda.is_available() by @wkcn in #2661
- [fp16] lower
initial_scale_power
to16
by @stas00 in #2663 - fix Tensor contiguous bug in model_compression by @xiaoxiawu-microsoft in #2671
- [inference] ds-mlp refactor w.r.t. ops by @jeffra in #2668
- real_accelerator validation check for both accelerator and deepspeed accelerator path by @delock in #2685
- fix typo and remove duplicated code in ZeRO stage 1 and 2 by @wkcn in #2655
- Add mlflow logging for aml by @cassieesvelt in #2495
- Fix import error of op_builder by @tohtana in #2687
- Pass training flag to forward call from module config by @lokoppakmsft in #2604
- Extend quantization utils features by @lokoppakmsft in #2683
- [GatheredParameters] add support for any iterable by @stas00 in #2664
- Fix for latest diffusers by @mrwyattii in #2699
- exclude benchmarks during install by @jeffra in #2698
- Correct loss scale in ZeRO step by @jomayeri in #2695
- [ZeRO] non-MoE stage 1 requires CG disabled by @jeffra in #2703
- remove print side effect from importing deepspeed by @jeffra in #2704
- ZeRO3 handling frozen weights by @tjruwase in #2653
New Contributors
- @eltociear made their first contribution in #2639
- @li-yi-dong made their first contribution in #2500
- @wkcn made their first contribution in #2661
- @xiaoxiawu-microsoft made their first contribution in #2671
- @cassieesvelt made their first contribution in #2495
- @tohtana made their first contribution in #2687
Full Changelog: v0.7.7...v0.8.0