New features
- DeepSpeed Compression: https://www.microsoft.com/en-us/research/blog/deepspeed-compression-a-composable-library-for-extreme-compression-and-zero-cost-quantization/
What's Changed
- Adding DeepSpeed Compression Composer by @yaozhewei in #2105
- Remove hardcoded ROCm install path by @mrwyattii in #2093
- Fix softmax dim of Residual MoE implementation in moe/layer.py by @hero007feng in #2110
- reduce ds-inference log verbosity by @jeffra in #2111
- DeepSpeed Compression announcement by @conglongli in #2114
- Checkpoint reshaping by @tjruwase in #1953
- Fix init_process_group by @Quentin-Anthony in #2121
- DS Benchmarks QoL Improvements by @Quentin-Anthony in #2120
- [ROCm] Wrong command broke ROCm build. by @jpvillam-amd in #2118
- DeepSpeed Communication Profiling and Logging by @Quentin-Anthony in #2012
- Add flake8 to pre-commit checks by @aphedges in #2051
- Fix conflict between Tutel and top-2 gate in MoE layer by @yetiansh in #2053
- adding HF Accelerate+DS tests workflow by @pacman100 in #2134
- [inference tests] turn off time check for now by @jeffra in #2142
- Allow turning off loss scaling wrt GAS + update tput calculator by @jeffra in #2140
- Refactor ZeRO configs to use Pydantic by @mrwyattii in #2004
- Add purely-local sliding window sparse attention config by @Quentin-Anthony in #1962
- Trajepl/nebula ckpt engine by @trajepl in #2085
- Graceful exit on failures for multi-node runs by @jerrymannil in #2008
- fix: fix BF16_Optimizer compatibility issue by @shjwudp in #2152
- Fix random token-generation issue + MP-checkpoint loading/saving by @RezaYazdaniAminabadi in #2132
- Added retain_graph as a kwarg to the main engine backward function by @ncilfone in #1149
- Elastic Training support in DeepSpeed by @aj-prime in #2156
- prevent cuda 10 builds of inference kernels on ampere by @jeffra in #2157
- [zero-3] shutdown zero.Init from within ds.init by @jeffra in #2150
- enable fp16 input autocasting by @jeffra in #2158
- Release swap buffers for persisted params by @tjruwase in #2089
- Tensor parallelism for Mixture of Experts by @siddharth9820 in #2074
New Contributors
- @hero007feng made their first contribution in #2110
- @jpvillam-amd made their first contribution in #2118
- @yetiansh made their first contribution in #2053
- @pacman100 made their first contribution in #2134
- @jimwu6 made their first contribution in #2144
- @trajepl made their first contribution in #2085
- @ncilfone made their first contribution in #1149
Full Changelog: v0.6.7...v0.7.0