What's Changed
- Documentation for DeepSpeed Accelerator Abstraction Interface by @delock in #3184
- FP8 unittest for H100 by @jomayeri in #3731
- Fix apex install bugs by @loadams in #3741
- Fix Autotuner get_gas_from_user_config by @straywarrior in #3664
- Include cublas error details when getting cublas handle fails by @jli in #3695
- fix hybrid engine mlp module by @tensor-tang in #3736
- Fix output transpose dimension bugs by @loadams in #3747
- remove UtilsBuilder load, use torch (un)flatten ops by @inkcherry in #3728
- add Chinese Zhihu social account by @conglongli in #3755
- Account for expert parameters when calculating the total number of pa… by @alito in #3720
- fix ccl_backend and residual_add problems by @dc3671 in #3642
- Fix url in getting-started guide (docs) by @acforvs in #3768
- Update deepspeed-chat/japanese/README.md by @eltociear in #3765
- Add H100 workflow and status badge. by @loadams in #3754
- Zero++ tutorial PR by @HeyangQin in #3783
- [Fix] _conv_flops_compute when padding is a str and stride=1 by @zhiruiluo in #3169
- fix interpolate flops compute by @cli99 in #3782
- use
Flops Profiler
to testmodel.generate()
by @CaffreyR in #2515 - [zero] revert PR #3611 by @jeffra in #3786
New Contributors
- @straywarrior made their first contribution in #3664
- @alito made their first contribution in #3720
- @acforvs made their first contribution in #3768
- @zhiruiluo made their first contribution in #3169
- @CaffreyR made their first contribution in #2515
Full Changelog: v0.9.4...v0.9.5