What's Changed
- [MiCS] [Fix] saving and loading model checkpoint logic for MiCS sharding by @zarzen in #3440
- fix some typo by @digger-yu in #3675
- Use logger in accelerator by @tjruwase in #3682
- Update README to add ICS'23 paper on Tensor Parallel MoEs by @siddharth9820 in #3687
- non-JIT build fix on ROCm by @rraminen in #3638
- Fix local rank mismatch error when training on nodes with different number of GPUs by @byungsoo-oh in #3409
- Correct world_size/backend for mpi by @abhilash1910 in #3694
- Fix incorrectly formatted f string in hostfile checking by @loadams in #3698
- fix typo name of hybrid engine func by @tensor-tang in #3689
- Revert "fix typo name (#3689)" by @loadams in #3702
- Fix gpt-j inference issue by @RezaYazdaniAminabadi in #3639
- change partititon_name to partition_name by @digger-yu in #3700
- Fix unit test typo in tests/unit/ops/transformer/inference by @mrwyattii in #3697
- Small tweak on cuda version mismatch documentation by @jli in #3706
- DeepSpeed overview in Japanese by @conglongli in #3709
- zero3 performance optimizations by @hablb in #3622
- Fix typo in name of hybrid engine function by @loadams in #3704
- Increase tensor creator coverage by @tjruwase in #3684
- [Bugfix][CPU] Remove C++ version in CPU OpBuilder by @delock in #3643
- Single Node is using unreferenced pdsh kill cmd while terminating by @abhilash1910 in #3730
- Update Dockerfile with newer cuda and torch. by @loadams in #3716
New Contributors
- @byungsoo-oh made their first contribution in #3409
- @abhilash1910 made their first contribution in #3694
- @tensor-tang made their first contribution in #3689
- @jli made their first contribution in #3706
Full Changelog: v0.9.3...v0.9.4