What's Changed
- Update version after 0.18.1 release by @loadams in #7647
- Deduplicate fp32 weights under torch autocast and ZeRO3 by @eternalNight in #7651
- ulysses mpu: additional api by @stas00 in #7649
- ALST/UlyssesSP: more intuitive API wrt variable seqlen by @stas00 in #7656
- Fix misplaced overflow handling return in fused_optimizer.py by @rraminen in #7645
- [bug]: fixed comm_dtype in extra_large_param_to_reduce by @therealnaveenkamal in #7660
- UlyssesSP: TiledMLP doc - recomputes forward twice by @stas00 in #7664
- resolved a 0-dim tensor slicing bug from _get_state_without_padding by @therealnaveenkamal in #7659
- Fix typo in pytorch-profiler.md documentation by @kunheek in #7652
- README refresh by @sfc-gh-truwase in #7668
New Contributors
Full Changelog: v0.18.1...v0.18.2