What's Changed
- Update version.txt post 0.17.6 release by @loadams in #7572
- DeepCompile ZeRO-3: robust allgather for uneven shards; fix profiling… by @juyterman1000 in #7489
- logging: Also set log level of logger handlers by @eternalNight in #7576
- Deepcompile: Fix bugs when applying deepcompile to VLA-like models by @eternalNight in #7569
- Broadcast fp16 overflow in Z1 by @sfc-gh-truwase in #7580
- Deepcompile: Make size of activation to free configurable by @eternalNight in #7582
- SuperOffload Release by @xylian86 in #7559
- Include init file for superoffload folder by @nguyen599 in #7591
- disables ZeRO checkpoint loading path when stage=0 by @therealnaveenkamal in #7586
- Simplify leaf module hook by @tohtana in #7592
- Fix the universal checkpoint issue for stage3 when there are multiple subgroups. by @zhengchenyu in #7585
- Change current_device() to current_device_name() by @delock in #7600
- Fixed the problem of loading universal checkpoint error in multi-machine mode. by @zhengchenyu in #7601
- DeepCompile: Specify tensor aliasing in C++ op schema by @eternalNight in #7597
- DeepCompile: Fuse allgather and downcast by @eternalNight in #7588
- Add blog for SuperOffload by @xylian86 in #7594
- Add venv to .gitignore by @zhengchenyu in #7605
- Handle the case of DeepCompile's enabled but not activated by @tohtana in #7603
- DeepCompile: Fix IPG bucket clearing by @eternalNight in #7610
- Minor fix in the SuperOffload blog by @xylian86 in #7612
- Fixed the issue that universal checkpoint cannot be loaded for stage3 when world size expansion. by @zhengchenyu in #7599
- Fixed save_checkpoint race when consolidating NVMe offloaded tensors by @H1manshu21 in #7613
- [wall_clock_breakdown] always log stats when enabled by @stas00 in #7617
- DeepCompile: Use min_cut_rematerialization for partitioning joint graphs by @eternalNight in #7609
- Show mismatching values when DeepCompile test fails by @tohtana in #7618
- Improve leaf module interface (enable via config, relax matching criteria, add document, etc.) by @tohtana in #7604
- add print_dist util by @stas00 in #7621
- Super offload blog Chinese version by @delock in #7620
- Enable grad scaler for ZeRO-0 + torch.autocast path by @tohtana in #7619
- Blog of zenflow binding study by @delock in #7614
- Clarify document of leaf module config by @tohtana in #7623
- [TiledMLP] moe support by @stas00 in #7622
- Update email address by @sfc-gh-truwase in #7624
New Contributors
- @juyterman1000 made their first contribution in #7489
- @nguyen599 made their first contribution in #7591
- @zhengchenyu made their first contribution in #7585
- @H1manshu21 made their first contribution in #7613
Full Changelog: v0.17.6...v0.18.0