Version v0.1.7 Released Today
Highlights
- Started torch.fx for auto-parallel training
- Update the zero mechanism with ColoTensor
- Fixed various bugs
What's Changed
Hotfix
- [hotfix] prevent nested ZeRO (#1140) by ver217
- [hotfix]fix bugs caused by refactored pipeline (#1133) by YuliangLiu0306
- [hotfix] fix param op hook (#1131) by ver217
- [hotfix] fix zero init ctx numel (#1128) by ver217
- [hotfix]change to fit latest p2p (#1100) by YuliangLiu0306
- [hotfix] fix chunk comm src rank (#1072) by ver217
Zero
- [zero] avoid zero hook spam by changing log to debug level (#1137) by Frank Lee
- [zero] added error message to handle on-the-fly import of torch Module class (#1135) by Frank Lee
- [zero] fixed api consistency (#1098) by Frank Lee
- [zero] zero optim copy chunk rather than copy tensor (#1070) by ver217
Optim
Ddp
- [ddp] add save/load state dict for ColoDDP (#1127) by ver217
- [ddp] add set_params_to_ignore for ColoDDP (#1122) by ver217
- [ddp] supported customized torch ddp configuration (#1123) by Frank Lee
Pipeline
- [pipeline]support List of Dict data (#1125) by YuliangLiu0306
- [pipeline] supported more flexible dataflow control for pipeline parallel training (#1108) by Frank Lee
- [pipeline] refactor the pipeline module (#1087) by Frank Lee
Fx
- [fx]add autoparallel passes (#1121) by YuliangLiu0306
- [fx] added unit test for coloproxy (#1119) by Frank Lee
- [fx] added coloproxy (#1115) by Frank Lee
Gemini
- [gemini] gemini mgr supports "cpu" placement policy (#1118) by ver217
- [gemini] zero supports gemini (#1093) by ver217
Test
- [test] fixed hybrid parallel test case on 8 GPUs (#1106) by Frank Lee
- [test] skip tests when not enough GPUs are detected (#1090) by Frank Lee
- [test] ignore 8 gpu test (#1080) by Frank Lee
Release
Tensor
- [tensor] refactor param op hook (#1097) by ver217
- [tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077) by ver217
- [Tensor] fix equal assert (#1091) by Ziyue Jiang
- [Tensor] 1d row embedding (#1075) by Ziyue Jiang
- [tensor] chunk manager monitor mem usage (#1076) by ver217
- [Tensor] fix optimizer for CPU parallel (#1069) by Ziyue Jiang
- [Tensor] add hybrid device demo and fix bugs (#1059) by Ziyue Jiang
Amp
Workflow
- [workflow] fixed 8-gpu test workflow (#1101) by Frank Lee
- [workflow] added regular 8 GPU testing (#1099) by Frank Lee
- [workflow] disable p2p via shared memory on non-nvlink machine (#1086) by Frank Lee
Engine
Doc
Context
- [context] support lazy init of module (#1088) by Frank Lee
- [context] maintain the context object in with statement (#1073) by Frank Lee
Refactory
- [refactory] add nn.parallel module (#1068) by Jiarui Fang
Cudnn
Full Changelog: v0.1.7...v0.1.6