Overview
Here are the main improvements of this release:
- MOE and BERT models can be trained with ZeRO.
- Provide a uniform checkpoint for all kinds of parallelism.
- Optimize ZeRO-offload, and improve model scaling.
- Design a uniform model memory tracer.
- Implement an efficient hybrid Adam (CPU and CUDA kernels).
- Improve activation offloading.
- Profiler TensorBoard plugin of Beta version.
- Refactor pipeline module for closer integration with engine.
- Chinese tutorials, WeChat and Slack user groups.
What's Changed
Features
- [zero] get memory usage for sharded param by @feifeibear in #536
- [zero] improve the accuracy of get_memory_usage of sharded param by @feifeibear in #538
- [zero] refactor model data tracing by @feifeibear in #537
- [zero] get memory usage of sharded optim v2. by @feifeibear in #542
- [zero] polish ZeroInitContext by @ver217 in #540
- [zero] optimize grad offload by @ver217 in #539
- [zero] non model data tracing by @feifeibear in #545
- [zero] add zero config to neutralize zero context init by @1SAA in #546
- [zero] dump memory stats for sharded model by @feifeibear in #548
- [zero] add stateful tensor by @feifeibear in #549
- [zero] label state for param fp16 and grad by @feifeibear in #551
- [zero] hijack p.grad in sharded model by @ver217 in #554
- [utils] update colo tensor moving APIs by @feifeibear in #553
- [polish] rename col_attr -> colo_attr by @feifeibear in #558
- [zero] trace states of fp16/32 grad and fp32 param by @ver217 in #571
- [zero] adapt zero for unsharded parameters by @1SAA in #561
- [refactor] memory utils by @feifeibear in #577
- Feature/checkpoint gloo by @kurisusnowdeng in #589
- [zero] add sampling time for memstats collector by @Gy-Lu in #610
- [model checkpoint] checkpoint utils by @kurisusnowdeng in #592
- [model checkpoint][hotfix] unified layers for save&load by @kurisusnowdeng in #593
- Feature/checkpoint 2D by @kurisusnowdeng in #595
- Feature/checkpoint 1D by @kurisusnowdeng in #594
- [model checkpoint] CPU communication ops by @kurisusnowdeng in #590
- Feature/checkpoint 2.5D by @kurisusnowdeng in #596
- Feature/Checkpoint 3D by @kurisusnowdeng in #597
- [model checkpoint] checkpoint hook by @kurisusnowdeng in #598
- Feature/Checkpoint tests by @kurisusnowdeng in #599
- [zero] adapt zero for unsharded parameters (Optimizer part) by @1SAA in #601
- [zero] polish init context by @feifeibear in #645
- refactor pipeline---put runtime schedule into engine. by @YuliangLiu0306 in #627
Bug Fix
- [Zero] process no-leaf-module in Zero by @1SAA in #535
- Add gather_out arg to Linear by @Wesley-Jzy in #541
- [hoxfix] fix parallel_input flag for Linear1D_Col gather_output by @Wesley-Jzy in #579
- [hotfix] add hybrid adam to init by @ver217 in #584
- Hotfix/path check util by @kurisusnowdeng in #591
- [hotfix] fix sharded optim zero grad by @ver217 in #604
- Add tensor parallel input check by @Wesley-Jzy in #621
- [hotfix] Raise messages for indivisible batch sizes with tensor parallelism by @number1roy in #622
- [zero] fixed the activation offload by @Gy-Lu in #647
- fixed bugs in CPU adam by @1SAA in #633
- Revert "[zero] polish init context" by @feifeibear in #657
- [hotfix] fix a bug in model data stats tracing by @feifeibear in #655
- fix bugs for unsharded parameters when restore data by @1SAA in #664
Unit Testing
- [zero] test zero tensor utils by @FredHuang99 in #609
- remove hybrid adam in test_moe_zero_optim by @1SAA in #659
Documentation
- Refactored docstring to google style by @number1roy in #532
- [docs] updatad docs of hybrid adam and cpu adam by @Gy-Lu in #552
- html refactor by @number1roy in #555
- [doc] polish docstring of zero by @ver217 in #612
- [doc] update rst by @ver217 in #615
- [doc] polish amp docstring by @ver217 in #616
- [doc] polish moe docsrting by @ver217 in #618
- [doc] polish optimizer docstring by @ver217 in #619
- [doc] polish utils docstring by @ver217 in #620
- [NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cuda_util.cu … by @GaryGky in #625
- [doc] polish checkpoint docstring by @ver217 in #637
- update GPT-2 experiment result by @Sze-qq in #666
- [NFC] polish code by @binmakeswell in #646
Model Zoo
Miscellaneous
- [logging] polish logger format by @feifeibear in #543
- [profiler] add MemProfiler by @raejaf in #356
- [Bot] Synchronize Submodule References by @github-actions in #501
- [tool] create .clang-format for pre-commit by @BoxiangW in #578
- [GitHub] Add prefix and label in issue template by @binmakeswell in #652
Full Changelog: v0.1.1...v0.1.2