Overview
We are happy to release the version v0.1.0
today. Compared to the previous version, we have a brand new zero module and updated many aspects of our system for better performance and usability. The latest version can be installed by pip install colossalai
now. We will update our examples and documentation in the next few days accordingly.
Highlights:
Note:
a. Only the major base commits are chosen to display. Successive commits which enhance/update the base commit are not shown.
b. Some commits do not have associated pull request ID for some unknown reasons.
c. The list is ordered by time.
Features
- add moe context, moe utilities and refactor gradient handler (#455 )By @1SAA
- [zero] Update initialize for ZeRO (#458 ) By @ver217
- [zero] hybrid cpu adam (#445 ) By @feifeibear
- added Multiply Jitter and capacity factor eval for MOE (#434 ) By @1SAA
- [fp16] refactored fp16 optimizer (#392 ) By @FrankLeeeee
- [zero] memtracer to record cuda memory usage of model data and overall system (#395 ) By @feifeibear
- Added tensor detector (#393 ) By @Gy-Lu
- Added activation offload (#331 ) By @Gy-Lu
- [zero] zero init context collect numel of model (#375 ) By @feifeibear
- Added PCIE profiler to dectect data transmission (#373 ) By @1SAA
- Added Profiler Context to manage all profilers (#340 ) By @1SAA
- set criterion as optional in colossalai initialize (#336 ) By @FrankLeeeee
- [zero] Update sharded model v2 using sharded param v2 (#323 ) By @ver217
- [zero] zero init context (#321 ) By @feifeibear
- Added profiler communication operations By @1SAA
- added buffer sync to naive amp model wrapper (#291 ) By @FrankLeeeee
- [zero] cpu adam kernel (#288 ) By @Gy-Lu
- Feature/zero (#279 ) By @feifeibear @FrankLeeeee @ver217
- impl shard optim v2 and add unit test By @ver217
- [profiler] primary memory tracer By @raejaf
- add sharded adam By @ver217
Unit Testing
- [test] fixed amp convergence comparison test (#454 ) By @FrankLeeeee
- [test] optimized zero data parallel test (#452 ) By @FrankLeeeee
- [test] make zero engine test really work (#447 ) By @feifeibear
- optimized context test time consumption (#446 ) By @FrankLeeeee
- [unitest] polish zero config in unittest (#438 ) By @feifeibear
- added testing module (#435 ) By @FrankLeeeee
- [zero] polish ShardedOptimV2 unittest (#385 ) By @feifeibear
- [unit test] Refactored test cases with component func (#339 ) By @FrankLeeeee
Documentation
- [doc] Update docstring for ZeRO (#459 ) By @ver217
- update README and images path (#384 ) By @binmakeswell
- add badge and contributor list By @FrankLeeeee
- add community group and update issue template (#271 ) By @binmakeswell
- update experimental visualization (#253 ) By @Sze-qq
- add Chinese README By @binmakeswell
CI/CD
- update github CI with the current workflow (#441 ) By @FrankLeeeee
- update unit testing CI rules By @FrankLeeeee
- added compatibility CI and options for release ci By @FrankLeeeee
- added pypi publication CI and remove formatting CI By @FrankLeeeee
Bug Fix
- fix gpt attention mask (#461 ) By @ver217
- [bug] Fixed device placement bug in memory monitor thread (#433 ) By @FrankLeeeee
- fixed fp16 optimizer none grad bug (#432 ) By @FrankLeeeee
- fixed gpt attention mask in pipeline (#430 ) By @FrankLeeeee
- [hotfix] fixed bugs in ShardStrategy and PcieProfiler (#394 ) By @1SAA
- fixed bug in activation checkpointing test (#387 ) By @FrankLeeeee
- [profiler] Fixed bugs in CommProfiler and PcieProfiler (#377 ) By @1SAA
- fixed CI dataset directory; fixed import error of 2.5d accuracy (#255 ) By @kurisusnowdeng
- fixed padding index issue for vocab parallel embedding layers; updated 3D linear to be compatible with examples in the tutorial By @kurisusnowdeng
Miscellaneous
- [log] better logging display with rich (#426 ) By @feifeibear