What's Changed
Workflow
- [workflow] fixed broken rellease workflows (#2604) by Frank Lee
- [workflow] added cuda extension build test before release (#2598) by Frank Lee
- [workflow] hooked pypi release with lark (#2596) by Frank Lee
- [workflow] hooked docker release with lark (#2594) by Frank Lee
- [workflow] added test-pypi check before release (#2591) by Frank Lee
- [workflow] fixed the typo in the example check workflow (#2589) by Frank Lee
- [workflow] hook compatibility test failure to lark (#2586) by Frank Lee
- [workflow] hook example test alert with lark (#2585) by Frank Lee
- [workflow] added notification if scheduled build fails (#2574) by Frank Lee
- [workflow] added discussion stats to community report (#2572) by Frank Lee
- [workflow] refactored compatibility test workflow for maintenability (#2560) by Frank Lee
- [workflow] adjust the GPU memory threshold for scheduled unit test (#2558) by Frank Lee
- [workflow] fixed example check workflow (#2554) by Frank Lee
- [workflow] fixed typos in the leaderboard workflow (#2567) by Frank Lee
- [workflow] added contributor and user-engagement report (#2564) by Frank Lee
- [workflow] only report coverage for changed files (#2524) by Frank Lee
- [workflow] fixed the precommit CI (#2525) by Frank Lee
- [workflow] fixed changed file detection (#2515) by Frank Lee
- [workflow] fixed the skip condition of example weekly check workflow (#2481) by Frank Lee
- [workflow] automated bdist wheel build (#2459) by Frank Lee
- [workflow] automated the compatiblity test (#2453) by Frank Lee
- [workflow] fixed the on-merge condition check (#2452) by Frank Lee
- [workflow] make test coverage report collapsable (#2436) by Frank Lee
- [workflow] report test coverage even if below threshold (#2431) by Frank Lee
- [workflow]auto comment with test coverage report (#2419) by Frank Lee
- [workflow] auto comment if precommit check fails (#2417) by Frank Lee
- [workflow] added translation for non-english comments (#2414) by Frank Lee
- [workflow] added precommit check for code consistency (#2401) by Frank Lee
- [workflow] refactored the example check workflow (#2411) by Frank Lee
- [workflow] added nightly release to pypi (#2403) by Frank Lee
- [workflow] added missing file change detection output (#2387) by Frank Lee
- [workflow]New version: Create workflow files for examples' auto check (#2298) by ziyuhuang123
- [workflow] fixed pypi release workflow error (#2328) by Frank Lee
- [workflow] fixed pypi release workflow error (#2327) by Frank Lee
- [workflow] added workflow to release to pypi upon version change (#2320) by Frank Lee
- [workflow] removed unused assign reviewer workflow (#2318) by Frank Lee
- [workflow] rebuild cuda kernels when kernel-related files change (#2317) by Frank Lee
Release
Doc
- [doc] updated readme for CI/CD (#2600) by Frank Lee
- [doc] fixed issue link in pr template (#2577) by Frank Lee
- [doc] updated the CHANGE_LOG.md for github release page (#2552) by Frank Lee
- [doc] fixed the typo in pr template (#2556) by Frank Lee
- [doc] added pull request template (#2550) by Frank Lee
- [doc] update example link (#2520) by binmakeswell
- [doc] update opt and tutorial links (#2509) by binmakeswell
- [doc] added documentation for CI/CD (#2420) by Frank Lee
- [doc] updated kernel-related optimisers' docstring (#2385) by Frank Lee
- [doc] updated readme regarding pypi installation (#2406) by Frank Lee
- [doc] hotfix #2377 by Jiarui Fang
- [doc] hotfix #2377 by jiaruifang
- [doc] update stable diffusion link (#2322) by binmakeswell
- [doc] update diffusion doc (#2296) by binmakeswell
- [doc] update news (#2295) by binmakeswell
- [doc] update news by binmakeswell
Setup
- [setup] fixed inconsistent version meta (#2578) by Frank Lee
- [setup] refactored setup.py for dependency graph (#2413) by Frank Lee
- [setup] support pre-build and jit-build of cuda kernels (#2374) by Frank Lee
- [setup] make cuda extension build optional (#2336) by Frank Lee
- [setup] remove torch dependency (#2333) by Frank Lee
- [setup] removed the build dependency on colossalai (#2307) by Frank Lee
Tutorial
- [tutorial] polish README (#2568) by binmakeswell
- [tutorial] update fastfold tutorial (#2565) by oahzxl
Polish
- [polish] polish ColoTensor and its submodules (#2537) by HELSON
- [polish] polish code for get_static_torch_model (#2405) by HELSON
Kernel
Hotfix
- [hotfix] fix zero ddp warmup check (#2545) by ver217
- [hotfix] fix autoparallel demo (#2533) by YuliangLiu0306
- [hotfix] fix lightning error (#2529) by HELSON
- [hotfix] meta tensor default device. (#2510) by Super Daniel
- [hotfix] gpt example titans bug #2493 (#2494) by Jiarui Fang
- [hotfix] gpt example titans bug #2493 by jiaruifang
- [hotfix] add norm clearing for the overflow step (#2416) by HELSON
- [hotfix] add DISTPAN argument for benchmark (#2412) by HELSON
- [hotfix] fix gpt gemini example (#2404) by HELSON
- [hotfix] issue #2388 by Jiarui Fang
- [hotfix] issue #2388 by jiaruifang
- [hotfix] fix implement error in diffusers by Jiarui Fang
- [hotfix] fix implement error in diffusers by 1SAA
Autochunk
- [autochunk] add benchmark for transformer and alphafold (#2543) by oahzxl
- [autochunk] support multi outputs chunk search (#2538) by oahzxl
- [autochunk] support transformer (#2526) by oahzxl
- [autochunk] support parsing blocks (#2506) by oahzxl
- [autochunk] support autochunk on evoformer (#2497) by oahzxl
- [autochunk] support evoformer tracer (#2485) by oahzxl
- [autochunk] add autochunk feature by Jiarui Fang
Git
- [git] remove invalid submodule (#2540) by binmakeswell
Gemini
- [gemini] add profiler in the demo (#2534) by HELSON
- [gemini] update the gpt example (#2527) by HELSON
- [gemini] update ddp strict mode (#2518) by HELSON
- [gemini] add get static torch model (#2356) by HELSON
Example
- [example] Add fastfold tutorial (#2528) by LuGY
- [example] update lightning dependency for stable diffusion (#2522) by Jiarui Fang
- Merge pull request #2499 from feifeibear/dev0116_10 by Fazzie-Maqianli
- [example] dreambooth example by jiaruifang
- [example] fix requirements (#2488) by binmakeswell
- [example] titans for gpt (#2484) by Jiarui Fang
- [example] titans for gpt by jiaruifang
- [example] stable diffusion add roadmap (#2482) by Jiarui Fang
- [example] stable diffusion add roadmap by jiaruifang
- [example] update gpt gemini example ci test (#2477) by ver217
- [example] integrate seq-parallel tutorial with CI (#2463) by Frank Lee
- [example] update vit ci script (#2469) by ver217
- [example] integrate autoparallel demo with CI (#2466) by Frank Lee
- [example] fixed seed error in train_dreambooth_colossalai.py (#2445) by Haofan Wang
- [example] updated large-batch optimizer tutorial (#2448) by Frank Lee
- [example] updated the hybrid parallel tutorial (#2444) by Frank Lee
- [example] improved the clarity yof the example readme (#2427) by Frank Lee
- [example] removed duplicated stable diffusion example (#2424) by Frank Lee
- [example] gpt, shard init on all processes (#2366) by Jiarui Fang
- [example] upload auto parallel gpt2 demo (#2354) by YuliangLiu0306
- [example] add google doc for benchmark results of GPT (#2355) by Jiarui Fang
- [example] make gpt example directory more clear (#2353) by Jiarui Fang
- [example] simplify opt example (#2344) by Jiarui Fang
- [example] add example requirement (#2345) by binmakeswell
- [example] diffusion update diffusion,Dreamblooth (#2329) by Fazzie-Maqianli
- [example] update diffusion readme with official lightning (#2304) by Jiarui Fang
- [example] update gemini benchmark bash (#2306) by HELSON
Zero
- [zero] add zero wrappers (#2523) by HELSON
- [zero] fix gradient clipping in hybrid parallelism (#2521) by HELSON
- [zero] add strict ddp mode (#2508) by HELSON
- [zero] add unit testings for hybrid parallelism (#2486) by HELSON
- [zero] add unit test for low-level zero init (#2474) by HELSON
- [zero] polish low level optimizer (#2473) by HELSON
- [zero] low level optim supports ProcessGroup (#2464) by Jiarui Fang
- [zero] add warning for ignored parameters (#2446) by HELSON
- [zero] fix state_dict and load_state_dict for ddp ignored parameters (#2443) by HELSON
- [zero] add inference mode and its unit test (#2418) by HELSON
Autoparallel
- [autoparallel] accelerate gpt2 training (#2495) by YuliangLiu0306
- [autoparallel] support origin activation ckpt on autoprallel system (#2468) by YuliangLiu0306
- [autoparallel] update binary elementwise handler (#2451) by YuliangLiu0306
- [autoparallel] integrate device mesh initialization into autoparallelize (#2393) by YuliangLiu0306
- [autoparallel] add shard option (#2423) by YuliangLiu0306
- [autoparallel] bypass MetaInfo when unavailable and modify BCAST_FUNC_OP metainfo (#2293) by Boyuan Yao
Utils
- [utils] lazy init. (#2148) by Super Daniel
Auto-chunk
Fx
- [fx] allow control of ckpt_codegen init (#2498) by oahzxl
- [fx] allow native ckpt trace and codegen. (#2438) by Super Daniel
Ci
- [CI] add test_ci.sh for palm, opt and gpt (#2475) by Jiarui Fang
Cli
- [cli] fixed hostname mismatch error (#2465) by Frank Lee
- [cli] provided more details if colossalai run fail (#2442) by Frank Lee
- [cli] updated installation check cli for aot/jit build (#2395) by Frank Lee
Examples
- [examples] update autoparallel tutorial demo (#2449) by YuliangLiu0306
- [examples] adding tflops to PaLM (#2365) by ZijianYY
- [examples]adding tp to PaLM (#2319) by ZijianYY
- [exmaple] fix dreamblooth format (#2315) by Fazzie-Maqianli
Ddp
Docker
Worfklow
Device
- [device] find best logical mesh by Jiarui Fang
- [device] find best logical mesh by YuliangLiu0306
- [device] alpha beta profiler (#2311) by YuliangLiu0306
Pipeline
- [Pipeline] Refine GPT PP Example by Jiarui Fang
Builder
- [builder] correct readme (#2375) by Jiarui Fang
- [builder] reconfig op_builder for pypi install (#2314) by Jiarui Fang
- [builder] MOE builder (#2277) by Jiarui Fang
Auto-parallel
Amp
Autockpt
- Merge pull request #2258 from hpcaitech/debug/ckpt-autoparallel by Boyuan Yao
Full Changelog: v0.2.1...v0.2.0