What's Changed
Release
- [release] update version (#6041) by Hongxin Liu
Fp8
- [fp8] disable all_to_all_fp8 in intranode (#6045) by Hanks
- [fp8] fix linear hook (#6046) by Hongxin Liu
- [fp8] optimize all-gather (#6043) by Hongxin Liu
- [FP8] unsqueeze scale to make it compatible with torch.compile (#6040) by Guangyao Zhang
- Merge pull request #6012 from hpcaitech/feature/fp8_comm by Hongxin Liu
- Merge pull request #6033 from wangbluo/fix by Wang Binluo
- Merge pull request #6024 from wangbluo/fix_merge by Wang Binluo
- Merge pull request #6023 from wangbluo/fp8_merge by Wang Binluo
- [fp8] Merge feature/fp8_comm to main branch of Colossalai (#6016) by Wang Binluo
- [fp8] zero support fp8 linear. (#6006) by flybird11111
- [fp8] add use_fp8 option for MoeHybridParallelPlugin (#6009) by Wang Binluo
- [fp8]update reduce-scatter test (#6002) by flybird11111
- [fp8] linear perf enhancement by botbw
- [fp8] update torch.compile for linear_fp8 to >= 2.4.0 (#6004) by botbw
- [fp8] support asynchronous FP8 communication (#5997) by flybird11111
- [fp8] refactor fp8 linear with compile (#5993) by Hongxin Liu
- [fp8] support hybrid parallel plugin (#5982) by Wang Binluo
- [fp8]Moe support fp8 communication (#5977) by flybird11111
- [fp8] use torch compile (torch >= 2.3.0) (#5979) by botbw
- [fp8] support gemini plugin (#5978) by Hongxin Liu
- [fp8] support fp8 amp for hybrid parallel plugin (#5975) by Hongxin Liu
- [fp8] add fp8 linear (#5967) by Hongxin Liu
- [fp8]support all2all fp8 (#5953) by flybird11111
- [FP8] rebase main (#5963) by flybird11111
- Merge pull request #5961 from ver217/feature/zeor-fp8 by Hanks
- [fp8] add fp8 comm for low level zero by ver217
Hotfix
- [Hotfix] Remove deprecated install (#6042) by Tong Li
- [Hotfix] Fix llama fwd replacement bug (#6031) by Wenxuan Tan
- [Hotfix] Avoid fused RMSnorm import error without apex (#5985) by Edenzzzz
- [Hotfix] README link (#5966) by Tong Li
- [hotfix] Remove unused plan section (#5957) by Tong Li
Colossalai/checkpoint_io/...
- [colossalai/checkpoint_io/...] fix bug in load_state_dict_into_model; format error msg (#6020) by Gao, Ruiyuan
Colossal-llama
Plugin
- [plugin] hotfix zero plugin (#6036) by Hongxin Liu
- [plugin] add cast inputs option for zero (#6003) (#6022) by Hongxin Liu
- [plugin] add cast inputs option for zero (#6003) by Hongxin Liu
Ci
- [CI] Remove triton version for compatibility bug; update req torch >=2.2 (#6018) by Wenxuan Tan
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5995) by pre-commit-ci[bot]
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
Colossalchat
Misc
- [misc] Use dist logger in plugins (#6011) by Edenzzzz
- [misc] update compatibility (#6008) by Hongxin Liu
- [misc] Bypass the huggingface bug to solve the mask mismatch problem (#5991) by Haze188
- [misc] remove useless condition by haze188
- [misc] fix ci failure: change default value to false in moe plugin by haze188
- [misc] remove incompatible test config by haze188
- [misc] remove debug/print code by haze188
- [misc] skip redunant test by haze188
- [misc] solve booster hang by rename the variable by haze188
Feature
- [Feature] Zigzag Ring attention (#5905) by Edenzzzz
- [Feature]: support FP8 communication in DDP, FSDP, Gemini (#5928) by Hanks
- [Feature] llama shardformer fp8 support (#5938) by Guangyao Zhang
- [Feature] MoE Ulysses Support (#5918) by Haze188
Chat
- [Chat] fix readme (#5989) by YeAnbang
- Merge pull request #5962 from hpcaitech/colossalchat by YeAnbang
- [Chat] Fix lora (#5946) by YeAnbang
Test ci
- [test ci]Feature/fp8 comm (#5981) by flybird11111
Docs
- [Docs] clarify launch port by Edenzzzz
Test
- [test] add zero fp8 test case by ver217
- [test] add check by hxwang
- [test] fix test: test_zero1_2 by hxwang
- [test] add mixtral modelling test by botbw
- [test] pass mixtral shardformer test by botbw
- [test] mixtra pp shard test by hxwang
- [test] add mixtral transformer test by hxwang
- [test] add mixtral for sequence classification by hxwang
Lora
- [lora] lora support hybrid parallel plugin (#5956) by Wang Binluo
Feat
Chore
- [chore] remove redundant test case, print string & reduce test tokens by botbw
- [chore] docstring by hxwang
- [chore] change moe_pg_mesh to private by hxwang
- [chore] solve moe ckpt test failure and some other arg pass failure by hxwang
- [chore] minor fix after rebase by hxwang
- [chore] minor fix by hxwang
- [chore] arg pass & remove drop token by hxwang
- [chore] trivial fix by botbw
- [chore] manually revert unintended commit by botbw
- [chore] handle non member group by hxwang
Moe
- [moe] solve dp axis issue by botbw
- [moe] remove force_overlap_comm flag and add warning instead by hxwang
- Revert "[moe] implement submesh initialization" by hxwang
- [moe] refactor mesh assignment by hxwang
- [moe] deepseek moe sp support by haze188
- [moe] remove ops by hxwang
- [moe] full test for deepseek and mixtral (pp + sp to fix) by hxwang
- [moe] finalize test (no pp) by hxwang
- [moe] init moe plugin comm setting with sp by hxwang
- [moe] clean legacy code by hxwang
- [moe] test deepseek by hxwang
- [moe] implement tp by botbw
- [moe] add mixtral dp grad scaling when not all experts are activated by botbw
- [moe] implement submesh initialization by botbw
- [moe] implement transit between non moe tp and ep by botbw
- [moe] fix plugin by hxwang
Doc
- [doc] add MoeHybridParallelPlugin docstring by botbw
Deepseek
- [deepseek] replace attn (a workaround for bug in transformers) by hxwang
Bug
- [bug] fix: somehow logger hangs the program by botbw
Zero
Full Changelog: v0.4.3...v0.4.2