hpcaitech/ColossalAI v0.3.5 on GitHub

What's Changed

Release

[release] update version (#5380) by Hongxin Liu

Llama

Merge pull request #5377 from hpcaitech/example/llama-npu by Frank Lee
[llama] fix memory issue (#5371) by Hongxin Liu
[llama] polish training script and fix optim ckpt (#5368) by Hongxin Liu
[llama] fix neftune & pbar with start_step (#5364) by Camille Zhong
[llama] add flash attn patch for npu (#5362) by Hongxin Liu
[llama] update training script (#5360) by Hongxin Liu
[llama] fix dataloader for hybrid parallel (#5358) by Hongxin Liu

Moe

[moe] fix tests by ver217
[moe] fix mixtral optim checkpoint (#5344) by Hongxin Liu
[moe] fix mixtral forward default value (#5329) by Hongxin Liu
[moe] fix mixtral checkpoint io (#5314) by Hongxin Liu
[moe] support mixtral (#5309) by Hongxin Liu
[moe] update capacity computing (#5253) by Hongxin Liu
[moe] init mixtral impl by Xuanlei Zhao
[moe]: fix ep/tp tests, add hierarchical all2all (#4982) by Wenhao Chen
[moe] support optimizer checkpoint (#5015) by Xuanlei Zhao
[moe] merge moe into main (#4978) by Xuanlei Zhao

Lr-scheduler

[lr-scheduler] fix load state dict and add test (#5369) by Hongxin Liu

Eval

[eval] update llama npu eval (#5366) by Camille Zhong

Gemini

[gemini] fix param op hook when output is tuple (#5355) by Hongxin Liu
[gemini] hotfix NaN loss while using Gemini + tensor_parallel (#5150) by flybird11111
[gemini]fix gemini optimzer, saving Shardformer in Gemini got list assignment index out of range (#5085) by flybird11111
[gemini] gemini support extra-dp (#5043) by flybird11111
[gemini] gemini support tensor parallelism. (#4942) by flybird11111

Fix

[fix] remove unnecessary dp_size assert (#5351) by Wenhao Chen

Checkpointio

[checkpointio] fix gemini and hybrid parallel optim checkpoint (#5347) by Hongxin Liu

Chat

[Chat] fix sft loss nan (#5345) by YeAnbang

Extension

[extension] fixed exception catch (#5342) by Frank Lee

Doc

[doc] added docs for extensions (#5324) by Frank Lee
[doc] add llama2-13B disyplay (#5285) by Desperado-Jia
[doc] fix doc typo (#5256) by binmakeswell
[doc] fix typo in Colossal-LLaMA-2/README.md (#5247) by digger yu
[doc] SwiftInfer release (#5236) by binmakeswell
[doc] add Colossal-LLaMA-2-13B (#5234) by binmakeswell
[doc] Make leaderboard format more uniform and good-looking (#5231) by JIMMY ZHAO
[doc] Update README.md of Colossal-LLAMA2 (#5233) by Camille Zhong
[doc] Update required third-party library list for testing and torch comptibility checking (#5207) by Zhongkai Zhao
[doc] update pytorch version in documents. (#5177) by flybird11111
[doc] fix colossalqa document (#5146) by Michelle
[doc] updated paper citation (#5131) by Frank Lee
[doc] add moe news (#5128) by binmakeswell

Tests

[tests] fix t5 test. (#5322) by flybird11111

Accelerator

Merge pull request #5321 from FrankLeeeee/hotfix/accelerator-api by Frank Lee
[accelerator] fixed npu api by FrankLeeeee
[accelerator] init the accelerator module (#5129) by Frank Lee

Workflow

[workflow] updated CI image (#5318) by Frank Lee
[workflow] fixed oom tests (#5275) by Frank Lee
[workflow] fixed incomplete bash command (#5272) by Frank Lee
[workflow] fixed build CI (#5240) by Frank Lee

Feat

[feat] refactored extension module (#5298) by Frank Lee

Nfc

[NFC] polish applications/Colossal-LLaMA-2/colossal_llama2/tokenizer/init_tokenizer.py code style (#5228) by 李文军
[nfc] fix typo colossalai/shardformer/ (#5133) by digger yu
[nfc] fix typo change directoty to directory (#5111) by digger yu
[nfc] fix typo and author name (#5089) by digger yu
[nfc] fix typo in docs/ (#4972) by digger yu

Hotfix

[hotfix] fix 3d plugin test (#5292) by Hongxin Liu
[hotfix] Fix ShardFormer test execution path when using sequence parallelism (#5230) by Zhongkai Zhao
[hotfix]: add pp sanity check and fix mbs arg (#5268) by Wenhao Chen
[hotfix] removed unused flag (#5242) by Frank Lee
[hotfix] fixed memory usage of shardformer module replacement (#5122) by アマデウス
[Hotfix] Fix model policy matching strategy in ShardFormer (#5064) by Zhongkai Zhao
[hotfix]: modify create_ep_hierarchical_group and add test (#5032) by Wenhao Chen
[hotfix] Suport extra_kwargs in ShardConfig (#5031) by Zhongkai Zhao
[hotfix] Add layer norm gradients all-reduce for sequence parallel (#4926) by littsk
[hotfix] fix grad accumulation plus clipping for gemini (#5002) by Baizhou Zhang

Sync

Merge pull request #5278 from ver217/sync/npu by Frank Lee

Shardformer

[shardformer] hybridparallelplugin support gradients accumulation. (#5246) by flybird11111
[shardformer] llama support DistCrossEntropy (#5176) by flybird11111
[shardformer]: support gpt-j, falcon, Mistral and add interleaved pipeline for bert (#5088) by Wenhao Chen
[shardformer]fix flash attention, when mask is casual, just don't unpad it (#5084) by flybird11111
[shardformer] fix llama error when transformers upgraded. (#5055) by flybird11111
[shardformer] Fix serialization error with Tensor Parallel state saving (#5018) by Jun Gao

Ci

[ci] fix test_hybrid_parallel_plugin_checkpoint_io.py (#5276) by flybird11111
[ci] fix shardformer tests. (#5255) by flybird11111
[ci] fixed ddp test (#5254) by Frank Lee
[ci] fixed booster test (#5251) by Frank Lee

Npu

[npu] change device to accelerator api (#5239) by Hongxin Liu
[npu] use extension for op builder (#5172) by Xuanlei Zhao
[npu] support triangle attention for llama (#5130) by Xuanlei Zhao
[npu] add npu support for hybrid plugin and llama (#5090) by Xuanlei Zhao
[npu] add npu support for gemini and zero (#5067) by Hongxin Liu

Pipeline

[pipeline] A more general _communicate in p2p (#5062) by Elsa Granger
[pipeline]: add p2p fallback order and fix interleaved pp deadlock (#5214) by Wenhao Chen
[pipeline]: support arbitrary batch size in forward_only mode (#5201) by Wenhao Chen
[pipeline]: fix p2p comm, add metadata cache and support llama interleaved pp (#5134) by Wenhao Chen

Format

[format] applied code formatting on changed files in pull request 5234 (#5235) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5115 (#5118) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5124 (#5125) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5088 (#5127) by github-actions[bot]
[format] applied code formatting on changed files in pull request 5067 (#5072) by github-actions[bot]
[format] applied code formatting on changed files in pull request 4926 (#5007) by github-actions[bot]

Colossal-llama-2

[Colossal-LLaMA-2] Release Colossal-LLaMA-2-13b-base model (#5224) by Tong Li
[Colossal-Llama-2] Add finetuning Colossal-Llama-2 example (#4878) by Yuanchen

Devops

[devops] update torch versoin in ci (#5217) by Hongxin Liu

Colossaleval

[ColossalEval] Support GSM, Data Leakage Evaluation and Tensor Parallel (#5169) by Yuanchen

Colossalqa

[colossalqa] fix pangu api (#5170) by Michelle
[ColossalQA] refactor server and webui & add new feature (#5138) by Michelle

Plugin

[plugin]fix 3d checkpoint load when booster boost without optimizer. (#5135) by flybird11111

Feature

[FEATURE] Add Safety Eval Datasets to ColossalEval (#5095) by Zian(Andy) Zheng
[Feature] Add document retrieval QA (#5020) by YeAnbang

Inference

[inference] refactor examples and fix schedule (#5077) by Hongxin Liu
[inference] update examples and engine (#5073) by Xu Kai
[inference] Refactor inference architecture (#5057) by Xu Kai
[Inference] Fix bug in ChatGLM2 Tensor Parallelism (#5014) by Jianghai

Hotfix/hybridengine

[hotfix/hybridengine] Fix init model with random parameters in benchmark (#5074) by Bin Jia
[hotfix/hybridengine] fix bug when tp*pp size = 1 (#5069) by Bin Jia

Misc

[misc] remove outdated submodule (#5070) by Hongxin Liu
[misc] add code owners (#5024) by Hongxin Liu

Kernels

[Kernels]added flash-decoidng of triton (#5063) by Cuiqing Li (李崔卿)
[Kernels]Update triton kernels into 2.1.0 (#5046) by Cuiqing Li (李崔卿)

Exampe

[exampe] fix llama example' loss error when using gemini plugin (#5060) by flybird11111

Pipeline,shardformer

[pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when strict=False, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017) by Elsa Granger

Full Changelog: v0.3.5...v0.3.4

hpcaitech/ColossalAI v0.3.5 Version v0.3.5 Release Today! on GitHub

What's Changed

Release

Llama

Moe

Lr-scheduler

Eval

Gemini

Fix

Checkpointio

Chat

Extension

Doc

Tests

Accelerator

Workflow

Feat

Nfc

Hotfix

Sync

Shardformer

Ci

Npu

Pipeline

Format

Colossal-llama-2

Devops

Colossaleval

Colossalqa

Plugin

Feature

Inference

Hotfix/hybridengine

Misc

Kernels

Exampe

Pipeline,shardformer

hpcaitech/ColossalAI v0.3.5
Version v0.3.5 Release Today!

on GitHub