hpcaitech/ColossalAI v0.3.8 on GitHub

What's Changed

Release

[release] update version (#5752) by Hongxin Liu

Fix/example

[Fix/Example] Fix Llama Inference Loading Data Type (#5763) by Yuanheng Zhao

Gemini

Merge pull request #5749 from hpcaitech/prefetch by botbw
Merge pull request #5754 from Hz188/prefetch by botbw
[Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
[gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
Merge pull request #5733 from Hz188/feature/prefetch by botbw
Merge pull request #5731 from botbw/prefetch by botbw
[gemini] init auto policy prefetch by hxwang
Merge pull request #5722 from botbw/prefetch by botbw
[gemini] maxprefetch means maximum work to keep by hxwang
[gemini] use compute_chunk to find next chunk by hxwang
[gemini] prefetch chunks by hxwang
[gemini]remove registered gradients hooks (#5696) by flybird11111

Chore

[chore] refactor profiler utils by hxwang
[chore] remove unnecessary assert since compute list might not be recorded by hxwang
[chore] remove unnecessary test & changes by hxwang
Merge pull request #5738 from botbw/prefetch by Haze188
[chore] fix init error by hxwang
[chore] Update placement_policy.py by botbw
[chore] remove debugging info by hxwang
[chore] remove print by hxwang
[chore] refactor & sync by hxwang
[chore] sync by hxwang

Bug

[bug] continue fix by hxwang
[bug] workaround for idx fix by hxwang
[bug] fix early return (#5740) by botbw

Bugs

[bugs] fix args.profile=False DummyProfiler errro by genghaozhe

Inference

[inference] Fix running time of test_continuous_batching (#5750) by Yuanheng Zhao
[Inference]Fix readme and example for API server (#5742) by Jianghai
[inference] release (#5747) by binmakeswell
[Inference] Fix Inference Generation Config and Sampling (#5710) by Yuanheng Zhao
[Inference] Fix API server, test and example (#5712) by Jianghai
[Inference] Delete duplicated copy_vector (#5716) by 傅剑寒
[Inference]Adapt repetition_penalty and no_repeat_ngram_size (#5708) by yuehuayingxueluo
[Inference] Add example test_ci script by CjhHa1
[Inference] Fix bugs and docs for feat/online-server (#5598) by Jianghai
[Inference] resolve rebase conflicts by CjhHa1
[Inference] Finish Online Serving Test, add streaming output api, continuous batching test and example (#5432) by Jianghai
[Inference] ADD async and sync Api server using FastAPI (#5396) by Jianghai
[Inference] Support the logic related to ignoring EOS token (#5693) by yuehuayingxueluo
[Inference]Adapt temperature processing logic (#5689) by yuehuayingxueluo
[Inference] Remove unnecessary float4_ and rename float8_ to float8 (#5679) by Steve Luo
[Inference] Fix quant bits order (#5681) by 傅剑寒
[inference]Add alibi to flash attn function (#5678) by yuehuayingxueluo
[Inference] Adapt Baichuan2-13B TP (#5659) by yuehuayingxueluo

Feature

[Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
[Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
[Feature] qlora support (#5586) by linsj20

Example

[example] add profile util for llama by hxwang
[example] Update Inference Example (#5725) by Yuanheng Zhao

Colossal-inference

[Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao

Nfc

[NFC] fix requirements (#5744) by Yuanheng Zhao
[NFC] Fix code factors on inference triton kernels (#5743) by Yuanheng Zhao

Ci

[ci] Temporary fix for build on pr (#5741) by Yuanheng Zhao
[ci] Fix example tests (#5714) by Yuanheng Zhao

Sync

Merge pull request #5737 from yuanheng-zhao/inference/sync/main by Yuanheng Zhao
[sync] Sync feature/colossal-infer with main by Yuanheng Zhao
[Sync] Update from main to feature/colossal-infer (Merge pull request #5685) by Yuanheng Zhao
[sync] resolve conflicts of merging main by Yuanheng Zhao

Shardformer

[Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
[Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
[Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
[Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
[shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111

Pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
[pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]

Doc

[doc] Update Inference Readme (#5736) by Yuanheng Zhao

Fix/inference

[Fix/Inference] Add unsupported auto-policy error message (#5730) by Yuanheng Zhao

Lazy

[lazy] fix lazy cls init (#5720) by flybird11111

Misc

[misc] Update PyTorch version in docs (#5724) by binmakeswell
[misc] Update PyTorch version in docs (#5711) by Edenzzzz
[misc] Add an existing issue checkbox in bug report (#5691) by Edenzzzz
[misc] refactor launch API and tensor constructor (#5666) by Hongxin Liu

Colossal-llama

[Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li

Fix

[Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
[Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
[Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao

Feat

[Feat]Inference RPC Server Support (#5705) by Runyu Lu

Hotfix

[hotfix] fix inference typo (#5438) by hugo-syn
[hotfix] fix OpenMOE example import path (#5697) by Yuanheng Zhao
[hotfix] Fix KV Heads Number Assignment in KVCacheManager (#5695) by Yuanheng Zhao

Inference/feat

[Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706) by 傅剑寒
[Inference/Feat] Add quant kvcache interface (#5700) by 傅剑寒
[Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686) by 傅剑寒
[Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680) by 傅剑寒
[Inference/Feat] Feat quant kvcache step2 (#5674) by 傅剑寒

Online server

[Online Server] Chat Api for streaming and not streaming response (#5470) by Jianghai

Zero

[zero]remove registered gradients hooks (#5687) by flybird11111

Kernel

[kernel] Support New KCache Layout - Triton Kernel (#5677) by Yuanheng Zhao

Inference/kernel

[Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) by Steve Luo

Lowlevelzero

[LowLevelZero] low level zero support lora (#5153) by flybird11111

Lora

[lora] add lora APIs for booster, support lora for TorchDDP (#4981) by Baizhou Zhang

Devops

[devops] fix release docker ci (#5665) by Hongxin Liu

Full Changelog: v0.3.8...v0.3.7

hpcaitech/ColossalAI v0.3.8 Version v0.3.8 Release Today! on GitHub

What's Changed

Release

Fix/example

Gemini

Chore

Bug

Bugs

Inference

Feature

Example

Colossal-inference

Nfc

Ci

Sync

Shardformer

Pre-commit.ci

Doc

Fix/inference

Lazy

Misc

Colossal-llama

Fix

Feat

Hotfix

Inference/feat

Online server

Zero

Kernel

Inference/kernel

Lowlevelzero

Lora

Devops

hpcaitech/ColossalAI v0.3.8
Version v0.3.8 Release Today!

on GitHub