github hpcaitech/ColossalAI v0.3.8
Version v0.3.8 Release Today!

latest releases: v0.4.0, v0.3.9
one month ago

What's Changed

Release

Fix/example

Gemini

  • Merge pull request #5749 from hpcaitech/prefetch by botbw
  • Merge pull request #5754 from Hz188/prefetch by botbw
  • [Gemini] add some code for reduce-scatter overlap, chunk prefetch in llama benchmark. (#5751) by Haze188
  • [gemini] async grad chunk reduce (all-reduce&reduce-scatter) (#5713) by botbw
  • Merge pull request #5733 from Hz188/feature/prefetch by botbw
  • Merge pull request #5731 from botbw/prefetch by botbw
  • [gemini] init auto policy prefetch by hxwang
  • Merge pull request #5722 from botbw/prefetch by botbw
  • [gemini] maxprefetch means maximum work to keep by hxwang
  • [gemini] use compute_chunk to find next chunk by hxwang
  • [gemini] prefetch chunks by hxwang
  • [gemini]remove registered gradients hooks (#5696) by flybird11111

Chore

  • [chore] refactor profiler utils by hxwang
  • [chore] remove unnecessary assert since compute list might not be recorded by hxwang
  • [chore] remove unnecessary test & changes by hxwang
  • Merge pull request #5738 from botbw/prefetch by Haze188
  • [chore] fix init error by hxwang
  • [chore] Update placement_policy.py by botbw
  • [chore] remove debugging info by hxwang
  • [chore] remove print by hxwang
  • [chore] refactor & sync by hxwang
  • [chore] sync by hxwang

Bug

Bugs

  • [bugs] fix args.profile=False DummyProfiler errro by genghaozhe

Inference

Feature

  • [Feature] auto-cast optimizers to distributed version (#5746) by Edenzzzz
  • [Feature] Distributed optimizers: Lamb, Galore, CAME and Adafactor (#5694) by Edenzzzz
  • Merge pull request #5588 from hpcaitech/feat/online-serving by Jianghai
  • [Feature] qlora support (#5586) by linsj20

Example

Colossal-inference

  • [Colossal-Inference] (v0.1.0) Merge pull request #5739 from hpcaitech/feature/colossal-infer by Yuanheng Zhao

Nfc

Ci

Sync

Shardformer

  • [Shardformer] Add parallel output for shardformer models(bloom, falcon) (#5702) by Haze188
  • [Shardformer]fix the num_heads assert for llama model and qwen model (#5704) by Wang Binluo
  • [Shardformer] Support the Qwen2 model (#5699) by Wang Binluo
  • Merge pull request #5684 from wangbluo/parallel_output by Wang Binluo
  • [Shardformer] add assert for num of attention heads divisible by tp_size (#5670) by Wang Binluo
  • [shardformer] support bias_gelu_jit_fused for models (#5647) by flybird11111

Pre-commit.ci

Doc

Fix/inference

Lazy

Misc

Colossal-llama

  • [Colossal-LLaMA] Fix sft issue for llama2 (#5719) by Tong Li

Fix

  • [Fix] Llama3 Load/Omit CheckpointIO Temporarily (#5717) by Runyu Lu
  • [Fix] Fix Inference Example, Tests, and Requirements (#5688) by Yuanheng Zhao
  • [Fix] Fix & Update Inference Tests (compatibility w/ main) by Yuanheng Zhao

Feat

Hotfix

Inference/feat

  • [Inference/Feat] Add convert_fp8 op for fp8 test in the future (#5706) by 傅剑寒
  • [Inference/Feat] Add quant kvcache interface (#5700) by 傅剑寒
  • [Inference/Feat] Add quant kvcache support for decode_kv_cache_memcpy (#5686) by 傅剑寒
  • [Inference/Feat] Add kvcache quant support for fused_rotary_embedding_cache_copy (#5680) by 傅剑寒
  • [Inference/Feat] Feat quant kvcache step2 (#5674) by 傅剑寒

Online server

  • [Online Server] Chat Api for streaming and not streaming response (#5470) by Jianghai

Zero

Kernel

Inference/kernel

  • [Inference/Kernel] refactor kvcache manager and rotary_embedding and kvcache_memcpy oper… (#5663) by Steve Luo

Lowlevelzero

Lora

Devops

Full Changelog: v0.3.8...v0.3.7

Don't miss a new ColossalAI release

NewReleases is sending notifications on new releases.