What's Changed
Release
- [release] update version (#5912) by Hongxin Liu
Misc
- [misc] support torch2.3 (#5893) by Hongxin Liu
Compatibility
- [compatibility] support torch 2.2 (#5875) by Guangyao Zhang
Chat
- Merge pull request #5901 from hpcaitech/colossalchat by YeAnbang
- Merge pull request #5850 from hpcaitech/rlhf_SimPO by YeAnbang
Shardformer
- [ShardFormer] fix qwen2 sp (#5903) by Guangyao Zhang
- [ShardFormer] Add Ulysses Sequence Parallelism support for Command-R, Qwen2 and ChatGLM (#5897) by Guangyao Zhang
- [shardformer] DeepseekMoE support (#5871) by Haze188
- [shardformer] fix the moe (#5883) by Wang Binluo
- [Shardformer] change qwen2 modeling into gradient checkpointing style (#5874) by Jianghai
- [shardformer]delete xformers (#5859) by flybird11111
Auto parallel
- [Auto Parallel]: Speed up intra-op plan generation by 44% (#5446) by Stephan Kö
Zero
- [zero] support all-gather overlap (#5898) by Hongxin Liu
Pre-commit.ci
- [pre-commit.ci] auto fixes from pre-commit.com hooks by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5878) by pre-commit-ci[bot]
- [pre-commit.ci] pre-commit autoupdate (#5572) by pre-commit-ci[bot]
Feature
Hotfix
- [HotFix] CI,import,requirements-test for #5838 (#5892) by Runyu Lu
- [Hotfix] Fix OPT gradient checkpointing forward by Edenzzzz
- [hotfix] fix the bug that large tensor exceed the maximum capacity of TensorBucket (#5879) by Haze188
Feat
Hoxfix
- [Hoxfix] Fix CUDA_DEVICE_MAX_CONNECTIONS for comm overlap by Edenzzzz
Quant
- [quant] fix bitsandbytes version check (#5882) by Hongxin Liu
Doc
- [doc] Update llama + sp compatibility; fix dist optim table by Edenzzzz
Moe/zero
Full Changelog: v0.4.1...v0.4.0