What's Changed
Release
Doc
- [doc] moved doc test command to bottom (#3075) by Frank Lee
- [doc] specified operating system requirement (#3019) by Frank Lee
- [doc] update nvme offload doc (#3014) by ver217
- [doc] add ISC tutorial (#2997) by binmakeswell
- [doc] add deepspeed citation and copyright (#2996) by ver217
- [doc] added reference to related works (#2994) by Frank Lee
- [doc] update news (#2983) by binmakeswell
- [doc] fix chatgpt inference typo (#2964) by binmakeswell
- [doc] add env scope (#2933) by binmakeswell
- [doc] added readme for documentation (#2935) by Frank Lee
- [doc] removed read-the-docs (#2932) by Frank Lee
- [doc] update installation for GPT (#2922) by binmakeswell
- [doc] add os scope, update tutorial install and tips (#2914) by binmakeswell
- [doc] fix GPT tutorial (#2860) by dawei-wang
- [doc] fix typo in opt inference tutorial (#2849) by Zheng Zeng
- [doc] update OPT serving (#2804) by binmakeswell
- [doc] update example and OPT serving link (#2769) by binmakeswell
- [doc] add opt service doc (#2747) by Frank Lee
- [doc] fixed a typo in GPT readme (#2736) by cloudhuang
- [doc] updated documentation version list (#2730) by Frank Lee
Workflow
- [workflow] fixed doc build trigger condition (#3072) by Frank Lee
- [workflow] supported conda package installation in doc test (#3028) by Frank Lee
- [workflow] fixed the post-commit failure when no formatting needed (#3020) by Frank Lee
- [workflow] added auto doc test on PR (#2929) by Frank Lee
- [workflow] moved pre-commit to post-commit (#2895) by Frank Lee
Booster
Example
- [example] fix redundant note (#3065) by binmakeswell
- [example] fixed opt model downloading from huggingface by Tomek
- [example] add LoRA support (#2821) by Haofan Wang
Autochunk
- [autochunk] refactor chunk memory estimation (#2762) by Xuanlei Zhao
Chatgpt
- [chatgpt] change critic input as state (#3042) by wenjunyang
- [chatgpt] fix readme (#3025) by BlueRum
- [chatgpt] Add saving ckpt callback for PPO (#2880) by LuGY
- [chatgpt]fix inference model load (#2988) by BlueRum
- [chatgpt] allow shard init and display warning (#2986) by ver217
- [chatgpt] fix lora gemini conflict in RM training (#2984) by BlueRum
- [chatgpt] making experience support dp (#2971) by ver217
- [chatgpt]fix lora bug (#2974) by BlueRum
- [chatgpt] fix inference demo loading bug (#2969) by BlueRum
- [ChatGPT] fix README (#2966) by Fazzie-Maqianli
- [chatgpt]add inference example (#2944) by BlueRum
- [chatgpt]support opt & gpt for rm training (#2876) by BlueRum
- [chatgpt] Support saving ckpt in examples (#2846) by BlueRum
- [chatgpt] fix rm eval (#2829) by BlueRum
- [chatgpt] add test checkpoint (#2797) by ver217
- [chatgpt] update readme about checkpoint (#2792) by ver217
- [chatgpt] startegy add prepare method (#2766) by ver217
- [chatgpt] disable shard init for colossalai (#2767) by ver217
- [chatgpt] support colossalai strategy to train rm (#2742) by BlueRum
- [chatgpt]fix train_rm bug with lora (#2741) by BlueRum
Dtensor
- [DTensor] refactor CommSpec (#3034) by YuliangLiu0306
- [DTensor] refactor sharding spec (#2987) by YuliangLiu0306
- [DTensor] implementation of dtensor (#2946) by YuliangLiu0306
Hotfix
- [hotfix] skip auto checkpointing tests (#3029) by YuliangLiu0306
- [hotfix] add shard dim to aviod backward communication error (#2954) by YuliangLiu0306
- [hotfix]: Remove math.prod dependency (#2837) by Jiatong (Julius) Han
- [hotfix] fix autoparallel compatibility test issues (#2754) by YuliangLiu0306
- [hotfix] fix chunk size can not be divided (#2867) by HELSON
- Hotfix/auto parallel zh doc (#2820) by YuliangLiu0306
- [hotfix] add copyright for solver and device mesh (#2803) by YuliangLiu0306
- [hotfix] add correct device for fake_param (#2796) by HELSON
Revert] recover "[refactor
Format
- [format] applied code formatting on changed files in pull request 3025 (#3026) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2997 (#3008) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2933 (#2939) by github-actions[bot]
- [format] applied code formatting on changed files in pull request 2922 (#2923) by github-actions[bot]
Pipeline
- [pipeline] Add Simplified Alpa DP Partition (#2507) by Ziyue Jiang
Fx
- [fx] remove depreciated algorithms. (#2312) (#2313) by Super Daniel
Refactor
- [refactor] restructure configuration files (#2977) by Saurav Maheshkar
Kernel
Misc
Autoparallel
- [autoparallel] apply repeat block to reduce solving time (#2912) by YuliangLiu0306
- [autoparallel] find repeat blocks (#2854) by YuliangLiu0306
- [autoparallel] Patch meta information for nodes that will not be handled by SPMD solver (#2823) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.where
(#2822) by Boyuan Yao - [autoparallel] Patch meta information of
torch.tanh()
andtorch.nn.Dropout
(#2773) by Boyuan Yao - [autoparallel] Patch tensor related operations meta information (#2789) by Boyuan Yao
- [autoparallel] rotor solver refactor (#2813) by Boyuan Yao
- [autoparallel] Patch meta information of
torch.nn.Embedding
(#2760) by Boyuan Yao - [autoparallel] distinguish different parallel strategies (#2699) by YuliangLiu0306
Zero
- [zero] trivial zero optimizer refactoring (#2869) by YH
- [zero] fix wrong import (#2777) by Boyuan Yao
Cli
Triton
Nfc
- [NFC] polish colossalai/engine/schedule/_pipeline_schedule.py code style (#2744) by Michelle
- [NFC] polish code format by binmakeswell
- [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/graph_analysis.py code style (#2737) by xyupeng
- [NFC] polish colossalai/context/process_group_initializer/initializer_2d.py code style (#2726) by Zirui Zhu
- [NFC] polish colossalai/auto_parallel/tensor_shard/deprecated/op_handler/batch_norm_handler.py code style (#2728) by Zangwei Zheng
- [NFC] polish colossalai/cli/cli.py code style (#2734) by Wangbo Zhao(黑色枷锁)
Exmaple
- [exmaple] add bert and albert (#2824) by Jiarui Fang
Ci/cd
Full Changelog: v0.2.6...v0.2.5