hpcaitech/ColossalAI v0.1.7
Version v0.1.7 Released Today

on GitHub

latest releases: v0.4.6, v0.4.5, v0.4.4...

2 years ago

Version v0.1.7 Released Today

Highlights

Started torch.fx for auto-parallel training
Update the zero mechanism with ColoTensor
Fixed various bugs

What's Changed

Hotfix

[hotfix] prevent nested ZeRO (#1140) by ver217
[hotfix]fix bugs caused by refactored pipeline (#1133) by YuliangLiu0306
[hotfix] fix param op hook (#1131) by ver217
[hotfix] fix zero init ctx numel (#1128) by ver217
[hotfix]change to fit latest p2p (#1100) by YuliangLiu0306
[hotfix] fix chunk comm src rank (#1072) by ver217

Zero

[zero] avoid zero hook spam by changing log to debug level (#1137) by Frank Lee
[zero] added error message to handle on-the-fly import of torch Module class (#1135) by Frank Lee
[zero] fixed api consistency (#1098) by Frank Lee
[zero] zero optim copy chunk rather than copy tensor (#1070) by ver217

Optim

[optim] refactor fused sgd (#1134) by ver217

Ddp

[ddp] add save/load state dict for ColoDDP (#1127) by ver217
[ddp] add set_params_to_ignore for ColoDDP (#1122) by ver217
[ddp] supported customized torch ddp configuration (#1123) by Frank Lee

Pipeline

[pipeline]support List of Dict data (#1125) by YuliangLiu0306
[pipeline] supported more flexible dataflow control for pipeline parallel training (#1108) by Frank Lee
[pipeline] refactor the pipeline module (#1087) by Frank Lee

Fx

[fx]add autoparallel passes (#1121) by YuliangLiu0306
[fx] added unit test for coloproxy (#1119) by Frank Lee
[fx] added coloproxy (#1115) by Frank Lee

Gemini

[gemini] gemini mgr supports "cpu" placement policy (#1118) by ver217
[gemini] zero supports gemini (#1093) by ver217

Test

[test] fixed hybrid parallel test case on 8 GPUs (#1106) by Frank Lee
[test] skip tests when not enough GPUs are detected (#1090) by Frank Lee
[test] ignore 8 gpu test (#1080) by Frank Lee

Release

[release] update version.txt (#1103) by Frank Lee

Tensor

[tensor] refactor param op hook (#1097) by ver217
[tensor] refactor chunk mgr and impl MemStatsCollectorV2 (#1077) by ver217
[Tensor] fix equal assert (#1091) by Ziyue Jiang
[Tensor] 1d row embedding (#1075) by Ziyue Jiang
[tensor] chunk manager monitor mem usage (#1076) by ver217
[Tensor] fix optimizer for CPU parallel (#1069) by Ziyue Jiang
[Tensor] add hybrid device demo and fix bugs (#1059) by Ziyue Jiang

Amp

[amp] included dict for type casting of model output (#1102) by Frank Lee

Workflow

[workflow] fixed 8-gpu test workflow (#1101) by Frank Lee
[workflow] added regular 8 GPU testing (#1099) by Frank Lee
[workflow] disable p2p via shared memory on non-nvlink machine (#1086) by Frank Lee

Engine

[engine] fixed empty op hook check (#1096) by Frank Lee

Doc

[doc] added documentation to chunk and chunk manager (#1094) by Frank Lee

Context

[context] support lazy init of module (#1088) by Frank Lee
[context] maintain the context object in with statement (#1073) by Frank Lee

Refactory

[refactory] add nn.parallel module (#1068) by Jiarui Fang

Cudnn

[cudnn] set False to cudnn benchmark by default (#1063) by Frank Lee

Full Changelog: v0.1.7...v0.1.6

Check out latest releases or
releases around hpcaitech/ColossalAI v0.1.7

Don't miss a new ColossalAI release

NewReleases is sending notifications on new releases.

Get notifications