hpcaitech/ColossalAI v0.1.9 on GitHub

What's Changed

Zero

[zero] add chunk_managerV2 for all-gather chunk (#1441) by HELSON
[zero] add chunk size searching algorithm for parameters in different groups (#1436) by HELSON
[zero] add has_inf_or_nan in AgChunk; enhance the unit test of AgChunk (#1426) by HELSON
[zero] add unit test for AgChunk's append, close, access (#1423) by HELSON
[zero] add AgChunk (#1417) by HELSON
[zero] ZeroDDP supports controlling outputs' dtype (#1399) by ver217
[zero] alleviate memory usage in ZeRODDP state_dict (#1398) by HELSON
[zero] chunk manager allows filtering ex-large params (#1393) by ver217
[zero] zero optim state_dict takes only_rank_0 (#1384) by ver217

Fx

[fx] add vanilla activation checkpoint search with test on resnet and densenet (#1433) by Super Daniel
[fx] modify the calculation of node_size in MetaInfoProp for activation checkpointing usages (#1425) by Super Daniel
[fx] fixed torchaudio conformer tracing (#1392) by Frank Lee
[fx] patched torch.max and data movement operator (#1391) by Frank Lee
[fx] fixed indentation error in checkpointing codegen (#1385) by Frank Lee
[fx] patched torch.full for huggingface opt (#1386) by Frank Lee
[fx] update split module pass and add customized policy (#1373) by YuliangLiu0306
[fx] add torchaudio test (#1369) by Super Daniel
[fx] Add colotracer compatibility test on torchrec (#1370) by Boyuan Yao
[fx]add gpt2 passes for pipeline performance test (#1366) by YuliangLiu0306
[fx] added activation checkpoint codegen support for torch < 1.12 (#1359) by Frank Lee
[fx] added activation checkpoint codegen (#1355) by Frank Lee
[fx] fixed apex normalization patch exception (#1352) by Frank Lee
[fx] added activation checkpointing annotation (#1349) by Frank Lee
[fx] update MetaInforProp pass to process more complex node.meta (#1344) by YuliangLiu0306
[fx] refactor tracer to trace complete graph (#1342) by YuliangLiu0306
[fx] tested the complete workflow for auto-parallel (#1336) by Frank Lee
[fx]refactor tracer (#1335) by YuliangLiu0306
[fx] recovered skipped pipeline tests (#1338) by Frank Lee
[fx] fixed compatiblity issue with torch 1.10 (#1331) by Frank Lee
[fx] fixed unit tests for torch 1.12 (#1327) by Frank Lee
[fx] add balanced policy v2 (#1251) by YuliangLiu0306
[fx] Add unit test and fix bugs for transform_mlp_pass (#1299) by XYE
[fx] added apex normalization to patched modules (#1300) by Frank Lee

Recommendation System

[FAW] export FAW in _ops (#1438) by Jiarui Fang
[FAW] move coloparam setting in test code. (#1429) by Jiarui Fang
[FAW] parallel FreqAwareEmbedding (#1424) by Jiarui Fang
[FAW] add cache manager for the cached embedding (#1419) by Jiarui Fang

Global Tensor

[tensor] add shape consistency feature to support auto spec transform (#1418) by YuliangLiu0306
[tensor]build sharding spec to replace distspec in future. (#1405) by YuliangLiu0306

Hotfix

[hotfix] zero optim prevents calling inner optim.zero_grad (#1422) by ver217
[hotfix] fix CPUAdam kernel nullptr (#1410) by ver217
[hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) by HELSON
[hotfix] fix a running error in test_colo_checkpoint.py (#1387) by HELSON
[hotfix] fix some bugs during gpt2 testing (#1379) by YuliangLiu0306
[hotfix] fix zero optim save/load state dict (#1381) by ver217
[hotfix] fix zero ddp buffer cast (#1376) by ver217
[hotfix] fix no optimizer in save/load (#1363) by HELSON
[hotfix] fix megatron_init in test_gpt2.py (#1357) by HELSON
[hotfix] ZeroDDP use new process group (#1333) by ver217
[hotfix] shared model returns cpu state_dict (#1328) by ver217
[hotfix] fix ddp for unit test test_gpt2 (#1326) by HELSON
[hotfix] fix unit test test_module_spec (#1321) by HELSON
[hotfix] fix PipelineSharedModuleGradientHandler (#1314) by ver217
[hotfix] fix ColoTensor GPT2 unitest (#1309) by HELSON
[hotfix] add missing file (#1308) by Jiarui Fang
[hotfix] remove potiential circle import (#1307) by Jiarui Fang
[hotfix] skip some unittest due to CI environment. (#1301) by YuliangLiu0306
[hotfix] fix shape error in backward when using ColoTensor (#1298) by HELSON
[hotfix] Dist Mgr gather torch version (#1284) by Jiarui Fang

Communication

[communication] add p2p_v2.py to support communication with List[Any] (#1407) by Kirigaya Kazuto

Device

[device] add DeviceMesh class to support logical device layout (#1394) by YuliangLiu0306

Chunk

[chunk] add PG check for tensor appending (#1383) by Jiarui Fang

DDP

[DDP] test ddp state dict uses more strict threshold (#1382) by ver217

Checkpoint

[checkpoint] add kwargs for load_state_dict (#1374) by HELSON
[checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368) by HELSON
[checkpoint] sharded optim save/load grad scaler (#1350) by ver217
[checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) by HELSON
[checkpoint] add ColoOptimizer checkpointing (#1316) by Jiarui Fang
[checkpoint] add test for bert and hotfix save bugs (#1297) by Jiarui Fang

Util

[util] standard checkpoint function naming (#1377) by Frank Lee

Nvme

[nvme] CPUAdam and HybridAdam support NVMe offload (#1360) by ver217

Colotensor

[colotensor] use cpu memory to store state_dict (#1367) by HELSON
[colotensor] add Tensor.view op and its unit test (#1343) by HELSON

Unit test

[unit test] add megatron init test in zero_optim (#1358) by HELSON

Docker

[docker] add tensornvme in docker (#1354) by ver217

Doc

[doc] update rst and docstring (#1351) by ver217

Refactor

[refactor] refactor ColoTensor's unit tests (#1340) by HELSON

Workflow

[workflow] update docker build workflow to use proxy (#1334) by Frank Lee
[workflow] update 8-gpu test to use torch 1.11 (#1332) by Frank Lee
[workflow] roll back to use torch 1.11 for unit testing (#1325) by Frank Lee
[workflow] fixed trigger condition for 8-gpu unit test (#1323) by Frank Lee
[workflow] updated release bdist workflow (#1318) by Frank Lee
[workflow] disable SHM for compatibility CI on rtx3080 (#1315) by Frank Lee
[workflow] updated pytorch compatibility test (#1311) by Frank Lee

Test

[test] removed outdated unit test for meta context (#1329) by Frank Lee

Utils

[utils] integrated colotensor with lazy init context (#1324) by Frank Lee

Optimizer

[Optimizer] Remove useless ColoOptimizer (#1312) by Jiarui Fang
[Optimizer] polish the init method of ColoOptimizer (#1310) by Jiarui Fang

Full Changelog: v0.1.9...v0.1.8

hpcaitech/ColossalAI v0.1.9 Version v0.1.9 Release Today! on GitHub

What's Changed

Zero

Fx

Recommendation System

Global Tensor

Hotfix

Communication

Device

Chunk

DDP

Checkpoint

Util

Nvme

Colotensor

Unit test

Docker

Doc

Refactor

Workflow

Test

Utils

Optimizer

hpcaitech/ColossalAI v0.1.9
Version v0.1.9 Release Today!

on GitHub