github hpcaitech/ColossalAI v0.1.9
Version v0.1.9 Release Today!

latest releases: v0.3.7, v0.3.6, v0.3.5...
21 months ago

What's Changed

Zero

  • [zero] add chunk_managerV2 for all-gather chunk (#1441) by HELSON
  • [zero] add chunk size searching algorithm for parameters in different groups (#1436) by HELSON
  • [zero] add has_inf_or_nan in AgChunk; enhance the unit test of AgChunk (#1426) by HELSON
  • [zero] add unit test for AgChunk's append, close, access (#1423) by HELSON
  • [zero] add AgChunk (#1417) by HELSON
  • [zero] ZeroDDP supports controlling outputs' dtype (#1399) by ver217
  • [zero] alleviate memory usage in ZeRODDP state_dict (#1398) by HELSON
  • [zero] chunk manager allows filtering ex-large params (#1393) by ver217
  • [zero] zero optim state_dict takes only_rank_0 (#1384) by ver217

Fx

Recommendation System

Global Tensor

Hotfix

  • [hotfix] zero optim prevents calling inner optim.zero_grad (#1422) by ver217
  • [hotfix] fix CPUAdam kernel nullptr (#1410) by ver217
  • [hotfix] adapt ProcessGroup and Optimizer to ColoTensor (#1388) by HELSON
  • [hotfix] fix a running error in test_colo_checkpoint.py (#1387) by HELSON
  • [hotfix] fix some bugs during gpt2 testing (#1379) by YuliangLiu0306
  • [hotfix] fix zero optim save/load state dict (#1381) by ver217
  • [hotfix] fix zero ddp buffer cast (#1376) by ver217
  • [hotfix] fix no optimizer in save/load (#1363) by HELSON
  • [hotfix] fix megatron_init in test_gpt2.py (#1357) by HELSON
  • [hotfix] ZeroDDP use new process group (#1333) by ver217
  • [hotfix] shared model returns cpu state_dict (#1328) by ver217
  • [hotfix] fix ddp for unit test test_gpt2 (#1326) by HELSON
  • [hotfix] fix unit test test_module_spec (#1321) by HELSON
  • [hotfix] fix PipelineSharedModuleGradientHandler (#1314) by ver217
  • [hotfix] fix ColoTensor GPT2 unitest (#1309) by HELSON
  • [hotfix] add missing file (#1308) by Jiarui Fang
  • [hotfix] remove potiential circle import (#1307) by Jiarui Fang
  • [hotfix] skip some unittest due to CI environment. (#1301) by YuliangLiu0306
  • [hotfix] fix shape error in backward when using ColoTensor (#1298) by HELSON
  • [hotfix] Dist Mgr gather torch version (#1284) by Jiarui Fang

Communication

Device

Chunk

DDP

  • [DDP] test ddp state dict uses more strict threshold (#1382) by ver217

Checkpoint

  • [checkpoint] add kwargs for load_state_dict (#1374) by HELSON
  • [checkpoint] use args, kwargs in save_checkpoint, load_checkpoint (#1368) by HELSON
  • [checkpoint] sharded optim save/load grad scaler (#1350) by ver217
  • [checkpoint] use gather_tensor in checkpoint and update its unit test (#1339) by HELSON
  • [checkpoint] add ColoOptimizer checkpointing (#1316) by Jiarui Fang
  • [checkpoint] add test for bert and hotfix save bugs (#1297) by Jiarui Fang

Util

Nvme

  • [nvme] CPUAdam and HybridAdam support NVMe offload (#1360) by ver217

Colotensor

  • [colotensor] use cpu memory to store state_dict (#1367) by HELSON
  • [colotensor] add Tensor.view op and its unit test (#1343) by HELSON

Unit test

  • [unit test] add megatron init test in zero_optim (#1358) by HELSON

Docker

Doc

Refactor

  • [refactor] refactor ColoTensor's unit tests (#1340) by HELSON

Workflow

  • [workflow] update docker build workflow to use proxy (#1334) by Frank Lee
  • [workflow] update 8-gpu test to use torch 1.11 (#1332) by Frank Lee
  • [workflow] roll back to use torch 1.11 for unit testing (#1325) by Frank Lee
  • [workflow] fixed trigger condition for 8-gpu unit test (#1323) by Frank Lee
  • [workflow] updated release bdist workflow (#1318) by Frank Lee
  • [workflow] disable SHM for compatibility CI on rtx3080 (#1315) by Frank Lee
  • [workflow] updated pytorch compatibility test (#1311) by Frank Lee

Test

  • [test] removed outdated unit test for meta context (#1329) by Frank Lee

Utils

  • [utils] integrated colotensor with lazy init context (#1324) by Frank Lee

Optimizer

Full Changelog: v0.1.9...v0.1.8

Don't miss a new ColossalAI release

NewReleases is sending notifications on new releases.