github hpcaitech/ColossalAI v0.1.3
V0.1.3 Released!

latest releases: v0.4.6, v0.4.5, v0.4.4...
2 years ago

Overview

Here are the main improvements of this release:

  1. Gemini: Heterogeneous memory space manager
  2. Refactor the API of pipeline parallelism

What's Changed

Features

  • [zero] initialize a stateful tensor manager by @feifeibear in #614
  • [pipeline] refactor pipeline by @YuliangLiu0306 in #679
  • [zero] stateful tensor manager by @ver217 in #687
  • [zero] adapt zero hooks for unsharded module by @1SAA in #699
  • [zero] refactor memstats collector by @ver217 in #706
  • [zero] improve adaptability for not-shard parameters by @1SAA in #708
  • [zero] check whether gradients have inf and nan in gpu by @1SAA in #712
  • [refactor] refactor the memory utils by @feifeibear in #715
  • [util] support detection of number of processes on current node by @FrankLeeeee in #723
  • [utils] add synchronized cuda memory monitor by @1SAA in #740
  • [zero] refactor ShardedParamV2 by @1SAA in #742
  • [zero] add tensor placement policies by @ver217 in #743
  • [zero] use factory pattern for tensor_placement_policy by @feifeibear in #752
  • [zero] refactor memstats_collector by @1SAA in #746
  • [gemini] init genimi individual directory by @feifeibear in #754
  • refactor shard and gather operation by @1SAA in #773

Bug Fix

  • [zero] fix init bugs in zero context by @1SAA in #686
  • [hotfix] update requirements-test by @ver217 in #701
  • [hotfix] fix a bug in 3d vocab parallel embedding by @kurisusnowdeng in #707
  • [compatibility] fixed tensor parallel compatibility with torch 1.9 by @FrankLeeeee in #700
  • [hotfix]fixed bugs of assigning grad states to non leaf nodes by @Gy-Lu in #711
  • [hotfix] fix stateful tensor manager's cuda model data size by @ver217 in #710
  • [bug] fixed broken test_found_inf by @FrankLeeeee in #725
  • [util] fixed activation checkpointing on torch 1.9 by @FrankLeeeee in #719
  • [util] fixed communication API with PyTorch 1.9 by @FrankLeeeee in #721
  • [bug] removed zero installation requirements by @FrankLeeeee in #731
  • [hotfix] remove duplicated param register to stateful tensor manager by @feifeibear in #728
  • [utils] correct cpu memory used and capacity in the context of multi-process by @feifeibear in #726
  • [bug] fixed grad scaler compatibility with torch 1.8 by @FrankLeeeee in #735
  • [bug] fixed DDP compatibility with torch 1.8 by @FrankLeeeee in #739
  • [hotfix] fix memory leak in backward of sharded model by @ver217 in #741
  • [hotfix] fix initialize about zero by @ver217 in #748
  • [hotfix] fix prepare grads in sharded optim by @ver217 in #749
  • [hotfix] layernorm by @kurisusnowdeng in #750
  • [hotfix] fix auto tensor placement policy by @ver217 in #753
  • [hotfix] fix reuse_fp16_shard of sharded model by @ver217 in #756
  • [hotfix] fix test_stateful_tensor_mgr by @ver217 in #762
  • [compatibility] used backward-compatible API for global process group by @FrankLeeeee in #758
  • [hotfix] fix the ckpt hook bugs when using DDP by @Gy-Lu in #769
  • [hotfix] polish sharded optim docstr and warning by @ver217 in #770

Unit Testing

Documentation

Miscellaneous

  • [Bot] Synchronize Submodule References by @github-actions in #556
  • [Bot] Synchronize Submodule References by @github-actions in #695
  • [refactor] zero directory by @feifeibear in #724
  • [Bot] Synchronize Submodule References by @github-actions in #751

Full Changelog: v0.1.2...v0.1.3

Don't miss a new ColossalAI release

NewReleases is sending notifications on new releases.