Main Features
- Enhance ColoTensor and build a demo to train BERT (from hugging face) using Tensor Parallelism without modifying model.
What's Changed
ColoTensor
- [Tensor] add ColoTensor TP1Dcol Embedding by @Wesley-Jzy in #899
- [Tensor] add embedding tp1d row by @Wesley-Jzy in #904
- [Tensor] update pytest.mark.parametrize in tensor tests by @Wesley-Jzy in #913
- [Tensor] init ColoParameter by @feifeibear in #914
- [Tensor] add a basic bert. by @Wesley-Jzy in #911
- [Tensor] polish model test by @feifeibear in #915
- [Tensor] fix test_model by @Wesley-Jzy in #916
- [Tensor] add 1d vocab loss by @Wesley-Jzy in #918
- [Graph] building computing graph with ColoTensor, Linear only by @feifeibear in #917
- [Tensor] add from_pretrained support and bert pretrained test by @Wesley-Jzy in #921
- [Tensor] test pretrain loading on multi-process by @feifeibear in #922
- [tensor] hijack addmm for colo tensor by @ver217 in #923
- [tensor] colo tensor overrides mul by @ver217 in #927
- [Tensor] simplify named param by @Wesley-Jzy in #928
- [Tensor] fix init context by @Wesley-Jzy in #931
- [Tensor] add optimizer to bert test by @Wesley-Jzy in #933
- [tensor] design DistSpec and DistSpecManager for ColoTensor by @ver217 in #934
- [Tensor] add DistSpec for loss and test_model by @Wesley-Jzy in #947
- [tensor] derive compute pattern from dist spec by @ver217 in #971
Pipeline Parallelism
- [pipelinable]use pipelinable to support GPT model. by @YuliangLiu0306 in #903
CI
- [CI] add CI for releasing bdist wheel by @ver217 in #901
- [CI] fix release bdist CI by @ver217 in #902
- [ci] added wheel build scripts by @FrankLeeeee in #910
Misc
- [Bot] Synchronize Submodule References by @github-actions in #907
- [Bot] Synchronize Submodule References by @github-actions in #912
- [setup] update cuda ext cc flags by @ver217 in #919
- [setup] support more cuda architectures by @ver217 in #920
- [NFC] update results on a single GPU, highlight quick view by @binmakeswell in #981
Full Changelog: v0.1.4...v0.1.5