What's Changed
Embedding
- [embedding] cache_embedding small improvement (#1564) by CsRic
- [embedding] polish parallel embedding tablewise (#1545) by Jiarui Fang
- [embedding] freq_aware_embedding: add small functions for caller application (#1537) by CsRic
- [embedding] fix a bug in table wise sharding (#1538) by Jiarui Fang
- [embedding] tablewise sharding polish (#1535) by Jiarui Fang
- [embedding] add tablewise sharding for FAW (#1526) by CsRic
Nfc
- [NFC] polish test component gpt code style (#1567) by アマデウス
- [NFC] polish doc style for ColoTensor (#1457) by Jiarui Fang
- [NFC] global vars should be upper case (#1456) by Jiarui Fang
Pipeline/tuning
- [pipeline/tuning] improve dispatch performance both time and space cost (#1544) by Kirigaya Kazuto
Fx
- [fx] provide a stable but not accurate enough version of profiler. (#1547) by Super Daniel
- [fx] Add common node in model linearize (#1542) by Boyuan Yao
- [fx] support meta tracing for aten level computation graphs like functorch. (#1536) by Super Daniel
- [fx] Modify solver linearize and add corresponding test (#1531) by Boyuan Yao
- [fx] add test for meta tensor. (#1527) by Super Daniel
- [fx]patch nn.functional convolution (#1528) by YuliangLiu0306
- [fx] Fix wrong index in annotation and minimal flops in ckpt solver (#1521) by Boyuan Yao
- [fx] hack torch_dispatch for meta tensor and autograd. (#1515) by Super Daniel
- [fx] Fix activation codegen dealing with checkpointing first op (#1510) by Boyuan Yao
- [fx] fix the discretize bug (#1506) by Boyuan Yao
- [fx] fix wrong variable name in solver rotor (#1502) by Boyuan Yao
- [fx] Add activation checkpoint solver rotor (#1496) by Boyuan Yao
- [fx] add more op patches for profiler and error message for unsupported ops. (#1495) by Super Daniel
- [fx] fixed adapative pooling size concatenation error (#1489) by Frank Lee
- [fx] add profiler for fx nodes. (#1480) by Super Daniel
- [fx] Fix ckpt functions' definitions in forward (#1476) by Boyuan Yao
- [fx] fix MetaInfoProp for incorrect calculations and add detections for inplace op. (#1466) by Super Daniel
- [fx] add rules to linearize computation graphs for searching. (#1461) by Super Daniel
- [fx] Add use_reentrant=False to checkpoint in codegen (#1463) by Boyuan Yao
- [fx] fix test and algorithm bugs in activation checkpointing. (#1451) by Super Daniel
- [fx] Use colossalai checkpoint and add offload recognition in codegen (#1439) by Boyuan Yao
- [fx] fix the false interpretation of algorithm 3 in https://arxiv.org/abs/1604.06174. (#1446) by Super Daniel
Autoparallel
- [autoparallel]add backward cost info into strategies (#1524) by YuliangLiu0306
- [autoparallel] support fucntion in operator handler (#1529) by YuliangLiu0306
- [autoparallel] change the merge node logic (#1533) by YuliangLiu0306
- [autoparallel] added liveness analysis (#1516) by Frank Lee
- [autoparallel] add more sharding strategies to conv (#1487) by YuliangLiu0306
- [autoparallel] add cost graph class (#1481) by YuliangLiu0306
- [autoparallel] added namespace constraints (#1490) by Frank Lee
- [autoparallel] integrate auto parallel with torch fx (#1479) by Frank Lee
- [autoparallel] added dot handler (#1475) by Frank Lee
- [autoparallel] introduced baseclass for op handler and reduced code redundancy (#1471) by Frank Lee
- [autoparallel] standardize the code structure (#1469) by Frank Lee
- [autoparallel] Add conv handler to generate strategies and costs info for conv (#1467) by YuliangLiu0306
Utils
- [utils] refactor parallel layers checkpoint and bcast model on loading checkpoint (#1548) by ver217
- [utils] optimize partition_tensor_parallel_state_dict (#1546) by ver217
- [utils] Add use_reetrant=False in utils.activation_checkpoint (#1460) by Boyuan Yao
- [utils] Impl clip_grad_norm for ColoTensor and ZeroOptimizer (#1442) by ver217
Hotfix
- [hotfix] change namespace for meta_trace. (#1541) by Super Daniel
- [hotfix] fix init context (#1543) by ver217
- [hotfix] avoid conflict of meta registry with torch 1.13.0. (#1530) by Super Daniel
- [hotfix] fix coloproxy typos. (#1519) by Super Daniel
Pipeline/pipleline_process_group
- [pipeline/pipleline_process_group] finish PipelineProcessGroup to manage local abd global rank in TP,DP and PP (#1508) by Kirigaya Kazuto
Doc
- [doc] docstring for FreqAwareEmbeddingBag (#1525) by Jiarui Fang
- [doc] update readme with the new xTrimoMultimer project (#1477) by Sze-qq
- [doc] update docstring in ProcessGroup (#1468) by Jiarui Fang
- [Doc] add more doc for ColoTensor. (#1458) by Jiarui Fang
Autoparellel
- [autoparellel]add strategies constructor (#1505) by YuliangLiu0306
Faw
- [FAW] cpu caching operations (#1520) by Jiarui Fang
- [FAW] refactor reorder() for CachedParamMgr (#1514) by Jiarui Fang
- [FAW] LFU initialize with dataset freq (#1513) by Jiarui Fang
- [FAW] shrink freq_cnter size (#1509) by CsRic
- [FAW] remove code related to chunk (#1501) by Jiarui Fang
- [FAW] add more docs and fix a warning (#1500) by Jiarui Fang
- [FAW] FAW embedding use LRU as eviction strategy intialized with dataset stats (#1494) by CsRic
- [FAW] LFU cache for the FAW by CsRic
- [FAW] init an LFU implementation for FAW (#1488) by Jiarui Fang
- [FAW] reorganize the inheritance struct of FreqCacheEmbedding (#1448) by Geng Zhang
Pipeline/rpc
- [pipeline/rpc] update outstanding mechanism | optimize dispatching strategy (#1497) by Kirigaya Kazuto
- [pipeline/rpc] implement distributed optimizer | test with assert_close (#1486) by Kirigaya Kazuto
- [pipeline/rpc] support interleaving | fix checkpoint bug | change logic when dispatch data in work_list to ensure steady 1F1B (#1483) by Kirigaya Kazuto
- [pipeline/rpc] implement a demo for PP with cuda rpc framework (#1470) by Kirigaya Kazuto
Tensor
- [tensor]add 1D device mesh (#1492) by YuliangLiu0306
- [tensor] support runtime ShardingSpec apply (#1453) by YuliangLiu0306
- [tensor] shape consistency generate transform path and communication cost (#1435) by YuliangLiu0306
- [tensor] added linear implementation for the new sharding spec (#1416) by Frank Lee
Fce
- [FCE] update interface for frequency statistics in FreqCacheEmbedding (#1462) by Geng Zhang
Workflow
Test
Engin/schedule
- [engin/schedule] use p2p_v2 to recontruct pipeline_schedule (#1408) by Kirigaya Kazuto
Full Changelog: v0.1.10...v0.1.9