v0.9.1 is a minor release with the following update:
Distributed Graph Partitioning Pipeline
DGL now supports partitioning and preprocessing graph data using multiple machines. At its core is a new data format called Chunked Graph Data Format (CGDF) which stores graph data by chunks. The new pipeline processes data chunks in parallel which not only reduces the memory requirement of each machine but also significantly accelerates the entire procedure. For the same random graph with 1B nodes/5B edges, using a cluster of 8 AWS EC2 x1e.4xlarge (16 vCPU, 488GB RAM each), the new pipeline can reduce the running time to 2.7 hours and cut down the money cost by 3.7x. Read the feature highlight blog for more details.
To get started with this new feature, check out the new user guide chapter.
New Additions
- A new example of SEAL model for OGBL datasets: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/seal_ogbl (#4291)
- A new example of Directional Graph Substructure Networks (GSN) for OGBG-MolPCBA dataset: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/directional_GSN (#4405)
- A new example of the Network In Graph Neural Network model for OGBL datasets: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ngnn (#4328)
- PyTorch Multi-GPU examples are moved to
dgl/examples/pytorch/multigpu/
. With a new example of multi-GPU graph property prediction that can achieve 9.5x speedup on 8 GPUs. (#4385) - A new example of Heterogeneous RGCN model on OGBN-MAG dataset: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ogbn-mag (#4331)
- Refactored the code style of the following commonly visited examples: RGCN, GIN, GAT. (#4327) (#4280) (#4240)
System Enhancement
- Two new APIs
dgl.use_libxsmm
anddgl.is_libxsmm_enabled
to enable/disable Intel LibXSMM. (#4455) - Added a new option
exclude_self
to exclude self-loop edges fordgl.knn_graph
. The API now supports creating a batch of KNN graphs. (#4389) - The distributed training program launched by DGL will now report error when any trainer/server fails.
- Speedup DataLoader by adding CPU affinity support. (#4126)
- Enable graph partition book to support canonical edge types. (#4343)
- Improve the performance of CUDA SpMMCSr (#4363)
- Add CUDA Weighted Neighborhood Sampling (#4064)
- Enable UVA for Weighted Samplers (#4314)
- Allow add data to self loop created by AddSelfLoop or add_self_loop (#4261)
- Add CUDA Weighted Randomwalk Sampling (#4243)
Deprecation & Cleanup
- Removed the already deprecated
AsyncTransferer
class. The functionality has been incorporated to DGL DataLoader. (#4505) - Removed the already deprecated
num_servers
andnum_workers
arguments ofdgl.distributed.initialize
. (#4284)
Dependency Update
Starting from this release, we will drop support for CUDA 10.1 and 11.0. On windows, we will further drop support for CUDA 10.2.
Linux: CentOS 7+ / Ubuntu 18.04+
PyTorch ver. \ CUDA ver. | 10.2 | 11.1 | 11.3 | 11.5 | 11.6 |
---|---|---|---|---|---|
1.9 | ✅ | ✅ | |||
1.10 | ✅ | ✅ | ✅ | ||
1.11 | ✅ | ✅ | ✅ | ✅ | |
1.12 | ✅ | ✅ | ✅ |
Windows: Windows 10+/Windows server 2016+
PyTorch ver. \ CUDA ver. | 11.1 | 11.3 | 11.5 | 11.6 |
---|---|---|---|---|
1.9 | ✅ | |||
1.10 | ✅ | ✅ | ||
1.11 | ✅ | ✅ | ✅ | |
1.12 | ✅ | ✅ |
Bugfixes
- Fix a crash bug due to incorrect dtype in dgl.to_block() (#4487)
- Fix a bug related to unpinning when tensoradaptor is not available (#4450)
- Fix a bug related to pinning empty tensors and graphs (#4393)
- Remove duplicate entries of CUB submodule (#4499)
- Fix broken static_assert (#4342)
- A bunch of fixes in edge_softmax_hetero (#4336)
- Fix the default value of
num_bases
in RelGraphConv module (#4321) - Fix etype check in DistGraph.edge_subgraph (#4322)
- Fix incorrect _bias and bias usage (#4310)
- Enable DistGraph.find_edge() works with str or tuple of str (#4319)
- Fix a numerical bug related to SparseAdagrad. (#4253)