dmlc/dgl 0.9.1 on GitHub

v0.9.1 is a minor release with the following update:

Distributed Graph Partitioning Pipeline

DGL now supports partitioning and preprocessing graph data using multiple machines. At its core is a new data format called Chunked Graph Data Format (CGDF) which stores graph data by chunks. The new pipeline processes data chunks in parallel which not only reduces the memory requirement of each machine but also significantly accelerates the entire procedure. For the same random graph with 1B nodes/5B edges, using a cluster of 8 AWS EC2 x1e.4xlarge (16 vCPU, 488GB RAM each), the new pipeline can reduce the running time to 2.7 hours and cut down the money cost by 3.7x. Read the feature highlight blog for more details.

To get started with this new feature, check out the new user guide chapter.

New Additions

A new example of SEAL model for OGBL datasets: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/seal_ogbl (#4291)
A new example of Directional Graph Substructure Networks (GSN) for OGBG-MolPCBA dataset: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/directional_GSN (#4405)
A new example of the Network In Graph Neural Network model for OGBL datasets: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ngnn (#4328)
PyTorch Multi-GPU examples are moved to dgl/examples/pytorch/multigpu/. With a new example of multi-GPU graph property prediction that can achieve 9.5x speedup on 8 GPUs. (#4385)
A new example of Heterogeneous RGCN model on OGBN-MAG dataset: https://github.com/dmlc/dgl/tree/master/examples/pytorch/ogb/ogbn-mag (#4331)
Refactored the code style of the following commonly visited examples: RGCN, GIN, GAT. (#4327) (#4280) (#4240)

System Enhancement

Two new APIs dgl.use_libxsmm and dgl.is_libxsmm_enabled to enable/disable Intel LibXSMM. (#4455)
Added a new option exclude_self to exclude self-loop edges for dgl.knn_graph. The API now supports creating a batch of KNN graphs. (#4389)
The distributed training program launched by DGL will now report error when any trainer/server fails.
Speedup DataLoader by adding CPU affinity support. (#4126)
Enable graph partition book to support canonical edge types. (#4343)
Improve the performance of CUDA SpMMCSr (#4363)
Add CUDA Weighted Neighborhood Sampling (#4064)
Enable UVA for Weighted Samplers (#4314)
Allow add data to self loop created by AddSelfLoop or add_self_loop (#4261)
Add CUDA Weighted Randomwalk Sampling (#4243)

Deprecation & Cleanup

Removed the already deprecated AsyncTransferer class. The functionality has been incorporated to DGL DataLoader. (#4505)
Removed the already deprecated num_servers and num_workers arguments of dgl.distributed.initialize. (#4284)

Dependency Update

Starting from this release, we will drop support for CUDA 10.1 and 11.0. On windows, we will further drop support for CUDA 10.2.

PyTorch ver. \ CUDA ver.	10.2	11.1	11.3	11.5	11.6
1.9	✅	✅
1.10	✅	✅	✅
1.11	✅	✅	✅	✅
1.12	✅		✅		✅

PyTorch ver. \ CUDA ver.	11.1	11.3	11.5	11.6
1.9	✅
1.10	✅	✅
1.11	✅	✅	✅
1.12		✅		✅

Linux: CentOS 7+ / Ubuntu 18.04+

PyTorch ver. \ CUDA ver. 10.2 11.1 11.3 11.5 11.6

1.9 ✅ ✅
1.10 ✅ ✅ ✅
1.11 ✅ ✅ ✅ ✅
1.12 ✅ ✅ ✅

Windows: Windows 10+/Windows server 2016+

PyTorch ver. \ CUDA ver. 11.1 11.3 11.5 11.6

1.9 ✅
1.10 ✅ ✅
1.11 ✅ ✅ ✅
1.12 ✅ ✅

Bugfixes

Fix a crash bug due to incorrect dtype in dgl.to_block() (#4487)
Fix a bug related to unpinning when tensoradaptor is not available (#4450)
Fix a bug related to pinning empty tensors and graphs (#4393)
Remove duplicate entries of CUB submodule (#4499)
Fix broken static_assert (#4342)
A bunch of fixes in edge_softmax_hetero (#4336)
Fix the default value of num_bases in RelGraphConv module (#4321)
Fix etype check in DistGraph.edge_subgraph (#4322)
Fix incorrect _bias and bias usage (#4310)
Enable DistGraph.find_edge() works with str or tuple of str (#4319)
Fix a numerical bug related to SparseAdagrad. (#4253)

dmlc/dgl 0.9.1 v0.9.1 on GitHub