dmlc/dgl v0.7.0 on GitHub

This is a new major release with various system optimizations, new features and enhancements, new models and bugfixes.

Important: Change on PyPI Installation

DGL pip wheels are no longer shipped on PyPI. Use the following command to install DGL with pip:

pip install dgl -f https://data.dgl.ai/wheels/repo.html for CPU.
pip install dgl-cuXX -f https://data.dgl.ai/wheels/repo.html for CUDA.
pip install --pre dgl -f https://data.dgl.ai/wheels-test/repo.html for CPU nightly builds.
pip install --pre dgl-cuXX -f https://data.dgl.ai/wheels-test/repo.html for CUDA nightly builds.

This does not impact conda installation.

GPU-based Neighbor Sampling

DGL now supports uniform neighbor sampling and MFG conversion on GPU, contributed by @nv-dlasalle from NVIDIA. Experiment for GraphSAGE on the ogbn-product graph gets a >10x speedup (reduced from 113s to 11s per epoch) on a g3.16x instance. The following docs have been updated accordingly:

A new user guide chapter Using GPU for Neighborhood Sampling about when and how to use this new feature.
The API doc of NodeDataLoader.

New Tutorials for Multi-GPU and Distributed Training

The release brings two new tutorials about multi-GPU training for node classification and graph classification, respectively. There is also a new tutorial about distributed training across multiple machines. All of them are available at https://docs.dgl.ai/.

Improved CPU Message Passing Kernel

The update includes a new CPU implementation of the core GSpMM kernel for GNN message passing, thanks to @sanchit-misra from Intel. The new kernel performs tiling on the sparse CSR matrix and leverages Intel’s LibXSMM for kernel generation, which gives an up to 4.4x speedup over the old kernel. Please read their paper https://arxiv.org/abs/2104.06700 for details.

More efficient NodeEmbedding for multi-GPU training and distributed training

DGL now utilizes NCCL to synchronize the gradients of sparse node embeddings (dgl.nn.NodeEmbedding) during training (credits to @nv-dlasalle from NVIDIA). The NCCL feature is available in both dgl.optim.SparseAdam and dgl.optim.SparseAdagrad. Experiments show a 20% speedup (reduced from 47.2s to 39.5s per epoch) on a g4dn.12xlarge (4 T4 GPU) instance for training RGCN on ogbn-mag graph. The optimization is automatically turned on when NCCL backend support is detected.

The sparse optimizers for dgl.distributed.DistEmbedding now use a synchronized gradient update strategy. We add a new optimizer dgl.distributed.optim.SparseAdam. The dgl.distributed.SparseAdagrad has been moved to dgl.distributed.optim.SparseAdagrad.

Sparse-sparse Matrix Multiplication and Addition Support

We add two new APIs dgl.adj_product_graph and dgl.adj_sum_graph that perform sparse-sparse matrix multiplications and additions as graph operations respectively. They can run with both CPU and GPU with autograd support. An example usage of these functions is Graph Transformer Networks.

PyTorch Lightning Compatibility

DGL is now compatible with PyTorch Lightning for single-GPU training or training with DistributedDataParallel. See this example of training GraphSAGE with PyTorch Lightning.

Node classification: https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_lightning.py
Unsupervised learning: https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_lightning_unsupervised.py

We thank @justusschock for making DGL DataLoaders compatible with PyTorch Lightning (#2886).

New Models

A batch of 19 new model examples are added to DGL in 0.7 bringing the total number to be 90+. Users can now use the search bar on https://www.dgl.ai/ to quickly locate the examples with tagged keywords. Below is the list of new models added.

Interaction Networks for Learning about Objects, Relations, and Physics (https://arxiv.org/abs/1612.00222.pdf) (#2794, @Ericcsr)
Multi-GPU RGAT for OGB-LSC Node Classification (#2835, @maqy1995)
Network Embedding with Completely-imbalanced Labels (https://ieeexplore.ieee.org/document/8979355) (#2813, @Fizyhsp)
Temporal Graph Networks improved (#2860, @Ericcsr)
Diffusion Convolutional Recurrent Neural Network (https://arxiv.org/abs/1707.01926) (#2858, @Ericcsr)
Gated Attention Networks for Learning on Large and Spatiotemporal Graphs (https://arxiv.org/abs/1803.07294) (#2858, @Ericcsr)
DeeperGCN (https://arxiv.org/abs/2006.07739) (#2831, @xnuohz)
Deep Graph Contrastive Representation Learning (https://arxiv.org/abs/2006.04131) (#2828, #3009, @hengruizhang98)
Graph Neural Networks Inspired by Classical Iterative Algorithms (https://arxiv.org/abs/2103.06064) (#2770, @ffttyy)
GraphSAINT (#2792) (@lt610)
Label Propagation (#2852, @xnuohz)
Combining Label Propagation and Simple Models Out-performs Graph Neural Networks (https://arxiv.org/abs/2010.13993) (#2852, @xnuohz)
GCNII (#2874, @kyawlin)
Latent Dirichlet Allocation on GPU (#2883, @yifeim)
A Heterogeneous Information Network based Cross Domain Insurance Recommendation System for Cold Start Users (#2864, @KounianhuaDu)
Five heterogeneous graph models: HetGNN/GTN/HAN/NSHE/MAGNN (#2993, @Theheavens)
New OGB-arxiv and OGB-proteins results (#3018, @Espylapiza)
Heterogeneous Graph Attention Networks with minibatch sampling (#3005, @maqy1995)
Learning Hierarchical Graph Neural Networks for Image Clustering (https://arxiv.org/abs/2107.01319) (#3087, #3105)

New Datasets

Two fake news datasets, Gossipcop and Politifact. (#2876, #2939, @kayzliu)
Two fraud datasets extracted from Yelp and Amazon. See https://arxiv.org/pdf/2008.08692.pdf and https://ponderly.github.io/pub/PCGNN_WWW2021.pdf for details. (#2876, #2908, @kayzliu)

New Functionalities

KD-Tree, Brute-force family, and NN-descent implementation of KNN (#2767, #2892, #2941) (@lygztq)
BLAS-based KNN implementation on GPU (#2868, @milesial)
A new API dgl.sample_neighbors_biased for biased neighbor sampling where each node has a tag, and each tag has its own (unnormalized) probability (#1665, #2987, @soodoshll). We also provide two helper functions sort_csr_by_tag and sort_csc_by_tag to sort the internal storage of a graph based on tags to allow such kind of neighbor sampling (#1664, @soodoshll).
Distributed sparse Adam node embedding optimizer (#2733)
Heterogeneous graph’s multi_update_all now supports user-defined cross-type reducers (#2891, @Secbone)
Add in_degrees and out_degrees supports to dgl.DistGraph (#2918)
A new API dgl.sampling.node2vec_random_walk for Node2vec random walks (#2992, @Smilexuhc)
dgl.node_subgraph, dgl.edge_subgraph, dgl.in_subgraph and dgl.out_subgraph all have a relabel_nodes argument to allow graph compaction (i.e. removing the nodes with no edges). (#2929)
Allow direct slicing of a batched graph without constructing a new data structure. (#2349, #2851, #2965)
Allow setting the distributed node embeddings with NodeEmbedding.all_set_embedding() (#3047)
Graphs can be directly created from CSR or CSC representations on either CPU or GPU (#3045). See the API doc of dgl.graph for more details.
A new dgl.reorder API to permute a graph according to RCMK, METIS or custom strategy (#3063)
dgl.nn.GraphConv now has a left normalization which divides the outgoing messages by out-degrees, equivalent to random-walk normalization (#3114)
Add a new exclude='self' to EdgeDataLoader to exclude the edges sampled in the current minibatch alone during neighbor sampling when reverse edges are not available (#3122)

Performance Optimizations

Check if a COO is sorted to avoid sync during forward/backward and parallelize sorted COO/CSR conversion. (#2645, @nv-dlasalle)
Faster uniform sampling with replacement (#2953)
Eliminating ctor & dtor & IsNullArray overheads in random walks (#2990, @AjayBrahmakshatriya)
GatedGCNConv shortcut with one edge type (#2994)
Hierarchical Partitioning in distributed training with 25% speedup (#3000, @soodoshll)
Save memory usage in node_split and edge_split during partitioning (#3132, @JingchengYu94)

Other Enhancements

Graph partitioning now returns ID mapping from old nodes/edges to new ones (#2857)
Better error message when idx_list out of bound (#2848)
Kill training jobs on remote machines in distributed training when receiving KeyboardInterrupt (#2881)
Provide a dgl.multiprocessing namespace for multiprocess training with fork and OpenMP (#2905)
GAT supports multidimensional input features (#2912)
Users can now specify graph format for distributed training (#2948)
CI now runs on Kubernetes (#2957)
to_heterogeneous(to_homogeneous(hg)) now returns the same hg. (#2958)
remove_nodes and remove_edges now preserves batch information. (#3119)

Bug Fixes

Multiprocessing sampling in distributed training hangs in Python 3.8 (#2315, #2826)
Use correct NIC for distributed training (#2798, @Tonny-Gu)
Fix potential TypeError in HGT example (#2830, @zhangtianle)
Distributed training initialization fails with graphs without node/edge data (#2366, #2838)
DGL Sparse Optimizer will crash when some DGL NodeEmbedding is not involved in the forward pass (#2856, #2859)
Fix GATConv shape issues with Residual Connections (#2867, #2921, #2922, #2947, #2962, @xieweiyi, @jxgu1016)
Moving a graph to GPU will change the default CUDA device (#2895, #2897)
Remove __len__ method to stop polluting PyCharm outputs (#2902)
Inconsistency in the typing of node types and edge types returned by load_partition (#2742, @chwan-rice)
NodeDataLoader and EdgeDataLoader now supports DistributedDataParallel with proper shuffling and batching (#2539, #2911)
Nonuniform sampling with replacement may dereference null pointer (#2942, #2943, @nv-dlasalle)
Strange behavior of bipartite_from_networkx() (#2808, #2917)
Make GCMC example compatible with torchtext 0.9+ (#2985, @alexpod1000)
dgl.to_homogenous doesn't work correctly on graphs with 0 nodes of a given type (#2870, #3011)
TU regression datasets throw errors (#2952, #3010)
RGCN generates nan in PyTorch 1.8 but not in PyTorch 1.7.x (#2760, #3013, @nv-dlasalle)
Deal with situation where num_layers equals 1 for GraphSAGE (#3066, @Wang-Yu-Qing)
Lengthen the timeout for distributed node embedding (#2966, #2967 @sojiadeshina)
Misc fixes in code and documentation (#2844, #2869, #2840, #2879, #2863, #2822, #2907, #2928, #2935, #2960, #2938, #2968, #2961, #2983, #2981, #3017, #3051, #3040, #3064, #3065, #3133, #3139) (@Theheavens, @ab-10, @yunshiuan, @moritzblum, @kayzliu, @universvm, @europeanplaice, etc.)

Deprecations

preserve_nodes argument in dgl.edge_subgraph is deprecated and renamed to relabel_nodes.