github pyg-team/pytorch_geometric 2.2.0
PyG 2.2.0: Accelerations and Scalability

latest releases: 2.5.3, 2.5.2, 2.5.1...
19 months ago

We are excited to announce the release of PyG 2.2 🎉🎉🎉

  • Highlights
  • Breaking Changes
  • Deprecations
  • Features
  • Bugfixes
  • Full Changelog

PyG 2.2 is the culmination of work from 78 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.1.0.

Highlights

pyg-lib Integration

We are proud to release and integrate pyg-lib==0.1.0 into PyG, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG (#5330, #5347, #5384, #5388).

You can install pyg-lib as described in our README.md:

pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
import pyg_lib

Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., to accelerate neighborhood sampling routines or to accelerate heterogeneous GNN execution:

  • pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG.

Screenshot 2022-11-30 at 08 44 08

  • pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types.

Screenshot 2022-11-30 at 08 44 38

GraphStore and FeatureStore Abstractions

PyG 2.2 includes numerous primitives to easily integrate with simple paradigms for scalable graph machine learning, enabling users to train GNNs on graphs far larger than the size of their machine's available memory. It does so by introducing simple, easy-to-use, and extensible abstractions of a FeatureStore and a GraphStore that plug directly into existing familiar PyG interfaces (see here for the accompanying tutorial).

feature_store = CustomFeatureStore()
feature_store['paper', 'x', None] = ...  # Add paper features
feature_store['author', 'x', None] = ...  # Add author features

graph_store = CustomGraphStore()
graph_store['edge', 'coo'] = ...  # Add edges in "COO" format

# `CustomGraphSampler` knows how to sample on `CustomGraphStore`:
graph_sampler = CustomGraphSampler(
    graph_store=graph_store,
    num_neighbors=[10, 20],
    ...
)

from torch_geometric.loader import NodeLoader
loader = NodeLoader(
    data=(feature_store, graph_store),
    node_sampler=graph_sampler,
    batch_size=20,
    input_nodes='paper',
)

for batch in loader:
    pass

Data loading and sampling routines are refactored and decomposed into torch_geometric.loader and torch_geometric.sampler modules, respectively (#5563, #5820, #5456, #5457, #5312, #5365, #5402, #5404, #5418).

Optimized and Fused Aggregations

PyG 2.2 further accelerates scatter aggregations based on CPU/GPU and with/without backward computation paths (requires torch>=1.12.0 and torch-scatter>=2.1.0) (#5232, #5241, #5353, #5386, #5399, #6051, #6052).

We also optimized the usage of nn.aggr.MultiAggregation by fusing the computation of multiple aggregations together (see here for more details) (#6036, #6040).

Here are some benchmarking results on PyTorch 1.12 (summed over 1000 runs):

Aggregators Vanilla Fusion
[sum, mean] 0.3325s 0.1996s
[sum, mean, min, max] 0.7139s 0.5037s
[sum, mean, var] 0.6849s 0.3871s
[sum, mean, var, std] 1.0955s 0.3973s

Lastly, we have incorporated "fused" GNN operators via the dgNN package, starting with a FusedGATConv implementation (#5140).

Community Sprint: Type Hints and TorchScript Support

We are running regular community sprints to get our community more involved in building PyG. Whether you are just beginning to use graph learning or have been leveraging GNNs in research or production, the community sprints welcome members of all levels with different types of projects.

We had our first community sprint on 10/12 to fully-incorporate type hints and TorchScript support over the entire code base. The goal was to improve usability and cleanliness of our codebase. We had 20 contributors participating, contributing to 120 type hints within 2 weeks, adding around 2400 lines of code (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768, #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852).

Explainability

Our second community sprint began on 11/15 with the goal to improve the explainability capabilities of PyG. With this, we introduce the torch_geometric.explain module to provide a unified set of tools to explain the predictions of a PyG model or to explain the underlying phenomenon of a dataset.

Some of the features developed in the sprint are incorporated into this release:

data = HeteroData(...)
model = HeteroGNN(...)

# Explain predictions on heterogenenous graphs for output node 10:
captum_model = to_captum_model(model, mask_type, output_idx, metadata)
inputs, additional_forward_args = to_captum_input(data.x_dict, data.edge_index_dict, mask_type)

ig = IntegratedGradients(captum_model)
ig_attr = ig.attribute(
    inputs=inputs,
    target=int(y[output_idx]),
    additional_forward_args=additional_forward_args,
    internal_batch_size=1,
)

Breaking Changes

  • Renamed drop_unconnected_nodes to drop_unconnected_node_types and drop_orig_edges to drop_orig_edge_types in AddMetapaths (#5490)

Deprecations

Features

Layers, Models and Examples

Data Loaders

Transformations

Datasets

General Improvements

Bugfixes

Full Changelog

Added
  • Extended GNNExplainer to support edge level explanations (#6056)
  • Added CPU affinitization for NodeLoader (#6005)
  • Added triplet sampling in LinkNeighborLoader (#6004)
  • Added FusedAggregation of simple scatter reductions (#6036)
  • Added a to_smiles function (#6038)
  • Added option to make normalization coefficients trainable in PNAConv (#6039)
  • Added semi_grad option in VarAggregation and StdAggregation (#6042)
  • Allow for fused aggregations in MultiAggregation (#6036, #6040)
  • Added HeteroData support for to_captum_model and added to_captum_input (#5934)
  • Added HeteroData support in RandomNodeLoader (#6007)
  • Added bipartite GraphSAGE example (#5834)
  • Added LRGBDataset to include 5 datasets from the Long Range Graph Benchmark (#5935)
  • Added a warning for invalid node and edge type names in HeteroData (#5990)
  • Added PyTorch 1.13 support (#5975)
  • Added int32 support in NeighborLoader (#5948)
  • Add dgNN support and FusedGATConv implementation (#5140)
  • Added lr_scheduler_solver and customized lr_scheduler classes (#5942)
  • Add to_fixed_size graph transformer (#5939)
  • Add support for symbolic tracing of SchNet model (#5938)
  • Add support for customizable interaction graph in SchNet model (#5919)
  • Started adding torch.sparse support to PyG (#5906, #5944, #6003)
  • Added HydroNet water cluster dataset (#5537, #5902, #5903)
  • Added explainability support for heterogeneous GNNs (#5886)
  • Added SparseTensor support to SuperGATConv (#5888)
  • Added TorchScript support for AttentiveFP (#5868)
  • Added num_steps argument to training and inference benchmarks (#5898)
  • Added torch.onnx.export support (#5877, #5997)
  • Enable VTune ITT in inference and training benchmarks (#5830, #5878)
  • Add training benchmark (#5774)
  • Added a "Link Prediction on MovieLens" Colab notebook (#5823)
  • Added custom sampler support in LightningDataModule (#5820)
  • Added a return_semantic_attention_weights argument HANConv (#5787)
  • Added disjoint argument to NeighborLoader and LinkNeighborLoader (#5775)
  • Added support for input_time in NeighborLoader (#5763)
  • Added disjoint mode for temporal LinkNeighborLoader (#5717)
  • Added HeteroData support for transforms.Constant (#5700)
  • Added np.memmap support in NeighborLoader (#5696)
  • Added assortativity that computes degree assortativity coefficient (#5587)
  • Added SSGConv layer (#5599)
  • Added shuffle_node, mask_feature and add_random_edge augmentation methdos (#5548)
  • Added dropout_path augmentation that drops edges from a graph based on random walks (#5531)
  • Add support for filling labels with dummy values in HeteroData.to_homogeneous() (#5540)
  • Added temporal_strategy option to neighbor_sample (#5576)
  • Added torch_geometric.sampler package to docs (#5563)
  • Added the DGraphFin dynamic graph dataset (#5504)
  • Added dropout_edge augmentation that randomly drops edges from a graph - the usage of dropout_adj is now deprecated (#5495)
  • Added dropout_node augmentation that randomly drops nodes from a graph (#5481)
  • Added AddRandomMetaPaths that adds edges based on random walks along a metapath (#5397)
  • Added WLConvContinuous for performing WL refinement with continuous attributes (#5316)
  • Added print_summary method for the torch_geometric.data.Dataset interface (#5438)
  • Added sampler support to LightningDataModule (#5456, #5457)
  • Added official splits to MalNetTiny dataset (#5078)
  • Added IndexToMask and MaskToIndex transforms (#5375, #5455)
  • Added FeaturePropagation transform (#5387)
  • Added PositionalEncoding (#5381)
  • Consolidated sampler routines behind torch_geometric.sampler, enabling ease of extensibility in the future (#5312, #5365, #5402, #5404), #5418)
  • Added pyg-lib neighbor sampling (#5384, #5388)
  • Added pyg_lib.segment_matmul integration within HeteroLinear (#5330, #5347))
  • Enabled bf16 support in benchmark scripts (#5293, #5341)
  • Added Aggregation.set_validate_args option to skip validation of dim_size (#5290)
  • Added SparseTensor support to inference and training benchmark suite (#5242, #5258, #5881)
  • Added experimental mode in inference benchmarks (#5254)
  • Added node classification example instrumented with Weights and Biases (W&B) logging and W&B Sweeps (#5192)
  • Added experimental mode for utils.scatter (#5232, #5241, #5386)
  • Added missing test labels in HGBDataset (#5233)
  • Added BaseStorage.get() functionality (#5240)
  • Added a test to confirm that to_hetero works with SparseTensor (#5222)
  • Added torch_geometric.explain module with base functionality for explainability methods (#5804, #6054, #6089)
Changed
  • Moved and adapted GNNExplainer from torch_geometric.nn to torch_geometric.explain.algorithm (#5967, #6065)
  • Optimized scatter implementations for CPU/GPU, both with and without backward computation (#6051, #6052)
  • Support temperature value in dense_mincut_pool (#5908)
  • Fixed a bug in which VirtualNode mistakenly treated node features as edge features (#5819)
  • Fixed setter and getter handling in BaseStorage (#5815)
  • Fixed path in hetero_conv_dblp.py example (#5686)
  • Fix auto_select_device routine in GraphGym for PyTorch Lightning>=1.7 (#5677)
  • Support in_channels with tuple in GENConv for bipartite message passing (#5627, #5641)
  • Handle cases of not having enough possible negative edges in RandomLinkSplit (#5642)
  • Fix RGCN+pyg-lib for LongTensor input (#5610)
  • Improved type hint support (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768), #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852)
  • Avoid modifying mode_kwargs in MultiAggregation (#5601)
  • Changed BatchNorm to allow for batches of size one during training (#5530, #5614)
  • Integrated better temporal sampling support by requiring that local neighborhoods are sorted according to time (#5516, #5602)
  • Fixed a bug when applying several scalers with PNAConv (#5514)
  • Allow . in ParameterDict key names (#5494)
  • Renamed drop_unconnected_nodes to drop_unconnected_node_types and drop_orig_edges to drop_orig_edge_types in AddMetapaths (#5490)
  • Improved utils.scatter performance by explicitly choosing better implementation for add and mean reduction (#5399)
  • Fix to_dense_adj with empty edge_index (#5476)
  • The AttentionalAggregation module can now be applied to compute attentin on a per-feature level (#5449)
  • Ensure equal lenghts of num_neighbors across edge types in NeighborLoader (#5444)
  • Fixed a bug in TUDataset in which node features were wrongly constructed whenever node_attributes only hold a single feature (e.g., in PROTEINS) (#5441)
  • Breaking change: removed num_neighbors as an attribute of loader (#5404)
  • ASAPooling is now jittable (#5395)
  • Updated unsupervised GraphSAGE example to leverage LinkNeighborLoader (#5317)
  • Replace in-place operations with out-of-place ones to align with torch.scatter_reduce API (#5353)
  • Breaking bugfix: PointTransformerConv now correctly uses sum aggregation (#5332)
  • Improve out-of-bounds error message in MessagePassing (#5339)
  • Allow file names of a Dataset to be specified as either property and method (#5338)
  • Fixed separating a list of SparseTensor within InMemoryDataset (#5299)
  • Improved name resolving of normalization layers (#5277)
  • Fail gracefully on GLIBC errors within torch-spline-conv (#5276)
  • Fixed Dataset.num_classes in case a transform modifies data.y (#5274)
  • Allow customization of the activation function within PNAConv (#5262)
  • Do not fill InMemoryDataset cache on dataset.num_features (#5264)
  • Changed tests relying on dblp datasets to instead use synthetic data (#5250)
  • Fixed a bug for the initialization of activation function examples in custom_graphgym (#5243)
  • Allow any integer tensors when checking edge_index input to message passing (5281)
Removed
  • Removed scatter_reduce option from experimental mode (#5399)

Full commit list: 2.1.0...2.2.0

Don't miss a new pytorch_geometric release

NewReleases is sending notifications on new releases.