Skip to content

Instantly share code, notes, and snippets.

@rusty1s
Last active December 1, 2022 06:34
Show Gist options
  • Save rusty1s/471053686026d8eb439fd34ae25cf927 to your computer and use it in GitHub Desktop.
Save rusty1s/471053686026d8eb439fd34ae25cf927 to your computer and use it in GitHub Desktop.
PyG 2.2: Accelerations and Scalability

We are excited to announce the release of PyG 2.2 🎉🎉🎉

PyG 2.2 is the culmination of work from 78 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.1.0.

Highlights

pyg-lib Integration

We are proud to release and integrate pyg-lib==0.1.0 into PyG, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG (#5330, #5347, #5384, #5388).

You can install pyg-lib as described in our README.md:

pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html

Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., to accelerate neighborhood sampling routines or to accelerate heterogeneous GNN execution:

  • pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG.
  • pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types.

GraphStore and FeatureStore Abstractions

PyG 2.2 includes numerous primitives to easily integrate with simple paradigms for scalable graph machine learning, enabling users to train GNNs on graphs far larger than the size of their machine's available memory. It does so by introducing simple, easy-to-use, and extensible abstractions of a FeatureStore and a GraphStore that plug directly into existing familiar PyG interfaces (see here for the accompanying tutorial).

feature_store = CustomFeatureStore()
feature_store['paper', 'x', None] = ...  # Add paper features
feature_store['author', 'x', None] = ...  # Add author features

graph_store = CustomGraphStore()
graph_store['edge', 'coo'] = ...  # Add edges in "COO" format

# `CustomGraphSampler` knows how to sample on `CustomGraphStore`:
graph_sampler = CustomGraphSampler(
    graph_store=graph_store,
    num_neighbors=[10, 20],
    ...
)

from torch_geometric.loader import NodeLoader
loader = NodeLoader(
    data=(feature_store, graph_store),
    node_sampler=graph_sampler,
    batch_size=20,
    input_nodes='paper',
)

for batch in loader:
    pass

Data loading and sampling routines are refactored and decomposed into torch_geometric.loader and torch_geometric.sampler modules, respectively (#5563, #5820, #5456, #5457, #5312, #5365, #5402, #5404), #5418).

Optimized and Fused Aggregations

PyG 2.2 further accelerates scatter aggregations based on CPU/GPU and with/without backward computation paths (requires torch>=1.12.0 and torch-scatter>=2.1.0) (#5232, #5241, #5353, #5386, #5399, #6051, #6052).

We also optimized the usage of nn.aggr.MultiAggregation by fusing the computation of multiple aggregations together (see here for more details) (#6036, #6040).

Here are some benchmarking results on PyTorch 1.12 (summed over 1000 runs):

Aggregators Vanilla Fusion
[sum, mean] 0.3325s 0.1996s
[sum, mean, min, max] 0.7139s 0.5037s
[sum, mean, var] 0.6849s 0.3871s
[sum, mean, var, std] 1.0955s 0.3973s

Lastly, we have incorporated "fused" GNN operators via the dgNN package, starting with a FusedGATConv implementation (#5140).

Community Sprint: Type Hints and TorchScript Support

We are running regular community sprints to get our community more involved in building PyG. Whether you are just beginning to use graph learning or have been leveraging GNNs in research or production, the community sprints welcome members of all levels with different types of projects.

We had our first community sprint on 10/12 to fully-incorporate type hints and TorchScript support over the entire code base. The goal was to improve usability and cleanliness of our codebase. We had 20 contributors participating, contributing to 120 type hints within 2 weeks, adding around 2400 lines of code (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768), #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852).

Explainability

Our second community sprint began on 11/15 with the goal to improve the explainability capabilities of PyG. With this, we introduce the torch_geometric.explain module to provide a unified set of tools to explain the predictions of a PyG model or to explain the underlying phenomenon of a dataset.

Some of the features developed in the sprint are incorporated into this release:

  • Added the torch_geometric.explain module (#5804, #6054, #6089)
  • Moved and adapted the GNNExplainer module to torch_geometric.explain (#5967, #6065). See here and here for the accompanying examples.
  • Extended GNNExplainer to support edge level explanations (#6056)
  • Added explainability support for heterogeneous GNNs via to_captum_model and to_captum_input (#5886, #5934)
data = HeteroData(...)
model = HeteroGNN(...)

# Explain predictions on heterogenenous graphs for output node 10:
captum_model = to_captum_model(model, mask_type, output_idx, metadata)
inputs, additional_forward_args = to_captum_input(data.x_dict, data.edge_index_dict, mask_type)

ig = IntegratedGradients(captum_model)
ig_attr = ig.attribute(
    inputs=inputs,
    target=int(y[output_idx]),
    additional_forward_args=additional_forward_args,
    internal_batch_size=1,
)

Breaking Changes

  • Renamed drop_unconnected_nodes to drop_unconnected_node_types and drop_orig_edges to drop_orig_edge_types in AddMetapaths (#5490)

Deprecations

  • The usage of nn.models.GNNExplainer is now deprecated in favor of explain.GNNExplainer
  • The usage of utils.dropout_adj is now deprecated in favor of utils.dropout_edge
  • The usage of loader.RandomNodeSampler is now deprecated in favor of loader.RandomNodeLoader
  • The usage of to_captum is now deprecated in favor of to_captum_model.

Features

Layers, Models and Examples

  • Added a "Link Prediction on MovieLens" Colab notebook (#5823)
  • Added a bipartite link-prediction example (#5834)
  • Added the SSGConv layer (#5599)
  • Added the WLConvContinuous layer for performing WL-refinement with continuous attributes (#5316)
  • Added the PositionalEncoding module (#5381)
  • Added a node classification example instrumented with Weights and Biases (#5192)

Data Loaders

  • Added support for triplet sampling in LinkNeighborLoader (#6004)
  • Added temporal_strategy = uniform/last option to NeighborLoaader and LinkNeighborLoader (#5576)
  • Added a disjoint option to NeighborLoader and LinkNeighborLoader (#5717, #5775)
  • Added HeteroData support in RandomNodeLoader (#6007
  • Added int32-based edge_index support in NeighborLoader (#5948)
  • Added support for input_time in NeighborLoader (#5763)
  • Added np.memmap support in NeighborLoader (#5696)
  • Added CPU affinitization support to NeighborLoader (#6005)

Transformations

  • Added a FeaturePropagation transform (#5387)
  • Added IndexToMask and MaskToIndex transforms (#5375, #5455)
  • Added shuffle_node, mask_feature and add_random_edge augmentations (#5548)
  • Added dropout_node, dropout_edge and dropout_path augmentations (#5481, #5495, #5531)
  • Added a AddRandomMetaPaths transform that adds edges based on random walks along a metapath (#5397)
  • Added a utils.to_smiles function (#6038)
  • Added HeteroData support for transforms.Constant (#5700)

Datasets

  • Added the LRGBDataset to include 5 datasets from the Long Range Graph Benchmark (#5935)
  • Added the HydroNet water cluster dataset (#5537, #5902, #5903)
  • Added the DGraphFin dynamic graph dataset (#5504)
  • Added the official splits to the MalNetTiny dataset (#5078)
  • Added a print_summary method for the torch_geometric.data.Dataset interface (#5438)

General Improvements

  • Added training and inference benchmark scripts (#5774, #5830, #5878, #5293, #5341, #5242, #5258, #5881, #5254)
  • Added the utils.assortativity function to compute the degree assortativity coefficient (#5587)
  • Add support for filling labels with dummy values in HeteroData.to_homogeneous() (#5540)
  • Added torch.onnx.export support (#5877, #5997)
  • Added option to make normalization coefficients trainable in PNAConv (#6039)
  • Added a semi_grad option in VarAggregation and StdAggregation (#6042)
  • Added a warning for invalid node and edge type names in HeteroData (#5990)
  • Added lr_scheduler_solver and customized lr_scheduler classes (#5942)
  • Added to_fixed_size graph transformer (#5939)
  • Added support for symbolic tracing in the SchNet model (#5938)
  • Added support for customizing the interaction graph in the SchNet model (#5919)
  • Added SparseTensor support to SuperGATConv (#5888)
  • Added TorchScript support for AttentiveFP (#5868)
  • Added a return_semantic_attention_weights argument HANConv (#5787)
  • Added temperature value customization in dense_mincut_pool (#5908)
  • Added support for a tuple of in_channels in GENConv for bipartite message passing (#5627, #5641)
  • Added Aggregation.set_validate_args option to skip validation of dim_size (#5290)
  • Added BaseStorage.get() functionality (#5240)
  • Added support for batches of size one in BatchNorm (#5530, #5614)
  • The AttentionalAggregation module can now be applied to compute attention on a per-feature level (#5449)
  • Added TorchScript support to ASAPooling (#5395)
  • Updated the unsupervised GraphSAGE example to leverage LinkNeighborLoader (#5317)
  • Added better out-of-bounds error message in MessagePassing (#5339)
  • Added support to customize the activation function in PNAConv (#5262)

Bugfixes

  • Fixed a bug in TUDataset, in which node features were wrongly constructed whenever node_attributes only hold a single feature (e.g., in PROTEINS) (#5441)
  • Fixed a bug in the VirtualNode transform, in which node features were mistakenly treated as edge features (#5819)
  • Fixed a bug when applying several scalers with PNAConv (#5514)
  • Fixed setter and getter handling in BaseStorage (#5815)
  • Fixed the auto_select_device routine in GraphGym for pytorch_lightning>=1.7 (#5677)
  • Fixed RandomLinkSplit in case there aren't enough negative edges to sample (#5642)
  • Fixed the in-place modification to mode_kwargs in MultiAggregation (#5601)
  • Fixed the utils.to_dense_adj routine in case edge_index is empty (#5476)
  • Fixed the PointTransformerConv to now correctly use sum aggregation (#5332)
  • Fixed the output of Dataset.num_classes in case a transform modifies data.y (#5274)
  • Fail gracefully on GLIBC errors within torch-spline-conv (#5276)

Full Changelog

Added
Changed
Removed

Full commit list: https://github.com/pyg-team/pytorch_geometric/compare/2.1.0...2.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment