rusty1s/release-2.2.md

## release-2.2.md

      
    Raw
  

              release-2.2.md
            
          
    We are excited to announce the release of PyG 2.2 🎉🎉🎉

Highlights
Breaking Changes
Deprecations
Features
Bugfixes
Full Changelog

PyG 2.2 is the culmination of work from 78 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.1.0.
Highlights

pyg-lib Integration

We are proud to release and integrate pyg-lib==0.1.0 into PyG, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG (#5330, #5347, #5384, #5388).
You can install pyg-lib as described in our README.md:
pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., to accelerate neighborhood sampling routines or to accelerate heterogeneous GNN execution:

pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG.
pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types.

GraphStore and FeatureStore Abstractions

PyG 2.2 includes numerous primitives to easily integrate with simple paradigms for scalable graph machine learning, enabling users to train GNNs on graphs far larger than the size of their machine's available memory. It does so by introducing simple, easy-to-use, and extensible abstractions of a FeatureStore and a GraphStore that plug directly into existing familiar PyG interfaces (see here for the accompanying tutorial).
feature_store = CustomFeatureStore()
feature_store['paper', 'x', None] = ...  # Add paper features
feature_store['author', 'x', None] = ...  # Add author features

graph_store = CustomGraphStore()
graph_store['edge', 'coo'] = ...  # Add edges in "COO" format

# `CustomGraphSampler` knows how to sample on `CustomGraphStore`:
graph_sampler = CustomGraphSampler(
    graph_store=graph_store,
    num_neighbors=[10, 20],
    ...
)

from torch_geometric.loader import NodeLoader
loader = NodeLoader(
    data=(feature_store, graph_store),
    node_sampler=graph_sampler,
    batch_size=20,
    input_nodes='paper',
)

for batch in loader:
    pass
Data loading and sampling routines are refactored and decomposed into torch_geometric.loader and torch_geometric.sampler modules, respectively (#5563, #5820, #5456, #5457, #5312, #5365, #5402, #5404), #5418).
Optimized and Fused Aggregations

PyG 2.2 further accelerates scatter aggregations based on CPU/GPU and with/without backward computation paths (requires torch>=1.12.0 and torch-scatter>=2.1.0) (#5232, #5241, #5353, #5386,  #5399, #6051, #6052).
We also optimized the usage of nn.aggr.MultiAggregation by fusing the computation of multiple aggregations together (see here for more details) (#6036, #6040).
Here are some benchmarking results on PyTorch 1.12 (summed over 1000 runs):


Aggregators
Vanilla
Fusion


[sum, mean]
0.3325s
0.1996s


[sum, mean, min, max]
0.7139s
0.5037s


[sum, mean, var]
0.6849s
0.3871s


[sum, mean, var, std]
1.0955s
0.3973s


Lastly, we have incorporated "fused" GNN operators via the dgNN package, starting with a FusedGATConv implementation (#5140).
Community Sprint: Type Hints and TorchScript Support

We are running regular community sprints to get our community more involved in building PyG. Whether you are just beginning to use graph learning or have been leveraging GNNs in research or production, the community sprints welcome members of all levels with different types of projects.
We had our first community sprint on 10/12 to fully-incorporate type hints and TorchScript support over the entire code base. The goal was to improve usability and cleanliness of our codebase. We had 20 contributors participating, contributing to 120 type hints within 2 weeks, adding around 2400 lines of code (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768), #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852).
Explainability

Our second community sprint began on 11/15 with the goal to improve the explainability capabilities of PyG. With this, we introduce the torch_geometric.explain module to provide a unified set of tools to explain the predictions of a PyG model or to explain the underlying phenomenon of a dataset.
Some of the features developed in the sprint are incorporated into this release:

Added the torch_geometric.explain module (#5804, #6054, #6089)
Moved and adapted the GNNExplainer module to torch_geometric.explain (#5967, #6065). See here and here for the accompanying examples.
Extended GNNExplainer to support edge level explanations (#6056)
Added explainability support for heterogeneous GNNs via to_captum_model and to_captum_input (#5886, #5934)

data = HeteroData(...)
model = HeteroGNN(...)

# Explain predictions on heterogenenous graphs for output node 10:
captum_model = to_captum_model(model, mask_type, output_idx, metadata)
inputs, additional_forward_args = to_captum_input(data.x_dict, data.edge_index_dict, mask_type)

ig = IntegratedGradients(captum_model)
ig_attr = ig.attribute(
    inputs=inputs,
    target=int(y[output_idx]),
    additional_forward_args=additional_forward_args,
    internal_batch_size=1,
)
Breaking Changes


Renamed drop_unconnected_nodes to drop_unconnected_node_types and drop_orig_edges to drop_orig_edge_types in AddMetapaths (#5490)

Deprecations


The usage of nn.models.GNNExplainer is now deprecated in favor of explain.GNNExplainer
The usage of utils.dropout_adj is now deprecated in favor of utils.dropout_edge
The usage of loader.RandomNodeSampler is now deprecated in favor of loader.RandomNodeLoader
The usage of to_captum is now deprecated in favor of to_captum_model.

Features

Layers, Models and Examples


Added a "Link Prediction on MovieLens" Colab notebook (#5823)
Added a bipartite link-prediction example (#5834)
Added the SSGConv layer (#5599)
Added the WLConvContinuous layer for performing WL-refinement with continuous attributes (#5316)
Added the PositionalEncoding module (#5381)
Added a node classification example instrumented with Weights and Biases (#5192)

Data Loaders


Added support for triplet sampling in LinkNeighborLoader (#6004)
Added temporal_strategy = uniform/last option to NeighborLoaader and LinkNeighborLoader (#5576)
Added a disjoint option to NeighborLoader and LinkNeighborLoader (#5717, #5775)
Added HeteroData support in RandomNodeLoader (#6007
Added int32-based  edge_index support in NeighborLoader (#5948)
Added support for input_time in NeighborLoader (#5763)
Added np.memmap support in NeighborLoader (#5696)
Added CPU affinitization support to NeighborLoader (#6005)

Transformations


Added a FeaturePropagation transform (#5387)
Added IndexToMask and MaskToIndex transforms (#5375, #5455)
Added shuffle_node, mask_feature and add_random_edge augmentations (#5548)
Added dropout_node, dropout_edge and dropout_path augmentations (#5481, #5495, #5531)
Added a AddRandomMetaPaths transform that adds edges based on random walks along a metapath (#5397)
Added a utils.to_smiles function (#6038)
Added HeteroData support for transforms.Constant (#5700)

Datasets


Added the LRGBDataset to include 5 datasets from the Long Range Graph Benchmark (#5935)
Added the HydroNet water cluster dataset (#5537, #5902, #5903)
Added the DGraphFin dynamic graph dataset (#5504)
Added the official splits to the MalNetTiny dataset (#5078)
Added a print_summary method for the torch_geometric.data.Dataset interface (#5438)

General Improvements


Added training and inference benchmark scripts (#5774, #5830, #5878, #5293, #5341, #5242, #5258, #5881, #5254)
Added the utils.assortativity function to compute the degree assortativity coefficient (#5587)
Add support for filling labels with dummy values in HeteroData.to_homogeneous() (#5540)
Added torch.onnx.export support (#5877, #5997)
Added option to make normalization coefficients trainable in PNAConv (#6039)
Added a semi_grad option in VarAggregation and StdAggregation (#6042)
Added a warning for invalid node and edge type names in HeteroData (#5990)
Added lr_scheduler_solver and customized lr_scheduler classes (#5942)
Added to_fixed_size graph transformer (#5939)
Added support for symbolic tracing in the SchNet model (#5938)
Added support for customizing the interaction graph in the SchNet model (#5919)
Added SparseTensor support to SuperGATConv (#5888)
Added TorchScript support for AttentiveFP (#5868)
Added a return_semantic_attention_weights argument HANConv (#5787)
Added temperature value customization in dense_mincut_pool (#5908)
Added support for a tuple of in_channels in GENConv for bipartite message passing (#5627, #5641)
Added Aggregation.set_validate_args option to skip validation of dim_size (#5290)
Added BaseStorage.get() functionality (#5240)
Added support for batches of size one in BatchNorm (#5530, #5614)
The AttentionalAggregation module can now be applied to compute attention on a per-feature level (#5449)
Added TorchScript support to ASAPooling (#5395)
Updated the unsupervised GraphSAGE example to leverage LinkNeighborLoader (#5317)
Added better out-of-bounds error message in MessagePassing (#5339)
Added support to customize the activation function in PNAConv (#5262)

Bugfixes


Fixed a bug in TUDataset, in which node features were wrongly constructed whenever node_attributes only hold a single feature (e.g., in PROTEINS) (#5441)
Fixed a bug in the VirtualNode transform, in which node features were mistakenly treated as edge features (#5819)
Fixed a bug when applying several scalers with PNAConv (#5514)
Fixed setter and getter handling in BaseStorage (#5815)
Fixed the auto_select_device routine in GraphGym for pytorch_lightning>=1.7 (#5677)
Fixed RandomLinkSplit in case there aren't enough negative edges to sample (#5642)
Fixed the in-place modification to mode_kwargs in MultiAggregation (#5601)
Fixed the utils.to_dense_adj routine in case edge_index is empty (#5476)
Fixed the PointTransformerConv to now correctly use sum aggregation (#5332)
Fixed the output of Dataset.num_classes in case a transform modifies data.y (#5274)
Fail gracefully on GLIBC errors within torch-spline-conv (#5276)

Full Changelog


Added


Changed


Removed

Full commit list: https://github.com/pyg-team/pytorch_geometric/compare/2.1.0...2.2.0
Aggregators	Vanilla	Fusion
`[sum, mean]`	0.3325s	0.1996s
`[sum, mean, min, max]`	0.7139s	0.5037s
`[sum, mean, var]`	0.6849s	0.3871s
`[sum, mean, var, std]`	1.0955s	0.3973s