Skip to content

Instantly share code, notes, and snippets.

@rusty1s
Last active August 17, 2022 10:18
Show Gist options
  • Save rusty1s/d4a1d3e28ee0756ce6f33367f205f95a to your computer and use it in GitHub Desktop.
Save rusty1s/d4a1d3e28ee0756ce6f33367f205f95a to your computer and use it in GitHub Desktop.
PyG 2.1: Principled aggregations, link-level and temporal samplers, data pipe support, ...

We are excited to announce the release of PyG 2.1 🎉

PyG 2.1 is the culmination of work from over 60 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.0.4.

Highlights

Principled Aggregations

See here for the accompanying tutorial.

Aggregation functions play an important role in the message passing framework and the readout functions of Graph Neural Networks. Specifically, many works in the literature (Hamilton et al. (2017), Xu et al. (2018), Corso et al. (2020), Li et al. (2020), Tailor et al. (2021), Bartunov et al. (2022)) demonstrate that the choice of aggregation functions contributes significantly to the representational power and performance of the model.

To facilitate further experimentation and unify the concepts of aggregation within GNNs across both MessagePassing and global readouts, we have made the concept of Aggregation a first-class principle in PyG (#4379, #4522, #4687, #4721, #4731, #4762, #4749, #4779, #4863, #4864, #4865, #4866, #4872, #4927, #4934, #4935, #4957, #4973, #4973, #4986, #4995, #5000, #5021, #5034, #5036, #5039, #4522, #5033, #5085, #5097, #5099, #5104, #5113, #5130, #5098, #5191). As of now, PyG provides support for various aggregations — from simple ones (e.g., mean, max, sum), to advanced ones (e.g., median, var, std), learnable ones (e.g., SoftmaxAggregation, PowerMeanAggregation), and exotic ones (e.g., LSTMAggregation, SortAggregation, EquilibriumAggregation). Furthermore, multiple aggregations can be combined and stacked together:

from torch_geometric.nn import MessagePassing, SoftmaxAggregation

class MyConv(MessagePassing):
    def __init__(self, ...):
        # Combines a set of aggregations and concatenates their results.
        # The interface also supports automatic resolution.
        super().__init__(aggr=['mean', 'std', SoftmaxAggregation(learn=True)])

Link-level Neighbor Loader

We added a new LinkNeighborLoader class for training scalable GNNs that perform edge-level predictions on giant graphs (#4396, #4439, #4441, #4446, #4508, #4509, #4868). LinkNeighborLoader comes with automatic support for both homogeneous and heterogenous data, and supports link prediction via automatic negative sampling as well as edge-level classification and regression models:

from torch_geometric.loader import LinkNeighborLoader

loader = LinkNeighborLoader(
    data,
    num_neighbors=[30] * 2,  # Sample 30 neighbors for each node for 2 iterations
    batch_size=128,  # Use a batch size of 128 for sampling training links
    edge_label_index=data.edge_index,  # Use the entire graph for supervision
    negative_sampling_ratio=1.0,  # Sample negative edges
)

sampled_data = next(iter(loader))
print(sampled_data)
>>> Data(x=[1368, 1433], edge_index=[2, 3103], edge_label_index=[2, 256], edge_label=[256])

Neighborhood Sampling based on Temporal Constraints

Both NeighborLoader and LinkNeighborLoader now support temporal sampling via the time_attr argument (#4025, #4877, #4908, #5137, #5173). If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. neighbors have an earlier timestamp than the center node:

from torch_geometric.loader import NeighborLoader

data['paper'].time = torch.arange(data['paper'].num_nodes)

loader = NeighborLoader(
    data,
    input_nodes='paper',
    time_attr='time',  # Only sample papers that appeared before the seed paper
    num_neighbors=[30] * 2,
    batch_size=128,
)

Note that this feature requires torch-sparse>=0.6.14.

Functional DataPipes

See here for the accompanying example.

PyG now fully supports data loading using the newly introduced concept of DataPipes in PyTorch for easily constructing flexible and performant data pipelines (#4302, #4345, #4349). PyG provides DataPipe support for batching multiple PyG data objects together and for applying any PyG transform:

datapipe = FileOpener(['SMILES_HIV.csv'])
datapipe = datapipe.parse_csv_as_dict()
datapipe = datapipe.parse_smiles(target_key='HIV_active')
datapipe = datapipe.in_memory_cache()  # Cache graph instances in-memory.
datapipe = datapipe.shuffle()
datapipe = datapipe.batch_graphs(batch_size=32)
datapipe = FileLister([root_dir], masks='*.off', recursive=True)
datapipe = datapipe.read_mesh()
datapipe = datapipe.in_memory_cache()  # Cache graph instances in-memory.
datapipe = datapipe.sample_points(1024)  # Use PyG transforms from here.
datapipe = datapipe.knn_graph(k=8)
datapipe = datapipe.shuffle()
datapipe = datapipe.batch_graphs(batch_size=32)

Breaking Changes

Deprecations

Features

Layers, Models and Examples

Transformations

Datasets

General Improvements

Bugfixes

  • Fixed a bug in RGATConv that produced device mismatches for "f-scaled" mode (#5187]
  • Fixed a bug in GINEConv bug for non-Sequential neural network layers (#5154]
  • Fixed a bug in HGTLoader which produced outputs with missing edge types, will require torch-sparse>=0.6.15 (#5067)
  • Fixed a bug in load_state_dict for Linear with strict=False mode (5094)
  • Fixed data.num_node_features computation for sparse matrices (5089)
  • Fixed a bug in which GraphGym did not create new non-linearity functions but re-used existing ones (4978)
  • Fixed BasicGNN for num_layers=1, which now respects a desired number of out_channels (#4943)
  • Fixed a bug in data.subgraph for 0-dim tensors (#4932)
  • Fixed a bug in InMemoryDataset inferring wrong length for lists of tensors (#4837)
  • Fixed a bug in TUDataset where pre_filter was not applied whenever pre_transform was present (#4842)
  • Fixed access of edge types in HeteroData via two node types when there exists multiple relations between them (#4782)
  • Fixed a bug in HANConv in which destination node features rather than source node features were propagated (#4753)
  • Fixed a ranking protocol bug in the RGCN link prediction example (#4688)
  • Fixed the interplay between TUDataset and pre_transform transformations that modify node features (#4669)
  • The bias argument in TAGConv is now correctly applied (#4597)
  • Fixed filtering of attributes in samplers in case __cat_dim__ != 0 (#4629)
  • Fixed SparseTensor support in NeighborLoader (#4320)
  • Fixed average degree handling in PNAConv (#4312)
  • Fixed a bug in from_networkx in case some attributes are PyTorch tensors (#4486)
  • Fixed a missing clamp in the DimeNet model (#4506, #4562)
  • Fixed the download link in DBP15K (#4428)
  • Fixed an autograd bug in DimeNet when resetting parameters (#4424)
  • Fixed bipartite message passing in case flow="target_to_source" (#4418)
  • Fixed a bug in which num_nodes was not properly updated in the FixedPoints transform (#4394)
  • Fixed a bug in which GATConv was not jittable (#4347)
  • Fixed a bug in which nn.models.GAT did not produce out_channels many output channels (#4299)
  • Fixed a bug in mini-batching with empty lists as attributes (#4293)
  • Fixed a bug in which GCNConv could not be combined with to_hetero on heterogeneous graphs with one node type (#4279)

Full Changelog

Added
  • Added edge_label_time argument to LinkNeighborLoader (#5137, #5173)
  • Let ImbalancedSampler accept torch.Tensor as input (#5138)
  • Added flow argument to gcn_norm to correctly normalize the adjacency matrix in GCNConv (#5149)
  • NeighborSampler supports graphs without edges (#5072)
  • Added the MeanSubtractionNorm layer (#5068)
  • Added pyg_lib.segment_matmul integration within RGCNConv (#5052, #5096)
  • Support SparseTensor as edge label in LightGCN (#5046)
  • Added support for BasicGNN models within to_hetero (#5091)
  • Added support for computing weighted metapaths in AddMetapaths (#5049)
  • Added inference benchmark suite (#4915)
  • Added a dynamically sized batch sampler for filling a mini-batch with a variable number of samples up to a maximum size (#4972)
  • Added fine grained options for setting bias and dropout per layer in the MLP model (#4981)
  • Added EdgeCNN model (#4991)
  • Added scalable inference mode in BasicGNN with layer-wise neighbor loading (#4977)
  • Added inference benchmarks (#4892, #5107)
  • Added PyTorch 1.12 support (#4975)
  • Added unbatch_edge_index functionality for splitting an edge_index tensor according to a batch vector (#4903)
  • Added node-wise normalization mode in LayerNorm (#4944)
  • Added support for normalization_resolver (#4926, #4951, #4958, #4959)
  • Added notebook tutorial for torch_geometric.nn.aggr package to documentation (#4927)
  • Added support for follow_batch for lists or dictionaries of tensors (#4837)
  • Added Data.validate() and HeteroData.validate() functionality (#4885)
  • Added LinkNeighborLoader support to LightningDataModule (#4868)
  • Added predict() support to the LightningNodeData module (#4884)
  • Added time_attr argument to LinkNeighborLoader (#4877, #4908)
  • Added a filter_per_worker argument to data loaders to allow filtering of data within sub-processes (#4873)
  • Added a NeighborLoader benchmark script (#4815, #4862)
  • Added support for FeatureStore and GraphStore in NeighborLoader (#4817, #4851, #4854, #4856, #4857, #4882, #4883, #4929, #4992, #4962, #4968, #5037, #5088)
  • Added a normalize parameter to dense_diff_pool (#4847)
  • Added size=None explanation to jittable MessagePassing modules in the documentation (#4850)
  • Added documentation to the DataLoaderIterator class (#4838)
  • Added GraphStore support to Data and HeteroData (#4816)
  • Added FeatureStore support to Data and HeteroData (#4807, #4853)
  • Added FeatureStore and GraphStore abstractions (#4534, #4568)
  • Added support for dense aggregations in global_*_pool (#4827)
  • Added Python version requirement (#4825)
  • Added TorchScript support to JumpingKnowledge module (#4805)
  • Added a max_sample argument to AddMetaPaths in order to tackle very dense metapath edges (#4750)
  • Test HANConv with empty tensors (#4756, #4841)
  • Added the bias vector to the GCN model definition in the "Create Message Passing Networks" tutorial (#4755)
  • Added transforms.RootedSubgraph interface with two implementations: RootedEgoNets and RootedRWSubgraph (#3926)
  • Added ptr vectors for follow_batch attributes within Batch.from_data_list (#4723)
  • Added torch_geometric.nn.aggr package (#4687, #4721, #4731, #4762, #4749, #4779, #4863, #4864, #4865, #4866, #4872, #4934, #4935, #4957, #4973, #4973, #4986, #4995, #5000, #5034, #5036, #5039, #4522, #5033, #5085, #5097, #5099, #5104, #5113, #5130, #5098, #5191)
  • Added the DimeNet++ model (#4432, #4699, #4700, #4800)
  • Added an example of using PyG with PyTorch Ignite (#4487)
  • Added GroupAddRev module with support for reducing training GPU memory (#4671, #4701, #4715, #4730)
  • Added benchmarks via wandb (#4656, #4672, #4676)
  • Added unbatch functionality (#4628)
  • Confirm that to_hetero() works with custom functions, e.g., dropout_adj (4653)
  • Added the MLP.plain_last=False option (4652)
  • Added a check in HeteroConv and to_hetero() to ensure that MessagePassing.add_self_loops is disabled (4647)
  • Added HeteroData.subgraph(), HeteroData.node_type_subgraph() and HeteroData.edge_type_subgraph() support (#4635)
  • Added the AQSOL dataset (#4626)
  • Added HeteroData.node_items() and HeteroData.edge_items() functionality (#4644)
  • Added PyTorch Lightning support in GraphGym (#4511, #4516 #4531, #4689, #4843)
  • Added support for returning embeddings in MLP models (#4625)
  • Added faster initialization of NeighborLoader in case edge indices are already sorted (via is_sorted=True) (#4620, #4702)
  • Added AddPositionalEncoding transform (#4521)
  • Added HeteroData.is_undirected() support (#4604)
  • Added the Genius and Wiki datasets to nn.datasets.LINKXDataset (#4570, #4600)
  • Added nn.aggr.EquilibrumAggregation implicit global layer (#4522)
  • Added support for graph-level outputs in to_hetero (#4582)
  • Added CHANGELOG.md (#4581)
  • Added HeteroData support to the RemoveIsolatedNodes transform (#4479)
  • Added HeteroData.num_features functionality (#4504)
  • Added support for projecting features before propagation in SAGEConv (#4437)
  • Added Geom-GCN splits to the Planetoid datasets (#4442)
  • Added a LinkNeighborLoader for training scalable link predictions models #4396, #4439, #4441, #4446, #4508, #4509)
  • Added an unsupervised GraphSAGE example on PPI (#4416)
  • Added support for LSTM aggregation in SAGEConv (#4379)
  • Added support for floating-point labels in RandomLinkSplit (#4311, #4383)
  • Added support for torch.data DataPipes (#4302, #4345, #4349)
  • Added support for the cosine argument in the KNNGraph/RadiusGraph transforms (#4344)
  • Added support graph-level attributes in networkx conversion (#4343)
  • Added support for renaming node types via HeteroData.rename (#4329)
  • Added an example to load a trained PyG model in C++ (#4307)
  • Added a MessagePassing.explain_message method to customize making explanations on messages (#4278, #4448))
  • Added support for GATv2Conv in the nn.models.GAT model (#4357)
  • Added HeteroData.subgraph functionality (#4243)
  • Added the MaskLabel module and a corresponding masked label propagation example (#4197)
  • Added temporal sampling support to NeighborLoader (#4025)
  • Added an example for unsupervised heterogeneous graph learning based on "Deep Multiplex Graph Infomax" (#3189)
Changed
  • Changed docstring for RandomLinkSplit (#5190)
  • Switched to PyTorch scatter_reduce implementation - experimental feature (#5120)
  • Fixed RGATConv device mismatches for f-scaled mode (#5187]
  • Allow for multi-dimensional edge_labels in LinkNeighborLoader (#5186]
  • Fixed GINEConv bug with non-sequential input (#5154]
  • Improved error message (#5095)
  • Fixed HGTLoader bug which produced outputs with missing edge types (#5067)
  • Fixed dynamic inheritance issue in data batching (#5051)
  • Fixed load_state_dict in Linear with strict=False mode (5094)
  • Fixed typo in MaskLabel.ratio_mask (5093)
  • Fixed data.num_node_features computation for sparse matrices (5089)
  • Fixed torch.fx bug with torch.nn.aggr package (#5021))
  • Fixed GenConv test (4993)
  • Fixed packaging tests for Python 3.10 (4982)
  • Changed act_dict (part of graphgym) to create individual instances instead of reusing the same ones everywhere (4978)
  • Fixed issue where one-hot tensors were passed to F.one_hot (4970)
  • Fixed bool arugments in argparse in benchmark/ (#4967)
  • Fixed BasicGNN for num_layers=1, which now respects a desired number of out_channels (#4943)
  • len(batch) will now return the number of graphs inside the batch, not the number of attributes (#4931)
  • Fixed data.subgraph generation for 0-dim tensors (#4932)
  • Removed unnecssary inclusion of self-loops when sampling negative edges (#4880)
  • Fixed InMemoryDataset inferring wrong len for lists of tensors (#4837)
  • Fixed Batch.separate when using it for lists of tensors (#4837)
  • Correct docstring for SAGEConv (#4852)
  • Fixed a bug in TUDataset where pre_filter was not applied whenever pre_transform was present
  • Renamed RandomTranslate to RandomJitter - the usage of RandomTranslate is now deprecated (#4828)
  • Do not allow accessing edge types in HeteroData with two node types when there exists multiple relations between these types (#4782)
  • Allow edge_type == rev_edge_type argument in RandomLinkSplit (#4757)
  • Fixed a numerical instability in the GeneralConv and neighbor_sample tests (#4754)
  • Fixed a bug in HANConv in which destination node features rather than source node features were propagated (#4753)
  • Fixed versions of checkout and setup-python in CI (#4751)
  • Fixed protobuf version (#4719)
  • Fixed the ranking protocol bug in the RGCN link prediction example (#4688)
  • Math support in Markdown (#4683)
  • Allow for setter properties in Data (#4682, #4686)
  • Allow for optional edge_weight in GCN2Conv (#4670)
  • Fixed the interplay between TUDataset and pre_transform that modify node features (#4669)
  • Make use of the pyg_sphinx_theme documentation template (#4664, #4667)
  • Refactored reading molecular positions from sdf file for qm9 datasets (4654)
  • Fixed MLP.jittable() bug in case return_emb=True (#4645, #4648)
  • The generated node features of StochasticBlockModelDataset are now ordered with respect to their labels (#4617)
  • Fixed typos in the documentation (#4616, #4824, #4895, #5161)
  • The bias argument in TAGConv is now actually applied (#4597)
  • Fixed subclass behaviour of process and download in Datsaet (#4586)
  • Fixed filtering of attributes for loaders in case __cat_dim__ != 0 (#4629)
  • Fixed SparseTensor support in NeighborLoader (#4320)
  • Fixed average degree handling in PNAConv (#4312)
  • Fixed a bug in from_networkx in case some attributes are PyTorch tensors (#4486)
  • Added a missing clamp in DimeNet (#4506, #4562)
  • Fixed the download link in DBP15K (#4428)
  • Fixed an autograd bug in DimeNet when resetting parameters (#4424)
  • Fixed bipartite message passing in case flow="target_to_source" (#4418)
  • Fixed a bug in which num_nodes was not properly updated in the FixedPoints transform (#4394)
  • PyTorch Lightning >= 1.6 support (#4377)
  • Fixed a bug in which GATConv was not jittable (#4347)
  • Fixed a bug in which the GraphGym config was not stored in each specific experiment directory (#4338)
  • Fixed a bug in which nn.models.GAT did not produce out_channels-many output channels (#4299)
  • Fixed mini-batching with empty lists as attributes (#4293)
  • Fixed a bug in which GCNConv could not be combined with to_hetero on heterogeneous graphs with one node type (#4279)
Removed
  • Remove internal metrics in favor of torchmetrics (#4287)

Full commit list: https://github.com/pyg-team/pytorch_geometric/compare/2.0.4...master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment