We are excited to announce the release of PyG 2.2 🎉🎉🎉
PyG 2.2 is the culmination of work from 78 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.1.0
.
We are proud to release and integrate pyg-lib==0.1.0
into PyG, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG (#5330, #5347, #5384, #5388).
You can install pyg-lib
as described in our README.md
:
pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
Once pyg-lib
is installed, it will get automatically picked up by PyG, e.g., to accelerate neighborhood sampling routines or to accelerate heterogeneous GNN execution:
pyg-lib
provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG.pyg-lib
provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types.
PyG 2.2 includes numerous primitives to easily integrate with simple paradigms for scalable graph machine learning, enabling users to train GNNs on graphs far larger than the size of their machine's available memory. It does so by introducing simple, easy-to-use, and extensible abstractions of a FeatureStore
and a GraphStore
that plug directly into existing familiar PyG interfaces (see here for the accompanying tutorial).
feature_store = CustomFeatureStore()
feature_store['paper', 'x', None] = ... # Add paper features
feature_store['author', 'x', None] = ... # Add author features
graph_store = CustomGraphStore()
graph_store['edge', 'coo'] = ... # Add edges in "COO" format
# `CustomGraphSampler` knows how to sample on `CustomGraphStore`:
graph_sampler = CustomGraphSampler(
graph_store=graph_store,
num_neighbors=[10, 20],
...
)
from torch_geometric.loader import NodeLoader
loader = NodeLoader(
data=(feature_store, graph_store),
node_sampler=graph_sampler,
batch_size=20,
input_nodes='paper',
)
for batch in loader:
pass
Data loading and sampling routines are refactored and decomposed into torch_geometric.loader
and torch_geometric.sampler
modules, respectively (#5563, #5820, #5456, #5457, #5312, #5365, #5402, #5404), #5418).
PyG 2.2 further accelerates scatter
aggregations based on CPU/GPU and with/without backward computation paths (requires torch>=1.12.0
and torch-scatter>=2.1.0
) (#5232, #5241, #5353, #5386, #5399, #6051, #6052).
We also optimized the usage of nn.aggr.MultiAggregation
by fusing the computation of multiple aggregations together (see here for more details) (#6036, #6040).
Here are some benchmarking results on PyTorch 1.12 (summed over 1000 runs):
Aggregators | Vanilla | Fusion |
---|---|---|
[sum, mean] |
0.3325s | 0.1996s |
[sum, mean, min, max] |
0.7139s | 0.5037s |
[sum, mean, var] |
0.6849s | 0.3871s |
[sum, mean, var, std] |
1.0955s | 0.3973s |
Lastly, we have incorporated "fused" GNN operators via the dgNN
package, starting with a FusedGATConv
implementation (#5140).
We are running regular community sprints to get our community more involved in building PyG. Whether you are just beginning to use graph learning or have been leveraging GNNs in research or production, the community sprints welcome members of all levels with different types of projects.
We had our first community sprint on 10/12 to fully-incorporate type hints and TorchScript support over the entire code base. The goal was to improve usability and cleanliness of our codebase. We had 20 contributors participating, contributing to 120 type hints within 2 weeks, adding around 2400 lines of code (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, #5733, #5743, #5734, #5735, #5736, #5737, #5738, #5747, #5752, #5753, #5754, #5756, #5757, #5758, #5760, #5766, #5767, #5768), #5781, #5778, #5797, #5798, #5799, #5800, #5806, #5810, #5811, #5828, #5847, #5851, #5852).
Our second community sprint began on 11/15 with the goal to improve the explainability capabilities of PyG. With this, we introduce the torch_geometric.explain
module to provide a unified set of tools to explain the predictions of a PyG model or to explain the underlying phenomenon of a dataset.
Some of the features developed in the sprint are incorporated into this release:
- Added the
torch_geometric.explain
module (#5804, #6054, #6089) - Moved and adapted the
GNNExplainer
module totorch_geometric.explain
(#5967, #6065). See here and here for the accompanying examples. - Extended
GNNExplainer
to support edge level explanations (#6056) - Added explainability support for heterogeneous GNNs via
to_captum_model
andto_captum_input
(#5886, #5934)
data = HeteroData(...)
model = HeteroGNN(...)
# Explain predictions on heterogenenous graphs for output node 10:
captum_model = to_captum_model(model, mask_type, output_idx, metadata)
inputs, additional_forward_args = to_captum_input(data.x_dict, data.edge_index_dict, mask_type)
ig = IntegratedGradients(captum_model)
ig_attr = ig.attribute(
inputs=inputs,
target=int(y[output_idx]),
additional_forward_args=additional_forward_args,
internal_batch_size=1,
)
- Renamed
drop_unconnected_nodes
todrop_unconnected_node_types
anddrop_orig_edges
todrop_orig_edge_types
inAddMetapaths
(#5490)
- The usage of
nn.models.GNNExplainer
is now deprecated in favor ofexplain.GNNExplainer
- The usage of
utils.dropout_adj
is now deprecated in favor ofutils.dropout_edge
- The usage of
loader.RandomNodeSampler
is now deprecated in favor ofloader.RandomNodeLoader
- The usage of
to_captum
is now deprecated in favor ofto_captum_model
.
- Added a "Link Prediction on MovieLens" Colab notebook (#5823)
- Added a bipartite link-prediction example (#5834)
- Added the
SSGConv
layer (#5599) - Added the
WLConvContinuous
layer for performing WL-refinement with continuous attributes (#5316) - Added the
PositionalEncoding
module (#5381) - Added a node classification example instrumented with Weights and Biases (#5192)
- Added support for triplet sampling in
LinkNeighborLoader
(#6004) - Added
temporal_strategy = uniform/last
option toNeighborLoaader
andLinkNeighborLoader
(#5576) - Added a
disjoint
option toNeighborLoader
andLinkNeighborLoader
(#5717, #5775) - Added
HeteroData
support inRandomNodeLoader
(#6007 - Added
int32
-basededge_index
support inNeighborLoader
(#5948) - Added support for
input_time
inNeighborLoader
(#5763) - Added
np.memmap
support inNeighborLoader
(#5696) - Added CPU affinitization support to
NeighborLoader
(#6005)
- Added a
FeaturePropagation
transform (#5387) - Added
IndexToMask
andMaskToIndex
transforms (#5375, #5455) - Added
shuffle_node
,mask_feature
andadd_random_edge
augmentations (#5548) - Added
dropout_node
,dropout_edge
anddropout_path
augmentations (#5481, #5495, #5531) - Added a
AddRandomMetaPaths
transform that adds edges based on random walks along a metapath (#5397) - Added a
utils.to_smiles
function (#6038) - Added
HeteroData
support fortransforms.Constant
(#5700)
- Added the
LRGBDataset
to include 5 datasets from the Long Range Graph Benchmark (#5935) - Added the
HydroNet
water cluster dataset (#5537, #5902, #5903) - Added the
DGraphFin
dynamic graph dataset (#5504) - Added the official splits to the
MalNetTiny
dataset (#5078) - Added a
print_summary
method for thetorch_geometric.data.Dataset
interface (#5438)
- Added training and inference benchmark scripts (#5774, #5830, #5878, #5293, #5341, #5242, #5258, #5881, #5254)
- Added the
utils.assortativity
function to compute the degree assortativity coefficient (#5587) - Add support for filling labels with dummy values in
HeteroData.to_homogeneous()
(#5540) - Added
torch.onnx.export
support (#5877, #5997) - Added option to make normalization coefficients trainable in
PNAConv
(#6039) - Added a
semi_grad
option inVarAggregation
andStdAggregation
(#6042) - Added a warning for invalid node and edge type names in
HeteroData
(#5990) - Added
lr_scheduler_solver
and customizedlr_scheduler
classes (#5942) - Added
to_fixed_size
graph transformer (#5939) - Added support for symbolic tracing in the
SchNet
model (#5938) - Added support for customizing the interaction graph in the
SchNet
model (#5919) - Added
SparseTensor
support toSuperGATConv
(#5888) - Added TorchScript support for
AttentiveFP
(#5868) - Added a
return_semantic_attention_weights
argumentHANConv
(#5787) - Added temperature value customization in
dense_mincut_pool
(#5908) - Added support for a tuple of
in_channels
inGENConv
for bipartite message passing (#5627, #5641) - Added
Aggregation.set_validate_args
option to skip validation ofdim_size
(#5290) - Added
BaseStorage.get()
functionality (#5240) - Added support for batches of size one in
BatchNorm
(#5530, #5614) - The
AttentionalAggregation
module can now be applied to compute attention on a per-feature level (#5449) - Added TorchScript support to
ASAPooling
(#5395) - Updated the unsupervised
GraphSAGE
example to leverageLinkNeighborLoader
(#5317) - Added better out-of-bounds error message in
MessagePassing
(#5339) - Added support to customize the activation function in
PNAConv
(#5262)
- Fixed a bug in
TUDataset
, in which node features were wrongly constructed whenevernode_attributes
only hold a single feature (e.g., inPROTEINS
) (#5441) - Fixed a bug in the
VirtualNode
transform, in which node features were mistakenly treated as edge features (#5819) - Fixed a bug when applying several scalers with
PNAConv
(#5514) - Fixed
setter
andgetter
handling inBaseStorage
(#5815) - Fixed the
auto_select_device
routine in GraphGym forpytorch_lightning>=1.7
(#5677) - Fixed
RandomLinkSplit
in case there aren't enough negative edges to sample (#5642) - Fixed the in-place modification to
mode_kwargs
inMultiAggregation
(#5601) - Fixed the
utils.to_dense_adj
routine in caseedge_index
is empty (#5476) - Fixed the
PointTransformerConv
to now correctly usesum
aggregation (#5332) - Fixed the output of
Dataset.num_classes
in case atransform
modifiesdata.y
(#5274) - Fail gracefully on
GLIBC
errors withintorch-spline-conv
(#5276)
Added
Changed
Removed
Full commit list: https://github.com/pyg-team/pytorch_geometric/compare/2.1.0...2.2.0