Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save wolfram77/8967fb8c9a273d1a969b138404bea903 to your computer and use it in GitHub Desktop.
Save wolfram77/8967fb8c9a273d1a969b138404bea903 to your computer and use it in GitHub Desktop.
Graph Transformers to the MAX : NOTES

Graph Transformers to the MAX

RPI - Oldest engineering University in the West.

Most powerful private University supercomputer.

Graph learning.

Primer on transformer

Query, Key, Value

Pairwise similarity

$$ q_i = W_Q x_i $$

$$ k_i = W_K x_i $$

$$ v_i = W_V x_i $$

$$ A = softmax(Q K^T / \sqrt(d_k)) $$

$$ X = A V $$

Multiple projections are used, each projection is a head.

Learning graphs:

  • learn representation
  • update embedding value based on neighbours.
  • homophily similar nodes keep similar company
  • node and edge embeddings
  • Node, edge and graph level prediction - predict numeric properties of graph regression

Usage:

  • Protein contacts
  • Link prediction in social network

Early work:

  • GCN
  • transformer - self-attention - Direct interaction with other nodes compared to GCN, no restrictions on distance of nodes.
  • Linearizing attention in Transformers.

Third order interaction

  • geometric data
  • interaction in pairs over a triple
  • Third order, angles, area of triangle
  • Fourth order - dihedral angle and volume of tetrahedrons

Crucial for 3D geometry prediction.

Edge augmented Graph transformer

$$ A_{att} = softmax(\frac{QK^T}{\sqrt{d_k}} + E).\sigma(G) $$

In attention, every pairwise computation happens.

Learn edge representations and add bias. Gate channels, similar to LSTM.

Degree of nodes in graphs matter. Degree scaling.

SVD positional encoding Laplacian positional encoding

OGB dataset - predict HOMO +LUMO

Have to try various variables.

Image transformer - chop images and treat it like text. Requires large scaling in compute. Lesson from this: Push ideas to limit - know when to stop

EGT - start with original edges and learn new edges.

Triplet graph transformer

Extends EGT - node channels Pair representation - edge channels Adding attention between attention pairs This removes intermediate node.

Eg. For 3 nodes i, j, k. Look at photos

Both inward and outward - generalized all three way interactions.

Edges pay attention to each other

Open Catalyst Challenge

Directed Graph Transformers

Look it up - possible used case for directed edges triangle counting?

Personalize medicine - use medical guidelines. Also use mining ⛏️ and learning

Food recommendations and suggesting recipes

Search for EGT, TGT, on GitHub

Weissfeller Lehman tests Distinguishing more subgraph isomorphic tests. Difficult work to extend to graphs of large sizes.

BERT - looks at past and future, masked language model. GPT - looks only at past, is next word prediction. RLHF - reinforcement learning based on human feedback - human alignment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment