wolfram77/notes-graph-transformers-zaki--by-mahen.md

## notes-graph-transformers-zaki--by-mahen.md

      
    Raw
  

              notes-graph-transformers-zaki--by-mahen.md
            
          
    Graph Transformers to the MAX

RPI - Oldest engineering University in the West.
Most powerful private University supercomputer.
Graph learning.
Primer on transformer

Query, Key, Value
Pairwise similarity
$$
q_i = W_Q x_i
$$
$$
k_i = W_K x_i
$$
$$
v_i = W_V x_i
$$
$$
A = softmax(Q K^T / \sqrt(d_k))
$$
$$
X = A V
$$
Multiple projections are used, each projection is a head.
Learning graphs:

learn representation
update embedding value based on neighbours.
homophily similar nodes keep similar company
node and edge embeddings
Node, edge and graph level prediction - predict numeric properties of graph regression

Usage:


Protein contacts
Link prediction in social network

Early work:


GCN
transformer - self-attention - Direct interaction with other nodes compared to GCN, no restrictions on distance of nodes.
Linearizing attention in Transformers.

Third order interaction


geometric data
interaction in pairs over a triple
Third order, angles, area of triangle
Fourth order - dihedral angle and volume of tetrahedrons

Crucial for 3D geometry prediction.
Edge augmented Graph transformer

$$
A_{att} = softmax(\frac{QK^T}{\sqrt{d_k}} + E).\sigma(G)
$$
In attention, every pairwise computation happens.
Learn edge representations and add bias.
Gate channels, similar to LSTM.
Degree of nodes in graphs matter. Degree scaling.
SVD positional encoding
Laplacian positional encoding
OGB dataset - predict HOMO +LUMO
Have to try various variables.
Image transformer - chop images and treat it like text. Requires large scaling in compute.
Lesson from this:
Push ideas to limit - know when to stop
EGT - start with original edges and learn new edges.
Triplet graph transformer

Extends EGT -
node channels
Pair representation - edge channels
Adding attention between attention pairs
This removes intermediate node.
Eg. For 3 nodes i, j, k.
Look at photos
Both inward and outward - generalized all three way interactions.
Edges pay attention to each other
Open Catalyst Challenge
Directed Graph Transformers

Look it up - possible used case for directed edges triangle counting?
Personalize medicine - use medical guidelines. Also use mining ⛏️ and learning
Food recommendations and suggesting recipes
Search for EGT, TGT, on GitHub
Weissfeller Lehman tests
Distinguishing more subgraph isomorphic tests.
Difficult work to extend to graphs of large sizes.
BERT - looks at past and future, masked language model.
GPT - looks only at past, is next word prediction.
RLHF - reinforcement learning based on human feedback - human alignment