Skip to content

Instantly share code, notes, and snippets.

View RuiFilipeCampos's full-sized avatar
🔨
building cool stuff

Rui Campos RuiFilipeCampos

🔨
building cool stuff
View GitHub Profile
@RuiFilipeCampos
RuiFilipeCampos / dotprod_to_metric.md
Created March 25, 2024 15:58
From scaled dot product to metric tensor

From scaled dot product to metric tensor

NOTE: WIP

In this section, we point out that the multi-headed scaled dot product attention introduced in 2017 is equivalent to a general quadratic form that lends itself to a more efficient reformulation. Furthermore, we argue on the grounds of efficiency, interpretability and regularization for the imposition that the form be a metric. What follows is a short exposition of scaled dot product using Ricci calculus, transitioning into the proposed quadratic and metric attentions.

Let $Q_d^{nk}$, $K_d^{nk}$ and $V_d^{nk}$ each be $n$ learnable linear maps from $R^d$ to $R^k$ that act on $b$ sequences of $c$ input embeddings to produce the well known keys, queries and values,

$$

@RuiFilipeCampos
RuiFilipeCampos / metric_attention_cuda.md
Last active March 25, 2024 15:56
CUDA Kernel of the Metric Tensor Attention

CUDA Kernel of the Metric Tensor Attention

NOTE: WIP

Forwards Pass

Let $P^{nk}_d$ be $N_n$ learnable projections from $\mathbf R^{N_d}$ to $\mathbf R^{N_k}$ and $x^{bcd}$ a batch of $N_b$ sequences containing $N_c$ embeddings from $\mathbf R^{N_d}$. The action of these projections is expressed in Ricci notation by

@RuiFilipeCampos
RuiFilipeCampos / test.md
Last active February 23, 2024 10:45
test

test