Skip to content

Instantly share code, notes, and snippets.

View shamoons's full-sized avatar

Shamoon Siddiqui shamoons

View GitHub Profile
@leopd
leopd / almost-attention.py
Last active April 21, 2023 22:35
Explanatory (non-vectorized) code for how attention works
# This code doesn't work, and isn't intended to.
# The goal of this code is to explain how attention mechansisms work, in code.
# It is deliberately not vectorized to make it clearer.
def attention(self, X_in:List[Tensor]):
# For every token transform previous layer's out
for i in range(self.sequence_length):
query[i] = self.Q * X_in[i]
key[i] = self.K * X_in[i]
value[i] = self.V * X_in[i]