Skip to content

Instantly share code, notes, and snippets.

@wael34218
Last active December 16, 2021 16:22
Show Gist options
  • Save wael34218/302ea9240782a6c628e0ce9cf5768c5a to your computer and use it in GitHub Desktop.
Save wael34218/302ea9240782a6c628e0ce9cf5768c5a to your computer and use it in GitHub Desktop.
Type Complexity Weights per layer Sequential Operations Max path length
Self Attention O(l.n2.d) 3(d*dkey)+d2 +2*d*dforward O(1) O(1)
Restricted Self Attention O(l.r2.d *n/r) =

O(l.n.r.d)

3(d*dkey)+d2 +2*d*dforward O(1) O(n/r)
LSTM O(l.d2.n) 8d2 O(n) O(n)
GRU O(l.d2.n) 6d2 O(n) O(n)
CNN O(l (fin * kw * kh) * fout * (w*h)) (fin * kw * kh) * fout O(1) -
CNN 1D O(l.d(k * d) (n*1)) =

O(l.k.n.din.dout)

k din dout O(1) O(n/2k) No dialation

O(logkn) \w dialeation

f: Number of filters

n: Sequence length

d: Dimension size

l: Number of layers

k: Kernel size

w: Width

h: Height

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment