Type | Complexity | Weights per layer | Sequential Operations | Max path length |
---|---|---|---|---|
Self Attention | O(l.n2.d) | 3(d*dkey)+d2 +2*d*dforward | O(1) | O(1) |
Restricted Self Attention | O(l.r2.d *n/r) =
O(l.n.r.d) |
3(d*dkey)+d2 +2*d*dforward | O(1) | O(n/r) |
LSTM | O(l.d2.n) | 8d2 | O(n) | O(n) |
GRU | O(l.d2.n) | 6d2 | O(n) | O(n) |
CNN | O(l (fin * kw * kh) * fout * (w*h)) | (fin * kw * kh) * fout | O(1) | - |
CNN 1D | O(l.d(k * d) (n*1)) =
O(l.k.n.din.dout) |
k din dout | O(1) | O(n/2k) No dialation
O(logkn) \w dialeation |
f: Number of filters
n: Sequence length
d: Dimension size
l: Number of layers
k: Kernel size
w: Width
h: Height