Instantly share code, notes, and snippets.

Last active February 25, 2023 21:01
Show Gist options
• Save danbst/fdf604ae279f9e01c8a28f1f84f9876e to your computer and use it in GitHub Desktop.
Matrix multiplication in logarithmic space. Incoming matrices are logs of their values, result values are logs of real values. The benefit is that multiplication is not used

# Matrix multiplication without multiplication

The basic idea is to transform dot product into log form:

``````log(a.b) = log(a1*b1 + a2*b2 + ... + an*bn)
= log(exp(log a1+log b1) + exp(log a2+log b2) + ... + exp(log an+log bn))
``````

and then apply something like Horner's method, but for exp-s, so result looks like (I'll simplify log a1 to la1, log a2 to la2, ...):

``````  = log(exp(la1+lb1) * (1+ exp(la2+lb2-la1-lb1)*(1+ exp(la3+lb3-la2-lb2)*(1+ ...) ))
``````

and then cancel exp/log:

``````  = la1+lb1 + log(1+ exp(la2+lb2-la1-lb1)*(1+ exp(la3+lb3-la2-lb2)*(1+ ...) )
``````

and then expand log(1+x) as log(x) + addenLogPlus1(x) (which is a new function we'll talk about later):

``````   (D1 = exp(la2+lb2-la1-lb1)*(1+ exp(la3+lb3-la2-lb2)*(1+ ...))

= la1+lb1 + log(1+ D1) = la1+lb1 + log D1 + addenLogPlus1(D1)
``````

I've added D1 variable to capture the "rest". But as we see, it is not computable. Instead, we should go "backwards", from the inner most entry:

``````  D[n] = lan+lbn - la[n-1]-la[n-1]
D[n-1] = la[n-1]+lb[n-1] - la[n-2]-la[n-2] + D[n] + addenLogPlus1(exp(D[n]))
D[n-2] = la[n-2]+lb[n-2] - la[n-3]-la[n-3] + D[n-1] + addenLogPlus1(exp(D[n-1]))
...
``````

Then D[0] will contain log value of our a.b

Now, back to addenLogPlus1. This is a simple function:

``````   addenLogPlus1(x) = log(1+x) - log(x)
``````

But we can modify it and require input to be log:

``````   addenLogPlus1(x) = log(1+exp(x)) - x
``````

This way we remove exp from D[..] computation.

``````  D[n] = lan+lbn - la[n-1]-lb[n-1]
D[n-1] = la[n-1]+lb[n-1] - la[n-2]-lb[n-2] + D[n] + addenLogPlus1(D[n])
D[n-2] = la[n-2]+lb[n-2] - la[n-3]-lb[n-3] + D[n-1] + addenLogPlus1(D[n-1])
...
D[0] = la[0] + lb[0] + D[1] + addenLogPlus1(D[1])
``````

To make it multiplication free we can get rid of `addenLogPlus1` by... converting it to lookup table! For this we stick with float16 datatype and convert the function into lookup table. Without much thinking, I interpret float16 bitwise as uint16 and use as an index in lookup table. Table size is 64K, which doesn't create much overhead:

``````D = la[n]+lb[n] - la[n-1]-lb[n-1]
loop i from n-1 to 0:
D = D + la[i]+lb[i] - la[i-1]-lb[i-1] + lookup_table[D]
return D
``````

And dot product is done! In similar way, I can do matrix multiplication by utilizing `numpy` arrays. The code is available below.

## Extra thoughts

Even though this is slower than built-in `np.dot` (or GPU-based multiplication), this poses a question:

• Would this be faster than normal multiplication if this is implemented using vectorized intrinsics?
• Can the weights of matrices for large neural model be stored in log space and processed with this algorithm?
• Do we even need "multiplication" for large neural model inferences? Do we even need to leave log space?
• Current TPUs dedicate large processor space to efficient multiplication schemes, maybe it can be traded to addition for log-space matrices?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 import numpy as np from math import log, exp def addenLogPlus1(x): x = min(500, max(-500, x)) # clamping too big and too small values return log(exp(x)+1) - x # lookup table for every float16 x -> log(exp(x) + 1) - x plus1_lookup = np.vectorize(addenLogPlus1)(np.arange(2**16, dtype='int16').view('