B-Y-P/reciprocal.md

## reciprocal.md

      
    Raw
  

              reciprocal.md
            
          
    R. Alverson - Integer Division Using Reciprocals

For simplicity, this will just go over the unsigned case

Written for someone, such as myself, who isn't as quick to put together the same logical steps
Given $y \in \mathbb{Z}$ where $y &gt; 0$

Find $a,sh \in \mathbb{Z}$ such that $\forall x \in \mathbb{Z} \ (\ \lfloor \frac{x}{y} \rfloor = \lfloor x \cdot a / 2^{sh} \rfloor)$

To motivate what we'll be doing, lets start by considering:
$$
\begin{align*}
\frac{x \cdot a}{2^{sh}} &= \frac{x}{y}      \\
\frac{a}{2^{sh}}         &= \frac{1}{y}      \\
a &= \frac{2^{sh}}{y} \\
\end{align*}
$$
What we want to show is that $a = \lfloor \frac{2^{sh}}{y} \rfloor$ can still be used to compute $\lfloor \frac{x}{y} \rfloor$ exactly

So, let $sh = w + \lceil \log_2y \rceil$ where w is our 'effective wordsize' in bits
From an implementation standpoint, we'd like to be able to write:
Recip(uint64_t y){
  uint64_t a, sh;
  // compute a, sh from y
  return (a, sh);
}
which would seem to imply $w = 64$, however we musn't be so hasty.

Given that $\lceil \log_2y \rceil \in \mathbb{Z}$ we have:
$$
\begin{alignedat}{3}
\log_2y  \  &\leq & \              \lceil \log_2y \rceil       & &&<    \     \log_2y + 1          \\
2^{\log_2y} \  &\leq & \           2^{\lceil \log_2y \rceil}      & &&<    \  2^{\log_2y + 1}         \\
2^{\log_2y} \  &\leq & \           2^{\lceil \log_2y \rceil}      & &&<    \  2^{\log_2y} \cdot 2     \\
y  \  &\leq & \           2^{\lceil \log_2y \rceil}      & &&<    \  2y                      \\
y  \  &\leq & \           2^{\lceil \log_2y \rceil}      & &&\leq \  2y-1  \quad \text{since } 2y,\ 2^{\lceil \log_2y \rceil} \in \mathbb{Z}  \\
2^w \cdot y  \  &\leq & \ 2^w \cdot 2^{\lceil \log_2y \rceil}      & &&\leq \  2^w \cdot (2y-1)        \\
2^w \cdot y  \  &\leq & \           2^{w + \lceil \log_2y \rceil}  & &&\leq \  2^{w+1} \cdot y - 2^{w} \\
2^w \cdot y  \  &\leq & \           2^{sh}                         & &&\leq \  2^{w+1} \cdot y - 2^{w} \\
2^w \  &\leq & \     \frac{2^{sh}}{y}                     & &&\leq \  2^{w+1} - \frac{2^w}{y} \\
2^w \  &\leq & \     a                                    & &&\leq \  2^{w+1} - 1 \quad \text{when } y \leq 2^w \\
\end{alignedat}
$$
So what do we get from the bounds $2^w \leq a \leq \ 2^{w+1} - 1$?

Well, if we know $a$ fits in $w+1$ bits, and since we want to store $a$ in a uint64_t $\Rightarrow w+1 = 64 \Rightarrow w = 63$

Note that above we restricted $y \leq 2^w \equiv y \leq 2^{63}$ which is no longer true for all values uint64_t

If that's a deal breaker, then you'll have to implement the extended precision case for $a$ as an exercise for the reader ;)
Okay, so we know how to find $a,sh$, but how do we know that they compute $\lfloor \frac{x}{y} \rfloor$ exactly?

Well, we're going to have to be a bit more precise with what those $\lfloor ... \rfloor$ and $\lceil ... \rceil$ are doing
$$
\begin{alignedat}{3}
\text{let } q &= \lfloor \frac{x}{y} \rfloor &&\Rightarrow \exists \ r_q \in \mathbb{Z} & \ x &&= q \cdot y + r_q & \quad \text{where } 0 \leq r_q < y  \\
\text{let } a &= \lceil \frac{2^{sh}}{y} \rceil &&\Rightarrow \exists \ r_a \in \mathbb{Z} & \quad 2^{sh} &&= a \cdot y - r_a & \quad \text{where } 0 \leq r_a < y  \\
\end{alignedat}
$$
Note, $\lfloor ... \rfloor$ makes $q$ smaller, so $r_q$ is added on to compensate

while $\lceil ... \rceil$ makes $a$ larger, so $r_a$ is taken away to compensate
Now, if we can show that $r_q$ and $r_a$ make no difference when calculating $q = \lfloor \frac{x}{y} \rfloor = \lfloor x \cdot a / 2^{sh} \rfloor$ then we're done :)

But first, we'll get the easy case out of the way. When $x=0$ it should be clear that $q = \lfloor \frac{0}{y} \rfloor = \lfloor 0 \cdot a / 2^{sh} \rfloor = 0$

So for the rest of the proof can make use of the case where $x \geq 1$
$$
\begin{alignedat}{3}
\text{let } Q &=           &&\lfloor                x   \cdot a              \ /\ 2^{sh} \rfloor \\
&=           &&\lfloor (q \cdot y + r_q) \cdot a               \ /\ 2^{sh} \rfloor \\
&=           &&\lfloor (q \cdot a y                  + a\ r_q) \ /\ 2^{sh} \rfloor \\
&=           &&\lfloor (q \cdot (2^{sh} + r_a)       + a\ r_q) \ /\ 2^{sh} \rfloor \\
&=           &&\lfloor (q \cdot 2^{sh} + q \cdot r_a + a\ r_q) \ /\ 2^{sh} \rfloor \\
&= \ q \ + \ &&\lfloor (q\ r_a + a\ r_q) \ /\ 2^{sh} \rfloor \\
\end{alignedat}
$$
To guarantee that $Q = q$  we need to show that $0 \leq q\ r_a + a\ r_q &lt; 2^{sh}$

It should come to no surprise that this combination of non-negative values is non-negative, so all that's left is the upper bound
$$
\begin{alignedat}{3}
& r_a &&<    y                       \\
& r_a &&\leq y-1                     \\
& q \ r_a &&\leq q\ y - q            \\
& q \ r_a &&\leq (x - r_q) - q       \\
& q \ r_a &&\leq (x - q) - r_q       \\
\end{alignedat}
$$
$$
\text{let } 1 \leq x \leq 2^w \ \text{along with } y \leq 2^w \ \land \ q = \lfloor\frac{x}{y}\rfloor \\
\Rightarrow \  q>0 \  \Rightarrow \  (x-q) \ <\ x\ \leq \ 2^w
$$
$$
\begin{alignedat}{3}
& \ q \ r_a               &&\leq (x - q) - r_q           \\
& \ q \ r_a               &&<     (2^w) - r_q           \\
& \ q \ r_a \ + \ a \ r_q &&< 2^w  \ + \ a \ r_q - r_q   \\
& \ q \ r_a \ + \ a \ r_q &&< 2^w  \ + \ (a-1) \ r_q     \\
& \ q \ r_a \ + \ a \ r_q &&< 2^w  \ + \ (a-1) \ (y-1)   \\
& \ q \ r_a \ + \ a \ r_q &&< 2^w  \ + \ ay - a - y + 1  \\
& \ q \ r_a \ + \ a \ r_q &&< 2^w  \ + \ (2^{sh}+r_a) - a - y + 1  \\
& \ q \ r_a \ + \ a \ r_q &&< 2^w  \ + \ 2^{sh}+(y-1) - a - y + 1  \\
& \ q \ r_a \ + \ a \ r_q &&< 2^w  \ + \ 2^{sh} - a \\
& \ q \ r_a \ + \ a \ r_q &&< 2^{sh}  \ + \ 2^w - a \\
& \ q \ r_a \ + \ a \ r_q &&< 2^{sh}  \quad \text{since } 2^w \leq a \quad \text{might have to scroll up a bit ;)} \\
\end{alignedat}
$$

All right, that's enough math, time to get a bit gung-ho with writing the code for it
Recip(uint64_t y){
  assert(y <= (1ull << 63));    // Ensure y <= 2^63
  
  uint64_t sh = 0;
  while(y > (1ull << sh)){ sh++; }         // sh' = ceil(log2(y))
  sh += 63;                                // sh  = ceil(log2(y)) + w
  
  uint64_t a = ((1ull << sh) + y-1) / y    //  a  = ceil(2^sh / y) = floor((2^sh + y-1) / y)
  return (a, sh);
}

FastDivExamp(uint64_t[] x, uint64_t y){
  uint64_t a, sh;
  a, sh = Recip(y);
  for(i, 0, x.count){
    //x[i] = x[i] / y;        // q = floor(x/y)
    x[i] = (x[i]*a) >> sh;    // Q = (x*a) / 2^sh
  }
}
Ahhh, such an elegant mapping from theory to implementation.

Unfortunately, it fails the smell test for anyone paying attention :P
If we tried plugging in y = 8, we'd get sh = 3 + 63 = 66

So, pray tell, what do we expect to get from either (1ull << sh) or (x*a)?
Instead, we'll explicitly handle the pseudo uint128_t operations using a 64:64 tuple

This has the benefit where if we have (v1:v0) then returning v1 is equivalent to (v1:v0) >> 64 which we can bake into sh

So the only hurdle left is calculating ((1<<sh) + y-1) / y
We have a convenient mapping from (1<<sh) + (y-1) to (v1:v0)

v0 = y-1 when $y-1 &lt; 2^{64}$

v1 = 1 << (sh-64) when $2^{64} \leq 2^{sh} &lt; 2^{128}$

The condition for v1 is only true when $64 \leq sh \  \Leftrightarrow \ 64 \leq 63 + \lceil \log_2y \rceil \ \Rightarrow \ 1 \leq \lceil \log_2y \rceil \ \Rightarrow \ 2 \leq y$

Now, for the case when $y=1$, I propose another algorithm. $\forall x \in \mathbb{Z} \ (\ \lfloor \frac{x}{1} \rfloor = x)$
Finally, we can take our (v1,v0) tuple and pass it to the x64 DIV

While this can raise an exception if the result won't fit within a uint64_t, we've already proven that $a \leq \ 2^{w+1} - 1$
The same idea can be used with x64 MUL and only using the high 64 bits to compute the aforementioned (v1:v0) >> 64
Recip(uint64_t y){
  assert(1 < y && y <= (1ull << 63));    // Ensure 1 < y <= 2^63
  
  uint64_t sh = 0;
  while(y > (1ull << sh)){ sh++; }         // sh' = ceil(log2(y))
  sh += 63 - 64;                           // sh  = ceil(log2(y)) + w - 64
  
  uint64_t a = Div128(1ull<<sh, y-1, y);   //  a  = ceil(2^sh / y) = floor((2^sh + y-1) / y)
  return (a, sh);
}

FastDivExamp(uint64_t[] x, uint64_t y){
  uint64_t a, sh;
  a, sh = Recip(y);
  for(i, 0, x.count){
    //x[i] = x[i] / y;                // q = floor(x/y)
    x[i] = MulHi64(x[i], a) >> sh;    // Q = (x*a) / 2^sh
  }
}