Skip to content

Instantly share code, notes, and snippets.

@B-Y-P
Last active June 19, 2023 05:35
Show Gist options
  • Save B-Y-P/5872dbaaf768c204480109007f64a915 to your computer and use it in GitHub Desktop.
Save B-Y-P/5872dbaaf768c204480109007f64a915 to your computer and use it in GitHub Desktop.

For simplicity, this will just go over the unsigned case
Written for someone, such as myself, who isn't as quick to put together the same logical steps

Given $y \in \mathbb{Z}$ where $y > 0$
Find $a,sh \in \mathbb{Z}$ such that $\forall x \in \mathbb{Z} \ (\ \lfloor \frac{x}{y} \rfloor = \lfloor x \cdot a / 2^{sh} \rfloor)$
To motivate what we'll be doing, lets start by considering:

$$ \begin{align*} \frac{x \cdot a}{2^{sh}} &= \frac{x}{y} \\ \frac{a}{2^{sh}} &= \frac{1}{y} \\ a &= \frac{2^{sh}}{y} \\ \end{align*} $$

What we want to show is that $a = \lfloor \frac{2^{sh}}{y} \rfloor$ can still be used to compute $\lfloor \frac{x}{y} \rfloor$ exactly
So, let $sh = w + \lceil \log_2y \rceil$ where w is our 'effective wordsize' in bits

From an implementation standpoint, we'd like to be able to write:

Recip(uint64_t y){
  uint64_t a, sh;
  // compute a, sh from y
  return (a, sh);
}

which would seem to imply $w = 64$, however we musn't be so hasty.
Given that $\lceil \log_2y \rceil \in \mathbb{Z}$ we have:

$$ \begin{alignedat}{3} \log_2y \ &\leq & \ \lceil \log_2y \rceil & &&< \ \log_2y + 1 \\ 2^{\log_2y} \ &\leq & \ 2^{\lceil \log_2y \rceil} & &&< \ 2^{\log_2y + 1} \\ 2^{\log_2y} \ &\leq & \ 2^{\lceil \log_2y \rceil} & &&< \ 2^{\log_2y} \cdot 2 \\ y \ &\leq & \ 2^{\lceil \log_2y \rceil} & &&< \ 2y \\ y \ &\leq & \ 2^{\lceil \log_2y \rceil} & &&\leq \ 2y-1 \quad \text{since } 2y,\ 2^{\lceil \log_2y \rceil} \in \mathbb{Z} \\ 2^w \cdot y \ &\leq & \ 2^w \cdot 2^{\lceil \log_2y \rceil} & &&\leq \ 2^w \cdot (2y-1) \\ 2^w \cdot y \ &\leq & \ 2^{w + \lceil \log_2y \rceil} & &&\leq \ 2^{w+1} \cdot y - 2^{w} \\ 2^w \cdot y \ &\leq & \ 2^{sh} & &&\leq \ 2^{w+1} \cdot y - 2^{w} \\ 2^w \ &\leq & \ \frac{2^{sh}}{y} & &&\leq \ 2^{w+1} - \frac{2^w}{y} \\ 2^w \ &\leq & \ a & &&\leq \ 2^{w+1} - 1 \quad \text{when } y \leq 2^w \\ \end{alignedat} $$

So what do we get from the bounds $2^w \leq a \leq \ 2^{w+1} - 1$?
Well, if we know $a$ fits in $w+1$ bits, and since we want to store $a$ in a uint64_t $\Rightarrow w+1 = 64 \Rightarrow w = 63$
Note that above we restricted $y \leq 2^w \equiv y \leq 2^{63}$ which is no longer true for all values uint64_t
If that's a deal breaker, then you'll have to implement the extended precision case for $a$ as an exercise for the reader ;)

Okay, so we know how to find $a,sh$, but how do we know that they compute $\lfloor \frac{x}{y} \rfloor$ exactly?
Well, we're going to have to be a bit more precise with what those $\lfloor ... \rfloor$ and $\lceil ... \rceil$ are doing

$$ \begin{alignedat}{3} \text{let } q &= \lfloor \frac{x}{y} \rfloor &&\Rightarrow \exists \ r_q \in \mathbb{Z} & \ x &&= q \cdot y + r_q & \quad \text{where } 0 \leq r_q < y \\ \text{let } a &= \lceil \frac{2^{sh}}{y} \rceil &&\Rightarrow \exists \ r_a \in \mathbb{Z} & \quad 2^{sh} &&= a \cdot y - r_a & \quad \text{where } 0 \leq r_a < y \\ \end{alignedat} $$

Note, $\lfloor ... \rfloor$ makes $q$ smaller, so $r_q$ is added on to compensate
while $\lceil ... \rceil$ makes $a$ larger, so $r_a$ is taken away to compensate

Now, if we can show that $r_q$ and $r_a$ make no difference when calculating $q = \lfloor \frac{x}{y} \rfloor = \lfloor x \cdot a / 2^{sh} \rfloor$ then we're done :)
But first, we'll get the easy case out of the way. When $x=0$ it should be clear that $q = \lfloor \frac{0}{y} \rfloor = \lfloor 0 \cdot a / 2^{sh} \rfloor = 0$
So for the rest of the proof can make use of the case where $x \geq 1$

$$ \begin{alignedat}{3} \text{let } Q &= &&\lfloor x \cdot a \ /\ 2^{sh} \rfloor \\ &= &&\lfloor (q \cdot y + r_q) \cdot a \ /\ 2^{sh} \rfloor \\ &= &&\lfloor (q \cdot a y + a\ r_q) \ /\ 2^{sh} \rfloor \\ &= &&\lfloor (q \cdot (2^{sh} + r_a) + a\ r_q) \ /\ 2^{sh} \rfloor \\ &= &&\lfloor (q \cdot 2^{sh} + q \cdot r_a + a\ r_q) \ /\ 2^{sh} \rfloor \\ &= \ q \ + \ &&\lfloor (q\ r_a + a\ r_q) \ /\ 2^{sh} \rfloor \\ \end{alignedat} $$

To guarantee that $Q = q$ we need to show that $0 \leq q\ r_a + a\ r_q &lt; 2^{sh}$
It should come to no surprise that this combination of non-negative values is non-negative, so all that's left is the upper bound

$$ \begin{alignedat}{3} & r_a &&< y \\ & r_a &&\leq y-1 \\ & q \ r_a &&\leq q\ y - q \\ & q \ r_a &&\leq (x - r_q) - q \\ & q \ r_a &&\leq (x - q) - r_q \\ \end{alignedat} $$

$$ \text{let } 1 \leq x \leq 2^w \ \text{along with } y \leq 2^w \ \land \ q = \lfloor\frac{x}{y}\rfloor \\ \Rightarrow \ q>0 \ \Rightarrow \ (x-q) \ <\ x\ \leq \ 2^w $$

$$ \begin{alignedat}{3} & \ q \ r_a &&\leq (x - q) - r_q \\ & \ q \ r_a &&< (2^w) - r_q \\ & \ q \ r_a \ + \ a \ r_q &&< 2^w \ + \ a \ r_q - r_q \\ & \ q \ r_a \ + \ a \ r_q &&< 2^w \ + \ (a-1) \ r_q \\ & \ q \ r_a \ + \ a \ r_q &&< 2^w \ + \ (a-1) \ (y-1) \\ & \ q \ r_a \ + \ a \ r_q &&< 2^w \ + \ ay - a - y + 1 \\ & \ q \ r_a \ + \ a \ r_q &&< 2^w \ + \ (2^{sh}+r_a) - a - y + 1 \\ & \ q \ r_a \ + \ a \ r_q &&< 2^w \ + \ 2^{sh}+(y-1) - a - y + 1 \\ & \ q \ r_a \ + \ a \ r_q &&< 2^w \ + \ 2^{sh} - a \\ & \ q \ r_a \ + \ a \ r_q &&< 2^{sh} \ + \ 2^w - a \\ & \ q \ r_a \ + \ a \ r_q &&< 2^{sh} \quad \text{since } 2^w \leq a \quad \text{might have to scroll up a bit ;)} \\ \end{alignedat} $$


All right, that's enough math, time to get a bit gung-ho with writing the code for it

Recip(uint64_t y){
  assert(y <= (1ull << 63));    // Ensure y <= 2^63
  
  uint64_t sh = 0;
  while(y > (1ull << sh)){ sh++; }         // sh' = ceil(log2(y))
  sh += 63;                                // sh  = ceil(log2(y)) + w
  
  uint64_t a = ((1ull << sh) + y-1) / y    //  a  = ceil(2^sh / y) = floor((2^sh + y-1) / y)
  return (a, sh);
}

FastDivExamp(uint64_t[] x, uint64_t y){
  uint64_t a, sh;
  a, sh = Recip(y);
  for(i, 0, x.count){
    //x[i] = x[i] / y;        // q = floor(x/y)
    x[i] = (x[i]*a) >> sh;    // Q = (x*a) / 2^sh
  }
}

Ahhh, such an elegant mapping from theory to implementation.
Unfortunately, it fails the smell test for anyone paying attention :P

If we tried plugging in y = 8, we'd get sh = 3 + 63 = 66
So, pray tell, what do we expect to get from either (1ull << sh) or (x*a)?

Instead, we'll explicitly handle the pseudo uint128_t operations using a 64:64 tuple
This has the benefit where if we have (v1:v0) then returning v1 is equivalent to (v1:v0) >> 64 which we can bake into sh
So the only hurdle left is calculating ((1<<sh) + y-1) / y

We have a convenient mapping from (1<<sh) + (y-1) to (v1:v0)
v0 = y-1 when $y-1 &lt; 2^{64}$
v1 = 1 << (sh-64) when $2^{64} \leq 2^{sh} &lt; 2^{128}$
The condition for v1 is only true when $64 \leq sh \ \Leftrightarrow \ 64 \leq 63 + \lceil \log_2y \rceil \ \Rightarrow \ 1 \leq \lceil \log_2y \rceil \ \Rightarrow \ 2 \leq y$
Now, for the case when $y=1$, I propose another algorithm. $\forall x \in \mathbb{Z} \ (\ \lfloor \frac{x}{1} \rfloor = x)$

Finally, we can take our (v1,v0) tuple and pass it to the x64 DIV
While this can raise an exception if the result won't fit within a uint64_t, we've already proven that $a \leq \ 2^{w+1} - 1$

The same idea can be used with x64 MUL and only using the high 64 bits to compute the aforementioned (v1:v0) >> 64

Recip(uint64_t y){
  assert(1 < y && y <= (1ull << 63));    // Ensure 1 < y <= 2^63
  
  uint64_t sh = 0;
  while(y > (1ull << sh)){ sh++; }         // sh' = ceil(log2(y))
  sh += 63 - 64;                           // sh  = ceil(log2(y)) + w - 64
  
  uint64_t a = Div128(1ull<<sh, y-1, y);   //  a  = ceil(2^sh / y) = floor((2^sh + y-1) / y)
  return (a, sh);
}

FastDivExamp(uint64_t[] x, uint64_t y){
  uint64_t a, sh;
  a, sh = Recip(y);
  for(i, 0, x.count){
    //x[i] = x[i] / y;                // q = floor(x/y)
    x[i] = MulHi64(x[i], a) >> sh;    // Q = (x*a) / 2^sh
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment