Skip to content

Instantly share code, notes, and snippets.

@micycle1
Created November 25, 2020 14:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save micycle1/5349fe1f5eab897cd8e8e10b569360cd to your computer and use it in GitHub Desktop.
Save micycle1/5349fe1f5eab897cd8e8e10b569360cd to your computer and use it in GitHub Desktop.
http://stereopsis.com/2div.html
Ben and 2 Divides
Two years ago, Ben Weiss (the guy who wrote Kai's Power Tools, Goo, Soap, etc.) was telling me about this idea he had to combine two divides. I just didn't pay attention.
Luckily, he told me again.
Ben's idea is really simple, and I use it everywhere now.
2 divides at once
Suppose you want to compute:
a/b AND c/d
In the real number system, this is the same as:
ad/bd AND cb/bd
In case this isn't getting through, that's:
inv := 1 / (b * d)
inv * a * d AND inv * b * d
Multiply instead
Of course, modern processors are LOTS faster at multiplies than divides. In fact, in optimal conditions, most of them can do a multiply per cycle, and a divide every 20-40 cycles. So an extra 5 multiplies spread over 2 divides should be worth it, right?
Well, I did some speed tests:
// This snippet computes c <- a / b, component-wise
for (int j = 0; j < n; j += 2) {
float ibd = 1.0f / (b[j] * b[j+1]);
c[j] = a[j] * b[j+1] * ibd;
c[j+1] = a[j+1] * b[j] * ibd;
}
If you put your P2's FPU in single-precision mode, this code runs in a blazing 9 CYCLES per divide (compared to 17 per divide normally), and you only drop about 1.5 bits in accuracy. If you're in double-precision, you can extend the technique to 3 divides per loop, and you still get even faster.
I've used this in texture mappers, perspective projection code (vertex transform, for example), and it's just really a wonder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment