Skip to content

Instantly share code, notes, and snippets.

@pervognsen
Last active May 16, 2020 00:02
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pervognsen/c3b2365b0696d20f6ff3d4abe1b1ec87 to your computer and use it in GitHub Desktop.
Save pervognsen/c3b2365b0696d20f6ff3d4abe1b1ec87 to your computer and use it in GitHub Desktop.

Let's say you have to perform a 4-way case analysis and are given a choice between cascaded 2-way branches or a single 4-way branch.

What's the difference in branch misprediction penalties, assuming the 4 cases are random?

Denote the branch penalty by x (which is 15-20 cycles on Sky Lake according to Agner Fog).

With the 4-way branch, the worst-case branch penalty is x. The expected branch penalty is 3/4 x: there's a 3/4 chance of x penalty.

With the cascaded 2-way branches, the worst-case branch penalty is 2x. The expected branch penalty is 1/2 x + 1/4 2x = x: there's a 1/4 chance of 2x penalty, and there's a 1/4 + 1/4 = 1/2 chance of x penalty.

A difference of 1/4 x for x = 20 cycles is 5 cycles. If you want a 4-way branch implemented with a jump table to win, you need to keep the fixed overhead down to less than 5 cycles, at least in this toy model. An L1-hitting load with a complex addressing mode is 5 cycles on Sky Lake, but instructions that only feed into a branch instruction aren't part of the subsequent dependency chain as far as latency goes, so it isn't as simple as adding up the latencies.

Now suppose we extend this comparison to a n-way case analysis where n = 2^k. The expected penalty for the multi-way branch is (n-1)/n x, which is ~x for larger n. The expected penalty for the cascaded 2-way branches is given by the recurrence relation f(n) = 1/2 x + f(n/2), f(1) = 0, which has the closed-form solution f(n) = k/2 x.

There isn't really any grand conclusion from this other than a more quantitative assessment of how multi-way branches widen the gap as n increases. I'm just trying to see if there's some useful napkin math for thinking about different branch structures for case analysis. E.g. how skewed does the case distribution need to be before you should pull out the most frequent cases from the multi-way branch into leading 2-way branches. The easy answer is that you should test it on real hardware with real datasets and pick the winner, but it would be nice to have a mental model that captures the major factors in simpler instances.

@pervognsen
Copy link
Author

The reason I kept the analysis to the simplest case of fully random cases is that it seems too hard to analyze with napkin math beyond that point given how modern branch predictors work. You can see some examples in this thread: https://twitter.com/pervognsen/status/1256369404817715200

@pervognsen
Copy link
Author

pervognsen commented May 8, 2020

I just looked at the code you linked. I'm not sure how it would work out in practice given your use case, but all my code relies heavily on unaligned reads and writes for exactly this reason. You can see an example here: https://gist.github.com/pervognsen/12d358c8fdd531d21fbccdca251fc2cd#file-spaceskip-c-L15. You can usually ensure that the data you're reading has sufficient end-padding to avoid the issue with unowned memory. For something like a serialization format where you own both the write side and the read side, it's especially easy; for something like a programming language lexer as in the above example, it's a little trickier (but usually doable with some trickery) if you want to use memory-mapped file IO or read user data directly.

@aardappel
Copy link

@pervognsen thanks, I agree unaligned reads would be the best way to implement this, as I mentioned in the comments. Problem is, this is in a very general library that reads data from any (pointer + size_t) pair, so strictly it is not possible to guarantee an unaligned read does not touch a page boundary, unless I explicitly specified it is up to the client to guarantee this is not the case, which is tricky. Maybe I could make it an #ifdef option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment