Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
<style>
body {
font-family: monospace;
font-size: 16px;
color: #202122;
white-space: pre-wrap;
}
.nibble-sep {
margin: 0 -.15em;
font-size: 0.8em;
}
.line {
display: inline-block;
width: 24em;
border-top: solid 1px black;
vertical-align: 0.25em;
}
.red { color: #b32424; }
.blue { color: #2a4b8d; }
.green { color: #14866d; }
.brown { color: #ac6600; }
</style>
This is based on <a href="https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation">the Wikipedia example</a>, <a href="https://stackoverflow.com/a/109025/362030">the StackOverflow thread</a>, and <a href="https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel">Bit Twiddling Hacks</a> by Sean Anderson.
Note that in practice, modern CPUs have a dedicated instruction for this, interoperable enough that <a href="https://github.com/sunfishcode/wasm-reference-manual/blob/master/WebAssembly.md#integer-population-count">it's even in WebAssembly</a>. <a href="https://vaibhavsagar.com/blog/2019/09/08/popcount/">This post</a> explains what this instruction is useful for. (A variation of this parallelization trick can also be used for <a href="https://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel">reversing the bits in a word</a> (<a href="https://stackoverflow.com/a/746203/362030">see also StackOverflow</a>), which has a dedicated CPU instruction in ARM but not x86, so maybe that's more useful.)
As an example, consider 1,825,859,237<sub>10</sub> = 0110<span class="nibble-sep">&sdot;</span>1100 1101<span class="nibble-sep">&sdot;</span>0100 0110<span class="nibble-sep">&sdot;</span>0110 1010<span class="nibble-sep">&sdot;</span>0101<sub>2</sub>, which has 16 ones.
(For readability, the bytes are separated by spaces and the nibbles are separated by dots.)
We can of course go through and count them one-by-one, but there's a clever trick where we count them in parallel.
Sketch:
First, to count the ones in each pair of bits AB, we just want to do A+B. If we mask, we can pull out the Bs; if we shift and mask, we can pull out the As; then we can add them:
0110<span class="nibble-sep">&sdot;</span>1100 1101<span class="nibble-sep">&sdot;</span>0100 0110<span class="nibble-sep">&sdot;</span>0110 1010<span class="nibble-sep">&sdot;</span>0101
&amp; <span class="blue">0101<span class="nibble-sep">&sdot;</span>0101 1101<span class="nibble-sep">&sdot;</span>0101 0101<span class="nibble-sep">&sdot;</span>0101 0101<span class="nibble-sep">&sdot;</span>0101</span>
<span class="line"></span>
<span class="green">0100<span class="nibble-sep">&sdot;</span>0100 0101<span class="nibble-sep">&sdot;</span>0100 0100<span class="nibble-sep">&sdot;</span>0100 0000<span class="nibble-sep">&sdot;</span>0101</span>
0110<span class="nibble-sep">&sdot;</span>1100 1101<span class="nibble-sep">&sdot;</span>0100 0110<span class="nibble-sep">&sdot;</span>0110 1010<span class="nibble-sep">&sdot;</span>0101 &gt;&gt; 1
= <span class="red">0011<span class="nibble-sep">&sdot;</span>0110 0110<span class="nibble-sep">&sdot;</span>1010 0011<span class="nibble-sep">&sdot;</span>0011 0101<span class="nibble-sep">&sdot;</span>0010</span>
<span class="red">0011<span class="nibble-sep">&sdot;</span>0110 0110<span class="nibble-sep">&sdot;</span>1010 0011<span class="nibble-sep">&sdot;</span>0011 0101<span class="nibble-sep">&sdot;</span>0010</span>
&amp; <span class="blue">0101<span class="nibble-sep">&sdot;</span>0101 1101<span class="nibble-sep">&sdot;</span>0101 0101<span class="nibble-sep">&sdot;</span>0101 0101<span class="nibble-sep">&sdot;</span>0101</span>
<span class="line"></span>
<span class="brown">0001<span class="nibble-sep">&sdot;</span>0100 0100<span class="nibble-sep">&sdot;</span>0000 0001<span class="nibble-sep">&sdot;</span>0001 0101<span class="nibble-sep">&sdot;</span>0000</span>
Then for each pair of 2-bit counts, we can do the same trick, mask with 0011<span class="nibble-sep">&sdot;</span>0011 0011<span class="nibble-sep">&sdot;</span>0011 0011<span class="nibble-sep">&sdot;</span>0011 0011<span class="nibble-sep">&sdot;</span>0011 to pull out every other 2-bit count, then shift and mask to pull out the <i>other</i> every other 2-bit counts, and add them, then repeat with 0000<span class="nibble-sep">&sdot;</span>1111 0000<span class="nibble-sep">&sdot;</span>1111 0000<span class="nibble-sep">&sdot;</span>1111 0000<span class="nibble-sep">&sdot;</span>1111, etc:
x = (x & 0x55555555) + ((x >> 1) & 0x55555555)
x = (x & 0x33333333) + ((x >> 2) & 0x33333333)
x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F)
x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF)
x = (x & 0x0000FFFF) + ((x >> 16) & 0x0000FFFF)
This is pretty cool, but there's 3 tricks we can do to even further optimize this:
Trick 1: We can simplify the first step because for just two bits, instead of (x & 01) + ((x >> 1) & 01), it so happens that you can just do x - ((x >> 1) & 01); this easy to verify by trying all 4 values of 2 bits (0, 1, 2, 3).
Trick 2: We can simplify the third step by adding before masking, rather than two separate masks before adding, because in the maximum of 8 possible ones easily fits in 4 bits. (Can't do this in earlier steps because, for example, the maximum of 4 possible ones does not fit in 2 bits.)
Trick 3: We can simplify the fourth and later steps by noticing that the maximum possible total of 32 ones easily fits in 8 bits, so we can just multiply by 0x01010101, which is equivalent to x + (x&lt;&lt;8) + (x&lt;&lt;16) + (x&lt;&lt;24).
End result:
x = x - ((x >> 1) & 0x55555555); // Trick 1
x = (x & 0x33333333) + ((x >> 2) & 0x33333333); // original
x = (x + (x >> 4)) & 0x0F0F0F0F; // Trick 2
x = x * 0x01010101; // Trick 3
@laughinghan
Copy link
Author

laughinghan commented Jun 9, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment