Skip to content

Instantly share code, notes, and snippets.

@drwelby
Last active October 8, 2019 15:21
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save drwelby/5977601 to your computer and use it in GitHub Desktop.
Bitwise binary range tree sorting, a la SBN

The SBN spatial index is some sort of binary range tree. I've been pretty close in reproducing the indexing by using a tree method, where the input bounding box is sorted through a tree structure by alternately comparing min/max coordinate pair against each division.

However, it's always seemed like there would be some faster way to sort features based on looking at individual bits in the binary representation of the coordinate value.

For example, let's say we have a 4 bit index space ( a 16 x 16 grid). We're trying to bin a bounding box with a max coordinate of (11,9) and min of (8,7).

Using the tree method, we would first look at the first split which occurs at an x of 8. Since xmin is equal to or greater than 8 that would sort the bounding box into the right bin. Next we would look at a y value of 8 for the second division. In this case the y value splits the bounding box, so the box cannot be sorted further and remains in the third bin.

We can also do this bitwise. What we first do is reverse the binary representation of each point and interleave the coordinates:

max: (1011,1001) =>

                11010011

min: (1000,0111) =>

                10101001

If we AND them together, we get a bitmask(?) representing bin number

                10000001

If we XOR them together, we get a bitmask representing a skip flag

                01111010

To sort features, we would start with a list representing the first bin. Each feature object stores the feature ID, and the two calulated sort parameters: the bin number and the skip flag. We then iterate through the features, looking at the first bit of the two parameters. First we look at the skip flag. If it's 1, then the feature is spanning the dividing line and can not be sorted. So we would then move on to the next feature. If it's 0 then we would look at the bit in the bin number, which would tell us which of the two child bins to move the feature to. When all the features in the list have been sorted, we then move on to the next bin and iterate through all of its features.

So in this example, lets say we're sorting only the above feature, and we have a dictionary B to store the bins. Initially the feature is in B[0]. This is our first sort, so we look at the first bit. The skip flag is 0, so the feature is sortable. The first bin bit is 1, so the feature would be moved to the second bin of the second tree level, which is B[2]. We're done with sorting B[0], so we move to B[1]. It's empty, so we move to B[2]. Since we're in the second level, we look at the second skip bit. It's 1, so we skip the feature, and with no more features to sort and no more successive bins we're done.

The bins in SBN seem to split when they reach 8 features, so that's easy to accomplish in this implementation: before starting on a new bin, check the length of its contents. If it's under 8, skip to the next bin.

But you might say "Why are you walking down the bins just to uncover the bin bits? Why not just take in all the bits up to the skip bit?". And that's a good question. I'm not sure why SBN's bother holding features until a bin has 8 features before splitting.

Let's say you think that's dumb, but you're willing to accept the fixed size of the SBN tree (8 features or less on average per bin). So you can generate a level mask by taking the total number of features in binary, dividing by 8 and int'ing, and OR'ing it with itself bitshifted to left 1 bit 8 times. So if you had 165 features, that's 20, or 00010100. After the shifts and OR's, you get 00111111. If your feature didn't have a skip bit in any of those first 6 bits, you could OR the level mask with the bin mask and get the bin number.

Now how to deal with skip bits in the skip mask? We'll do something similar to generating the level mask, only we OR the bit mask with itself shifted left. So 01111010 becomes 11111110. Take the complement and you get 00000001, which when ANDed with the level mask gives you 00000001 which when ANDed with the bin mask gives you the bin number.

Next up is to code this up and see how it compares to 'official' SBN sorting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment