creationix/Hash map Binary Tree hybrid.md

## Hash map Binary Tree hybrid.md

      
    Raw
  

              Hash map Binary Tree hybrid.md
            
          
    I've come up with an interesting design for a hash map that instead of bucket lists, recursivly embeds more hash maps, but with a new key for each level.
Each hash map is super simple with two slots and hashe keys are 1 bit.  However, the initial value is hashed to a 32-bit integer where each bit in the number is the key for each level in the recursive hash map.
In C, my structure looks like:
struct hash_tree_node {
  void *key;
  void *value;
  struct hash_tree_node *left;
  struct hash_tree_node *right;
}
In other words, this is shaped exactly like a binary tree.
Algorithm for setting a value, starting at the root node.

If the slot is free or matches our key, use it.
otherwise:

if the hash ends in 1, recurse on left (shift the hash right 1 bit to get new final bit)
else recurse on right (shift the same here)


As you can see, finding a free slot is O(log n).  Each hash hey gives you a unique pseudo-random search path to traverse the binary tree.
I optimize a little with a top node that's special and records a running total of all value nodes for quick length checks.
struct hash_tree {
  struct hash_tree_node *root;
  uintptr_t count;
}
Reading a value from the hash is also O(log n) cost as we must walk the tree using the hashed bits.
Deleting a value has similar cost.  Once found, we simply set key to null and the set algorithm knows it can reuse this slot instead of looking deeper.
Iterating over this can be done without recursion or a stack by using morris traversal.

Is there a name for this structure?
How does it compare to traditional hash maps and balanced binary trees?
Note that while this is shaped like a binary tree, there is no balancing.  Nodes can be empty and reused later.  Assuming a good integer hash for the bitfield used for the path, the tree should naturally stay balanced.
Since my bitpath is 32 bits long, if you happened to store more than 4 billion unique keys in this, the logic will turn into linear chains.  But there will be 4,294,967,296 of them so it shouldn't matter unless you went way past this value.
Of course if you're storing that many keys in my simple hash/tree, you're probably using it wrong.  You could simply use a 64-bit hash key instead to prevent the poor performance for trees deeper than 32 levels.  Most likely you need to rethink your solution a further back.