Skip to content

Instantly share code, notes, and snippets.

@avnerbarr
Created January 23, 2018 08:10
Show Gist options
  • Save avnerbarr/82fb20d25c9b040f2caca8cdcefe7fc2 to your computer and use it in GitHub Desktop.
Save avnerbarr/82fb20d25c9b040f2caca8cdcefe7fc2 to your computer and use it in GitHub Desktop.
64bit "murmur" hash in scala. The builtin murmur hash only outputs 32 bits so if hashing millions of items you will get quite a few conflicts. This method will allow hashing millions of items without any conflicts
// Simple function to generate 64 bit hashes using the builtin Murmur hash functionaliy which only outputs 32bits
// we had cases when hashing 15M+ items where we got around 0.08% conflicts
// using this method we were able to hash 15M items with 0 conflicts
object ExtendedMurmurHash {
val seed = 0xf7ca7fd2
def hash(u: String): Long = {
val a = scala.util.hashing.MurmurHash3.stringHash(u, seed)
val b = scala.util.hashing.MurmurHash3.stringHash(u.reverse.toString, seed)
// shift a 32 bits to the left, leaving 32 lower bits zeroed
// mask b with "1111...1" (32 bits)
// bitwise "OR" between both leaving a 64 bit hash of the string using Murmur32 on each portion
val c: Long = a.toLong << 32 | (b & 0xffffffffL)
c
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment