Skip to content

Instantly share code, notes, and snippets.

@joeldrapper
Created May 13, 2024 12:09
Show Gist options
  • Save joeldrapper/e5e345f8904f93d7a602677ccba4722c to your computer and use it in GitHub Desktop.
Save joeldrapper/e5e345f8904f93d7a602677ccba4722c to your computer and use it in GitHub Desktop.
Murmur hash in pure Ruby
module Murmur
MASK32 = 0xffffffff
def self.hash(str, seed = 0)
# Initialize variables
# h1: The main hash value that will be iteratively updated
# k1: A temporary value used to process chunks of 4 bytes
# i: Counter to keep track of the number of bytes processed
h1 = seed
k1 = 0
i = 0
# Iterate over each byte of the input string
str.each_byte do |byte|
# Combine the current byte with the existing k1 value
# This is done to build a chunk of 4 bytes
k1 |= byte << (i * 8)
i += 1
# Process the combined bytes when i reaches 4
# This ensures that the hash is computed in chunks of 4 bytes
next unless i == 4
# Perform bitwise operations on k1
# These operations are designed to mix and scramble the bits of k1
# The constants 0xcc9e2d51 and 0x1b873593 are chosen for optimal mixing
k1 = (k1 * 0xcc9e2d51) & MASK32
k1 = ((k1 << 15) | (k1 >> (32 - 15))) & MASK32
k1 = (k1 * 0x1b873593) & MASK32
# Update the hash value h1 by XORing it with the scrambled k1
# This combines the current chunk's hash with the overall hash
# The shift and addition operations further mix the bits of h1
h1 ^= k1
h1 = ((h1 << 13) | (h1 >> (32 - 13))) & MASK32
h1 = ((h1 * 5) + 0xe6546b64) & MASK32
# Reset i and k1 for the next iteration
i = 0
k1 = 0
end
# Process any remaining bytes if the string length is not a multiple of 4
# This ensures that all bytes contribute to the final hash value
if i > 0
k1 = (k1 * 0xcc9e2d51) & MASK32
k1 = ((k1 << 15) | (k1 >> (32 - 15))) & MASK32
k1 = (k1 * 0x1b873593) & MASK32
h1 ^= k1
end
# Finalize the hash value
# These operations mix the bits of h1 with the length of the input string
# The constants used are chosen for optimal mixing and avalanche effect
h1 ^= str.bytesize
h1 ^= h1 >> 16
h1 = (h1 * 0x85ebca6b) & MASK32
h1 ^= h1 >> 13
h1 = (h1 * 0xc2b2ae35) & MASK32
h1 ^ (h1 >> 16)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment