Last active
February 4, 2024 19:49
-
-
Save vaiorabbit/5657561 to your computer and use it in GitHub Desktop.
FNV-1a Hash (http://isthe.com/chongo/tech/comp/fnv/) in JavaScript.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// 32 bit FNV-1a hash | |
// Ref.: http://isthe.com/chongo/tech/comp/fnv/ | |
function fnv32a( str ) | |
{ | |
var FNV1_32A_INIT = 0x811c9dc5; | |
var hval = FNV1_32A_INIT; | |
for ( var i = 0; i < str.length; ++i ) | |
{ | |
hval ^= str.charCodeAt(i); | |
hval += (hval << 1) + (hval << 4) + (hval << 7) + (hval << 8) + (hval << 24); | |
} | |
return hval >>> 0; | |
} | |
/* | |
print( "Lorem -> " + fnv32a('Lorem') ); // Lorem -> 1789342528 | |
*/ | |
/* for Mac OSX JSC: | |
$ cat words.txt | /System/Library/Frameworks/JavaScriptCore.framework/Resources/jsc -f fnv32a.js > tmp_js.txt | |
var line; | |
while((line = readline()) != "") | |
{ | |
var re = /\"(.*)\"\,/; | |
var word = re.exec(line); | |
print( word[1] + " -> " + fnv32a(word[1]) ); | |
} | |
*/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
charCodeAt()
covers 16-bits not 8,so you're good up to(EDIT: Not exactly.. you do get a 16-bit value, but it's UTF-16 encoded, see below).0xFFFF
(65535)codePointAt()
is useful for chars larger than those values, such as some emoji👩🏿⚖️
(composite of several emoji/chars).So what happens here is that you'll get a result through
charCodeAt()
but it's UTF-16 surrogates(two 16-bit pairs). Even with the information above about why we're getting these values, you'll find that FNV is meant to operate on one byte at a time(not one character), JS strings are UTF-16 by default, but for anything exceeding one byte, we need to encode to UTF-8 instead.We can take the codepoint for the unicorn
🦄
(0x1f984) and then convert to octal notation as an easy way to visual the conversion to UTF-8 as the conversion is explained/visualized in wikipedia:You would then need to either modify the method to use bytes directly or convert those bytes into a string. In JS you can also have code handle that conversion like so:
For those same reasons
Я
(codepoint 1071 == 0x42f), it is two bytes(length of 1 in UTF-16), the expected hexadecimal value of the hash with that as a string input would be0x80c353e0
(not0x2a064b5e
), you can see that is the case with this FNV calculator.UTF-8 string + char codes/bytes:
"Я" == [208, 175]
Might as well point out for anyone else landing here and curious, the bitshifting instead of usual multiplying by prime you might see is required for JS, not just as an optimization, due to JS being a 53-bit float not 32-bit uint, otherwise you'll get inaccurate results as you exceed 32-bit values. In JS the bitwise/bitshift operators will operate on 32-bit numbers instead avoiding the issue.
The final
return hval >>> 0
is also important to get a positive number(uint), otherwise if you expect to get the same results such as hexadecimal strings(toString(16)
), it won't work as expected if negative values are converted.For anyone that wants a version that handles char code values beyond 128, or more than 32-bit hashes, see sindresorhus NPM package.