Skip to content

Instantly share code, notes, and snippets.

@vaiorabbit
Last active February 4, 2024 19:49
Show Gist options
  • Save vaiorabbit/5657561 to your computer and use it in GitHub Desktop.
Save vaiorabbit/5657561 to your computer and use it in GitHub Desktop.
FNV-1a Hash (http://isthe.com/chongo/tech/comp/fnv/) in JavaScript.
// 32 bit FNV-1a hash
// Ref.: http://isthe.com/chongo/tech/comp/fnv/
function fnv32a( str )
{
var FNV1_32A_INIT = 0x811c9dc5;
var hval = FNV1_32A_INIT;
for ( var i = 0; i < str.length; ++i )
{
hval ^= str.charCodeAt(i);
hval += (hval << 1) + (hval << 4) + (hval << 7) + (hval << 8) + (hval << 24);
}
return hval >>> 0;
}
/*
print( "Lorem -> " + fnv32a('Lorem') ); // Lorem -> 1789342528
*/
/* for Mac OSX JSC:
$ cat words.txt | /System/Library/Frameworks/JavaScriptCore.framework/Resources/jsc -f fnv32a.js > tmp_js.txt
var line;
while((line = readline()) != "")
{
var re = /\"(.*)\"\,/;
var word = re.exec(line);
print( word[1] + " -> " + fnv32a(word[1]) );
}
*/
@stuartpb
Copy link

stuartpb commented Aug 6, 2016

@prdoyle No. The shifts add the components that would be added when the hash is multiplied with the prime - by adding the shifts to the variable containing the hash, the right-most bit (which would be hval << 0, or a multiplication by 1) is incorporated inherently. (hval << 1, in turn, adds the multiplication by the second-rightmost bit's power of 2 (2), and so on, and so forth.)

@max0x7ba
Copy link

hval ^= str.charCodeAt(i); only works for 8-bit encodings.
Try "Я".charCodeAt(0).

@josiahbryan
Copy link

@max0x7ba How would you adjust it for non-8-bit encodings?

@erenes
Copy link

erenes commented Dec 5, 2017

It appears to work for me without problem if I try a Russian string in Firefox (57.0), IE11 and Chrome (64):

fnv32a("Сегодня я банан")
< 3081921716

"Ya" is code 1071 in my browser(s):

"Я".charCodeAt(0)
< 1071

@polarathene
Copy link

polarathene commented Jul 8, 2020

charCodeAt() covers 16-bits not 8, so you're good up to 0xFFFF(65535)(EDIT: Not exactly.. you do get a 16-bit value, but it's UTF-16 encoded, see below). codePointAt() is useful for chars larger than those values, such as some emoji 👩🏿‍⚖️(composite of several emoji/chars).

// Length 4, not 2 as it might look..
const unicorn = '🦄🌈';

console.log(`The character code is: ${unicorn.charCodeAt(0)} ,it's full codePoint is: ${unicorn.codePointAt(0)}`);
// Outputs: "The character code is: 55358 ,it's full codePoint is: 129412"
// Hexadecimal equivalents('.toString(16)'): charCode == 0xD83E, codePoint == 0x1F984

// charCode is acquired as explained for encoding to UTF-16 here:
// https://en.wikipedia.org/wiki/UTF-16#Examples
// 55358 ==0xD83E, is the "high surrogate" from the unicode value, calculated like this:
0xF984 / 0x400 + 0xD800 == 0xD83E

// The next value returned from 'charCodeAt()' when the index value increments, is not for the rainbow, 
// but the 2nd UTF-16 value part of the unicorn.. Note that the 'codePointAt()' is now matching the charCode value
// It no longer points to the codePoint for the unicorn.
// 56708 == 0xDD84, is the "low surrogate" calculated like follows:
0xF984 % 0x400 + 0xDC00 == 0xDD84

So what happens here is that you'll get a result through charCodeAt() but it's UTF-16 surrogates(two 16-bit pairs). Even with the information above about why we're getting these values, you'll find that FNV is meant to operate on one byte at a time(not one character), JS strings are UTF-16 by default, but for anything exceeding one byte, we need to encode to UTF-8 instead.

We can take the codepoint for the unicorn 🦄 (0x1f984) and then convert to octal notation as an easy way to visual the conversion to UTF-8 as the conversion is explained/visualized in wikipedia:

// This will convert to octal(base 8), output "374604"
console.log( Number(0x1f984).toString(8) )
console.log( '🦄'.codePointAt().toString(8) )

// codepoint exceeds 0x10000, so will be 4 bytes represented in UTF-8
// Initial octals for 4 byte UTF-8: 36x 2xx 2xx 2xx + split octal pairs to the last 3 bytes
// '0' prefix followed by number is JS syntax for octal values
const UTF8_bytes = [0360 + 000. 0200 + 037, 0200 + 046, 0200 + 004]
const UTF8_bytes = [0360, 0237, 0246, 0204]
// Which is equivalent to decimal values:
const UTF8_bytes = [240, 159, 166, 132]

You would then need to either modify the method to use bytes directly or convert those bytes into a string. In JS you can also have code handle that conversion like so:

const UTF8_string = unescape(encodeURIComponent('🦄'))
// Results in a string of length 4: "�", the two characters are 2 bytes long each, because their first bytes exceed 0x7F(128)

For those same reasons Я(codepoint 1071 == 0x42f), it is two bytes(length of 1 in UTF-16), the expected hexadecimal value of the hash with that as a string input would be 0x80c353e0(not 0x2a064b5e), you can see that is the case with this FNV calculator.

UTF-8 string + char codes/bytes:"Я" == [208, 175]


Might as well point out for anyone else landing here and curious, the bitshifting instead of usual multiplying by prime you might see is required for JS, not just as an optimization, due to JS being a 53-bit float not 32-bit uint, otherwise you'll get inaccurate results as you exceed 32-bit values. In JS the bitwise/bitshift operators will operate on 32-bit numbers instead avoiding the issue.

The final return hval >>> 0 is also important to get a positive number(uint), otherwise if you expect to get the same results such as hexadecimal strings(toString(16)), it won't work as expected if negative values are converted.


For anyone that wants a version that handles char code values beyond 128, or more than 32-bit hashes, see sindresorhus NPM package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment