Skip to content

Instantly share code, notes, and snippets.

@jlevy
Last active December 13, 2024 21:53
Show Gist options
  • Save jlevy/c246006675becc446360a798e2b2d781 to your computer and use it in GitHub Desktop.
Save jlevy/c246006675becc446360a798e2b2d781 to your computer and use it in GitHub Desktop.
Fast and simple insecure string hash for JavaScript
// These hashes are for algorithmic use cases, such as bucketing in hashtables, where security isn't
// needed and 32 or 64 bits is enough (that is, rare collisions are acceptable). These are way simpler
// than sha1 (and all its deps) or similar, and with a short, clean (base 36 alphanumeric) result.
// A simple, *insecure* 32-bit hash that's short, fast, and has no dependencies.
// Output is always 7 characters.
// Loosely based on the Java version; see
// https://stackoverflow.com/questions/6122571/simple-non-secure-hash-function-for-javascript
const simpleHash = str => {
let hash = 0;
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i);
hash = (hash << 5) - hash + char;
}
// Convert to 32bit unsigned integer in base 36 and pad with "0" to ensure length is 7.
return (hash >>> 0).toString(36).padStart(7, '0');
};
// An improved alternative:
// cyrb53 (c) 2018 bryc (github.com/bryc). License: Public domain. Attribution appreciated.
// A fast and simple 64-bit (or 53-bit) string hash function with decent collision resistance.
// Largely inspired by MurmurHash2/3, but with a focus on speed/simplicity.
// See https://stackoverflow.com/questions/7616461/generate-a-hash-from-string-in-javascript/52171480#52171480
// https://github.com/bryc/code/blob/master/jshash/experimental/cyrb53.js
const cyrb64 = (str, seed = 0) => {
let h1 = 0xdeadbeef ^ seed, h2 = 0x41c6ce57 ^ seed;
for(let i = 0, ch; i < str.length; i++) {
ch = str.charCodeAt(i);
h1 = Math.imul(h1 ^ ch, 2654435761);
h2 = Math.imul(h2 ^ ch, 1597334677);
}
h1 = Math.imul(h1 ^ (h1 >>> 16), 2246822507);
h1 ^= Math.imul(h2 ^ (h2 >>> 13), 3266489909);
h2 = Math.imul(h2 ^ (h2 >>> 16), 2246822507);
h2 ^= Math.imul(h1 ^ (h1 >>> 13), 3266489909);
// For a single 53-bit numeric return value we could return
// 4294967296 * (2097151 & h2) + (h1 >>> 0);
// but we instead return the full 64-bit value:
return [h2>>>0, h1>>>0];
};
// An improved, *insecure* 64-bit hash that's short, fast, and has no dependencies.
// Output is always 14 characters.
const cyrb64Hash = (str, seed = 0) => {
const [h2, h1] = cyrb64(str, seed);
return h2.toString(36).padStart(7, '0') + h1.toString(36).padStart(7, '0');
}
@jlevy
Copy link
Author

jlevy commented Oct 23, 2019

@ArashPartow
Copy link

A comprehensive list of general purpose hash functions and their implementations can found here:

https://www.partow.net/programming/hashfunctions/index.html

@cgarrovillo
Copy link

A comprehensive list of general purpose hash functions and their implementations can found here:

https://www.partow.net/programming/hashfunctions/index.html

This is a JS Gist. That article is in C.

@Philzen
Copy link

Philzen commented Sep 26, 2023

@jlevy that's a nice and practical one. By accident i discover in a test in our CI pipeline that sometimes it produces less than 6 digits of output. Is that intended?

@lehni
Copy link

lehni commented Oct 25, 2023

Thanks for this! I propose a slight simplification:

  • Instead of new Uint32Array([hash])[0] you can simply use (hash >>> 0)
  • Instead of hash &= hash you can use (…) | 0 to achieve the same
function simpleHash(str) {
  let hash = 0
  for (let i = 0; i < str.length; i++) {
    hash = ((hash << 5) - hash + str.charCodeAt(i)) | 0
  }
  return (hash >>> 0).toString(36)
}

@ashconnell
Copy link

ashconnell commented Feb 14, 2024

Here's some more info about the type of hash this generates:

  • It operates on 32-bit integers and converts the result to a base 36 string.
  • This means the total number of possible hash outputs is limited to 2^32, or about 4.29 billion unique values.
  • While this may seem large, it's relatively small in the context of other hash functions, especially if you're looking to hash a larger number of strings.
  • For 100 hashes, the probability of at least one collision is about 0.000115%, which is extremely low.
  • For 1,000 hashes, the probability increases to approximately 0.0116%, still quite low.
  • For 10,000 hashes, the probability jumps to about 1.16%, becoming more significant.
  • For 50,000 hashes, the probability of at least one collision is roughly 25.25%, a substantial risk.
  • For 100,000 hashes, the probability of at least one collision is around 68.78%, which is very high.

@jlevy
Copy link
Author

jlevy commented Feb 15, 2024

Yes, like with hashCode() in Java, this is for algorithmic use, and not use cases where collisions are unacceptable.

I've updated the comments, simplified as suggested, and for convenience padded so that output is always 7 characters.

That said, arguably a more modern algorithm like cyrb53 would be preferable for a lot of use cases.

@jlevy
Copy link
Author

jlevy commented Feb 15, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment