Instantly share code, notes, and snippets.

Embed
What would you like to do?
import basin
import string
from hashlib import sha1
## Fixed-length compact ids for compound keys. Given one or more strings, this will
## concat with delim (default '|'), sha1 hash the result, convert to the given base
## (default 62), truncate to the given length (default 12) and left-pad with zeros.
##
## *** WARNING MATH AHEAD ***
## Here's a handy way to approximate the birthday number for a given bitspace and a
## probability of collision. Take your desired probability, say one-in-a-million
## prob = 0.999999
## And calculate the bitspace. In this case, 12 base-62 digits
## space = 62 ** 12
## Take the natural log of the probability, times -2, times the bitspace. The sqrt
## of that is approximately the number of items that can be shoved into the bitspace
## before there is a one-in-a-million chance of collision.
## int(math.sqrt((-2 * math.log(prob)) * space))
## --> 80327683
## With the default settings, with 80 million items, the odds of a collision are
## still 10**6 to 1 against.
##
## Why not base64? or base92? Or base 256, or....
## One of the more annoying things about base64 and above is that they do not
## round-trip well through various systems like URLs, json, etc. every protocol and
## format has their own (sometimes two!) escaping regimes that conflict and result
## in sadness. base62 is a compromise between compression and resistance to corruption.
ALPHA = string.digits + string.ascii_letters
def generate(*tokens, **kwargs):
length = kwargs.get('length', 12)
base = kwargs.get('base', 62)
delim = kwargs.get('delim', '|')
return (basin.encode(ALPHA[0:base], int(sha1(delim.join(tokens)).hexdigest(), 16))[0:length]).zfill(length)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment