Last active
September 15, 2017 17:43
-
-
Save fritzy/10427e8032d77ded8e13 to your computer and use it in GitHub Desktop.
Fight the German Tank Problem in Postgresql with short, unique, url-safe ids.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE EXTENSION IF NOT EXISTS "pgcrypto"; | |
-- Create a trigger function that takes no arguments. | |
-- Trigger functions automatically have OLD, NEW records | |
-- and TG_TABLE_NAME as well as others. | |
CREATE OR REPLACE FUNCTION unique_short_id() | |
RETURNS TRIGGER AS $$ | |
-- Declare the variables we'll be using. | |
DECLARE | |
key TEXT; | |
qry TEXT; | |
found TEXT; | |
BEGIN | |
-- generate the first part of a query as a string with safely | |
-- escaped table name, using || to concat the parts | |
qry := 'SELECT id FROM ' || quote_ident(TG_TABLE_NAME) || ' WHERE id='; | |
-- This loop will probably only run once per call until we've generated | |
-- millions of ids. | |
LOOP | |
-- Generate our string bytes and re-encode as a base64 string. | |
key := encode(gen_random_bytes(6), 'base64'); | |
-- Base64 encoding contains 2 URL unsafe characters by default. | |
-- The URL-safe version has these replacements. | |
key := replace(key, '/', '_'); -- url safe replacement | |
key := replace(key, '+', '-'); -- url safe replacement | |
-- Concat the generated key (safely quoted) with the generated query | |
-- and run it. | |
-- SELECT id FROM "test" WHERE id='blahblah' INTO found | |
-- Now "found" will be the duplicated id or NULL. | |
EXECUTE qry || quote_literal(key) INTO found; | |
-- Check to see if found is NULL. | |
-- Ff we checked to see if found = NULL it would always be FALSE | |
-- because (NULL = NULL) is always FALSE. | |
IF found IS NULL THEN | |
-- If we didn't find a collision then leave the LOOP. | |
EXIT; | |
END IF; | |
-- We haven't EXITed yet, so return to the top of the LOOP | |
-- and try again. | |
END LOOP; | |
-- NEW and OLD are available in TRIGGER PROCEDURES. | |
-- NEW is the mutated row that will actually be INSERTed. | |
-- We're replacing id, regardless of what it was before | |
-- with our key variable. | |
NEW.id = key; | |
-- The RECORD returned here is what will actually be INSERTed, | |
-- or what the next trigger will get if there is one. | |
RETURN NEW; | |
END; | |
$$ language 'plpgsql'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TABLE test (id TEXT PRIMARY KEY, name TEXT); | |
-- We name the trigger "trigger_test_genid" so that we can remove | |
-- or replace it later. | |
-- If an INSERT contains multiple RECORDs, each one will call | |
-- unique_short_id individually. | |
CREATE TRIGGER trigger_test_genid BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE unique_short_id(); | |
INSERT INTO test (name) VALUES ('cheese'), ('ham'), ('turkey'), ('chicken'); | |
SELECT * FROM test; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
id | name | |
----------+--------- | |
Ixw1yIj7 | cheese | |
SXq0jZ-q | ham | |
KKWXEtBu | turkey | |
DRRXFs1U | chicken | |
(4 rows) |
Hi fritzy, shouldn't it be 25% change of collision after 1,300,000 IDs ie 1.3M and NOT 0.13M ?
Yes @insightfulprogrammer, that was a type-o.
This gist is linked to in https://blog.andyet.com/2016/02/23/generating-shortids-in-postgres/
How do you know that the postgresql encode
function does not also output =
characters in it's output? I looked but couldn't find this anywhere in the postgresql documentation. @fritzy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Generate a short, unique url-safe id that isn't sequential for your table. 2^48 possibilities. The Birthday Problem probability table says that we can expect a 1% chance of collisions after we've generated 240,000 IDs, 25% after 1,300,000 IDs, and 50% at 20,000,000 IDs.
Edit: To be clear, those percentages are that there was a single collision within all of those ids. The chance of every additional key being a collision is much, much, much lower.