Skip to content

Instantly share code, notes, and snippets.

@fritzy
Last active September 15, 2017 17:43
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fritzy/10427e8032d77ded8e13 to your computer and use it in GitHub Desktop.
Save fritzy/10427e8032d77ded8e13 to your computer and use it in GitHub Desktop.
Fight the German Tank Problem in Postgresql with short, unique, url-safe ids.
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
-- Create a trigger function that takes no arguments.
-- Trigger functions automatically have OLD, NEW records
-- and TG_TABLE_NAME as well as others.
CREATE OR REPLACE FUNCTION unique_short_id()
RETURNS TRIGGER AS $$
-- Declare the variables we'll be using.
DECLARE
key TEXT;
qry TEXT;
found TEXT;
BEGIN
-- generate the first part of a query as a string with safely
-- escaped table name, using || to concat the parts
qry := 'SELECT id FROM ' || quote_ident(TG_TABLE_NAME) || ' WHERE id=';
-- This loop will probably only run once per call until we've generated
-- millions of ids.
LOOP
-- Generate our string bytes and re-encode as a base64 string.
key := encode(gen_random_bytes(6), 'base64');
-- Base64 encoding contains 2 URL unsafe characters by default.
-- The URL-safe version has these replacements.
key := replace(key, '/', '_'); -- url safe replacement
key := replace(key, '+', '-'); -- url safe replacement
-- Concat the generated key (safely quoted) with the generated query
-- and run it.
-- SELECT id FROM "test" WHERE id='blahblah' INTO found
-- Now "found" will be the duplicated id or NULL.
EXECUTE qry || quote_literal(key) INTO found;
-- Check to see if found is NULL.
-- Ff we checked to see if found = NULL it would always be FALSE
-- because (NULL = NULL) is always FALSE.
IF found IS NULL THEN
-- If we didn't find a collision then leave the LOOP.
EXIT;
END IF;
-- We haven't EXITed yet, so return to the top of the LOOP
-- and try again.
END LOOP;
-- NEW and OLD are available in TRIGGER PROCEDURES.
-- NEW is the mutated row that will actually be INSERTed.
-- We're replacing id, regardless of what it was before
-- with our key variable.
NEW.id = key;
-- The RECORD returned here is what will actually be INSERTed,
-- or what the next trigger will get if there is one.
RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TABLE test (id TEXT PRIMARY KEY, name TEXT);
-- We name the trigger "trigger_test_genid" so that we can remove
-- or replace it later.
-- If an INSERT contains multiple RECORDs, each one will call
-- unique_short_id individually.
CREATE TRIGGER trigger_test_genid BEFORE INSERT ON test FOR EACH ROW EXECUTE PROCEDURE unique_short_id();
INSERT INTO test (name) VALUES ('cheese'), ('ham'), ('turkey'), ('chicken');
SELECT * FROM test;
id | name
----------+---------
Ixw1yIj7 | cheese
SXq0jZ-q | ham
KKWXEtBu | turkey
DRRXFs1U | chicken
(4 rows)
@fritzy
Copy link
Author

fritzy commented Feb 8, 2016

Generate a short, unique url-safe id that isn't sequential for your table. 2^48 possibilities. The Birthday Problem probability table says that we can expect a 1% chance of collisions after we've generated 240,000 IDs, 25% after 1,300,000 IDs, and 50% at 20,000,000 IDs.

Edit: To be clear, those percentages are that there was a single collision within all of those ids. The chance of every additional key being a collision is much, much, much lower.

@mythical-programmer
Copy link

Hi fritzy, shouldn't it be 25% change of collision after 1,300,000 IDs ie 1.3M and NOT 0.13M ?

@fritzy
Copy link
Author

fritzy commented May 11, 2017

Yes @insightfulprogrammer, that was a type-o.

@fritzy
Copy link
Author

fritzy commented May 11, 2017

@evbots
Copy link

evbots commented Sep 15, 2017

How do you know that the postgresql encode function does not also output = characters in it's output? I looked but couldn't find this anywhere in the postgresql documentation. @fritzy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment