Skip to content

Instantly share code, notes, and snippets.

@heri16
Last active April 26, 2024 09:08
Show Gist options
  • Save heri16/98e7d39b881cf1f8a0bc9ac1ce126438 to your computer and use it in GitHub Desktop.
Save heri16/98e7d39b881cf1f8a0bc9ac1ce126438 to your computer and use it in GitHub Desktop.
Secure Short Unique IDs (for generating Invites Codes or Promo Codes or URL slug from Postgres Database)

What

This is open-source code that lets you generate short unique identifiers from numbers (within Postgres DB). These IDs are URL-safe, can encode several numbers, and do not contain common profanity words.

This is what they look like:

https://example.com/order/FHxkSB1ai

Sample

Why

Many applications require unique codes that cannot be reversed-engineered or leak-information, such as unique order IDs, invite codes, coupon codes, or short urls.

Typically, one would use gen_random_bytes(), but the result is not collision-resistant, meaning generated identifiers or codes are not guaranteed to be truly unique. UUIDs are also way too long for use as short urls, promo codes or invitation codes.

Another plausible solution is to use a Keyed Crytographic PRF or Keyed One-way Hash Function such as SHA1 or Keccak. However, the output of most secure Hash Functions are way too long (256-bit or more) to be used for short identifiers. Truncating the hash-digest output also meant loss of collision-resistance, meaning you are no longer guaranteed unique codes and identifiers.

Another plausible solution is to obfuscate using a custom encoding format, with shuffling of the alphabet set. However, such methods have been proven to be easily decoded using cryptanalysis techniques. This meant someone can trivially reverse-engineer your hashid, and generate their own valid codes (to trigger an enumeration attack on your system). This means an attacker would not need to perform any brute-force attack at all to determine valid codes.

How it works

Our simple and elegant solution uses a Feistel Cipher, which guarantees unique short codes without any collision. We use Blowfish, a 64-bit block-size Fiestel Cipher that is built into vanilla Postgresql, that has been heavily audited by the security community.

First, a 448-bit secret key (derived from your Postgres DB root keys) is used to convert a unique bigint/int8 number to a secure 64-bit ciphertext (also unique). The popular hashid encoding-scheme is then used to produce between 9 to 13 alphanumeric characters (also unique). This method also ensures that no profanity appears in the pseudo-random unique codes that are shared to your users.

Our alphabet excludes the letters I, l, O, and 0 to avoid confusion and abuse. Even with these extra protections, our max character-length of 13 is still much shorter than other competing implementations (that typically produces alphanumeric IDs that are 26-characters or more).

Need shorter codes?

See NIST's FPE-FF1 Encryption in pure PL/pgsql that can generate shorter codes. Suitable for practical applications such as:

  • Inputing Some Digits with a phone keypad (or PIN pad)
  • Masking sensitive data (such as National ID Numbers, without revealing the masquerading)

In most cases, the solution here (that uses Blowfish) is much faster and slightly more secure.

To issue codes that contain only numerical digits (such as prepaid topup vouchers), use gen_digit_code(id, secret).

Performance

Performance is very good as all cryptography and encoding is implemented in C via common postgres extensions.

Usage

A. Generated Column

If you would like to store the generated code, use Postgres' generated-column:

-- Create a table
CREATE TABLE profile (
    id bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    short_id text GENERATED ALWAYS AS (gen_short_code(id, gen_short_secret('_profile'))) STORED,
    public_id numeric(20) GENERATED ALWAYS AS (gen_digit_code(id, gen_short_secret('_profile'))) STORED,
    ...
);

-- (optional) add an index on the generated column to speed up the query
CREATE INDEX profile_short_id_idx ON profile (short_id text_pattern_ops) WITH (fillfactor = 50);

B. Computed Column

If you don't want to store the generated short codes on-disk, use Postgres' computed-column:

-- Create a table
CREATE TABLE profile (
    id bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    ...
);

-- Create a computed field
CREATE FUNCTION short_id(profile)
SECURITY DEFINER SET search_path = ''
RETURNS text AS $$
  SELECT gen_short_code($1.id, gen_short_secret('_profile'));
$$ LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE;

-- Query the computed field
SELECT *, profile.short_id FROM profile;

-- (optional) add an index on the computed field to speed up the query
CREATE INDEX profile_short_id_idx ON profile (short_id(profile) text_pattern_ops) WITH (fillfactor = 50);

Tested on

  • Supabase
  • Postgres v15

Inspired By

-- Enable the "pgsodium" extension
CREATE EXTENSION IF NOT EXISTS pgsodium;
-- Enable the "pgcrypto" extension
CREATE EXTENSION IF NOT EXISTS pgcrypto WITH SCHEMA extensions;
-- Enable the "pg_hashids" extension
CREATE EXTENSION IF NOT EXISTS pg_hashids WITH SCHEMA extensions;
-- Create the gen_short_secret(key_realm, key_rotate_idx, key_length) function
CREATE OR REPLACE FUNCTION gen_short_secret(char(8), int8 DEFAULT 1, integer DEFAULT 56)
RETURNS bytea AS $$
-- Use a server-managed secret key (with optional key-rotation)
SELECT pgsodium.derive_key($2, $3, $1::bytea);
$$ LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE;
-- Create the gen_short_code(id, secret, min_char_length, custom_alphabet_set) function
CREATE OR REPLACE FUNCTION gen_short_code(int8, bytea, integer DEFAULT 6, varchar(512) DEFAULT 'abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ123456789')
RETURNS text AS $$
SELECT id_encode(('x' || encode(encrypt(int8send($1), $2, 'bf-ecb/pad:none'), 'hex'))::bit(64)::int8, '', $3, $4);
$$ LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE;
-- Create the gen_digit_code(id, secret) function
CREATE OR REPLACE FUNCTION gen_digit_code(int8, bytea)
RETURNS numeric(20, 0) AS $$
WITH v(num) AS (VALUES (('x' || encode(encrypt(int8send($1), $2, 'bf-ecb/pad:none'), 'hex'))::bit(64)::int8))
SELECT (CASE WHEN v.num < 0 THEN 18446744073709551616::numeric + v.num ELSE v.num END) FROM v;
$$ LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE;
-- Create a table
DROP TABLE org_invite CASCADE;
CREATE TABLE IF NOT EXISTS org_invite (
id int8 PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create a computed field
CREATE OR REPLACE FUNCTION code(org_invite)
RETURNS text AS $$
SELECT gen_short_code($1.id, gen_short_secret('_invite_'));
$$ LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE
SECURITY DEFINER SET search_path = 'extensions', 'public';
-- Generate 100,000 records
INSERT INTO org_invite (created_at) SELECT CURRENT_TIMESTAMP FROM generate_series(1, 100000);
SELECT *, org_invite.code FROM org_invite LIMIT 100;
SELECT *, org_invite.code FROM org_invite ORDER BY id DESC LIMIT 100;
-- (optional) add an index on the computed field to speed up queries
DROP INDEX IF EXISTS org_invite_code_idx;
CREATE INDEX org_invite_code_idx ON org_invite (code(org_invite) text_pattern_ops) WITH (fillfactor = 50);
-- Creating the index may take some time (as a full table scan is needed)
-- Query within 1000,000 records
SELECT *, org_invite.code FROM org_invite WHERE org_invite.code LIKE '7mD%';
-- (Optional) reload postgrest.org Schema Cache
NOTIFY pgrst, 'reload schema'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment