Skip to content

Instantly share code, notes, and snippets.

@fabiolimace
Last active April 24, 2024 09:40
Show Gist options
  • Star 24 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save fabiolimace/5e7923803566beefaf3c716d1343ae27 to your computer and use it in GitHub Desktop.
Save fabiolimace/5e7923803566beefaf3c716d1343ae27 to your computer and use it in GitHub Desktop.
Functions for generating Segment's KSUIDs on PostgreSQL
/*
* MIT License
*
* Copyright (c) 2023 Fabio Lima
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
/**
* Returns a Segment's KSUID.
*
* ------------------------------
* Structure
* ------------------------------
* 2HiFB j6X9oGTDYLDVn8qqfjfE9C
* ^ ^
* | |
* | +----- random (128b)
* +----------- seconds (32b)
* ------------------------------
*
* Use COLLATE "C" or COLLATE "POSIX" on column to sort by ASCII order.
* "The C and POSIX collations both specify “traditional C” behavior, in
* which only the ASCII letters “A” through “Z” are treated as letters,
* and sorting is done strictly by character code byte values."
* Source: https://www.postgresql.org/docs/current/collation.html
*
* Reference implementation: https://github.com/segmentio/ksuid
* Also read: https://segment.com/blog/a-brief-history-of-the-uuid/
*
* MIT License.
*/
create or replace function ksuid() returns text as $$
declare
v_time timestamp with time zone := null;
v_seconds numeric(50) := null;
v_numeric numeric(50) := null;
v_epoch numeric(50) = 1400000000; -- 2014-05-13T16:53:20Z
v_base62 text := '';
v_alphabet char array[62] := array[
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z'];
i integer := 0;
begin
-- Get the current time
v_time := clock_timestamp();
-- Extract epoch seconds
v_seconds := EXTRACT(EPOCH FROM v_time) - v_epoch;
-- Generate a KSUID in a numeric variable
v_numeric := v_seconds * pow(2::numeric(50), 128) -- 32 bits for seconds and 128 bits for randomness
+ ((random()::numeric(70,20) * pow(2::numeric(70,20), 48))::numeric(50) * pow(2::numeric(50), 80)::numeric(50))
+ ((random()::numeric(70,20) * pow(2::numeric(70,20), 40))::numeric(50) * pow(2::numeric(50), 40)::numeric(50))
+ (random()::numeric(70,20) * pow(2::numeric(70,20), 40))::numeric(50);
-- Encode it to base-62
while v_numeric <> 0 loop
v_base62 := v_base62 || v_alphabet[mod(v_numeric, 62) + 1];
v_numeric := div(v_numeric, 62);
end loop;
v_base62 := reverse(v_base62);
v_base62 := lpad(v_base62, 27, '0');
return v_base62;
end $$ language plpgsql;
-- EXAMPLE:
-- select ksuid() ksuid, clock_timestamp()-statement_timestamp() time_taken;
-- EXAMPLE OUTPUT:
-- |ksuid |time_taken |
-- |---------------------------|---------------|
-- |2HeIj5n6zGw76bbU6FCvHv0DQ16|00:00:00.000373|
-------------------------------------------------------------------
-- FOR TEST: the expected result is an empty result set
-------------------------------------------------------------------
-- with t as (
-- select ksuid() as id from generate_series(1, 1000)
-- )
-- select * from t where (id is null or not id ~ '^[a-zA-Z0-9]{27}$');
/*
* MIT License
*
* Copyright (c) 2023 Fabio Lima
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
/**
* Returns a Segment's KSUID with microsecond precision.
*
* -------------------------------
* Structure
* -------------------------------
* 2HiFJ Omk JQ0tyawHfJwUJO9IomG
* ^ ^ ^
* | | |
* | | +-- random (108b)
* | +------- micros (20b)
* +----------- seconds (32b)
* -------------------------------
*
* Use COLLATE "C" or COLLATE "POSIX" on column to sort by ASCII order.
* "The C and POSIX collations both specify “traditional C” behavior, in
* which only the ASCII letters “A” through “Z” are treated as letters,
* and sorting is done strictly by character code byte values."
* Source: https://www.postgresql.org/docs/current/collation.html
*
* Reference implementation: https://github.com/segmentio/ksuid
* Also read: https://segment.com/blog/a-brief-history-of-the-uuid/
*
* MIT License.
*/
create or replace function ksuid_micros() returns text as $$
declare
v_time timestamp with time zone := null;
v_seconds numeric(50) := null;
v_micros numeric(50) := null;
v_numeric numeric(50) := null;
v_epoch numeric(50) = 1400000000; -- 2014-05-13T16:53:20Z
v_base62 text := '';
v_alphabet char array[62] := array[
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z'];
i integer := 0;
begin
-- Get the current time
v_time := clock_timestamp();
-- Extract the epoch seconds and microseconds
v_seconds := EXTRACT(EPOCH FROM v_time) - v_epoch;
v_micros := MOD((EXTRACT(microseconds FROM v_time)::numeric(50)), 1e6::numeric(50));
-- Generate a KSUID in a numeric variable
v_numeric := (v_seconds * pow(2::numeric(50), 128)) -- 32 bits for seconds
+ (v_micros * pow(2::numeric(50), 108)) -- 20 bits for microseconds and 108 bits for randomness
+ ((random()::numeric(70,20) * pow(2::numeric(70,20), 54))::numeric(50) * pow(2::numeric(50), 54)::numeric(50))
+ (random()::numeric(70,20) * pow(2::numeric(70,20), 54))::numeric(50);
-- Encode it to base-62
while v_numeric <> 0 loop
v_base62 := v_base62 || v_alphabet[mod(v_numeric, 62) + 1];
v_numeric := div(v_numeric, 62);
end loop;
v_base62 := reverse(v_base62);
v_base62 := lpad(v_base62, 27, '0');
return v_base62;
end $$ language plpgsql;
-- EXAMPLE:
-- select ksuid_micros() ksuid, clock_timestamp()-statement_timestamp() time_taken;
-- EXAMPLE OUTPUT:
-- |ksuid |time_taken |
-- |---------------------------|---------------|
-- |2HgTLqTCTz2A7Z4u1luNJ4cHSBG|00:00:00.000305|
-------------------------------------------------------------------
-- FOR TEST: the expected result is an empty result set
-------------------------------------------------------------------
-- with t as (
-- select ksuid_micros() as id from generate_series(1, 1000)
-- )
-- select * from t where (id is null or not id ~ '^[a-zA-Z0-9]{27}$');
/*
* MIT License
*
* Copyright (c) 2023 Fabio Lima
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
-- Install `pgcrypto` module
-- CREATE EXTENSION IF NOT EXISTS pgcrypto;
/**
* Returns a Segment's KSUID.
*
* ------------------------------
* Structure
* ------------------------------
* 2HiFB j6X9oGTDYLDVn8qqfjfE9C
* ^ ^
* | |
* | +----- random (128b)
* +----------- seconds (32b)
* ------------------------------
*
* Use COLLATE "C" or COLLATE "POSIX" on column to sort by ASCII order.
* "The C and POSIX collations both specify “traditional C” behavior, in
* which only the ASCII letters “A” through “Z” are treated as letters,
* and sorting is done strictly by character code byte values."
* Source: https://www.postgresql.org/docs/current/collation.html
*
* Reference implementation: https://github.com/segmentio/ksuid
* Also read: https://segment.com/blog/a-brief-history-of-the-uuid/
*
* MIT License.
*/
create or replace function ksuid_pgcrypto() returns text as $$
declare
v_time timestamp with time zone := null;
v_seconds numeric(50) := null;
v_numeric numeric(50) := null;
v_epoch numeric(50) = 1400000000; -- 2014-05-13T16:53:20Z
v_payload bytea := null;
v_base62 text := '';
v_alphabet char array[62] := array[
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z'];
i integer := 0;
begin
-- Get the current time
v_time := clock_timestamp();
-- Extract the epoch seconds
v_seconds := EXTRACT(EPOCH FROM v_time) - v_epoch;
-- Generate a KSUID in a numeric variable
v_numeric := v_seconds * pow(2::numeric(50), 128); -- 32 bits for seconds
-- Add 128 random bits to it
v_payload := gen_random_bytes(16);
while i < 16 loop
i := i + 1;
v_numeric := v_numeric + (get_byte(v_payload, i - 1)::numeric(50) * pow(2::numeric(50), (16 - i) * 8));
end loop;
-- Encode it to base-62
while v_numeric <> 0 loop
v_base62 := v_base62 || v_alphabet[mod(v_numeric, 62) + 1];
v_numeric := div(v_numeric, 62);
end loop;
v_base62 := reverse(v_base62);
v_base62 := lpad(v_base62, 27, '0');
return v_base62;
end $$ language plpgsql;
-- EXAMPLE:
-- select ksuid_pgcrypto() ksuid, clock_timestamp()-statement_timestamp() time_taken;
-- EXAMPLE OUTPUT:
-- |ksuid |time_taken |
-- |---------------------------|---------------|
-- |2HeIj5n6zGw76bbU6FCvHv0DQ16|00:00:00.000542|
-------------------------------------------------------------------
-- FOR TEST: the expected result is an empty result set
-------------------------------------------------------------------
-- with t as (
-- select ksuid_pgcrypto() as id from generate_series(1, 1000)
-- )
-- select * from t where (id is null or not id ~ '^[a-zA-Z0-9]{27}$');
/*
* MIT License
*
* Copyright (c) 2023 Fabio Lima
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
-- Install `pgcrypto` module
-- CREATE EXTENSION IF NOT EXISTS pgcrypto;
/**
* Returns a Segment's KSUID with microsecond precision.
*
* -------------------------------
* Structure
* -------------------------------
* 2HiFJ Omk JQ0tyawHfJwUJO9IomG
* ^ ^ ^
* | | |
* | | +-- random (108b)
* | +------- micros (20b)
* +----------- seconds (32b)
* -------------------------------
*
* Use COLLATE "C" or COLLATE "POSIX" on column to sort by ASCII order.
* "The C and POSIX collations both specify “traditional C” behavior, in
* which only the ASCII letters “A” through “Z” are treated as letters,
* and sorting is done strictly by character code byte values."
* Source: https://www.postgresql.org/docs/current/collation.html
*
* Reference implementation: https://github.com/segmentio/ksuid
* Also read: https://segment.com/blog/a-brief-history-of-the-uuid/
*
* MIT License.
*/
create or replace function ksuid_pgcrypto_micros() returns text as $$
declare
v_time timestamp with time zone := null;
v_seconds numeric(50) := null;
v_micros numeric(50) := null;
v_numeric numeric(50) := null;
v_epoch numeric(50) = 1400000000; -- 2014-05-13T16:53:20Z
v_payload bytea := null;
v_base62 text := '';
v_alphabet char array[62] := array[
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z'];
i integer := 0;
begin
-- Get the current time
v_time := clock_timestamp();
-- Extract the epoch seconds and microseconds
v_seconds := EXTRACT(EPOCH FROM v_time) - v_epoch;
v_micros := MOD((EXTRACT(microseconds FROM v_time)::numeric(50)), 1e6::numeric(50));
-- Generate a KSUID in a numeric variable
v_numeric := (v_seconds * pow(2::numeric(50), 128)) -- 32 bits for seconds
+ (v_micros * pow(2::numeric(50), 108)); -- 20 bits for microseconds
-- Add 108 random bits to it
v_payload := gen_random_bytes(14);
v_payload := set_byte(v_payload::bytea, 0, get_byte(v_payload, 0) >> 4);
while i < 14 loop
i := i + 1;
v_numeric := v_numeric + (get_byte(v_payload, i - 1)::numeric(50) * pow(2::numeric(50), (14 - i) * 8));
end loop;
-- Encode it to base-62
while v_numeric <> 0 loop
v_base62 := v_base62 || v_alphabet[mod(v_numeric, 62) + 1];
v_numeric := div(v_numeric, 62);
end loop;
v_base62 := reverse(v_base62);
v_base62 := lpad(v_base62, 27, '0');
return v_base62;
end $$ language plpgsql;
-- EXAMPLE:
-- select ksuid_pgcrypto_micros() ksuid, clock_timestamp()-statement_timestamp() time_taken;
-- EXAMPLE OUTPUT:
-- |ksuid |time_taken |
-- |---------------------------|---------------|
-- |2HeIj5n6zGw76bbU6FCvHv0DQ16|00:00:00.000542|
-------------------------------------------------------------------
-- FOR TEST: the expected result is an empty result set
-------------------------------------------------------------------
-- with t as (
-- select ksuid_pgcrypto_micros() as id from generate_series(1, 1000)
-- )
-- select * from t where (id is null or not id ~ '^[a-zA-Z0-9]{27}$');
create or replace function base62(v_numeric numeric(50)) returns text as $$
declare
v_base62 text := '';
v_alphabet char array[62] := array[
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z'];
begin
while v_numeric <> 0 loop
v_base62 := v_base62 || v_alphabet[mod(v_numeric, 62) + 1];
v_numeric := div(v_numeric, 62);
end loop;
v_base62 := reverse(v_base62);
v_base62 := lpad(v_base62, 27, '0');
return v_base62;
end $$ language plpgsql;
-------------------------------------------------------------------
-- FOR TEST: the expected result is an empty result set
-------------------------------------------------------------------
select x.returned, x.expected from (
select base62(1094000954754373858329439098194926247752183157359) as returned, 'RLDUcKfyvzQRYwJBsQW1L7vKsJj' as expected union
select base62(803726687961417839458477625038381806898304379214) as returned, 'K5Q5xuh29Xt8VUy25cuiMn05BWg' as expected union
select base62(282830277175380297300274393782804602302411954413) as returned, '74GEWkWGTaPKWOiOw16UKYTzcdx' as expected union
select base62(269743924535260533410669459152221482693724560881) as returned, '6jyzfVl1sZc2LcxteGGsqHKJTKT' as expected union
select base62(1232833462799826170754278676272145150887465015607) as returned, 'UoLURViWlhacl6gJ30lO5D3NxoN' as expected union
select base62(611685135043540023398697716909848119640254999737) as returned, 'FHqAT2n3RG8ZqdKBTD3kjSjJ4Vl' as expected union
select base62(953273934005400200413489513400209108025609068785) as returned, 'Np9U7w95oMT99F2i6m39kZJrhjt' as expected union
select base62(287799250508902479787818127455725505651887833700) as returned, '7BxcDUNRPTU308dYi32pmxQ0jf2' as expected union
select base62(1269445476918462270919269682464131617414365219008) as returned, 'Vj4tu1BILeXFTfA0OGvGjVPStv6' as expected union
select base62(3954130933644675224642164484265366830115163718) as returned, '067smOqdTVnWq3hzUk1LBQ7pSX0' as expected union
select base62(345641316609833697798081885877628305432574046802) as returned, '8daeTnisiTVSaPNaPJxPeUro5qc' as expected union
select base62(1129536812589497328763363904561814015504391517342) as returned, 'SEHVx6H2MwGVOSLvecadlfUYZbi' as expected union
select base62(554074711635299389573660024537366959798186900662) as returned, 'DqZNz4YFit22SdXdIJ4HQq2d4Tu' as expected union
select base62(211545714751478621992362264272443490096979868901) as returned, '5Hnk2Go32Er5Sal8jdRXo0poTqv' as expected union
select base62(1122147277177701839538064255395407069562592623492) as returned, 'S2paBX5vSuVwhygR3vErNmt27me' as expected union
select base62(549637052663303562390233361703146990581651651083) as returned, 'Djh34JnfTZ1chDvN6LJ8XxXAkuZ' as expected union
select base62(1326792246883660381116617122368113981186484000021) as returned, 'X9wLvz1J1tbyiw9OYqNheGGrMBx' as expected union
select base62(1331018637943432897627658883337434264726313140218) as returned, 'XGUOQreDFUWeDwy38gygYNgyBcw' as expected union
select base62(1262740421871068384005470783812819586488855522859) as returned, 'VYgjF6GHX71WLjF1LIzJsq8MXgh' as expected union
select base62(1394986737188438074145339963201202753304334264989) as returned, 'YrbyPZRXKbNssd0DDdZWCFbFS4z' as expected union
select base62(989786329925282917354828926247290266746679973042) as returned, 'OjjKD93Tqx3ygN5aiQhdfGLk2bK' as expected union
select base62(272848539500980401072352496662863287941284799188) as returned, '6onGJ7UTEyKFKMYtrTh5ADpjyui' as expected union
select base62(194979784954437431546352727231543280562464939889) as returned, '4s8D4lLsk3jRQGCd063Bjf1Q4hN' as expected union
select base62(163275117725960045266192986606773032500097522222) as returned, '450GEAH6VjBc3G2bE05iFLBAijO' as expected union
select base62(354062610499821756491827913935939118405986326006) as returned, '8qdhurjRDIWWHBxznPmjC7qIb0I' as expected union
select base62(214575622952506321763672917024608429534835721470) as returned, '5MUpgRklVIjQOPSFHuHUrqIXyXu' as expected union
select base62(132135404711945898765787565488168286161327811593) as returned, '3IkaWqoGfRUXxuO0mpJ6pErAi5Z' as expected union
select base62(494144654821592338878548002890523930721626293248) as returned, 'CLhkbXBnkQpZt2YcQVTeEHCqbKK' as expected union
select base62(1270695998531412664968507531343768732741513504189) as returned, 'Vl12dOdhb2LaWaKpywbWa9IFk21' as expected union
select base62(963077232187361469697269858423682695341147886147) as returned, 'O4LJTKLTTBxwzONAQlq0daWUqTT' as expected union
select base62(144532502461690832895378098850858526500125585312) as returned, '3bxbqC0uTtIk0R7PJ8A7gMwWAq0' as expected union
select base62(1094224968032854741593917231169963405027570804451) as returned, 'RLZ0x6ijnpeWa30QTTvTCieuJzH' as expected union
select base62(132713840657997327115971832083293668078940720023) as returned, '3Je9yyk2Tkf2bg8qbR1yPCqz7I7' as expected union
select base62(1066273605705451469272329918518168076974282643312) as returned, 'QeFefFqlEgVUFPeZVPCH4hVKbcu' as expected union
select base62(1448360743349679757124659169201740540817760218260) as returned, 'aCJkfxar8G3rNzDtNCSkrChtZHo' as expected union
select base62(1390445916446380440916793466693663154607071621225) as returned, 'YkZj0rkCaX5KAbDxdZMXwhWIQf3' as expected union
select base62(1425446024484817316918863270429942698110164703784) as returned, 'ZcoH9d1jRJKspxJ4EKHkIEYGrWq' as expected union
select base62(508333429243734482806304589960148916273748787636) as returned, 'Chgu2RJGaB4UWnlbQt0sdjB9cIu' as expected union
select base62(886419213548817321818654189390469230632150354393) as returned, 'M9YZzMu13UlXFi0fzSbM7LTCM3t' as expected union
select base62(1403509722803565745702453829220450694899683234323) as returned, 'Z4onZh618VvENM7eIFQNVRpp2ZP' as expected union
select base62(630529724204306927890959905551991731421307324371) as returned, 'Fl2cGHVpptRK9HTVDRje1hethQp' as expected union
select base62(253722814632472052714578554603928859795078237426) as returned, '6L9nvchOgqoAalSs3gzy09sPqZm' as expected union
select base62(572381799991837782018250883138489919930366428308) as returned, 'EIwC9eVwgtWzwUym24aJkG0k2fE' as expected union
select base62(922969835414424383066970543907189812827357911902) as returned, 'N4C5lX6MaHtrXEioUrix1QarLzi' as expected union
select base62(41266075603431206345381999397839311195000168805) as returned, '11wXNMBgjOjpRWydlAwySoTdp5B' as expected union
select base62(566396805976680424705807005981823040751367794319) as returned, 'E9fCXQmbMPOflbO1HyM4yRwQ5CZ' as expected union
select base62(863545687754103825941585075306150330458503798236) as returned, 'La73plEoPc8lLJXWWelZGugCiYm' as expected union
select base62(111618949667441316268308585673854441325895333360) as returned, '2mxWGX89ZivvoXdVCYtfTs8oPho' as expected union
select base62(1339399109469104676955217834595538976959459439076) as returned, 'XTTWi6j7BD2AT4ZibdMhQRJzDiW' as expected union
select base62(341497984000312167860142722284087913678088998339) as returned, '8XAaiedVT50nARbGTSIeNYksqgN' as expected union
select base62(849040839924022055366717759309632987097402544182) as returned, 'LDdXi00nkmfJrciqYfiRXEMN2dK' as expected union
select base62(1198681997809301815274162398903418390833813924496) as returned, 'TxQTFAUJlBBeLxkAEFCwlCsGnJI' as expected union
select base62(534477259082191167401700960027129079930518266691) as returned, 'DMCbl81sUEdG7tLDNxV74l1tNvX' as expected union
select base62(727885602179425167136722614138614960427215070814) as returned, 'IBtqXhqVC8Y0SJVMFz0FNNqFDfK' as expected union
select base62(744123733002226289921775022739971689375259387620) as returned, 'Ib3sy4PnLY19c8tms7tOvIpGNee' as expected union
select base62(918142913182024548488522331263912199727240607363) as returned, 'MwiMCgsCARzFX6NGi0A55SSy0tH' as expected union
select base62(425595676189680805450290781266150803595059288731) as returned, 'AdU4cLUBzN7dnBJ6cGaQc8hKzbX' as expected union
select base62(336807727396704148028128600719877738405159312823) as returned, '8PtzCq1xsyif0ex0GB910mwwBoV' as expected union
select base62(5026800380673467683963836593226574338601929312) as returned, '07mw8RMm7LDZxH1HIrQYlU40Jv6' as expected union
select base62(238038919721547093896375404168450088923193211413) as returned, '5wr0oF40ZXAIl7yTdgncWdZ1Adp' as expected union
select base62(100407561140307680110580255209263953985286701846) as returned, '2VaPd9y7XcCFesudZPIUCnHDwRS' as expected union
select base62(604212314874050562105384817180631528218960533287) as returned, 'F6GEcAMCeChAVhfAaYK8irN0O3z' as expected union
select base62(972905451468625843808520346597202786045722351858) as returned, 'OJZXG9cIOxaMBzhqLmMF5lEswSo' as expected union
select base62(309811506696521274726101507949801845917995034265) as returned, '7k4ODeMr9Zo7s5fMNMECRue8PsX' as expected union
select base62(862320955752636054817321236951532685187142902316) as returned, 'LYDOiYzMELe3XNbyaa9P3oJfHbA' as expected union
select base62(428538376377275576155723099843499412196234820663) as returned, 'Ai2moLEVduJyxTgyeVkB5DuLdAt' as expected union
select base62(734385462933023869297938072380239497818576304666) as returned, 'ILyIxkyufKQHJe6t2cOX3DC79z8' as expected union
select base62(1139629846945349162323792210197703781129473426896) as returned, 'STvB6vZcjLDrdPWXZqkQkTVpiSW' as expected union
select base62(1297668704827523652627376560924946045471834402049) as returned, 'WQoNYe1vhqabvlEiMYGmsBitqOv' as expected union
select base62(136462187058110350693159493498468281000856132670) as returned, '3PSH0XDJuxwuSadT5BPU7TNn4D0' as expected union
select base62(1327035444998200576691201231569326073851820067332) as returned, 'XAJiXm95VO4ocfOq8JDHR1SzmcO' as expected union
select base62(235675204439882094223712355017999284666663884680) as returned, '5tBvL6fIcPmksKzP1Tfh0sMgzyy' as expected union
select base62(228497188346252373155088745655109446960802057719) as returned, '5i4JUFzvE42Zg5zkmaGnbmO5OAJ' as expected union
select base62(1372811697868462100339768029275940627442606482136) as returned, 'YJFYn6whrx9loof64uD0RlCufge' as expected union
select base62(1126256261462614995034921796534640846532336956828) as returned, 'S9CLLjx3ejekVyNnfm0trXTffqu' as expected union
select base62(1144342744377620691709953728734712481628222683945) as returned, 'SbDxU4Nlof6cV30poQ9ErVyfCPJ' as expected union
select base62(218563645565804445228689747063039984222793141975) as returned, '5SfyL988hSw9DvvhTIy2VD4BSp5' as expected union
select base62(973333937351724731696419889902063270895523888944) as returned, 'OKEhXH5BGPKzSbo0yCg4N26eAIC' as expected union
select base62(791246917408240668458175277095925222806186486656) as returned, 'Jm58DG1MOVEuPdmEKp63ClOkpnc' as expected union
select base62(1288914421387995204430709552951743721013680461582) as returned, 'WDFKfXe58vIROebqbay0ugemXuw' as expected union
select base62(1310744629213541635999240463086808420097777961030) as returned, 'Wl4cIhIAHp5A9CafWGg7jY8sVZ0' as expected union
select base62(88358015058561824054678394839841197701430712749) as returned, '2CumVgqZjOunlKuERsKt4F0Loar' as expected union
select base62(978764504430526703738783379816265365163009403533) as returned, 'OSeQhy28KBVDG8EmGAJKIBFPRvB' as expected union
select base62(906824305810740123097651243821217350165188557156) as returned, 'MfAwv4o7v6H2vUDt6chc2DdEcnE' as expected union
select base62(790005076089751180484792529913641620673348236425) as returned, 'Jk9pBXPC4XFBy2thu6ajd28Vb5t' as expected union
select base62(822738095536477666968857408630827581614415724386) as returned, 'KYsZPguMI9uQT03QorCwAJwuCXq' as expected union
select base62(123764105701181979299394727957566790771570415268) as returned, '35mKt2q6cxegCIQrhrpSRyGxtxM' as expected union
select base62(647498808314905624027040012220973019751750015851) as returned, 'GBKsbsubM1Ulq6Y93knZG9IYi9T' as expected union
select base62(374397105980876264336743767628659056105542388324) as returned, '9M9IKyGo9b0O5FJ1VcZ57bJ1NkS' as expected union
select base62(846028666166109899541270931449012380538499109264) as returned, 'L8y9hCoFOBn5LLJVQv1aB09kNUG' as expected union
select base62(539245100854380444945794326639596656836264144209) as returned, 'DTafPNcMu1Sm85CVtEFunW3iQaX' as expected union
select base62(716819481781801467434042607849295587886120335477) as returned, 'HukhCHvHdIzfmWf2A2CmAjcEgrd' as expected union
select base62(76417955431651763552315529992585559312851817294) as returned, '1uPfXtcouH1zfZwuw3JOi76g0TO' as expected union
select base62(733000838193027705053637894089773194460660846992) as returned, 'IJpHRgxCBDHIZJZPQ28BmG2TaPw' as expected union
select base62(594233146199866508661997705517749610683196301260) as returned, 'EqnVhHAxPsnNMRO0mTRwlbIjwCa' as expected union
select base62(671934333003976146001159829463644377645736597543) as returned, 'GnCSoMcEiYVXJg0SdmLkzPrWE6B' as expected union
select base62(511776262913017310069727729889273446418671739129) as returned, 'Cn1fHEEnbW9gOlNkIL4zF8iTrOz' as expected union
select base62(1146942457906847646142868854096423325324764192196) as returned, 'SfFig45w6ajBRCYjjltd4g21cii' as expected union
select base62(290033691586065276473629867468810106288324097356) as returned, '7FQHfH0kFyKVIyHu2WQ0YCzv0U0' as expected
) as x where x.returned <> x.expected;
@d1rtym0nk3y
Copy link

returns a random number of null value on postgres 14.5

with t as (
    select ksuid() as id from generate_series(1, 1000)
)
select * from t where id is null;

+----+
|id  |
+----+
|null|
|null|
|null|
|null|
|null|
|null|
|null|
|null|
|null|
+----+

@fabiolimace
Copy link
Author

fabiolimace commented Nov 16, 2022

I forgot to put in a floor to truncate the numeric variable, causing an off-by-one error.

Thank you @d1rtym0nk3y !

@fabiolimace
Copy link
Author

I changed the numeric variable declaration to numeric(50). Now floor is no longer needed.

@fabiolimace
Copy link
Author

fabiolimace commented Nov 16, 2022

Last updates:

  • Renamed fn_ksuid() to ksuid()
  • Added test_base62.sh
  • Optimized ksuid.sql to avoid md5()
  • Added ksuid_pgcrypto.sql to be used with pgcrypto module

@jackpmastrangelo
Copy link

Hi @fabiolimace, thanks for this implementation, we've found it really helpful!.

One thing I wanted to share is how a Postgres database's collation can affect the time-sorting property of a KSUID.

Our database defaulted to the en_US.UTF-8 collation. This collation sorted lowercase and uppercase versions of letters together, such that sort(['A', 'B', 'a', 'b']) => ['a', 'A', 'b', 'B'].

This is different from sorting assumptions of ASCII, which base62 shares where sort(['A', 'B', 'a', 'b']) => ['A', 'B', 'a', 'b'].

We had to change the collation on all of our databases from en_US.UTF-8 to C.UTF-8 which maintained ascii sort invariants on normal letters. Before we did this, we were getting errors between the expected sort order of our IDs, and the actual sort order.

Postgres collations are impossible to change without dropping and recreating the DB, or setting them on a per-column level, so I wanted to leave this for anyone who comes across this Gist and is planning to use Postgres KSUIDs in their new project.

@fabiolimace
Copy link
Author

Hi @jackpmastrangelo!

It is really very important to take into account the difference between collations.

This issue was found in a discussion about new UUID formats.

Many thanks for the notice!

P.S.: I found an interesting Gist on the subject: PostgreSQL collation is a massive footgun.

@fabiolimace
Copy link
Author

fabiolimace commented Mar 12, 2023

Added this note to all function comments.:

 * Use COLLATE "C" or COLLATE "POSIX" on column to sort by ASCII order.
 * "The C and POSIX collations both specify “traditional C” behavior, in
 * which only the ASCII letters “A” through “Z” are treated as letters, 
 * and sorting is done strictly by character code byte values."
 * Source: https://www.postgresql.org/docs/current/collation.html

The code below shows the difference in behavior between C and en_US collations.

----------------------------------------------------
-- CREATING TABLES WITH DIFFERENT COLLATIONS
----------------------------------------------------

-- Standard Collation (ASCII order)
create temporary table tmp_collate_c (
	name character varying(255) COLLATE "C"
);

-- Collation for American Englisth ('natural' order?)
create temporary table tmp_collate_us (
	name character varying(255) COLLATE "en_US"
);

----------------------------------------------------
-- INSERTING VALUES
----------------------------------------------------

insert into tmp_collate_c values ('AAAAAA');
insert into tmp_collate_c values ('BBBBBB');
insert into tmp_collate_c values ('CCCCCC');
insert into tmp_collate_c values ('aaaaaa');
insert into tmp_collate_c values ('bbbbbb');
insert into tmp_collate_c values ('cccccc');
insert into tmp_collate_c values ('AAAaaa');
insert into tmp_collate_c values ('BBBbbb');
insert into tmp_collate_c values ('CCCccc');
insert into tmp_collate_c values ('aaaAAA');
insert into tmp_collate_c values ('bbbBBB');
insert into tmp_collate_c values ('cccCCC');

insert into tmp_collate_us values ('AAAAAA');
insert into tmp_collate_us values ('BBBBBB');
insert into tmp_collate_us values ('CCCCCC');
insert into tmp_collate_us values ('aaaaaa');
insert into tmp_collate_us values ('bbbbbb');
insert into tmp_collate_us values ('cccccc');
insert into tmp_collate_us values ('AAAaaa');
insert into tmp_collate_us values ('BBBbbb');
insert into tmp_collate_us values ('CCCccc');
insert into tmp_collate_us values ('aaaAAA');
insert into tmp_collate_us values ('bbbBBB');
insert into tmp_collate_us values ('cccCCC');

----------------------------------------------------
-- THE RESULTS
----------------------------------------------------

select * from tmp_collate_c order by 1;

AAAAAA
AAAaaa
BBBBBB
BBBbbb
CCCCCC
CCCccc
aaaAAA
aaaaaa
bbbBBB
bbbbbb
cccCCC
cccccc

select * from tmp_collate_us order by 1;

aaaaaa
aaaAAA
AAAaaa
AAAAAA
bbbbbb
bbbBBB
BBBbbb
BBBBBB
cccccc
cccCCC
CCCccc
CCCCCC

@fabiolimace
Copy link
Author

fabiolimace commented Mar 12, 2023

Fixed ksuid_micros.sql and ksuid_pgcrypto_micros.sql.

Before:

v_micros  := MOD((EXTRACT(microseconds FROM current_timestamp)::numeric(50)), 1e6::numeric(50));

After:

v_micros  := MOD((EXTRACT(microseconds FROM v_time)::numeric(50)), 1e6::numeric(50));

Differences between clock_timestamp() and current_timestamp could cause wrong sort order in microseconds level.

For example:

select clock_timestamp()::varchar, current_timestamp::varchar
|clock_timestamp              |current_timestamp            |
|-----------------------------|-----------------------------|
|2023-03-12 01:41:33.728935-03|2023-03-12 01:41:33.728694-03|

@simkimsia
Copy link

Sorry @fabiolimace

Can I use this to generate ksuid as autogenerated primary key?

@arobert93
Copy link

Thank you for this! Really appreciate the effort you've made. Do you recommend using varchar(27) or text when defining the column? I'm new to Postgresql and I'm having a hard time choosing the type.

@fabiolimace
Copy link
Author

According to the PostgreSQL manual:

There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

I think text is not suitable because it has no length limit.

@arobert93
Copy link

@fabiolimace Thank you for referencing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment