Skip to content

Instantly share code, notes, and snippets.

@kodekracker
Last active April 27, 2022 18:02
Show Gist options
  • Save kodekracker/297146ddc607a9458e64a6c15b8f7c47 to your computer and use it in GitHub Desktop.
Save kodekracker/297146ddc607a9458e64a6c15b8f7c47 to your computer and use it in GitHub Desktop.
A bounded pseudo-random generator of unique values in postgres database inspired from pseudo_encrypt
CREATE FUNCTION pseudo_encrypt_24(VALUE int) returns int AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= (VALUE >> 12) & (4096-1);
r1:= VALUE & (4096-1);
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366 * r1 + 150889) % 714025) / 714025.0) * (4096-1))::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1 << 12) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;
CREATE FUNCTION bounded_pseudo_encrypt(VALUE int, max int) returns int AS $$
BEGIN
loop
value := pseudo_encrypt_24(value);
exit when value <= max;
end loop;
return value;
END
$$ LANGUAGE plpgsql strict immutable;
@kodekracker
Copy link
Author

Pseudo encrypt constrained to an arbitrary range

pseudo_encrypt outputs random-looking unique values in the 32-bit range (-2,147,483,648 (-231) through 2,147,483,647 (231 - 1))

To constrain values to a smaller domain [0..N] where N is for instance a power of 10, the Cycle-Walking Cipher technique can be used on top of the Feistel Network.

A variant of pseudo_encrypt() is necessary, with the following changes implemented:

Reduce the block size to the nearest even power of 2 greater than the range size. For example for the range [0..10,000,000], the nearest is 224 (16,777,216). Shifts and masks will be adjusted in the code for 2 half-blocks of 12 bits each.
Suppress the self-inverse property, so that a value wouldn't cycle back immediately to itself. It's done by inverting the blocks when they're recombined at the end of the loop.
Add an outer loop that applies the cipher until the result belongs to the expected range. This is done below in a separate function for clarity.

Sample output:

=> select i,bounded_pseudo_encrypt(i, 10000000) as rnd from generate_series(0,10) as x(i);

 i  |   rnd   
----+---------
  0 | 7388415
  1 | 4878904
  2 | 3247539
  3 | 7670618
  4 | 6551624
  5 | 1212319
  6 | 6156301
  7 |  893851
  8 |  337577
  9 |    4289
 10 |  316941

Query proving that the output is unique, and that the set of values covers exactly the intended range (might take a few minutes to run)

=> select count(distinct rnd) from
   (select bounded_pseudo_encrypt(i, 10000000) as rnd
       from generate_series(0,10000000) as x(i)
   ) as list
 where rnd between 0 and 10000000;

  count   
----------
 10000001

Reference: https://wiki.postgresql.org/wiki/Pseudo_encrypt_constrained_to_an_arbitrary_range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment