Skip to content

Instantly share code, notes, and snippets.

@raymondberg
Last active October 23, 2015 03:19
Show Gist options
  • Save raymondberg/7bd3ab308a3ad13df667 to your computer and use it in GitHub Desktop.
Save raymondberg/7bd3ab308a3ad13df667 to your computer and use it in GitHub Desktop.
Quick Setup of PSEUDO-RANDOM Record sorting with Grouping

Problem

Group rows with a certain record value together but randomize the rest.

SQLITE Setup Commands

.open newdb.sqlite
.load ./digest.so
create table never_use_access(unique_data varchar(64), sortable_data varchar(64));
insert into never_use_access values ("this is jake", "I'm a dude");
insert into never_use_access values ("this is pat", null); 
insert into never_use_access values ("this is pam", "I'm a chick");
insert into never_use_access values ("this is steph", "I'm a chick"); 
insert into never_use_access values ("this is barry", "I'm a great singer");
insert into never_use_access values ("this is larry", "I'm a dude"); 
insert into never_use_access values ("this is john", "I'm a poet");
insert into never_use_access values ("this is suzy", "I'm a rockstar"); 
insert into never_use_access values ("this is gwen", "I'm a chick");
insert into never_use_access values ("this is alan", "I'm a dude"); 
insert into never_use_access values ("this is frank", null);
insert into never_use_access values ("this is don", null); 
insert into never_use_access values ("this is ben", "I'm just happy to be here");

Query 1 - Nulls treated as common value

select unique_data
, sortable_data
, hex(sha1(sortable_data)) as hash 
from never_use_access 
order by hex(sha1(sortable_data));

Produces:

this is barry|I'm a great singer|556140E2E6796882AAD0C3DDB9CDF8FD099E980B
this is ben|I'm just happy to be here|6D98E5DE225E02B244AFA136574E9249191A9814
this is john|I'm a poet|C5519C5A8DD34216A5E3C8D88468DAC36AF0440A
this is jake|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is larry|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is alan|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is pam|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF
this is steph|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF
this is gwen|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF
this is pat||DA39A3EE5E6B4B0D3255BFEF95601890AFD80709
this is frank||DA39A3EE5E6B4B0D3255BFEF95601890AFD80709
this is don||DA39A3EE5E6B4B0D3255BFEF95601890AFD80709

Query 2 - Nulls treated as unique values

select unique_data
, sortable_data
, case when sortable_data is null then hex(randomblob(24)) else hex(sha1(sortable_data)) end as hash 
from never_use_access 
order by case when sortable_data is null then hex(randomblob(24)) else hex(sha1(sortable_data)) end;

Produces:

this is suzy|I'm a rockstar|1F4B0E5FBA9DB576F1117AD45540B135AB608E15
this is barry|I'm a great singer|556140E2E6796882AAD0C3DDB9CDF8FD099E980B
this is ben|I'm just happy to be here|6D98E5DE225E02B244AFA136574E9249191A9814
this is pat||10FFBE690DC8D9D3864AD81F1A6E1C1CA88A850EAC97F7F1
this is john|I'm a poet|C5519C5A8DD34216A5E3C8D88468DAC36AF0440A
this is frank||F86ACECCB01E0A9D84C3146A93072F515E329B02B57E2955
this is jake|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is larry|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is alan|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is pam|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF
this is steph|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF
this is gwen|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF
this is don||C45C44ABB2EDE1F57617818579F8F4425B12A5600EC075D3

This is randomly generated so the nulls will move around while everything else stays the same. Here's a second run of the same query on the same data:

this is don||4B51E444A48B2253D8606210FBEF190E239F850D791288C2
this is suzy|I'm a rockstar|1F4B0E5FBA9DB576F1117AD45540B135AB608E15
this is barry|I'm a great singer|556140E2E6796882AAD0C3DDB9CDF8FD099E980B
this is ben|I'm just happy to be here|6D98E5DE225E02B244AFA136574E9249191A9814
this is pat||EDC913B5EF965DC2AC7B454651FB855F37E46A206646B00C
this is frank||7943CF42CE6CF9B34C6D21D9C1D8C9BC1E1D6173BC0D1A79
this is john|I'm a poet|C5519C5A8DD34216A5E3C8D88468DAC36AF0440A
this is jake|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is larry|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is alan|I'm a dude|C930A4CB6E03004DCE04AF6EF798D17E93F8A81E
this is pam|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF
this is steph|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF
this is gwen|I'm a chick|CA1E6FEFB6ACC10ECD454F27B8B95446ACCC4CDF

Query 3 - Quiet and with a random seed

select unique_data
, sortable_data
from never_use_access 
order by case when sortable_data is null then hex(randomblob(24)) else hex(sha1(sortable_data || "this is a crappy shuffle")) end;

Produces:

this is pam|I'm a chick
this is steph|I'm a chick
this is gwen|I'm a chick
this is don|
this is frank|
this is ben|I'm just happy to be here
this is john|I'm a poet
this is pat|
this is suzy|I'm a rockstar
this is barry|I'm a great singer
this is jake|I'm a dude
this is larry|I'm a dude
this is alan|I'm a dude

Tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment