Skip to content

Instantly share code, notes, and snippets.

@jmindek
Last active January 31, 2024 15:48
Show Gist options
  • Star 41 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save jmindek/62c50dd766556b7b16d6 to your computer and use it in GitHub Desktop.
Save jmindek/62c50dd766556b7b16d6 to your computer and use it in GitHub Desktop.
DISTINCT ON like functionality for Redshift

distinct column -> For each row returned, return only the unique members of a set. Think of it as for each row in a projection, concatenate all the column values and return only the strings that are unique.

test_db=# SELECT DISTINCT parent_id, child_id, id FROM test.foo_table ORDER BY parent_id, child_id, id LIMIT 10;
parent_id | child_id | id
-----------+------------+-----------------------------
1000040 | 103 | 1000040|2645405726|0001|103
1000040 | 103 | 1000040|2650805748|0002|103
1000040 | 103 | 1000040|2653406206|0001|103
1000040 | 108 | 1000040|2645405726|0001|108
1000040 | 108 | 1000040|2653406206|0001|108
1000040 | 113 | 1000040|2645405726|0001|113
1000040 | 113 | 1000040|2653406206|0001|113
1000040 | 117 | 1000040|2645405726|0001|117
1000040 | 117 | 1000040|2653406206|0001|117
1000040 | 118 | 1000040|2645405726|0001|118
(10 rows)

Each row is unique. However, notice that are multiple rows for pairs of parent_id and child_id.

PostgreSQL has a nice enhancement to distinct that allows us to easily get only one row of a group of values in a particular column.

distinct on (column) -> Do a distinct, but only give me one record in a set of rows with the same field value in the named column.

test_db=# SELECT DISTINCT ON (parent_id) parent_id, child_id, id FROM test.foo_table ORDER BY parent_id, child_id, id limit 10;
parent_id | child_id | id
-----------+------------+-----------------------------
1000040 | 103 | 1000040|2645405726|0001|103
1000046 | 103 | 1000046|2664405890|0001|103
100008 | 103 | 100008|2601400960|0001|103
1000168 | 103 | 1000168|2461006072|0001|103
1000212 | 103 | 1000212|2405206458|0001|103
1000216 | 103 | 1000216|2642205628|0001|103
1000524 | 103 | 1000524|2459806672|0001|103
1000526 | 103 | 1000526|2458206280|0001|103
1000528 | 103 | 1000528|2422005896|0001|103
1000562 | 103 | 1000562|2808805598|0001|103
(10 rows)

Sadly, Redshift and other popular DBMSs do not have this enhancement.

Convert using rank():

SELECT 
        DISTINCT ON (parent_id) parent_id, 
        child_id, id 
FROM test.foo_table 
ORDER BY parent_id, child_id, id 
LIMIT 10;

to

SELECT *
FROM 
(SELECT parent_id, 
        child_id, 
        id, 
        rank() OVER (PARTITION BY parent_id ORDER BY child_id, id) AS parent_id_ranked
 FROM test.foo_table ORDER BY parent_id, child_id, id LIMIT 10
) AS ranked
WHERE ranked.parent_id_ranked = 1;
@heyalexchoi
Copy link

good god thank you

@austinjalexander
Copy link

Also check out ROW_NUMBER Window Function (example).

@filitchp
Copy link

filitchp commented Dec 2, 2016

thank you, this is amazing!

@benoittgt
Copy link

I prefer to use ROW_NUMBER() rather than RANK() because is deterministic. See : http://dba.stackexchange.com/a/165945/100437

If columns in PARTITION BY and ORDER have the same value they will have the same rank (1) and will not be marked as duplicated.

I would prefer :

SELECT *
FROM
  (SELECT parent_id,
          child_id,
          id,
          ROW_NUMBER() OVER (PARTITION BY parent_id
                             ORDER BY child_id, id) AS parent_id_ranked
   FROM test.foo_table
   ORDER BY parent_id,
            child_id,
            id
   LIMIT 10) AS ranked
WHERE ranked.parent_id_ranked = 1;

@wushugene
Copy link

@benoittgt that's the exact scenario I encountered. thanks!

@gregroberts
Copy link

+1 this shit's tight!

@spencersmith
Copy link

👍 Saved my ass last night.

Thanks!

@ededdneddyfan
Copy link

Thank you @benoittgt!

@stephenakearns
Copy link

Thanks Jerry & Benoit!

@christophlingg
Copy link

Thank you @benoittgt!

@emekdahl
Copy link

Thanks, @benoittgt! This worked for me! :)

@shcyiza
Copy link

shcyiza commented Dec 9, 2020

Exacly what i needed! thanks guys, these snippets saved my day

@BalmungSan
Copy link

Thank you very much @benoittgt !

@mpaloni
Copy link

mpaloni commented Apr 13, 2022

Very good! for once got window function working in first try ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment