jmindek/gist:62c50dd766556b7b16d6

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    distinct column -> For each row returned, return only the unique members of a set.
Think of it as for each row in a projection, concatenate all the column values and return only the strings that are unique.
test_db=# SELECT DISTINCT parent_id, child_id, id FROM test.foo_table ORDER BY parent_id, child_id, id LIMIT 10;
parent_id | child_id | id
-----------+------------+-----------------------------
1000040 | 103 | 1000040|2645405726|0001|103
1000040 | 103 | 1000040|2650805748|0002|103
1000040 | 103 | 1000040|2653406206|0001|103
1000040 | 108 | 1000040|2645405726|0001|108
1000040 | 108 | 1000040|2653406206|0001|108
1000040 | 113 | 1000040|2645405726|0001|113
1000040 | 113 | 1000040|2653406206|0001|113
1000040 | 117 | 1000040|2645405726|0001|117
1000040 | 117 | 1000040|2653406206|0001|117
1000040 | 118 | 1000040|2645405726|0001|118
(10 rows)

Each row is unique.
However, notice that are multiple rows for pairs of parent_id and child_id.
PostgreSQL has a nice enhancement to distinct that allows us to easily get only one row of a group of values in a particular column.
distinct on (column) -> Do a distinct, but only give me one record in a set of rows with the same field value in the named column.
test_db=# SELECT DISTINCT ON (parent_id) parent_id, child_id, id FROM test.foo_table ORDER BY parent_id, child_id, id limit 10;
parent_id | child_id | id
-----------+------------+-----------------------------
1000040 | 103 | 1000040|2645405726|0001|103
1000046 | 103 | 1000046|2664405890|0001|103
100008 | 103 | 100008|2601400960|0001|103
1000168 | 103 | 1000168|2461006072|0001|103
1000212 | 103 | 1000212|2405206458|0001|103
1000216 | 103 | 1000216|2642205628|0001|103
1000524 | 103 | 1000524|2459806672|0001|103
1000526 | 103 | 1000526|2458206280|0001|103
1000528 | 103 | 1000528|2422005896|0001|103
1000562 | 103 | 1000562|2808805598|0001|103
(10 rows)

Sadly, Redshift and other popular DBMSs do not have this enhancement.
Convert using rank():
SELECT 
        DISTINCT ON (parent_id) parent_id, 
        child_id, id 
FROM test.foo_table 
ORDER BY parent_id, child_id, id 
LIMIT 10;

to
SELECT *
FROM 
(SELECT parent_id, 
        child_id, 
        id, 
        rank() OVER (PARTITION BY parent_id ORDER BY child_id, id) AS parent_id_ranked
 FROM test.foo_table ORDER BY parent_id, child_id, id LIMIT 10
) AS ranked
WHERE ranked.parent_id_ranked = 1;