Skip to content

Instantly share code, notes, and snippets.

@nelsonsequiera
Last active December 31, 2020 06:04
Show Gist options
  • Save nelsonsequiera/d92d2cd1f1a6611fb5850916ca97d26f to your computer and use it in GitHub Desktop.
Save nelsonsequiera/d92d2cd1f1a6611fb5850916ca97d26f to your computer and use it in GitHub Desktop.
Redshift best distribution key test
* get current sort and dist keys
* Materialize a single column to check distribution
* Identify the table OID
* get skewed data stats
SELECT "column",
TYPE,
distkey,
sortkey
FROM pg_table_def
WHERE schemaname = 'public'
AND tablename = '<tablename>'
AND (
distkey = TRUE
OR sortkey <> 0
);
CREATE TEMP TABLE t1 DISTKEY (<column>) AS
SELECT <column> FROM <tablename>;
SELECT 't1'::regclass::oid;
-- gives an oid back
SELECT "table" tablename, skew_rows,
ROUND(CAST(max_blocks_per_slice AS FLOAT) /
GREATEST(NVL(min_blocks_per_slice,0)::int,1)::FLOAT,5) storage_skew,
ROUND(CAST(100*dist_slice AS FLOAT) /
(SELECT COUNT(DISTINCT slice) FROM stv_slices),2) pct_populated
FROM svv_table_info ti
JOIN (SELECT tbl, MIN(c) min_blocks_per_slice,
MAX(c) max_blocks_per_slice,
COUNT(DISTINCT slice) dist_slice
FROM (SELECT b.tbl, b.slice, COUNT(*) AS c
FROM STV_BLOCKLIST b
GROUP BY b.tbl, b.slice)
WHERE tbl = <oid> GROUP BY tbl) iq ON iq.tbl = ti.table_id;
/*
sample output of bad dist key:
tablename | skew_rows | storage_skew | pct_populated
-----------+-----------+--------------+---------------
t1 | 42.81 | 21.97688 | 66.67
sample output of good dist key:
tablename | skew_rows | storage_skew | pct_populated
-----------+-----------+--------------+---------------
t2 | 1.00 | 1.00063 | 66.67
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment