Skip to content

Instantly share code, notes, and snippets.

@nelsonsequiera
Last active December 23, 2020 06:12
Show Gist options
  • Save nelsonsequiera/8525fad20984cfe2ca2863c30b81c8ba to your computer and use it in GitHub Desktop.
Save nelsonsequiera/8525fad20984cfe2ca2863c30b81c8ba to your computer and use it in GitHub Desktop.
redshift best suggested recommended column encoding by data types and by encoding types. column encodings: raw AZ64 Byte-dictionary Delta LZO Mostly Runlength Text255 ZSTD Data types:BOOLEAN DOUBLE SMALLINT INTEGER BIGINT DECIMAL REAL DOUBLE CHAR VARCHAR DATE TIMESTAMP TIMESTAMPTZ TIME
style use case
key same value same slice, for joins
all dimension small table for frequent joins, data is copied to all nodes first slice
even unknown cases
  • Set DISTKEY to the column most used in a JOIN
  • Set SORTKEY to the column(s) most used in a WHERE
column type encoding
BOOLEAN raw
DOUBLE PRECISION raw
SMALLINT AZ64, ZSTD, delta, mostly
INTEGER AZ64, ZSTD, delta, mostly
BIGINT AZ64, ZSTD, delta, mostly
DECIMAL AZ64, ZSTD, delta, mostly
REAL raw
DOUBLE PRECISION raw
CHAR LZO, ZSTD, Text255 and Text32k
VARCHAR LZO, ZSTD, Text255 and Text32k
DATE AZ64
TIMESTAMP AZ64
TIMESTAMPTZ AZ64
TIME AZ64
encoding column type
raw sort keys, BOOLEAN, REAL, DOUBLE PRECISION
AZ64 numeric, date, and time
Byte-dictionary fewer than 256 unique values
Delta numeric sequence
LZO CHAR, VARCHAR, free form text
Mostly data type for a column is larger than most of the stored values
Runlength value that is repeated consecutively
Text255 and Text32k VARCHAR columns in which the same words recur often
ZSTD ALL, CHAR, VARCHAR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment