Skip to content

Instantly share code, notes, and snippets.

@slpsys
Last active November 26, 2017 17:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slpsys/46bc7816a388eda9b5bca2fe00cb27d4 to your computer and use it in GitHub Desktop.
Save slpsys/46bc7816a388eda9b5bca2fe00cb27d4 to your computer and use it in GitHub Desktop.
AWS Redshift - Filtering on a sort key

Execution plans

EXTRACT (YEAR FROM date_field) for a given year:

warehouse=# explain select count(*) from views.gamesave where EXTRACT (YEAR FROM event_created_at) = 2016;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 XN Aggregate  (cost=8152608.99..8152608.99 rows=1 width=0)
   ->  XN Seq Scan on gamesave  (cost=0.00..8145820.80 rows=2715274 width=0)
         Filter: ("date_part"('year'::text, event_created_at) = 2016)
(3 rows)

DATE_TRUNC('year', date_field) for a given year:

warehouse=# explain select count(*) from views.gamesave where DATE_TRUNC('year',event_created_at) = '2017-01-01';
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 XN Aggregate  (cost=8152608.99..8152608.99 rows=1 width=0)
   ->  XN Seq Scan on gamesave  (cost=0.00..8145820.80 rows=2715274 width=0)
         Filter: (date_trunc('year'::text, event_created_at) = '2017-01-01 00:00:00'::timestamp without time zone)
(3 rows)

Directly comparing the timestamp to two fencepost milliseconds that bound the year using between:

warehouse=# explain select count(*) from views.gamesave where event_created_at between '2017-01-01' and '2018-01-01';
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 XN Aggregate  (cost=0.67..0.67 rows=1 width=0)
   ->  XN Seq Scan on gamesave  (cost=0.00..0.57 rows=39 width=0)
         Filter: ((event_created_at >= '2017-01-01 00:00:00'::timestamp without time zone) AND (event_created_at <= '2018-01-01 00:00:00'::timestamp without time zone))
(3 rows)

Results

We've seen that WHERE timestamp BETWEEN YYYY-MM-DD AND YY+1-MM-DD is consistently 10x faster, if it is a sort key (this table is several billion rows, query returns ~1B):

warehouse=# select count(*) from views.gamesave where EXTRACT (YEAR FROM event_created_at) = 2016;
Time: 3683.316 ms

warehouse=# select count(*) from views.gamesave where DATE_TRUNC('year', event_created_at) = '2016-01-01';
Time: 3001.326 ms

warehouse=# select count(*) from views.gamesave where event_created_at between '2016-01-01' and '2017-01-01';
Time: 252.388 ms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment