Skip to content

Instantly share code, notes, and snippets.

@slpsys
Last active September 15, 2015 21:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slpsys/5e43d8237fd8aa924015 to your computer and use it in GitHub Desktop.
Save slpsys/5e43d8237fd8aa924015 to your computer and use it in GitHub Desktop.
warehouse=# CREATE VIEW events.events AS SELECT * FROM events.a UNION SELECT * FROM events.b;
CREATE
warehouse=# CREATE VIEW events.events AS SELECT * FROM events.a UNION ALL SELECT * FROM events.b;
CREATE
warehouse=# explain select count(*) from events.events where user_id = 1234567;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
XN Aggregate (cost=97603805.53..97603805.53 rows=1 width=0)
-> XN Subquery Scan events (cost=0.00..97593174.46 rows=4252427 width=0)
-> XN Unique (cost=0.00..97550650.19 rows=4252427 width=3373)
-> XN Append (cost=0.00..96934048.27 rows=4252427 width=3373)
-> XN Subquery Scan "*SELECT* 1" (cost=0.00..86274211.60 rows=5480 width=1320)
-> XN Seq Scan on "a" (cost=0.00..86274156.80 rows=5480 width=1320)
Filter: (user_id = 1234567)
-> XN Subquery Scan "*SELECT* 2" (cost=0.00..10659836.67 rows=4246947 width=3373)
-> XN Seq Scan on b (cost=0.00..10617367.20 rows=4246947 width=3373)
Filter: (user_id = 1234567)
(10 rows)
warehouse=# explain select count(*) from events.events2 where user_id = 1234567;
QUERY PLAN
------------------------------------------------------------------------------------------------
XN Aggregate (cost=96987203.61..96987203.61 rows=1 width=0)
-> XN Subquery Scan events2 (cost=0.00..96976572.54 rows=4252427 width=0)
-> XN Append (cost=0.00..96934048.27 rows=4252427 width=0)
-> XN Subquery Scan "*SELECT* 1" (cost=0.00..86274211.60 rows=5480 width=0)
-> XN Seq Scan on "a" (cost=0.00..86274156.80 rows=5480 width=0)
Filter: (user_id = 1234567)
-> XN Subquery Scan "*SELECT* 2" (cost=0.00..10659836.67 rows=4246947 width=0)
-> XN Seq Scan on b (cost=0.00..10617367.20 rows=4246947 width=0)
Filter: (user_id = 1234567)
(9 rows)
warehouse=# select count(*) from events.a where user_id = 1234567;
count
-------
3885
(1 row)
Time: 476.454 ms
warehouse=# select count(*) from events.events2 where user_id = 1234567;
count
-------
3885
(1 row)
Time: 504.726 ms
warehouse=# select count(*) from events.events where user_id = 1234567;
count
-------
3885
(1 row)
Time: 70111.247 ms
warehouse=# explain select name from events.events2 where user_id = 1234567;
QUERY PLAN
-------------------------------------------------------------------------------------------
XN Subquery Scan events2 (cost=0.00..96976572.54 rows=4252427 width=32)
-> XN Append (cost=0.00..96934048.27 rows=4252427 width=19)
-> XN Subquery Scan "*SELECT* 1" (cost=0.00..86274211.60 rows=5480 width=16)
-> XN Seq Scan on "a" (cost=0.00..86274156.80 rows=5480 width=16)
Filter: (user_id = 1234567)
-> XN Subquery Scan "*SELECT* 2" (cost=0.00..10659836.67 rows=4246947 width=19)
-> XN Seq Scan on b (cost=0.00..10617367.20 rows=4246947 width=19)
Filter: (user_id = 1234567)
(8 rows)
Time: 20.512 ms
warehouse=# explain select name, page from events.events2 where user_id = 1234567;
QUERY PLAN
-------------------------------------------------------------------------------------------
XN Subquery Scan events2 (cost=0.00..96976572.54 rows=4252427 width=64)
-> XN Append (cost=0.00..96934048.27 rows=4252427 width=37)
-> XN Subquery Scan "*SELECT* 1" (cost=0.00..86274211.60 rows=5480 width=31)
-> XN Seq Scan on "a" (cost=0.00..86274156.80 rows=5480 width=31)
Filter: (user_id = 1234567)
-> XN Subquery Scan "*SELECT* 2" (cost=0.00..10659836.67 rows=4246947 width=37)
-> XN Seq Scan on b (cost=0.00..10617367.20 rows=4246947 width=37)
Filter: (user_id = 1234567)
@slpsys
Copy link
Author

slpsys commented Sep 15, 2015

The two tables, a and b contain about 8B and 1B rows, respectively; b contains no rows where user_id is not null.

@slpsys
Copy link
Author

slpsys commented Sep 15, 2015

3_runtimes.sql shows that the difference between SELECT COUNT(*) over events2 (with the UNION ALL) is only ~5% more expensive than querying the constituent table a by itself--e.g. it likely is not doing a 60-way join (there are 60 columns in the raw tables) before simply filtering on one column and counting the results.
select count(*) from events.events, however, takes 147x as long to run. Again, there are ~9 billion rows, so that the DISTINCT becomes very expensive for a large N is unsurprising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment