Skip to content

Instantly share code, notes, and snippets.

@awreece
Last active November 28, 2017 18:28
Show Gist options
  • Save awreece/28c359c6d834717ab299665022b19fd6 to your computer and use it in GitHub Desktop.
Save awreece/28c359c6d834717ab299665022b19fd6 to your computer and use it in GitHub Desktop.
psql> explain analyze select investments.id
from contributions
JOIN investments ON contributions.investment_id = investments.id
JOIN contribution_investment_metrics cim on cim.investment_id = investments.id
WHERE contributions.id = '\x58c9c0d3ee944c48b32f814d';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=27098.32..27106.89 rows=1 width=13) (actual time=1322.324..1322.329 rows=1 loops=1)
Join Filter: (contributions.investment_id = investments.id)
-> Merge Join (cost=27098.24..27102.94 rows=1 width=26) (actual time=1322.296..1322.300 rows=1 loops=1)
Merge Cond: ("*SELECT* 1".investment_id = contributions.investment_id)
-> GroupAggregate (cost=27094.15..27098.15 rows=200 width=149) (actual time=1183.553..1312.280 rows=140770 loops=1)
Group Key: "*SELECT* 1".investment_id
-> Sort (cost=27094.15..27095.85 rows=3400 width=13) (actual time=1183.546..1236.100 rows=423140 loops=1)
Sort Key: "*SELECT* 1".investment_id
Sort Method: external merge Disk: 10712kB
-> Hash Join (cost=26774.06..27054.26 rows=3400 width=13) (actual time=535.677..817.481 rows=476281 loops=1)
Hash Cond: ("*SELECT* 1".metric = metrics.id)
-> HashAggregate (cost=26772.95..26892.95 rows=40000 width=34) (actual time=535.649..700.100 rows=476281 loops=1)
Group Key: "*SELECT* 1".investment_id, "*SELECT* 1".metric
-> Append (cost=10329.18..26290.92 rows=482027 width=26) (actual time=89.445..319.620 rows=482027 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=10329.18..10349.44 rows=5788 width=26) (actual time=89.444..90.553 rows=5788 loops=1)
-> Sort (cost=10329.18..10332.08 rows=5788 width=42) (actual time=89.443..89.910 rows=5788 loops=1)
Sort Key: contributions_1.investment_id, cm.metric, cm.last_update_on DESC
Sort Method: quicksort Memory: 645kB
-> Hash Join (cost=105.62..10256.84 rows=5788 width=42) (actual time=1.893..85.305 rows=5788 loops=1)
Hash Cond: (contributions_1.id = cm.contribution_id)
-> Seq Scan on contributions contributions_1 (cost=0.00..9694.49 rows=351495 width=26) (actual time=0.002..38.401 rows=351495 loops=1)
-> Hash (cost=85.36..85.36 rows=5788 width=34) (actual time=1.876..1.876 rows=5788 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 453kB
-> Seq Scan on contribution_metrics cm (cost=0.00..85.36 rows=5788 width=34) (actual time=0.003..0.970 rows=5788 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=0.08..15941.48 rows=476239 width=26) (actual time=0.018..199.051 rows=476239 loops=1)
-> Index Only Scan using investment_metrics_investment_id_metric_last_updated_on_idx1 on investment_metrics im (cost=0.08..14512.76 rows=476239 width=42) (actual time=0.017..157.291 rows=47
Heap Fetches: 476239
-> Hash (cost=1.05..1.05 rows=17 width=13) (actual time=0.014..0.014 rows=17 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on metrics (cost=0.00..1.05 rows=17 width=13) (actual time=0.006..0.008 rows=17 loops=1)
-> Sort (cost=4.09..4.09 rows=1 width=13) (actual time=0.038..0.038 rows=1 loops=1)
Sort Key: contributions.investment_id
Sort Method: quicksort Memory: 25kB
-> Index Scan using contributions_pkey on contributions (cost=0.08..4.09 rows=1 width=13) (actual time=0.021..0.022 rows=1 loops=1)
Index Cond: (id = '\x58c9c0d3ee944c48b32f814d'::bytea)
-> Index Only Scan using investments_pkey on investments (cost=0.08..3.95 rows=1 width=13) (actual time=0.015..0.016 rows=1 loops=1)
Index Cond: (id = "*SELECT* 1".investment_id)
Heap Fetches: 1
Planning time: 0.878 ms
Execution time: 1328.332 ms
select investments.id
from contributions
JOIN investments ON contributions.investment_id = investments.id
JOIN LATERAL (
select *
from contribution_investment_metrics cim
where cim.investment_id = investments.id
) cim ON cim.investment_id = investments.id
WHERE contributions.id = '\x58c9c0d3ee944c48b32f814d';
psql> explain analyze select investment_id
from contribution_investment_metrics
where investment_id = (select investments.id
from investments
join contributions on investments.id = contributions.investment_id
where contributions.id = '\x58c9c0d3ee944c48b32f814d');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on contribution_investment_metrics (cost=28.77..29.88 rows=4 width=13) (actual time=0.080..0.080 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Nested Loop (cost=0.17..8.18 rows=1 width=13) (actual time=0.016..0.017 rows=1 loops=1)
-> Index Scan using contributions_pkey on contributions contributions_1 (cost=0.08..4.09 rows=1 width=13) (actual time=0.007..0.008 rows=1 loops=1)
Index Cond: (id = '\x58c9c0d3ee944c48b32f814d'::bytea)
-> Index Only Scan using investments_pkey on investments (cost=0.08..4.09 rows=1 width=13) (actual time=0.006..0.006 rows=1 loops=1)
Index Cond: (id = contributions_1.investment_id)
Heap Fetches: 1
-> GroupAggregate (cost=20.59..21.68 rows=4 width=149) (actual time=0.080..0.080 rows=1 loops=1)
Group Key: grouped_metrics.investment_id
-> Hash Join (cost=20.59..21.67 rows=4 width=13) (actual time=0.074..0.078 rows=3 loops=1)
Hash Cond: (metrics.id = grouped_metrics.metric)
-> Seq Scan on metrics (cost=0.00..1.05 rows=17 width=13) (actual time=0.006..0.007 rows=17 loops=1)
-> Hash (cost=20.58..20.58 rows=4 width=26) (actual time=0.060..0.060 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Subquery Scan on grouped_metrics (cost=14.76..20.58 rows=4 width=26) (actual time=0.055..0.058 rows=3 loops=1)
-> GroupAggregate (cost=14.76..20.57 rows=4 width=34) (actual time=0.055..0.057 rows=3 loops=1)
Group Key: "*SELECT* 1".investment_id, "*SELECT* 1".metric
-> Merge Append (cost=14.76..20.55 rows=4 width=26) (actual time=0.050..0.052 rows=5 loops=1)
Sort Key: "*SELECT* 1".metric
-> Subquery Scan on "*SELECT* 1" (cost=14.68..14.68 rows=1 width=26) (actual time=0.041..0.042 rows=2 loops=1)
-> Sort (cost=14.68..14.68 rows=1 width=42) (actual time=0.041..0.042 rows=2 loops=1)
Sort Key: cm.metric, cm.last_update_on DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.14..14.68 rows=1 width=42) (actual time=0.031..0.032 rows=2 loops=1)
-> Index Scan using contributions_investment_id_idx1 on contributions (cost=0.08..4.77 rows=2 width=26) (actual time=0.025..0.025 rows=1 loops=1)
Index Cond: (investment_id = $1)
-> Index Only Scan using contribution_metrics_contribution_id_metric_last_update_on_idx1 on contribution_metrics cm (cost=0.06..4.95 rows=2 width=34) (actual time=0.004..0.005 r
Index Cond: (contribution_id = contributions.id)
Heap Fetches: 2
-> Subquery Scan on "*SELECT* 2" (cost=0.08..5.86 rows=3 width=26) (actual time=0.009..0.010 rows=3 loops=1)
-> Index Only Scan using investment_metrics_investment_id_metric_last_updated_on_idx1 on investment_metrics im (cost=0.08..5.85 rows=3 width=42) (actual time=0.009..0.010 rows=3 loops=1)
Index Cond: (investment_id = $1)
Heap Fetches: 3
Planning time: 0.782 ms
Execution time: 0.172 ms
CREATE TABLE metrics (
id bytea NOT NULL primary key,
description text,
name text COLLATE pg_catalog."C"
);
CREATE TABLE investments (
name text,
metadata jsonb,
id bytea NOT NULL PRIMARY KEY
);
CREATE TABLE investment_metrics (
last_updated_on timestamp with time zone,
metric bytea NOT NULL,
projected double precision,
investment_id bytea NOT NULL references investments(id) on delete cascade on update cascade,
primary key (investment_id, metric)
);
CREATE INDEX on investment_metrics (investment_id, metric, last_updated_on DESC);
--
-- An allocation of some amount to an investment. Note that investment_id *ought*
-- to be a foreign key, but isn't at the moment. I tried adding the foreign key and
-- the obvious query didn't work.
--
-- In practice, almost every investment has ~1 contribution.
--
CREATE TABLE contributions (
id bytea NOT NULL PRIMARY KEY,
amount double precision,
investment_id bytea,
metadata jsonb
);
CREATE INDEX ON contributions (investment_id);
--
-- Some metrics for a contribution, used to override the metric for the underlying investment.
-- In practice, very few contributions have metrics.
--
CREATE TABLE contribution_metrics (
last_update_on timestamp with time zone,
metric bytea NOT NULL,
projected double precision,
contribution_id bytea NOT NULL references contributions(id) on delete cascade on update cascade,
primary key (contribution_id, metric)
);
CREATE INDEX on contribution_metrics (contribution_id, metric, last_update_on DESC);
-- Create a function that always returns the first non-NULL item
CREATE OR REPLACE FUNCTION first_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE SQL IMMUTABLE STRICT AS $$
SELECT $1;
$$;
-- And then wrap an aggregate around it
CREATE AGGREGATE FIRST (
sfunc = first_agg,
basetype = anyelement,
stype = anyelement
);
--
-- Get the exploded metrics for a contribution by choosing the value
-- in contribution_metrics if available and otherwise defaulting to
-- the value in investment_metrics.
--
CREATE VIEW contribution_investment_metrics AS
SELECT grouped_metrics.investment_id,
max(
CASE metrics.name
WHEN 'Yield'::text THEN grouped_metrics.value
ELSE NULL::double precision
END) AS yield
from (
select investment_id, metric, first(value) as value
from (
(
select investment_id, cm.metric, cm.projected as value
from contribution_metrics cm
join contributions on cm.contribution_id = contributions.id
order by investment_id, cm.metric, cm.last_update_on desc
)
UNION ALL
(
select investment_id, metric, projected as value
from investment_metrics im
order by investment_id, metric, last_updated_on desc
)
) raw_metrics
group by investment_id, metric
) grouped_metrics
JOIN metrics ON metrics.id = grouped_metrics.metric
group by investment_id;
psql> select count(*) from contributions;
count
--------
351495
(1 row)
psql> select count(*) from investments;
count
--------
312558
(1 row)
psql> select count(*) from contribution_metrics;
count
-------
5788
(1 row)
psql> select count(*) from investment_metrics;
count
--------
476239
(1 row)
psql> select count(*) from metrics;
count
-------
17
(1 row)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment