Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save djk447/fa30268f101e9927f6b85aa0c122c695 to your computer and use it in GitHub Desktop.
Save djk447/fa30268f101e9927f6b85aa0c122c695 to your computer and use it in GitHub Desktop.
Timeseries are problematical in Postgres, here's part 2 of an attempt to make them a little less so.

#Implementing an Array-Based Timeseries Store in Postgres Part 2 See Part 1 here for an explanation of what we're up to. In this bit, I'll be taking this to a larger data store and seeing how the results scale. ##Data Setup and Recap of Part 1 The data is the same as from Part 1, except that we're now using a much larger data set. And we're only using the test2 table that I defined earlier. For a referesher, here's the original schema:

CREATE SCHEMA data;
CREATE TABLE data.generators (
    id          varchar primary key,
    ptid        int NOT NULL UNIQUE,
    name        varchar,
    location    point,
    owner       text,
    address     jsonb,
    fields      jsonb
);
CREATE TABLE data.rt_lbmp_generators(
    generator          varchar references data.generators(id) ON UPDATE CASCADE ON DELETE RESTRICT,
    record_time        timestamptz NOT NULL,
    lbmp               float,
    losses             float,
    congestion         float,
    price_version      int,
    PRIMARY KEY(generator,record_time)
);
CREATE INDEX rt_lbmp_gen_record_time_idx ON data.rt_lbmp_generators (record_time);
CREATE INDEX rt_lbmp_gen_generator_idx ON data.rt_lbmp_generators (generator);

And here's the test2 table and summary data type we use:

CREATE TABLE test2(
	generator 		varchar REFERENCES data.generators(id) ,
	record_range	tstzrange,
	lbmp			summary,
	losses			summary,
	congestion		summary,
	rt_lbmp			data.rt_lbmp_generators[]);

CREATE TYPE summary AS (
	nn_count		int,
	minimum		double precision,
	maximum		double precision,
	total		double precision,
	average		double precision);
	

To transfer the data from the original table to our new test2 table we'll use a query like this:

WITH t as (SELECT d.* FROM 
  (SELECT id FROM data.generators WHERE id NOT IN (
    SELECT DISTINCT generator FROM test2) ORDER BY id LIMIT 50) g 
      INNER JOIN data.rt_lbmp_generators d ON d.generator=g.id)  
INSERT INTO test2 
SELECT generator, range_merge(to_range(min(record_time),null::tstzrange),
  to_range(max(record_time), null::tstzrange)) as record_range, 
  (count(lbmp) , min(lbmp) , max(lbmp) , sum(lbmp) , avg(lbmp) )::summary as lbmp, 
  (count(losses) , min(losses) , max(losses) , sum(losses) , avg(losses))::summary as losses, 
  (count(congestion) , min(congestion) , max(congestion) , sum(congestion) , avg(congestion))::summary as congestion,
  array_agg(t.*::data.rt_lbmp_generators) as rt_lbmp FROM t 
GROUP BY generator, EXTRACT(year FROM record_time),EXTRACT(month from record_time);

And just run it multiple times until there's no more data to transfer, this is just to make sure if there's an error in the process we haven't lost a whole very long running query, so we segment it a bit.

##Some Nice Aggregation Advantages So now if I wanted to give you an idea of what the data looked like I might run a query like this over the original data:

WITH t as (SELECT * FROM data.rt_lbmp_generators)  
  SELECT generator, range_merge(to_range(min(record_time),null::tstzrange),
    to_range(max(record_time), null::tstzrange)) as record_range, 
    count(*) as num_records FROM t GROUP BY generator;

Unfortunately, with the large number of records we have, this takes a long time, about 200s!

HashAggregate  (cost=5586985.16..5587087.66 rows=200 width=40) (actual time=198774.940..198837.819 rows=498 loops=1)
  Group Key: t.generator
  Buffers: shared hit=9400 read=1162389, temp written=864703
  CTE t
    ->  Seq Scan on rt_lbmp_generators  (cost=0.00..2275588.04 rows=110379904 width=51) (actual time=0.931..52191.970 rows=109080312 loops=1)
          Buffers: shared hit=9400 read=1162389
  ->  CTE Scan on t  (cost=0.00..2207598.08 rows=110379904 width=40) (actual time=0.937..133870.390 rows=109080312 loops=1)
        Buffers: shared hit=9400 read=1162389, temp written=864703
Planning time: 0.167 ms
Execution time: 199128.589 ms

If we instead use a slightly modified version to run on our table and take advantage of the fact that we've already aggregated the counts inside our segments:

EXPLAIN (ANALYZE,BUFFERS) WITH t as (SELECT * FROM test2)  SELECT generator, range_merge(to_range(min(lower(record_range)),null::tstzrange),
  to_range(max(upper(record_range)), null::tstzrange)) as record_range, 
  sum((lbmp).nn_count) as num_records FROM t GROUP BY generator ORDER BY generator;
-------
Sort  (cost=1125.14..1125.64 rows=200 width=96) (actual time=102.391..102.480 rows=498 loops=1)
  Sort Key: t.generator
  Sort Method: quicksort  Memory: 69kB
  Buffers: shared hit=434
  CTE t
    ->  Seq Scan on test2  (cost=0.00..563.11 rows=12911 width=238) (actual time=0.073..10.604 rows=12911 loops=1)
          Buffers: shared hit=434
  ->  HashAggregate  (cost=451.89..554.39 rows=200 width=96) (actual time=36.896..100.448 rows=498 loops=1)
        Group Key: t.generator
        Buffers: shared hit=434
        ->  CTE Scan on t  (cost=0.00..258.22 rows=12911 width=96) (actual time=0.080..22.479 rows=12911 loops=1)
              Buffers: shared hit=434
Planning time: 0.374 ms
Execution time: 103.222 ms

There are probably better ways to get the count of the whole table etc, but this definitely makes it a bit easier over the groups that we have. The whole point of this is that we're taking advantage of some natural grouping in our tables, things we know about that the query planner doesn't in order to make things work better, so this is a nice little example of that.

generator record_range num_records
59TH STREET_GT_1 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
74TH STREET_GT_1 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
74TH STREET_GT_2 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
ADK HUDSON___FALLS ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
ADK_NYS___DAM ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
ADK RESOURCE___RCVRY ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
ADK S GLENS___FALLS ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
AIR_PRODUCTS___DRP ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
ALBANY___1 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
ALBANY___2 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
... ... ...
TRIGEN___CC ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
UNION___PROCESSING_DRP ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
UPPER HUDSON___HYD ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
UPPER RAQUET___HYD ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
VISCHER___FERRY HYD ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
WADING RIVER_IC_1 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
WADING RIVER_IC_2 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
WADING RIVER_IC_3 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
WALDEN___HYDRO ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
WARRENSBURG____ ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684
WATERSIDE___6 8 9 ["2014-04-16 04:05:00+00","2016-05-12 04:00:00+00"] 219684

With 498 generators in total...basically the same as before except instead of only having the first few generators, we now have all of them that had data available in the time range we're looking at. Additionally, the original was running on my local machine whereas this is running on a DigitalOcean instance, a 4GB Ram, 2 CPU, 60GB SSD instance to be specific. So don't compare the actual numbers between part1 and part2 as they are on different machines. And we've increased from ~2 million rows to ~110 million rows in the table here.

##Did it Compress? In part 1 we got a very large compression, much of it coming from index size, which I think will decrease as a fraction of table size as we increase the size of the table, so we'd expect some amount less compression:

relation total_size
data.rt_lbmp_generators 25 GB
public.test2 1552 MB
data.generators 144 kB
relation size
data.rt_lbmp_generators 9155 MB
data.rt_lbmp_generators_pkey 8689 MB
data.rt_lbmp_gen_generator_idx 4406 MB
data.rt_lbmp_gen_record_time_idx 3171 MB
pg_toast.pg_toast_81927 1530 MB
pg_toast.pg_toast_81927_index 17 MB
public.test2 3472 kB
public.test2_record_range_idx 840 kB
public.test2_generator_idx 456 kB

So only a 16x decrease in size this time rather than 2 orders of magnitude, still not bad though. I'm not sure how much this will increase or decrease based on the number of columns in the table etc, but that would be an interesting topic to explore. Without indexes etc we get a consistent 5-6x decrease in size of the raw data from the compression that Postgres' TOAST (The Oversized Attribute Storage Technique) does natively.(If you'd like to read more about TOAST and Postgres storage in general, I recommend this and this as good intros).

My guess is that that 5-6x size decrease is the long term limit for the types of tables we're working with here.

UPDATE: Here's results from Jim Nasby's new, better size estimation query.

oid table_schema table_name row_estimate total_bytes index_bytes toast_bytes table_bytes total index toast table
49487 data rt_lbmp_generators 1.02142e+08 26657562624 17055834112 8192 9601720320 25 GB 16 GB 8192 bytes 9157 MB
81927 public test2 12911 1627070464 1327104 1622163456 3579904 1552 MB 1296 kB 1547 MB 3496 kB

##But is it fast? We'll use the same basic set of queries here as we did before.

 EXPLAIN (ANALYZE,BUFFERS) SELECT record_time,lbmp,losses FROM data.rt_lbmp_generators 
  WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2014-07-21]'::tstzrange @> record_time;
------
  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=24) (actual time=0.059..81.303 rows=21829 loops=1)
  Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
  Rows Removed by Filter: 197855
  Buffers: shared hit=7157
Planning time: 0.169 ms
Execution time: 85.080 ms

Compared to:

 EXPLAIN (ANALYZE,BUFFERS) SELECT  (t.rt).record_time, (t.rt).lbmp, (t.rt).losses FROM (
  SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 WHERE generator='74TH STREET_GT_1' 
    AND '[2014-05-07,2014-07-21]'::tstzrange && record_range) as t WHERE'[2014-05-07,2014-07-21]'::tstzrange @> (t.rt).record_time;
------
Subquery Scan on t  (cost=0.29..8.25 rows=2 width=32) (actual time=1.384..34.257 rows=21829 loops=1)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
  Rows Removed by Filter: 4929
  Buffers: shared hit=60
  ->  Index Scan using test2_generator_idx on test2  (cost=0.29..4.50 rows=300 width=18) (actual time=1.377..12.938 rows=26758 loops=1)
        Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
        Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
        Rows Removed by Filter: 23
        Buffers: shared hit=60
Planning time: 0.212 ms
Execution time: 38.413 ms

So for the basic Select query we’re still doing significantly better with our array based store than otherwise. The real speedup happens with aggregates though:

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators 
  WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time;
-----
Aggregate  (cost=37570.31..37570.32 rows=1 width=16) (actual time=150.348..150.349 rows=1 loops=1)
  Buffers: shared hit=7157
  ->  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=16) (actual time=0.046..108.721 rows=127615 loops=1)
        Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
        Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92069
        Buffers: shared hit=7157
Planning time: 0.193 ms
Execution time: 150.407 ms

Compared to:

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
  SELECT sum(lbmp) lbmp, sum(losses) losses FROM (
    SELECT (lbmp).total as lbmp, (losses).total as losses FROM test2 
    WHERE generator='74TH STREET_GT_1' AND'[2014-05-07,2015-07-21]'::tstzrange @> record_range) as t1
  UNION ALL
  SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 
    WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range AND
      record_range NOT IN (
        SELECT record_range FROM test2 WHERE '[2014-05-07,2015-07-21]'::tstzrange @> record_range)) as t
  WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;
-----
Aggregate  (cost=631.95..631.96 rows=1 width=16) (actual time=29.571..29.572 rows=1 loops=1)
  Buffers: shared hit=482
  ->  Append  (cost=3.07..631.94 rows=2 width=16) (actual time=0.068..29.564 rows=2 loops=1)
        Buffers: shared hit=482
        ->  Aggregate  (cost=3.07..3.08 rows=1 width=122) (actual time=0.068..0.068 rows=1 loops=1)
              Buffers: shared hit=4
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..3.01 rows=13 width=122) (actual time=0.042..0.055 rows=13 loops=1)
                    Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                    Rows Removed by Filter: 13
                    Buffers: shared hit=4
        ->  Aggregate  (cost=628.83..628.84 rows=1 width=32) (actual time=29.495..29.495 rows=1 loops=1)
              Buffers: shared hit=478
              ->  Subquery Scan on t  (cost=612.04..628.81 rows=4 width=32) (actual time=10.914..24.432 rows=13122 loops=1)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4939
                    Buffers: shared hit=478
                    ->  Index Scan using test2_generator_idx on test2 test2_1  (cost=612.04..618.81 rows=800 width=18) (actual time=10.904..16.656 rows=18061 loops=1)
                          Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
                          Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
                          Rows Removed by Filter: 24
                          Buffers: shared hit=478
                          SubPlan 1
                            ->  Seq Scan on test2 test2_2  (cost=0.00..595.39 rows=6548 width=22) (actual time=0.006..5.727 rows=6461 loops=1)
                                  Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                                  Rows Removed by Filter: 6450
                                  Buffers: shared hit=434
Planning time: 0.431 ms
Execution time: 29.678 ms

##But what if we're not running the same query all day? So here's the really interesting bit: what happens when we aren't selecting/aggregating the same thing over and over again? I chose a few different generators and ran the same queries to check that out because I'd noticed that the first queries I ran tended to take longer than subsequent calls for the same query. All we've done here is change the name of the generator.

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators 
  WHERE generator='BLUE_CIRC_CHEM_DRP' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time;
-----
--First Run
Aggregate  (cost=37570.31..37570.32 rows=1 width=16) (actual time=1730.634..1730.635 rows=1 loops=1)
  Buffers: shared hit=3395 read=3800
  ->  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=16) (actual time=3.245..1683.733 rows=127615 loops=1)
        Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
        Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92069
        Buffers: shared hit=3395 read=3800
Planning time: 0.150 ms
Execution time: 1730.808 ms
--Second Run
Aggregate  (cost=37570.31..37570.32 rows=1 width=16) (actual time=158.710..158.711 rows=1 loops=1)
  Buffers: shared hit=7195
  ->  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=16) (actual time=0.100..117.741 rows=127615 loops=1)
        Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
        Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92069
        Buffers: shared hit=7195
Planning time: 0.378 ms
Execution time: 158.791 ms

So with the standard aggregation query when we are no longer working in shared buffers, we've got an order of magnitude speedup from the first to second run of the same query, likely mostly due to disk read times. So let's see how our array based system worked:

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
  SELECT sum(lbmp) lbmp, sum(losses) losses FROM (
    SELECT (lbmp).total as lbmp, (losses).total as losses FROM test2 
    WHERE generator='BLUE_CIRC_CHEM_DRP' AND'[2014-05-07,2015-07-21]'::tstzrange @> record_range) as t1
  UNION ALL
  SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 
    WHERE generator='BLUE_CIRC_CHEM_DRP' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range AND
      record_range NOT IN (
        SELECT record_range FROM test2 WHERE '[2014-05-07,2015-07-21]'::tstzrange @> record_range)) as t
  WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;
-----
--First Run
Aggregate  (cost=631.95..631.96 rows=1 width=16) (actual time=57.715..57.715 rows=1 loops=1)
  Buffers: shared hit=443 read=34
  ->  Append  (cost=3.07..631.94 rows=2 width=16) (actual time=1.418..57.708 rows=2 loops=1)
        Buffers: shared hit=443 read=34
        ->  Aggregate  (cost=3.07..3.08 rows=1 width=122) (actual time=1.416..1.416 rows=1 loops=1)
              Buffers: shared hit=2 read=1
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..3.01 rows=13 width=122) (actual time=1.354..1.388 rows=13 loops=1)
                    Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                    Rows Removed by Filter: 13
                    Buffers: shared hit=2 read=1
        ->  Aggregate  (cost=628.83..628.84 rows=1 width=32) (actual time=56.288..56.289 rows=1 loops=1)
              Buffers: shared hit=441 read=33
              ->  Subquery Scan on t  (cost=612.04..628.81 rows=4 width=32) (actual time=31.516..50.543 rows=13122 loops=1)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4939
                    Buffers: shared hit=441 read=33
                    ->  Index Scan using test2_generator_idx on test2 test2_1  (cost=612.04..618.81 rows=800 width=18) (actual time=31.494..41.488 rows=18061 loops=1)
                          Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
                          Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
                          Rows Removed by Filter: 24
                          Buffers: shared hit=441 read=33
                          SubPlan 1
                            ->  Seq Scan on test2 test2_2  (cost=0.00..595.39 rows=6548 width=22) (actual time=0.035..17.811 rows=6461 loops=1)
                                  Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                                  Rows Removed by Filter: 6450
                                  Buffers: shared hit=434
Planning time: 0.919 ms
Execution time: 58.051 ms
--Second Run
Aggregate  (cost=631.95..631.96 rows=1 width=16) (actual time=29.849..29.849 rows=1 loops=1)
  Buffers: shared hit=477
  ->  Append  (cost=3.07..631.94 rows=2 width=16) (actual time=0.110..29.842 rows=2 loops=1)
        Buffers: shared hit=477
        ->  Aggregate  (cost=3.07..3.08 rows=1 width=122) (actual time=0.108..0.108 rows=1 loops=1)
              Buffers: shared hit=3
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..3.01 rows=13 width=122) (actual time=0.064..0.086 rows=13 loops=1)
                    Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                    Rows Removed by Filter: 13
                    Buffers: shared hit=3
        ->  Aggregate  (cost=628.83..628.84 rows=1 width=32) (actual time=29.729..29.729 rows=1 loops=1)
              Buffers: shared hit=474
              ->  Subquery Scan on t  (cost=612.04..628.81 rows=4 width=32) (actual time=11.059..24.372 rows=13122 loops=1)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4939
                    Buffers: shared hit=474
                    ->  Index Scan using test2_generator_idx on test2 test2_1  (cost=612.04..618.81 rows=800 width=18) (actual time=11.051..16.621 rows=18061 loops=1)
                          Index Cond: ((generator)::text = 'BLUE_CIRC_CHEM_DRP'::text)
                          Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
                          Rows Removed by Filter: 24
                          Buffers: shared hit=474
                          SubPlan 1
                            ->  Seq Scan on test2 test2_2  (cost=0.00..595.39 rows=6548 width=22) (actual time=0.111..5.851 rows=6461 loops=1)
                                  Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                                  Rows Removed by Filter: 6450
                                  Buffers: shared hit=434
Planning time: 0.696 ms
Execution time: 30.022 ms

Well that's nice. Two order of magnitude speedup when the query doesn't hit shared buffers fully (for the normal table case). The buffers matter far less here because the table is much, much smaller (the part we're doing our first filter on that's not TOAST-ed is tiny and likely stays in shared buffers, the filters on individual rows need to be done, but it's done from the compressed TOAST-ed version which (I think) is stored sequentially on disk so should be relatively easy to access anyway). I also will note that I ran both of these queries with a few other generators substituted in and sometimes started by running on the normal table and sometimes started on the array-based storage table and the results were consistent with those above (there wasn't something weird going on where running the query on the first table moved something into buffers for the second because of shared foreign keys or anything like that). We'd definitely expect something like this with the aggregation case. But what about the more normal select case? I wouldn't have expected the speedup to be quite as large.

EXPLAIN (ANALYZE,BUFFERS) SELECT record_time,lbmp,losses FROM data.rt_lbmp_generators 
  WHERE generator='CARTHAGE___PAPER' AND '[2014-05-07,2014-07-21]'::tstzrange @> record_time;
-----
--First Run
Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=24) (actual time=2.246..1760.533 rows=21829 loops=1)
  Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
  Rows Removed by Filter: 197855
  Buffers: shared hit=3300 read=3807
Planning time: 0.209 ms
Execution time: 1765.724 ms
--Second Run
Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37564.55 rows=1152 width=24) (actual time=0.091..78.138 rows=21829 loops=1)
  Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> record_time)
  Rows Removed by Filter: 197855
  Buffers: shared hit=7107
Planning time: 0.217 ms
Execution time: 81.957 ms

Compared to:

EXPLAIN (ANALYZE,BUFFERS) SELECT  (t.rt).record_time, (t.rt).lbmp, (t.rt).losses FROM (
  SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 WHERE generator='CARTHAGE___PAPER' 
    AND '[2014-05-07,2014-07-21]'::tstzrange && record_range) as t WHERE'[2014-05-07,2014-07-21]'::tstzrange @> (t.rt).record_time ;
-----
--First Run
Subquery Scan on t  (cost=0.29..8.25 rows=2 width=32) (actual time=6.281..41.960 rows=21829 loops=1)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
  Rows Removed by Filter: 4929
  Buffers: shared hit=12 read=49
  ->  Index Scan using test2_generator_idx on test2  (cost=0.29..4.50 rows=300 width=18) (actual time=6.267..19.989 rows=26758 loops=1)
        Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
        Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
        Rows Removed by Filter: 23
        Buffers: shared hit=12 read=49
Planning time: 0.205 ms
Execution time: 46.844 ms
--Second Run
Subquery Scan on t  (cost=0.29..8.25 rows=2 width=32) (actual time=1.361..29.782 rows=21829 loops=1)
  Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
  Rows Removed by Filter: 4929
  Buffers: shared hit=61
  ->  Index Scan using test2_generator_idx on test2  (cost=0.29..4.50 rows=300 width=18) (actual time=1.341..11.730 rows=26758 loops=1)
        Index Cond: ((generator)::text = 'CARTHAGE___PAPER'::text)
        Filter: ('["2014-05-07 00:00:00+00","2014-07-21 00:00:00+00"]'::tstzrange && record_range)
        Rows Removed by Filter: 23
        Buffers: shared hit=61
Planning time: 0.249 ms
Execution time: 33.508 ms

But we got a two orders of magnitude speedup again. So yeah. That ain't bad at all.

I'd love thoughts/comments sparked by this: are these comparisons fair? What other types of use cases are there that you think would break this horribly? What are the most important steps to take to automate this sort of storage/querying? What sorts of queries would break this type of setup or do you think would perform badly?

@djk447
Copy link
Author

djk447 commented Jun 6, 2016

Here's a better query for the aggregate query we've been working on. The NOT IN clause had been bugging me for a bit...for some reason there isn't a range/array operator equivalent to !@> i.e. not contained by, which makes it hard to find the ranges that overlap but are not fully contained by the range of interest. However, I finally realized that I could use the union (+) operator and just check if the result's the same as the range of interest i.e. record_range + '[2014-05-07,2015-07-21]'::tstzrange <> '[2014-05-07,2015-07-21]'::tstzrange. It eliminates some rather annoying query planner stuff as well and speeds up the queries a bit.

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
  SELECT sum(lbmp) lbmp, sum(losses) losses FROM (
    SELECT (lbmp).total as lbmp, (losses).total as losses FROM test2 
    WHERE generator='GOWANUS_GT1_8' AND'[2014-05-07,2015-07-21]'::tstzrange @> record_range) as t1
  UNION ALL
  SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 
    WHERE generator='GOWANUS_GT1_8' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range AND
      record_range + '[2014-05-07,2015-07-21]'::tstzrange <> '[2014-05-07,2015-07-21]'::tstzrange) as t
  WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;

-----
--First Run
Aggregate  (cost=32.51..32.52 rows=1 width=16) (actual time=31.057..31.057 rows=1 loops=1)
  Buffers: shared hit=10 read=38
  ->  Append  (cost=3.07..32.50 rows=2 width=16) (actual time=1.258..31.049 rows=2 loops=1)
        Buffers: shared hit=10 read=38
        ->  Aggregate  (cost=3.07..3.08 rows=1 width=122) (actual time=1.258..1.258 rows=1 loops=1)
              Buffers: shared hit=3 read=1
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..3.01 rows=13 width=122) (actual time=1.207..1.239 rows=13 loops=1)
                    Index Cond: ((generator)::text = 'GOWANUS_GT1_8'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                    Rows Removed by Filter: 13
                    Buffers: shared hit=3 read=1
        ->  Aggregate  (cost=29.39..29.40 rows=1 width=32) (actual time=29.788..29.788 rows=1 loops=1)
              Buffers: shared hit=7 read=37
              ->  Subquery Scan on t  (cost=0.29..29.35 rows=8 width=32) (actual time=6.382..24.554 rows=13122 loops=1)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4939
                    Buffers: shared hit=7 read=37
                    ->  Index Scan using test2_generator_idx on test2 test2_1  (cost=0.29..10.60 rows=1500 width=18) (actual time=6.364..16.414 rows=18061 loops=1)
                          Index Cond: ((generator)::text = 'GOWANUS_GT1_8'::text)
                          Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND ((record_range + '["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange) <> '["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange))
                          Rows Removed by Filter: 24
                          Buffers: shared hit=7 read=37
Planning time: 0.509 ms
Execution time: 31.189 ms

--Second Run
Aggregate  (cost=32.51..32.52 rows=1 width=16) (actual time=22.401..22.401 rows=1 loops=1)
  Buffers: shared hit=48
  ->  Append  (cost=3.07..32.50 rows=2 width=16) (actual time=0.101..22.358 rows=2 loops=1)
        Buffers: shared hit=48
        ->  Aggregate  (cost=3.07..3.08 rows=1 width=122) (actual time=0.100..0.100 rows=1 loops=1)
              Buffers: shared hit=4
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..3.01 rows=13 width=122) (actual time=0.073..0.085 rows=13 loops=1)
                    Index Cond: ((generator)::text = 'GOWANUS_GT1_8'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_range)
                    Rows Removed by Filter: 13
                    Buffers: shared hit=4
        ->  Aggregate  (cost=29.39..29.40 rows=1 width=32) (actual time=22.254..22.254 rows=1 loops=1)
              Buffers: shared hit=44
              ->  Subquery Scan on t  (cost=0.29..29.35 rows=8 width=32) (actual time=1.557..16.532 rows=13122 loops=1)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4939
                    Buffers: shared hit=44
                    ->  Index Scan using test2_generator_idx on test2 test2_1  (cost=0.29..10.60 rows=1500 width=18) (actual time=1.549..8.031 rows=18061 loops=1)
                          Index Cond: ((generator)::text = 'GOWANUS_GT1_8'::text)
                          Filter: (('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range) AND ((record_range + '["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange) <> '["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange))
                          Rows Removed by Filter: 24
                          Buffers: shared hit=44
Planning time: 0.480 ms
Execution time: 22.556 ms

@djk447
Copy link
Author

djk447 commented Jun 10, 2016

Jim thought CTE's and aggregates in the middle of the tests weren't fair because they forced materialization, and in many cases that's not an option, so I thought I'd run some comparisons on aggregates without pre-aggregation, just to check it out. So I rewrote the aggregation query with a full unnest on the filter condition. It still kicks the original's butt on the first time through and performs comparably on the second once things are in shared buffers, and by kicks the original's butt, I mean an order of magnitude. Which makes me think that this isn't so much of an index problem as a disk-read problem and the basic thing we're doing here is clustering things on disk making IO sequential rather than random for much of the task and just simply because it's compressed, there's less io to do. The issue you'll run into is that you'll need a sequential scan for some random queries, but often that's the case anyway.

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
  SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 
    WHERE generator='BAYONEEC___CTG4' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range ) as t
  WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;

--First Run
Aggregate  (cost=29.28..29.29 rows=1 width=16) (actual time=258.193..258.194 rows=1 loops=1)
  Buffers: shared hit=56 read=243
  ->  Aggregate  (cost=29.26..29.27 rows=1 width=32) (actual time=258.187..258.188 rows=1 loops=1)
        Buffers: shared hit=56 read=243
        ->  Subquery Scan on t  (cost=0.29..29.22 rows=8 width=32) (actual time=8.904..204.881 rows=127615 loops=1)
              Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
              Rows Removed by Filter: 4939
              Buffers: shared hit=56 read=243
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..10.47 rows=1500 width=18) (actual time=8.884..137.419 rows=132554 loops=1)
                    Index Cond: ((generator)::text = 'BAYONEEC___CTG4'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range)
                    Rows Removed by Filter: 11
                    Buffers: shared hit=56 read=243
Planning time: 0.245 ms
Execution time: 258.433 ms
--Second Run
Aggregate  (cost=29.28..29.29 rows=1 width=16) (actual time=164.062..164.062 rows=1 loops=1)
  Buffers: shared hit=299
  ->  Aggregate  (cost=29.26..29.27 rows=1 width=32) (actual time=164.054..164.054 rows=1 loops=1)
        Buffers: shared hit=299
        ->  Subquery Scan on t  (cost=0.29..29.22 rows=8 width=32) (actual time=1.334..115.306 rows=127615 loops=1)
              Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> (t.rt).record_time)
              Rows Removed by Filter: 4939
              Buffers: shared hit=299
              ->  Index Scan using test2_generator_idx on test2  (cost=0.29..10.47 rows=1500 width=18) (actual time=1.328..51.885 rows=132554 loops=1)
                    Index Cond: ((generator)::text = 'BAYONEEC___CTG4'::text)
                    Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange && record_range)
                    Rows Removed by Filter: 11
                    Buffers: shared hit=299
Planning time: 0.180 ms
Execution time: 164.128 ms

Compared to the original:

EXPLAIN (ANALYZE,BUFFERS) SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators 
  WHERE generator='BAYONEEC___CTG4' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time;
-----
--First Run
Aggregate  (cost=37173.89..37173.90 rows=1 width=16) (actual time=2556.618..2556.618 rows=1 loops=1)
  Buffers: shared hit=2657 read=3321
  ->  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37168.20 rows=1138 width=16) (actual time=2.467..2510.484 rows=127615 loops=1)
        Index Cond: ((generator)::text = 'BAYONEEC___CTG4'::text)
        Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92069
        Buffers: shared hit=2657 read=3321
Planning time: 0.506 ms
Execution time: 2556.777 ms
--Second Run
Aggregate  (cost=37173.89..37173.90 rows=1 width=16) (actual time=150.463..150.464 rows=1 loops=1)
  Buffers: shared hit=5978
  ->  Index Scan using rt_lbmp_gen_generator_idx on rt_lbmp_generators  (cost=0.57..37168.20 rows=1138 width=16) (actual time=0.111..108.846 rows=127615 loops=1)
        Index Cond: ((generator)::text = 'BAYONEEC___CTG4'::text)
        Filter: ('["2014-05-07 00:00:00+00","2015-07-21 00:00:00+00"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92069
        Buffers: shared hit=5978
Planning time: 0.377 ms
Execution time: 150.546 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment