Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save djk447/b597cd6c9114d10c309a07bb5916bad5 to your computer and use it in GitHub Desktop.
Save djk447/b597cd6c9114d10c309a07bb5916bad5 to your computer and use it in GitHub Desktop.
Implementing an Array-Based Timeseries Store in Postgres

#Implementing an Array-Based Timeseries Store in Postgres

UPDATE: This is Part 1, Part 2 is here, enjoy!

In response to Jim Nasby's question over at ElephantStack I thought I'd play around with the concept and see where it took me. Hoping to spark ideas from others on how to store things that have a column and row element more effectively with Postgres. ##Data Setup The data I’m using for this example is publicly available data from the NYISO historical data set located here: http://www.nyiso.com/public/markets_operations/market_data/custom_report/index.jsp?report=rt_lbmp_gen

It’s the real time location based marginal price of electricity at multiple generators in New York State. The tables I’ve used to store data from the NYISO site are below (I probably ought be using text instead of varchar):

CREATE SCHEMA data;
CREATE TABLE data.generators (
    id          varchar primary key,
    ptid        int NOT NULL UNIQUE,
    name        varchar,
    location    point,
    owner       text,
    address     jsonb,
    fields      jsonb
);
CREATE TABLE data.rt_lbmp_generators(
    generator          varchar references data.generators(id) ON UPDATE CASCADE ON DELETE RESTRICT,
    record_time        timestamptz NOT NULL,
    lbmp               float,
    losses             float,
    congestion         float,
    price_version      int,
    PRIMARY KEY(generator,record_time)
);
CREATE INDEX rt_lbmp_gen_record_time_idx ON data.rt_lbmp_generators (record_time);
CREATE INDEX rt_lbmp_gen_generator_idx ON data.rt_lbmp_generators (generator);

I have some of the data from NYISO downloaded and inserted into the rt_lbmp_generators table, not a whole lot to start, only ~2 million rows, here’s a summary of the data:

generator record_range num_records
CRESCENT___HYD ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684
ARTHUR_KILL_2 ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684
AST_ENERGY_2_CC4 ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684
CH_MISC_IPPS ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684
BARRETT_IC_8 ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684
74TH STREET_GT_1 ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684
BEAVER RIVER___HYD ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684
ASTORIA_GT2_1 ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684
BOWLINE___1 ["2014-04-16 00:05:00-04","2016-05-12 00:00:00-04"] 219684

##Queries

We're going to look at two pretty basic types of queries on this data- the first is a simple selection of some of the data over a few months in a range we have in the data set, the second an aggregation over a longer time period. We'll use these to explore a few possible array-aggregation strategies. Here are the basic queries:

SELECT record_time,lbmp,losses FROM data.rt_lbmp_generators WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2014-07-21]'::tstzrange @> record_time; 

and

SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators 
WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time; 

NOTE: '[2014-05-07,2014-07-21]'::tstzrange @> record_time; is functionally the same as record_time BETWEEN '2014-05-07'::timestamptz AND '2014-07-21'::timestamptz; But I used the range based one as they seem to run at basically the same speed and the analogy to some of the range based queries used in the array-aggregated tables is clearer.

 EXPLAIN ANALYZE SELECT record_time,lbmp,losses FROM data.rt_lbmp_generators WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2014-07-21]'::tstzrange @> record_time ;
 -----
 Bitmap Heap Scan on rt_lbmp_generators  (cost=5308.35..29649.64 rows=1088 width=24) (actual time=19.566..50.069 rows=21829 loops=1)
  Recheck Cond: ((generator)::text = '74TH STREET_GT_1'::text)
  Filter: ('["2014-05-07 00:00:00-04","2014-07-21 00:00:00-04"]'::tstzrange @> record_time)
  Rows Removed by Filter: 197855
  Heap Blocks: exact=2498
  ->  Bitmap Index Scan on rt_lbmp_gen_generator_idx  (cost=0.00..5308.07 rows=217553 width=0) (actual time=19.265..19.265 rows=219684 loops=1)
        Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
Planning time: 0.091 ms
Execution time: 51.078 ms
EXPLAIN ANALYZE SELECT sum(lbmp) as total_lbmp, sum(losses) as total_losses FROM data.rt_lbmp_generators WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange @> record_time;
-----
Aggregate  (cost=29655.08..29655.09 rows=1 width=16) (actual time=75.041..75.041 rows=1 loops=1)
  ->  Bitmap Heap Scan on rt_lbmp_generators  (cost=5308.35..29649.64 rows=1088 width=16) (actual time=18.704..56.603 rows=127618 loops=1)
        Recheck Cond: ((generator)::text = '74TH STREET_GT_1'::text)
        Filter: ('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange @> record_time)
        Rows Removed by Filter: 92066
        Heap Blocks: exact=2498
        ->  Bitmap Index Scan on rt_lbmp_gen_generator_idx  (cost=0.00..5308.07 rows=217553 width=0) (actual time=18.346..18.346 rows=219684 loops=1)
              Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
Planning time: 0.092 ms
Execution time: 75.079 ms

##Implementing Array-Aggregation First we'll create a table called test to test out a few ideas about how we might do this, it will contain the generator, a record_range (the range of timestamps in record_time for the array aggregate we create) and rt_lbmp an array of rows of the same type as our initial table.

CREATE TABLE test(
	generator 		varchar REFERENCES data.generators(id) ,
	record_range	tstzrange,
	rt_lbmp			data.rt_lbmp_generators[]);
CREATE INDEX ON test USING GIST (record_range);
CREATE INDEX ON test (generator);

NOTE: for the next bit, I’ll be using the very helpful range_type_functions extension from the folks over at Moat. If you don’t want to install the full extension, you can just use a few functions it creates on PG 9.5 and above you can copy them from the repo and make them directly, you may need to drop them if you do decide to install the full extension I'm just using the to_range functions for now:

create function to_range(low anyelement, high anyelement, bounds text, range anyrange) returns anyrange
language plpgsql immutable as $$
declare
    l_range text;
begin
    execute format('select %s($1,$2,$3)',pg_typeof(range)) using low, high, bounds into l_range;
    return l_range;
end
$$;
comment on function to_range(low anyelement, high anyelement, bounds text, range anyrange)
is E'Given a lower bound, upper bound, bounds description, return a range of the given range type.';
create function to_range(elem anyelement, range anyrange) returns anyrange
language sql immutable set search_path from current as $$
select to_range(elem,elem,'[]',range);
$$;
comment on function to_range(elem anyelement, range anyrange)
is E'Convert an element e into the range [e].';

Now we'll need to get our data over to the test table from our original table, a relatively simple query will do. Given the shape of our data, we'll split it into months, but none of our queries will rely on that fact.

WITH t as (SELECT * FROM data.rt_lbmp_generators) INSERT INTO test SELECT generator, range_merge(to_range(min(record_time),null::tstzrange),to_range(max(record_time), null::tstzrange)) as record_range, array_agg(t.*::data.rt_lbmp_generators) as rt_lbmp FROM t GROUP BY generator, EXTRACT(year FROM record_time),EXTRACT(month from record_time);

We're extracting the correct ranges here and aggregating the records into arrays (as full records, this approach is a bit simpler than splitting each column out into its own array for now, it might be that it's more efficient to do the column by column approach, but I thought this would be simpler for now).

##Did it Compress? So let's see if this accomplished our compression!

NOTE: (we'll use a few different queries from this nice guide, only the total size one, which returns the size of tables with all indexes and related toast tables included is shown below, a few others are used though):

  SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 10;
relation total_size
data.rt_lbmp_generators 1937 MB
public.test 28 MB

Seems we got 2 orders of magnitude! Not bad.

Just as a note, a lot of that is the indexes on the rt_lbmp_generators table, especially the primary key (which probably I ought get rid of at some point):

| relation | size | |----------------------------------|---------| | data.rt_lbmp_generators_pkey | 1161 MB | | data.rt_lbmp_gen_record_time_idx | 546 MB | | data.rt_lbmp_generators | 165 MB | | data.rt_lbmp_gen_generator_idx | 65 MB | | pg_toast.pg_toast_382304 | 28 MB |

The pg_toast table is the table used to actually store much of the data for our test table, so even without the indexes there's an order of magnitude decrease in on-disk size. We could mimic the primary key behavior using another GIST index and an exclusion constraint, but I won't get into that here, it should be significantly smaller though and as the table grows would help speed up the queries below.

##But is it Fast? Okay, the data is compressing, but can we still query it reasonably?

EXPLAIN ANALYZE SELECT  (t.rt).record_time, (t.rt).lbmp, (t.rt).losses FROM (
  SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2014-07-21]'::tstzrange && record_range) as t WHERE'[2014-05-07,2014-07-21]'::tstzrange @> (t.rt).record_time ;
-----
Subquery Scan on t  (cost=0.00..11.75 rows=2 width=32) (actual time=1.558..14.688 rows=21829 loops=1)
  Filter: ('["2014-05-07 00:00:00-04","2014-07-21 00:00:00-04"]'::tstzrange @> (t.rt).record_time)
  Rows Removed by Filter: 4929
  ->  Seq Scan on test  (cost=0.00..8.00 rows=300 width=18) (actual time=1.551..6.674 rows=26758 loops=1)
        Filter: (('["2014-05-07 00:00:00-04","2014-07-21 00:00:00-04"]'::tstzrange && record_range) AND ((generator)::text = '74TH STREET_GT_1'::text))
        Rows Removed by Filter: 231
Planning time: 0.132 ms
Execution time: 15.829 ms

This is a simple query that selects the rows from test where record_range overlaps the range of interest, unnests the result set and then filters that based on record time to remove any individual points that didn't perfectly overlap. It's significantly faster than the normal storage case, even with indexes on the normal storage and even though we're doing a sequential scan on the resultset. It works well because the scan is still on a significantly smaller subset of data because the initial query reduced the order dramatically.

EXPLAIN ANALYZE SELECT (t1.rt).record_time, (t1.rt).lbmp, (t1.rt).losses FROM 
(SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test WHERE generator='74TH STREET_GT_1' AND 
'[2014-05-07,2014-07-21]'::tstzrange @> record_range) as t1
UNION ALL
SELECT  (t.rt).record_time, (t.rt).lbmp, (t.rt).losses FROM 
(SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2014-07-21]'::tstzrange && record_range AND record_range NOT IN (SELECT record_range FROM test WHERE '[2014-05-07,2014-07-21]'::tstzrange @> record_range)) as t WHERE'[2014-05-07,2014-07-21]'::tstzrange @> (t.rt).record_time; 
-----
Append  (cost=0.00..24.56 rows=101 width=24) (actual time=1.231..15.278 rows=21829 loops=1)
  ->  Subquery Scan on t1  (cost=0.00..8.01 rows=100 width=24) (actual time=1.230..5.037 rows=8729 loops=1)
        ->  Seq Scan on test  (cost=0.00..7.01 rows=100 width=18) (actual time=1.227..2.440 rows=8729 loops=1)
              Filter: (('["2014-05-07 00:00:00-04","2014-07-21 00:00:00-04"]'::tstzrange @> record_range) AND ((generator)::text = '74TH STREET_GT_1'::text))
              Rows Removed by Filter: 233
  ->  Subquery Scan on t  (cost=5.95..16.54 rows=1 width=32) (actual time=1.129..8.045 rows=13100 loops=1)
        Filter: ('["2014-05-07 00:00:00-04","2014-07-21 00:00:00-04"]'::tstzrange @> (t.rt).record_time)
        Rows Removed by Filter: 4929
        ->  Seq Scan on test test_1  (cost=5.95..14.04 rows=200 width=18) (actual time=1.123..3.686 rows=18029 loops=1)
              Filter: (('["2014-05-07 00:00:00-04","2014-07-21 00:00:00-04"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)) AND ((generator)::text = '74TH STREET_GT_1'::text))
              Rows Removed by Filter: 232
              SubPlan 1
                ->  Seq Scan on test test_2  (cost=0.00..5.93 rows=11 width=22) (actual time=0.004..0.047 rows=9 loops=1)
                      Filter: ('["2014-05-07 00:00:00-04","2014-07-21 00:00:00-04"]'::tstzrange @> record_range)
                      Rows Removed by Filter: 225
Planning time: 0.303 ms
Execution time: 16.445 ms

This query returns the same results with a bit more complex search path, as in, I was trying to get it to not have to look at the rows that it unnests from any of the arrays that are completely contained in the overall query (@> operator, see the docs if you want some more detail on range operators. It doesn't seem to speed things up much, but is a more confusing query plan. This type of query will likely be more helpful when we get to our aggregate functions.

##But is it Fast for Aggregates? We'll start off with a naive implementation of aggregation and then move to more complex cases. The naive implementation just goes through and unnests the rows after a first filter and then aggregates according to a filter. Very similar to the simple select query from before.

EXPLAIN ANALYZE SELECT sum((t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
  SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test 
  WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range) as t 
WHERE '[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time;
-----
Aggregate  (cost=32.76..32.77 rows=1 width=32) (actual time=71.965..71.965 rows=1 loops=1)
  ->  Subquery Scan on t  (cost=0.00..32.72 rows=8 width=32) (actual time=1.328..48.932 rows=127618 loops=1)
        Filter: ('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange @> (t.rt).record_time)
        Rows Removed by Filter: 4936
        ->  Seq Scan on test  (cost=0.00..13.97 rows=1500 width=18) (actual time=1.325..25.062 rows=132554 loops=1)
              Filter: (('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange && record_range) AND ((generator)::text = '74TH STREET_GT_1'::text))
              Rows Removed by Filter: 219
Planning time: 0.138 ms
Execution time: 72.000 ms

The naive implementation takes about as long as a query on a normal table with pretty significant indexes (partial indexes would probably work better, there are a lot of things I should do to make the normal way work better, it's a bit of a straw man I know, but still the things we're doing to improve performance are not completely crazy as first order things to try, so I'm not going to be too mad about it). But let's see how a little less naive implementation might look:

EXPLAIN ANALYZE SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
  SELECT sum((t1.rt).lbmp) lbmp, sum((t1.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test 
    WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange @>record_range) as t1
  UNION ALL
  SELECT sum((t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test WHERE generator='74TH STREET_GT_1' 
    AND '[2014-05-07,2015-07-21]'::tstzrange && record_range 
    AND record_range NOT IN (
      SELECT record_range FROM test WHERE'[2014-05-07,2015-07-21]'::tstzrange @> record_range)) as t
  WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b; 
-----
Aggregate  (cost=44.67..44.68 rows=1 width=16) (actual time=42.909..42.909 rows=1 loops=1)
  ->  Append  (cost=32.48..44.66 rows=2 width=16) (actual time=42.887..42.906 rows=2 loops=1)
        ->  Aggregate  (cost=32.48..32.49 rows=1 width=32) (actual time=42.886..42.886 rows=1 loops=1)
              ->  Seq Scan on test  (cost=0.00..12.98 rows=1300 width=18) (actual time=1.061..21.582 rows=114499 loops=1)
                    Filter: (('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange @> record_range) AND ((generator)::text = '74TH STREET_GT_1'::text))
                    Rows Removed by Filter: 221
        ->  Aggregate  (cost=12.14..12.15 rows=1 width=32) (actual time=0.018..0.018 rows=1 loops=1)
              ->  Subquery Scan on t  (cost=6.37..12.14 rows=1 width=32) (actual time=0.016..0.016 rows=0 loops=1)
                    Filter: ('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange @> (t.rt).record_time)
                    ->  Index Scan using test_generator_idx on test test_1  (cost=6.37..10.89 rows=100 width=18) (actual time=0.016..0.016 rows=0 loops=1)
                          Index Cond: ((generator)::text = '74TH STREET_GT_1'::text)
                          Filter: (('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)))
                          SubPlan 1
                            ->  Seq Scan on test test_2  (cost=0.00..5.93 rows=119 width=22) (never executed)
                                  Filter: ('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange @> record_range)
Planning time: 0.251 ms
Execution time: 42.963 ms

Here we can take advantage of not having to filter each row in the ranges that are completely contained by the range of interest and we can speed the query up significantly. But we can do even a bit better. We'll create a summary data type that can store some of the aggregates when we aggregate the array.

CREATE TYPE summary AS (
	nn_count		int,
	minimum		double precision,
	maximum		double precision,
	total		double precision,
	average		double precision);
CREATE TABLE test2(
	generator 		varchar REFERENCES data.generators(id) ,
	record_range	tstzrange,
	lbmp			summary,
	losses			summary,
	congestion		summary,
	rt_lbmp			data.rt_lbmp_generators[]);

Then we'll have to insert into our new table in a slightly different way than before.

WITH t as (SELECT * FROM data.rt_lbmp_generators) INSERT INTO test2 
  SELECT generator, range_merge(to_range(min(record_time),null::tstzrange),to_range(max(record_time), null::tstzrange)) as record_range, 
  (count(lbmp) , min(lbmp) , max(lbmp) , sum(lbmp) , avg(lbmp) )::summary as lbmp, 
  (count(losses) , min(losses) , max(losses) , sum(losses) , avg(losses))::summary as losses, 
  (count(congestion) , min(congestion) , max(congestion) , sum(congestion) , avg(congestion))::summary as congestion,
  array_agg(t.*::data.rt_lbmp_generators) as rt_lbmp FROM t 
  GROUP BY generator, EXTRACT(year FROM record_time),EXTRACT(month from record_time);
  
--Then create our indexes
CREATE INDEX ON test2 USING GIST (record_range);
CREATE INDEX ON test2 (generator);

Now we've got an interesting set of things we can do-instead of even accessing the arrays stored in the ranges that are entirely contained in our query, we can just access the already stored values and sum them, we'll only need to access the arrays that overlap but are not completely contained by the range of interest and we can go from there.

EXPLAIN ANALYZE SELECT sum(b.lbmp) total_lbmp, sum(b.losses) total_losses FROM (
  SELECT sum(lbmp) lbmp, sum(losses) losses FROM (
    SELECT (lbmp).total as lbmp, (losses).total as losses FROM test2 
    WHERE generator='74TH STREET_GT_1' AND'[2014-05-07,2015-07-21]'::tstzrange @> record_range) as t1
  UNION ALL
  SELECT sum( (t.rt).lbmp) lbmp, sum((t.rt).losses) losses FROM (
    SELECT unnest(rt_lbmp)::data.rt_lbmp_generators as rt FROM test2 
    WHERE generator='74TH STREET_GT_1' AND '[2014-05-07,2015-07-21]'::tstzrange && record_range AND
      record_range NOT IN (
        SELECT record_range FROM test2 WHERE '[2014-05-07,2015-07-21]'::tstzrange @> record_range)) as t
  WHERE'[2014-05-07,2015-07-21]'::tstzrange @> (t.rt).record_time) as b;
-----
Aggregate  (cost=48.94..48.95 rows=1 width=16) (actual time=10.789..10.789 rows=1 loops=1)
  ->  Append  (cost=11.57..48.93 rows=2 width=16) (actual time=0.093..10.786 rows=2 loops=1)
        ->  Aggregate  (cost=11.57..11.58 rows=1 width=122) (actual time=0.093..0.093 rows=1 loops=1)
              ->  Seq Scan on test2  (cost=0.00..11.51 rows=13 width=122) (actual time=0.011..0.085 rows=13 loops=1)
                    Filter: (('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange @> record_range) AND ((generator)::text = '74TH STREET_GT_1'::text))
                    Rows Removed by Filter: 221
        ->  Aggregate  (cost=37.32..37.33 rows=1 width=32) (actual time=10.693..10.693 rows=1 loops=1)
              ->  Subquery Scan on t  (cost=11.22..37.30 rows=4 width=32) (actual time=1.243..7.882 rows=13119 loops=1)
                    Filter: ('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange @> (t.rt).record_time)
                    Rows Removed by Filter: 4936
                    ->  Seq Scan on test2 test2_1  (cost=11.22..27.30 rows=800 width=18) (actual time=1.238..4.305 rows=18055 loops=1)
                          Filter: (('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange && record_range) AND (NOT (hashed SubPlan 1)) AND ((generator)::text = '74TH STREET_GT_1'::text))
                          Rows Removed by Filter: 232
                          SubPlan 1
                            ->  Seq Scan on test2 test2_2  (cost=0.00..10.93 rows=119 width=22) (actual time=0.002..0.062 rows=117 loops=1)
                                  Filter: ('["2014-05-07 00:00:00-04","2015-07-21 00:00:00-04"]'::tstzrange @> record_range)
                                  Rows Removed by Filter: 117
Planning time: 0.255 ms
Execution time: 10.847 ms

Not bad. Sped up the original query by about a factor of 7. Now I'm sure there are lots of things I could be doing to make these queries better, both the ones with the array based storage and more likely the ones without. I'd love thoughts and comments from anyone who has them. I did this on a relatively small data set for this one, but hopefully I'll get around to a part 2 sometime soon and explore it with more of the data from the NYISO set.

UPDATE: Part 2 is here, enjoy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment