Skip to content

Instantly share code, notes, and snippets.

@amosr
Last active November 26, 2015 03:37
Show Gist options
  • Save amosr/4bcc54d65f8e5fe1018e to your computer and use it in GitHub Desktop.
Save amosr/4bcc54d65f8e5fe1018e to your computer and use it in GitHub Desktop.
Playing with postgres

Comparing the Icicle against Postgres

Using the ASX data. SQL to create table and index is:

create table asx (company char(3), date date, open double precision, high double precision, low double precision, close double precision, volume double precision, adjclose double precision);

250mb dataset (500mb as json psv)

Postgres

Load: 15s

copy asx from '/Users/amos/proj/ambiata/postgres-test/asx.csv' csv header;

Load with index: 90s

create index asx_company_date on asx (company, date);
copy asx from '/Users/amos/proj/ambiata/postgres-test/asx.csv' csv header;

Avg open: 1.5s

Avg open, no index: 1.3s

select company, avg(open) from asx where date < '2015-01-01' group by company;

Bunch of things: 3.3s

Bunch of things, no index: 3.3s

select company,
    avg(open), avg(high), avg(low), avg(close), avg(close - open), max(close - open), min(close - open), sum( case when close > open then 1 else 0 end)
from asx where date < '2015-01-01' group by company;

Slow sum of open/close: 10s

select a.company, sum(x.close_prev - x.open_this) from
(select distinct company from asx) a,
(select lag(close,1) over (partition by company order by date asc) as close_prev, open as open_this, company as company_that from asx where date < '2015-01-01') x
where a.company = x.company_that group by a.company;

Faster sum of open/close gap: 3.5s

Faster sum of open/close gap, no index: 100s

select x.company_that,
       avg(x.closeprice_prev - x.openprice_this)
from (
       select lag(adjclose,1) over (partition by company order by date asc) as closeprice_prev,
       open as openprice_this,
       company as company_that
       from asx
       where date < '2015-01-01'
) x
group by company_that;

Icicle

Mean: 4.0s (0.5s compiling, 3.5 running)

All together: 6.3s (2.6s compiling, 3.7 running)

12gb data (18gb for json psv)

(Times for this are in minutes)

Postgres

Load no index: 8m

copy asx from '/Users/amos/proj/ambiata/postgres-test/large/asx.csv' csv header;

Load with index: 45m

create index asx_company_date on asx (company, date);
copy asx from '/Users/amos/proj/ambiata/postgres-test/large/asx.csv' csv header;

Mean: 0m40

Mean, no index: 1m30

select company,
    avg(open)
from asx where date < '2015-01-01' group by company;

Bunch of things: 1m30

Bunch of things, no index: 1m30

select company,
    avg(open), avg(high), avg(low), avg(close), avg(close - open), max(close - open), min(close - open), sum( case when close > open then 1 else 0 end)
from asx where date < '2015-01-01' group by company;

Faster sum of open/close gap: 53m

Faster sum of open/close gap, no index: ???

select x.company_that,
       avg(x.closeprice_prev - x.openprice_this)
from (
       select lag(adjclose,1) over (partition by company order by date asc) as closeprice_prev,
       open as openprice_this,
       company as company_that
       from asx
       where date < '2015-01-01'
) x

Icicle

Just mean: 3m20

Everything: 3m30

Icicle Dictionary

[feature.mean_open]
  expression = "feature data ~> mean open"

[feature.mean_high]
  expression = "feature data ~> mean high"

[feature.mean_low]
  expression = "feature data ~> mean low"

[feature.mean_close]
  expression = "feature data ~> mean close"

[feature.mean_gap]
  expression = "feature data ~> mean (close-open)"

[feature.max_gap]
  expression = "feature data ~> max (close-open)"

[feature.min_gap]
  expression = "feature data ~> min (close-open)"

[feature.close_open]
  expression = "feature data ~> filter close > open ~> count close"

[feature.gap_between_days]
  expression = "feature data ~> fold1 prev_close = (0, close) : case prev_close | (sum,prev) -> (sum + (prev-open), close) end ~> case prev_close | (sum,_) -> sum end"
@amosr
Copy link
Author

amosr commented Nov 26, 2015

don't take this too seriously - this is really just for fun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment