Used Google Cloud Platform BigQuery WebUI to edit SQL queries for GOOGL prediction code competition.
Found some very good examples of BigQuery complex queries here. Really critical findings are:
- Can use
lag(<prop name>, <previous row count) over (ORDER BY <prop name>)
, andlead
to access the window of the result set. - When combined with functions like
AVG(<prop name>)
, makes it really easy to calculate moving averages
SELECT
DATE(S1.date) AS join_date,
S1.high AS observed_equity_high,
LAG(S2.high, 1) OVER (ORDER BY S2.date ASC) AS prior_day_vix_futures_high,
AVG((S2.high - S2.open) / S2.open) OVER (ORDER BY S2.date ASC rows BETWEEN 3 preceding AND 1 preceding) AS future_vol_delta_norm_3dMA,
AVG((S1.high - S1.open) / S1.open) OVER (ORDER BY S1.date ASC rows BETWEEN 3 preceding AND 1 preceding) AS equity_delta_norm_3dMA,
LAG(S1.volume, 1) OVER (ORDER BY s1.date ASC) AS prior_day_volume
FROM
quandl_googl_vix.googl AS S1
JOIN EACH quandl_googl_vix.vixgog AS S2
ON
S1.date = S2.date
ORDER BY
join_date ASC;
- Exported to Google Sheet in csv.
The data points for this test are:
- Classifier
- observed_equity_high
- Dimensions
- prior_day_vix_futures_high
- future_vol_delta_norm_3dMA
- equity_delta_norm_3dMA
- step through "Tensorflow without a PhD" examples, and use exported table here.
- investigate whether ensuring that the data is correctly ordered in the view, is sufficient to avoid
OVER (ORDER BY)
in all selections