sstadelman/Exercises_5_15_2017.md

## Exercises_5_15_2017.md

      
    Raw
  

              Exercises_5_15_2017.md
            
          
    Ran Google Cloud Platform Big Query examples for working with time-series financial data.  Used gcloud command-line tools to connect to Google Cloud project financials-test.  Initialized bq (BigQuery tools), and pointed it at my project. Imported two large csv files of USD-GBP exchange rates in 2014.
# my GCP project
gcloud config set project financials-test

# my temporary data set that is imported to BigQuery
bq mk timeseries

# note the column definitions appended to the `load` command
bq load timeseries.gbpusd_0114 gs://solutions-public-assets/time-series-master/\_
GBPUSD_2014_01.csv venue:STRING,currencies:STRING,time:TIMESTAMP,bid:FLOAT,ask:FLOAT
bq load timeseries.gbpusd_0214 gs://solutions-public-assets/time-series-master/\_
GBPUSD_2014_02.csv venue:STRING,currencies:STRING,time:TIMESTAMP,bid:FLOAT,ask:FLOAT
Then, switched to BigQuery Web UI, and used the SQL query builder to run queries on the data sets.
I ran a simple query to just SELECT * FROM <timeseries> ORDER BY <column> LIMIT 1000; to check the data.
Then, I updated the query to use the FORMAT_UTC_USEC(time) as time pattern, to print miliseconds.
SELECT
venue,
currencies,
FORMAT_UTC_USEC(time) as time,
bid,
ask
FROM
timeseries.gbpusd_0114
ORDER BY
time ASC
LIMIT 1000;
Next, I selected the time and bid column values, where time was within a 24 hour period.
SELECT
  TIME(time) AS time,
  bid
FROM
  timeseries.gbpusd_0114
WHERE
  time BETWEEN TIMESTAMP("2014-01-01 00:00:00.000")
  AND TIMESTAMP("2014-01-01 00:29:59.999")
ORDER BY
  time ASC;
I learned to query data across two tables, with common columns, based on a query of the id of the table itself.  For example, find the min and max time, in either table with id containing "gbpusd".
select 
min(time) as time1,
max(time) as time2
from (table_query(timeseries, 'table_id contains "gbpusd"'));
I briefly used a string-formatter to clip the month-day-year information off of the time, so that I could easily view all results within a single day, without the extra string overhead.
SELECT
  STRFTIME_UTC_USEC(time, "%m-%d-%Y %r") AS time,
  ask
FROM
  [timeseries.gbpusd_0114] [timeseries.gbpusd_0214]
WHERE
  time BETWEEN TIMESTAMP("2014-01-31 23:30:00.000")
  AND TIMESTAMP("2014-02-01 00:29:59.999")
ORDER BY
  time ASC;
To produce a histogram, I grouped the count of ticks (time records) by hour, for a 12 and 24 hour period.  Then, to make it more interesting, I calculated the average ticks per hour, over all the days of the data set.
SELECT
  HOUR(time) AS hour,
  COUNT(time)/COUNT(DISTINCT DAY(time)) AS avg_ticks
FROM
  [timeseries.gbpusd_0114]
GROUP BY
  hour
ORDER BY
  hour ASC;
Last, and most importantly, I calculated the Simple Moving Average (SMA) for 60 seconds, for each tick in a 3-hour period, while joining the data from two tables.  I selected time and AVG(bid), from two tables joined using JOIN EACH for optimization, where the timestamp was between the tick's time and time - 60(seconds), and where the timestamp was within the 3-hour period.
I expect to reproduce this method, to calculate more interesting indicators.
SELECT
  TIME(S1.time) AS bid_time,
  AVG(S2.bid) AS avg_bid
FROM
  timeseries.gbpusd_0114 AS S1
JOIN EACH
  timeseries.gbpusd_0114 AS S2
ON
  S1.time = S2.time
WHERE
  TIMESTAMP_TO_SEC(S2.time)
  BETWEEN (TIMESTAMP_TO_SEC(S1.time) - 60)
  AND TIMESTAMP_TO_SEC(S1.time)
  AND
  S1.time
  BETWEEN TIMESTAMP("2014-01-16 09:00:00.000")
  AND TIMESTAMP("2014-01-16 11:59:59.999")
GROUP BY
  bid_time
ORDER BY
  bid_time ASC;