Skip to content

Instantly share code, notes, and snippets.

@ckashby
Created June 10, 2024 14:40
Show Gist options
  • Save ckashby/e43c0b20a2edf8e3469b5469b779e920 to your computer and use it in GitHub Desktop.
Save ckashby/e43c0b20a2edf8e3469b5469b779e920 to your computer and use it in GitHub Desktop.
import sqlite3
import pandas as pd
conn = sqlite3.connect("ladder.db")
# pd.read_sql("SELECT * FROM mytable;", conn)
# pd.read_sql("SELECT * FROM pets;", conn)
# pd.read_sql("SELECT * FROM employees;", conn)
# pd.read_sql("SELECT * FROM transactions;", conn)
pd.read_sql("SELECT * FROM yum;", conn)
# # Window Functions and More
# The only new topic in this file is window functions. By itself, this topic is not difficult, and so we
# will start out slow. However, this topic is very useful in more complex queries.
# Solutions to some of these problems will be even longer and more involved than in the previous file.
# We'll be mixing a lot of prior material
# ## Window Functions
# 78) Create a daily cumulative sum of the trading volume of the Yum! stock in `yum`.
pd.read_sql("SELECT date, volume, SUM(volume) OVER (ORDER BY date) AS cumulative_volume FROM yum;", conn)
# 79) Create a cumulative sum of the trading volume of Yum! across months. That is, the final row of this query should be the cumulative sum of all months from 2015 through 2019.
pd.read_sql("""
SELECT
strftime('%Y', date) AS year,
strftime('%m', date) AS month,
volume,
SUM(volume)
OVER (ORDER BY date)
AS cumulative_volume FROM yum;
""",
conn)
# 84) Next, let's create the [**Stochastic Oscillator**](https://www.investopedia.com/terms/s/stochasticoscillator.asp) of `yum`. The stochastic oscillator is actually two lines: One called _%K_ and the other called _%D_. They are computed as follows:
# * Let `h14` and `l14` denote the 14-day highs and lows (similar to last problem)
# * `percent_k = (close - l14) / (h14 - l14)`
# * `percent_d` is the 3-day moving average of `percent_k`
# My solution to this problem involved making _TWO_ CTEs (the second one uses the first one). My query was 29 lines long. As a guide to see if you got the answer correct, here's the first few lines of my (sorted) solution:
# ```
# date percent_k perecent_d
# ---------- ---------- ----------
# 2015-01-02
# 2015-01-05 -0.7417199
# 2015-01-06 -0.3257592 -0.7417199
# ```
# 85) In my opinion, this is the hardest problem in the ladder challenge. For each month between 2015 and 2019, as in the final problem from the `03` file, we'll attach Yum! stock data to the `transactions` data. Let's condense our `yum` data to show relevant monthly statistics. That is, for each month of each year, create a table with the following columns:
# * Year
# * Month
# * The total revenue from our company in `transactions`
# * The monthly low (ie, the lowest low that month)
# * The monthly high (ie, the highest high that month)
# * The monthly open (ie, the opening value in the first trading day that month)
# * The monthly close (ie, the closing value of the last trading day that month)
# * The total trade volume of Yum! that month
# My solution to this problem is 38 lines long. For reference, here are the top 3 rows of the solution:
# ```
# year month company_revenue yum_low yum_high yum_open yum_close yum_volume
# ---------- ---------- --------------- ---------- ---------- ---------- ---------- ----------
# 2015 01 $14,106 49.88 53.87 53.12 53.28 89,074,400
# 2015 02 $20,739 50.68 59.29 56.95 58.31 98,621,800
# 2015 03 $21,232 54.92 59.55 58.23 58.81 108,827,60
# ```
# Some hints:
# * I used two CTEs, but you may not need to.
# * You'll need the `FIRST_VALUE()` and `LAST_VALUE()` window functions.
# * To find the first in each month, you'll need the `PARTITION BY` statement in those window functions. `PARTITION BY` acts a lot like `GROUP BY`, but for window functions.
# 80) For March 2017, create a table from `yum` with the following columns:
# * Day of the month
# * Row number (ie, the nth trading day of the month)
# * Cumulative low (ie, lowest low so far this month)
# * Cumulative high (ie, highest high so far this month)
# * Cumulative total volume
conn.execute("DROP TABLE IF EXISTS cumulative_march_2017;")
conn.execute("""
CREATE TABLE cumulative_march_2017 AS
SELECT
strftime('%d', date) AS day_of_month,
row_number() OVER (PARTITION BY strftime('%m', date) ORDER BY date) AS row_number,
MIN(low) OVER (PARTITION BY strftime('%m', date) ORDER BY date) AS cumulative_low,
MAX(high) OVER (PARTITION BY strftime('%m', date) ORDER BY date) AS cumulative_high,
SUM(volume) OVER (PARTITION BY strftime('%m', date) ORDER BY date) AS cumulative_volume
FROM yum
WHERE date BETWEEN '2017-03-01' AND '2017-03-31';
""")
pd.read_sql("SELECT * FROM cumulative_march_2017;", conn)
# 81) From `yum`, create a column that represents the 7-day moving average of the closing price.
# This syntax is very confusing. You can read more about it
# [here](https://www.sqlitetutorial.net/sqlite-window-functions/sqlite-window-frame/).
conn.execute("DROP TABLE IF EXISTS moving_avg;")
conn.execute("""
CREATE TABLE moving_avg AS
SELECT
date,
close,
AVG(close) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
FROM yum;
""")
pd.read_sql("SELECT * FROM moving_avg;", conn)
# 82) Repeat the March 2017 problem but instead of cumulative highs, lows, and totals,
# show the 5-day moving highs and lows. (No need for volume here.)
conn.execute("DROP TABLE IF EXISTS moving_high_low;")
conn.execute("""
CREATE TABLE moving_high_low AS
SELECT
strftime('%d', date) AS day_of_month,
row_number() OVER (PARTITION BY strftime('%m', date) ORDER BY date) AS row_number,
MAX(high) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_high,
MIN(low) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_low
FROM yum
WHERE date BETWEEN '2017-03-01' AND '2017-03-31';
""")
conn.execute("DROP TABLE IF EXISTS avg_high_low;")
conn.execute("""
CREATE TABLE avg_high_low AS
SELECT
date,
high,
low,
AVG(high) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_avg_high,
AVG(low) OVER (ORDER BY date ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS moving_avg_low
FROM yum
WHERE date BETWEEN '2017-03-01' AND '2017-03-31';
""")
pd.read_sql("SELECT * FROM moving_high_low;", conn)
# 83) The [**Williams %R**](https://www.investopedia.com/terms/w/williamsr.asp) is an economic trendline indicator of a stock. Query `yum` to only include two columns: the `date` (unmodified), and the 7-day Williams %R of the stock at that date, call it `williams_r`. It is computed as follows:
# * Let `h7` be the running 7-day high (ie, highest high of the last 7 trading days).
# * Let `l7` be the running 7-day low (ie, lowest low of the lsat 7 trading days).
# * `williams_r = (h7 - close) / (h7 - l7)`
# The easiest way to do this problem is to make a CTE containing `h7` and `l7`, and then produce `williams_r` by querying your CTE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment