Skip to content

Instantly share code, notes, and snippets.

View sachinsdate's full-sized avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / boston_monthly_tmax_1998_2019.csv
Created June 26, 2024 10:27
Monthly average maximum temperature in Boston, MA
Date Monthly Average Maximum
1/15/1998 39.71
2/15/1998 40.97
3/15/1998 48.75
4/15/1998 56.74
5/15/1998 68.75
6/15/1998 72
7/15/1998 82.62
8/15/1998 80.2
9/15/1998 74.44
@sachinsdate
sachinsdate / southern_oscillations_standardized_long_may24.csv
Created June 26, 2024 10:25
The El Nino Southern Oscillations (ENSO) Index. Data source: NOAA
Date Y_t
1951-01-01 1.5
1951-02-01 0.9
1951-03-01 -0.1
1951-04-01 -0.3
1951-05-01 -0.7
1951-06-01 0.2
1951-07-01 -1.0
1951-08-01 -0.2
1951-09-01 -1.1
@sachinsdate
sachinsdate / pacf.py
Created June 21, 2024 11:55
Partial Auto-Correlation
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import pacf
from statsmodels.tsa.stattools import acf
import statsmodels.api as sm
from patsy import dmatrices
@sachinsdate
sachinsdate / Staten_Island_Ferry_Ridership_Counts_20240611.csv
Created June 15, 2024 07:52
Staten Island Ferry daily ridership counts (Source: NYC OpenData)
Date Whitehall Terminal St. George Terminal
01/01/2019 27101 23385
01/02/2019 39425 39746
01/03/2019 39430 37988
01/04/2019 37593 39140
01/05/2019 19537 17925
01/06/2019 21129 19085
01/07/2019 33400 33596
01/08/2019 32859 32521
01/09/2019 35588 35060
@sachinsdate
sachinsdate / system_of_regression_equations.py
Created December 14, 2022 14:03
A tutorial on solving a linear system of regression equations using Generalized Least Squares
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sb
#Create a list of the assets whose capital asset pricing models will make up the the
@sachinsdate
sachinsdate / 90day_RAR_on_assets.csv
Created December 12, 2022 09:57
Trailing 90-day returns of 8 stocks on NYSE and NASDAQ, and for the corresponding sector indexes. Source Yahoo Finance under Terms of use: https://legal.yahoo.com/us/en/yahoo/terms/otos/index.html
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 14 columns, instead of 6. in line 4.
Date,RAR_Energy,RAR_Metals,RAR_Auto,RAR_Technology,RAR_Chevron,RAR_Halliburton,RAR_Alcoa,RAR_Nucor,RAR_USSteel,RAR_Ford,RAR_Tesla,RAR_Google,RAR_Microsoft
2019-05-10,6.5961545570058915,-0.382100259291267,16.980423096067693,20.144324542716525,7.838687140506143,-9.476220040520879,-6.9431669207317,6.42141943672513,-17.767082658022694,29.022405063291146,-25.13538238802106,8.952849356982366,23.35190786030733
2019-05-13,6.077872310603407,-0.5278551837630419,16.595338809034903,21.905609130918425,8.573040434742577,-11.501168785151826,-8.83865472560975,4.248187263317492,-22.706320346320346,27.202982005141386,-26.78069516580104,9.053695816906558,24.28270581842493
2019-05-14,3.459818903497883,-3.7857422081352388,14.098548786527978,18.566912229335955,7.393579678758356,-12.890760028149195,-14.120176429075507,0.13789141713202824,-28.08288102261554,24.362673267326734,-29.245256175442353,2.2745797648289487,19.99829490827037
2019-05-15,2.550591352362299,-4.029390347163423,11.923854856180043,18.67323611276973,6.390994854783631
@sachinsdate
sachinsdate / risk_adjusted_return_ds.py
Created November 20, 2022 11:34
A Python script to calculate the 90-day risk adjusted return on assets
import pandas as pd
df_asset_prices = pd.read_csv('asset_prices.csv', header=0, parse_dates=['Date'], index_col=0)
df_asset_prices_shifted89 = df_asset_prices.shift(89).dropna()
df_asset_prices_trunc89 = df_asset_prices[89:]
df_asset_prices_90day_return = (df_asset_prices_trunc89-df_asset_prices_shifted89)/df_asset_prices_shifted89*100
df_DTB3 = pd.read_csv('DTB3.csv', header=0, parse_dates=['DATE'], index_col=0)
df_DTB3 = df_DTB3.dropna()
@sachinsdate
sachinsdate / 90day_RAR_on_assets.csv
Created November 20, 2022 11:31
90 day risk adjusted return (RAR) of 8 stocks on NYSE plus with the RAR of the corresponding S&P industry or sector index. The RAR is calculated subtracting the rate on the 90-day T-bill from the 90 day return on the corresponding asset
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 13 columns, instead of 10. in line 4.
Date,RAR_Energy,RAR_Metals,RAR_Auto,RAR_Technology,RAR_Chevron,RAR_Halliburton,RAR_Alcoa,RAR_Nucor,RAR_Ford,RAR_Tesla,RAR_Google,RAR_Microsoft
2019-05-10,6.5961545570058915,-0.382100259291267,16.980423096067693,20.144324542716525,7.838687140506143,-9.476220040520879,-6.9431669207317,6.421419436725146,29.022405063291146,-25.13538238802106,8.952849356982366,23.35190786030733
2019-05-13,6.077872310603407,-0.5278551837630419,16.595338809034903,21.905609130918425,8.573040434742577,-11.501168785151815,-8.83865472560975,4.248187263317492,27.202982005141386,-26.78069516580104,9.053695816906558,24.28270581842493
2019-05-14,3.459818903497883,-3.7857422081352388,14.098548786527978,18.566912229335955,7.393579678758356,-12.890760028149195,-14.120176429075507,0.13789141713202824,24.362673267326734,-29.245256175442353,2.2745797648289487,19.99829490827037
2019-05-15,2.550591352362299,-4.029390347163423,11.923854856180043,18.67323611276973,6.3909948547836315,-13.617494795281056,-14.408592540464461,0.30434117238267255,22.55984
@sachinsdate
sachinsdate / system_of_regression_equations.py
Created November 20, 2022 11:04
A tutorial on how to solve a system of regression equations. The assumption is that the residuals from individual regression models are correlated across different models but only for the same time period. The data set is in the file 90day_RAR_on_assets.csv
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
asset_names = ['Chevron', 'Halliburton', 'Alcoa', 'Nucor', 'Ford', 'Tesla', 'Google', 'Microsoft']
#M = number of equations
M = len(asset_names)
@sachinsdate
sachinsdate / gls.py
Created October 30, 2022 17:02
A tutorial on fitting a linear model using the Generalized Least Squares estimator
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)