Skip to content

Instantly share code, notes, and snippets.

View sachinsdate's full-sized avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / system_of_regression_equations.py
Created December 14, 2022 14:03
A tutorial on solving a linear system of regression equations using Generalized Least Squares
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sb
#Create a list of the assets whose capital asset pricing models will make up the the
@sachinsdate
sachinsdate / 90day_RAR_on_assets.csv
Created December 12, 2022 09:57
Trailing 90-day returns of 8 stocks on NYSE and NASDAQ, and for the corresponding sector indexes. Source Yahoo Finance under Terms of use: https://legal.yahoo.com/us/en/yahoo/terms/otos/index.html
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 14 columns, instead of 6. in line 4.
Date,RAR_Energy,RAR_Metals,RAR_Auto,RAR_Technology,RAR_Chevron,RAR_Halliburton,RAR_Alcoa,RAR_Nucor,RAR_USSteel,RAR_Ford,RAR_Tesla,RAR_Google,RAR_Microsoft
2019-05-10,6.5961545570058915,-0.382100259291267,16.980423096067693,20.144324542716525,7.838687140506143,-9.476220040520879,-6.9431669207317,6.42141943672513,-17.767082658022694,29.022405063291146,-25.13538238802106,8.952849356982366,23.35190786030733
2019-05-13,6.077872310603407,-0.5278551837630419,16.595338809034903,21.905609130918425,8.573040434742577,-11.501168785151826,-8.83865472560975,4.248187263317492,-22.706320346320346,27.202982005141386,-26.78069516580104,9.053695816906558,24.28270581842493
2019-05-14,3.459818903497883,-3.7857422081352388,14.098548786527978,18.566912229335955,7.393579678758356,-12.890760028149195,-14.120176429075507,0.13789141713202824,-28.08288102261554,24.362673267326734,-29.245256175442353,2.2745797648289487,19.99829490827037
2019-05-15,2.550591352362299,-4.029390347163423,11.923854856180043,18.67323611276973,6.390994854783631
@sachinsdate
sachinsdate / risk_adjusted_return_ds.py
Created November 20, 2022 11:34
A Python script to calculate the 90-day risk adjusted return on assets
import pandas as pd
df_asset_prices = pd.read_csv('asset_prices.csv', header=0, parse_dates=['Date'], index_col=0)
df_asset_prices_shifted89 = df_asset_prices.shift(89).dropna()
df_asset_prices_trunc89 = df_asset_prices[89:]
df_asset_prices_90day_return = (df_asset_prices_trunc89-df_asset_prices_shifted89)/df_asset_prices_shifted89*100
df_DTB3 = pd.read_csv('DTB3.csv', header=0, parse_dates=['DATE'], index_col=0)
df_DTB3 = df_DTB3.dropna()
@sachinsdate
sachinsdate / 90day_RAR_on_assets.csv
Created November 20, 2022 11:31
90 day risk adjusted return (RAR) of 8 stocks on NYSE plus with the RAR of the corresponding S&P industry or sector index. The RAR is calculated subtracting the rate on the 90-day T-bill from the 90 day return on the corresponding asset
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 13 columns, instead of 10. in line 4.
Date,RAR_Energy,RAR_Metals,RAR_Auto,RAR_Technology,RAR_Chevron,RAR_Halliburton,RAR_Alcoa,RAR_Nucor,RAR_Ford,RAR_Tesla,RAR_Google,RAR_Microsoft
2019-05-10,6.5961545570058915,-0.382100259291267,16.980423096067693,20.144324542716525,7.838687140506143,-9.476220040520879,-6.9431669207317,6.421419436725146,29.022405063291146,-25.13538238802106,8.952849356982366,23.35190786030733
2019-05-13,6.077872310603407,-0.5278551837630419,16.595338809034903,21.905609130918425,8.573040434742577,-11.501168785151815,-8.83865472560975,4.248187263317492,27.202982005141386,-26.78069516580104,9.053695816906558,24.28270581842493
2019-05-14,3.459818903497883,-3.7857422081352388,14.098548786527978,18.566912229335955,7.393579678758356,-12.890760028149195,-14.120176429075507,0.13789141713202824,24.362673267326734,-29.245256175442353,2.2745797648289487,19.99829490827037
2019-05-15,2.550591352362299,-4.029390347163423,11.923854856180043,18.67323611276973,6.3909948547836315,-13.617494795281056,-14.408592540464461,0.30434117238267255,22.55984
@sachinsdate
sachinsdate / system_of_regression_equations.py
Created November 20, 2022 11:04
A tutorial on how to solve a system of regression equations. The assumption is that the residuals from individual regression models are correlated across different models but only for the same time period. The data set is in the file 90day_RAR_on_assets.csv
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
asset_names = ['Chevron', 'Halliburton', 'Alcoa', 'Nucor', 'Ford', 'Tesla', 'Google', 'Microsoft']
#M = number of equations
M = len(asset_names)
@sachinsdate
sachinsdate / gls.py
Created October 30, 2022 17:02
A tutorial on fitting a linear model using the Generalized Least Squares estimator
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
@sachinsdate
sachinsdate / white_hc_matrix.py
Created September 25, 2022 07:09
A comparison of heteroskedasticity consistent estimators
import pandas as pd
import statsmodels.formula.api as smf
from patsy import dmatrices
from matplotlib import pyplot as plt
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
#Construct the model's equation in Patsy syntax. Statsmodels will automatically add the intercept and so we don't explicitly specify it in the model's equation
import pandas as pd
import statsmodels.formula.api as smf
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
#Construct the model's equation in Patsy syntax. Statsmodels will automatically add the intercept and so we don't explicitly specify it in the model's equation
reg_expr = 'Percent_Households_Below_Poverty_Level ~ Median_Age + Homeowner_Vacancy_Rate + Percent_Pop_25_And_Over_With_College_Or_Higher_Educ'
#Build and train the model and print the training summary
@sachinsdate
sachinsdate / instrumental_variables_regression.py
Last active August 30, 2022 14:30
A tutorial on instrumental variables regression using the IV2SLS class of statsmodels
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from statsmodels.api import add_constant
from statsmodels.sandbox.regression.gmm import IV2SLS
#Load the Panel Study of Income Dynamics (PSID) into a Dataframe
df = pd.read_csv('PSID1976.csv', header=0)
@sachinsdate
sachinsdate / us_census_bureau_acs_2015_2019_subset.csv
Last active October 27, 2022 10:54
A subset of the 2015–2019 American Community Survey (ACS) 5-Year Estimates conducted by the US Census Bureau used under the following terms of use: https://www.census.gov/data/developers/about/terms-of-service.html
County Percent_Households_Below_Poverty_Level Median_Age Homeowner_Vacancy_Rate Percent_Pop_25_And_Over_With_College_Or_Higher_Educ
Autauga, Alabama 14.7 38.2 1.4 26.6
Baldwin, Alabama 10.5 43 3.3 31.9
Barbour, Alabama 27.5 40.4 3.8 11.6
Bibb, Alabama 18.4 40.9 1.5 10.4
Blount, Alabama 14.2 40.7 0.7 13.1
Bullock, Alabama 28.2 40.2 0.2 12.1
Butler, Alabama 20.5 40.8 3.7 16.1
Calhoun, Alabama 18 39.6 2.1 18.5
Chambers, Alabama 18.1 42 2.7 13.3