Skip to content

Instantly share code, notes, and snippets.

Avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / risk_adjusted_return_ds.py
Created Nov 20, 2022
A Python script to calculate the 90-day risk adjusted return on assets
View risk_adjusted_return_ds.py
import pandas as pd
df_asset_prices = pd.read_csv('asset_prices.csv', header=0, parse_dates=['Date'], index_col=0)
df_asset_prices_shifted89 = df_asset_prices.shift(89).dropna()
df_asset_prices_trunc89 = df_asset_prices[89:]
df_asset_prices_90day_return = (df_asset_prices_trunc89-df_asset_prices_shifted89)/df_asset_prices_shifted89*100
df_DTB3 = pd.read_csv('DTB3.csv', header=0, parse_dates=['DATE'], index_col=0)
df_DTB3 = df_DTB3.dropna()
@sachinsdate
sachinsdate / 90day_RAR_on_assets.csv
Created Nov 20, 2022
90 day risk adjusted return (RAR) of 8 stocks on NYSE plus with the RAR of the corresponding S&P industry or sector index. The RAR is calculated subtracting the rate on the 90-day T-bill from the 90 day return on the corresponding asset
View 90day_RAR_on_assets.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have columns, instead of 10. in line 4.
Date,RAR_Energy,RAR_Metals,RAR_Auto,RAR_Technology,RAR_Chevron,RAR_Halliburton,RAR_Alcoa,RAR_Nucor,RAR_Ford,RAR_Tesla,RAR_Google,RAR_Microsoft
2019-05-10,6.5961545570058915,-0.382100259291267,16.980423096067693,20.144324542716525,7.838687140506143,-9.476220040520879,-6.9431669207317,6.421419436725146,29.022405063291146,-25.13538238802106,8.952849356982366,23.35190786030733
2019-05-13,6.077872310603407,-0.5278551837630419,16.595338809034903,21.905609130918425,8.573040434742577,-11.501168785151815,-8.83865472560975,4.248187263317492,27.202982005141386,-26.78069516580104,9.053695816906558,24.28270581842493
2019-05-14,3.459818903497883,-3.7857422081352388,14.098548786527978,18.566912229335955,7.393579678758356,-12.890760028149195,-14.120176429075507,0.13789141713202824,24.362673267326734,-29.245256175442353,2.2745797648289487,19.99829490827037
2019-05-15,2.550591352362299,-4.029390347163423,11.923854856180043,18.67323611276973,6.3909948547836315,-13.617494795281056,-14.408592540464461,0.30434117238267255,22.55984
@sachinsdate
sachinsdate / system_of_regression_equations.py
Created Nov 20, 2022
A tutorial on how to solve a system of regression equations. The assumption is that the residuals from individual regression models are correlated across different models but only for the same time period. The data set is in the file 90day_RAR_on_assets.csv
View system_of_regression_equations.py
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
asset_names = ['Chevron', 'Halliburton', 'Alcoa', 'Nucor', 'Ford', 'Tesla', 'Google', 'Microsoft']
#M = number of equations
M = len(asset_names)
@sachinsdate
sachinsdate / markov_switching_dynamic_regression.py
Created Nov 13, 2021
A tutorial on Markov Switching Dynamic Regression Model using Python and statsmodels
View markov_switching_dynamic_regression.py
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import statsmodels.api as sm
#Load the PCE and UMCSENT datasets
df = pd.read_csv(filepath_or_buffer='UMCSENT_PCE.csv', header=0, index_col=0,
infer_datetime_format=True, parse_dates=['DATE'])
#Set the index frequency to 'Month-Start'
df = df.asfreq('MS')
@sachinsdate
sachinsdate / poisson_regression.py
Last active Nov 13, 2022
Poisson Regression model
View poisson_regression.py
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#Create a pandas DataFrame for the counts data set.
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
@sachinsdate
sachinsdate / gls.py
Created Oct 30, 2022
A tutorial on fitting a linear model using the Generalized Least Squares estimator
View gls.py
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
@sachinsdate
sachinsdate / us_census_bureau_acs_2015_2019_subset.csv
Last active Oct 27, 2022
A subset of the 2015–2019 American Community Survey (ACS) 5-Year Estimates conducted by the US Census Bureau used under the following terms of use: https://www.census.gov/data/developers/about/terms-of-service.html
View us_census_bureau_acs_2015_2019_subset.csv
County Percent_Households_Below_Poverty_Level Median_Age Homeowner_Vacancy_Rate Percent_Pop_25_And_Over_With_College_Or_Higher_Educ
Autauga, Alabama 14.7 38.2 1.4 26.6
Baldwin, Alabama 10.5 43 3.3 31.9
Barbour, Alabama 27.5 40.4 3.8 11.6
Bibb, Alabama 18.4 40.9 1.5 10.4
Blount, Alabama 14.2 40.7 0.7 13.1
Bullock, Alabama 28.2 40.2 0.2 12.1
Butler, Alabama 20.5 40.8 3.7 16.1
Calhoun, Alabama 18 39.6 2.1 18.5
Chambers, Alabama 18.1 42 2.7 13.3
@sachinsdate
sachinsdate / stanford_heart_transplant_dataset_full.csv
Created Nov 20, 2020
The Stanford heart transplant data set is taken from https://statistics.stanford.edu/research/covariance-analysis-heart-transplant-survival-data and available for personal use/research purposes only.
View stanford_heart_transplant_dataset_full.csv
PATIENT_ID YR_OF_ACCEPTANCE AGE SURVIVAL_STATUS SURVIVAL_TIME PRIOR_SURGERY TRANSPLANT_STATUS WAITING_TIME_FOR_TRANSPLANT MISMATCH_ON_ALLELES MISMATCH_ON_ANTIGEN MISMATCH_SCORE
15 68 53 1 1 0 0
43 70 43 1 2 0 0
61 71 52 1 2 0 0
75 72 52 1 2 0 0
6 68 54 1 3 0 0
42 70 36 1 3 0 0
54 71 47 1 3 0 0
38 70 41 1 5 0 1 5 3 0 0.87
85 73 47 1 5 0 0
@sachinsdate
sachinsdate / white_hc_matrix.py
Created Sep 25, 2022
A comparison of heteroskedasticity consistent estimators
View white_hc_matrix.py
import pandas as pd
import statsmodels.formula.api as smf
from patsy import dmatrices
from matplotlib import pyplot as plt
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
#Construct the model's equation in Patsy syntax. Statsmodels will automatically add the intercept and so we don't explicitly specify it in the model's equation
View proxy_variables.py
import pandas as pd
import statsmodels.formula.api as smf
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
#Construct the model's equation in Patsy syntax. Statsmodels will automatically add the intercept and so we don't explicitly specify it in the model's equation
reg_expr = 'Percent_Households_Below_Poverty_Level ~ Median_Age + Homeowner_Vacancy_Rate + Percent_Pop_25_And_Over_With_College_Or_Higher_Educ'
#Build and train the model and print the training summary