Instantly share code, notes, and snippets.

💭
Up to my ears in regression modeling

# sachinsdate

💭
Up to my ears in regression modeling
Created December 14, 2022 14:03
A tutorial on solving a linear system of regression equations using Generalized Least Squares
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 import pandas as pd import statsmodels.formula.api as smf import statsmodels.api as sm from patsy import dmatrices from matplotlib import pyplot as plt import numpy as np import seaborn as sb #Create a list of the assets whose capital asset pricing models will make up the the
Created December 12, 2022 09:57
Trailing 90-day returns of 8 stocks on NYSE and NASDAQ, and for the corresponding sector indexes. Source Yahoo Finance under Terms of use: https://legal.yahoo.com/us/en/yahoo/terms/otos/index.html
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 14 columns, instead of 6. in line 4.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Created November 20, 2022 11:34
A Python script to calculate the 90-day risk adjusted return on assets
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 import pandas as pd df_asset_prices = pd.read_csv('asset_prices.csv', header=0, parse_dates=['Date'], index_col=0) df_asset_prices_shifted89 = df_asset_prices.shift(89).dropna() df_asset_prices_trunc89 = df_asset_prices[89:] df_asset_prices_90day_return = (df_asset_prices_trunc89-df_asset_prices_shifted89)/df_asset_prices_shifted89*100 df_DTB3 = pd.read_csv('DTB3.csv', header=0, parse_dates=['DATE'], index_col=0) df_DTB3 = df_DTB3.dropna()
Created November 20, 2022 11:31
90 day risk adjusted return (RAR) of 8 stocks on NYSE plus with the RAR of the corresponding S&P industry or sector index. The RAR is calculated subtracting the rate on the 90-day T-bill from the 90 day return on the corresponding asset
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 13 columns, instead of 10. in line 4.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Created November 20, 2022 11:04
A tutorial on how to solve a system of regression equations. The assumption is that the residuals from individual regression models are correlated across different models but only for the same time period. The data set is in the file 90day_RAR_on_assets.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 import pandas as pd import statsmodels.formula.api as smf import statsmodels.api as sm from patsy import dmatrices from matplotlib import pyplot as plt import numpy as np asset_names = ['Chevron', 'Halliburton', 'Alcoa', 'Nucor', 'Ford', 'Tesla', 'Google', 'Microsoft'] #M = number of equations M = len(asset_names)
Created October 30, 2022 17:02
A tutorial on fitting a linear model using the Generalized Least Squares estimator
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 import pandas as pd import statsmodels.formula.api as smf import statsmodels.api as sm from patsy import dmatrices from matplotlib import pyplot as plt import numpy as np #Load the US Census Bureau data into a Dataframe df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
Created September 25, 2022 07:09
A comparison of heteroskedasticity consistent estimators
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 import pandas as pd import statsmodels.formula.api as smf from patsy import dmatrices from matplotlib import pyplot as plt #Load the US Census Bureau data into a Dataframe df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0) #Construct the model's equation in Patsy syntax. Statsmodels will automatically add the intercept and so we don't explicitly specify it in the model's equation
Created September 10, 2022 06:19
The code used in my article on proxy variables: https://towardsdatascience.com/how-to-use-proxy-variables-in-a-regression-model-539f723ab587
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 import pandas as pd import statsmodels.formula.api as smf #Load the US Census Bureau data into a Dataframe df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0) #Construct the model's equation in Patsy syntax. Statsmodels will automatically add the intercept and so we don't explicitly specify it in the model's equation reg_expr = 'Percent_Households_Below_Poverty_Level ~ Median_Age + Homeowner_Vacancy_Rate + Percent_Pop_25_And_Over_With_College_Or_Higher_Educ' #Build and train the model and print the training summary
Last active August 30, 2022 14:30
A tutorial on instrumental variables regression using the IV2SLS class of statsmodels
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 import pandas as pd import numpy as np import statsmodels.formula.api as smf from statsmodels.api import add_constant from statsmodels.sandbox.regression.gmm import IV2SLS #Load the Panel Study of Income Dynamics (PSID) into a Dataframe df = pd.read_csv('PSID1976.csv', header=0)
Last active October 27, 2022 10:54
A subset of the 2015–2019 American Community Survey (ACS) 5-Year Estimates conducted by the US Census Bureau used under the following terms of use: https://www.census.gov/data/developers/about/terms-of-service.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
County Percent_Households_Below_Poverty_Level Median_Age Homeowner_Vacancy_Rate Percent_Pop_25_And_Over_With_College_Or_Higher_Educ Autauga, Alabama 14.7 38.2 1.4 26.6 Baldwin, Alabama 10.5 43 3.3 31.9 Barbour, Alabama 27.5 40.4 3.8 11.6 Bibb, Alabama 18.4 40.9 1.5 10.4 Blount, Alabama 14.2 40.7 0.7 13.1 Bullock, Alabama 28.2 40.2 0.2 12.1 Butler, Alabama 20.5 40.8 3.7 16.1 Calhoun, Alabama 18 39.6 2.1 18.5 Chambers, Alabama 18.1 42 2.7 13.3