Skip to content

Instantly share code, notes, and snippets.

View sachinsdate's full-sized avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / 90day_RAR_on_assets.csv
Created December 12, 2022 09:57
Trailing 90-day returns of 8 stocks on NYSE and NASDAQ, and for the corresponding sector indexes. Source Yahoo Finance under Terms of use: https://legal.yahoo.com/us/en/yahoo/terms/otos/index.html
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 14 columns, instead of 6. in line 4.
Date,RAR_Energy,RAR_Metals,RAR_Auto,RAR_Technology,RAR_Chevron,RAR_Halliburton,RAR_Alcoa,RAR_Nucor,RAR_USSteel,RAR_Ford,RAR_Tesla,RAR_Google,RAR_Microsoft
2019-05-10,6.5961545570058915,-0.382100259291267,16.980423096067693,20.144324542716525,7.838687140506143,-9.476220040520879,-6.9431669207317,6.42141943672513,-17.767082658022694,29.022405063291146,-25.13538238802106,8.952849356982366,23.35190786030733
2019-05-13,6.077872310603407,-0.5278551837630419,16.595338809034903,21.905609130918425,8.573040434742577,-11.501168785151826,-8.83865472560975,4.248187263317492,-22.706320346320346,27.202982005141386,-26.78069516580104,9.053695816906558,24.28270581842493
2019-05-14,3.459818903497883,-3.7857422081352388,14.098548786527978,18.566912229335955,7.393579678758356,-12.890760028149195,-14.120176429075507,0.13789141713202824,-28.08288102261554,24.362673267326734,-29.245256175442353,2.2745797648289487,19.99829490827037
2019-05-15,2.550591352362299,-4.029390347163423,11.923854856180043,18.67323611276973,6.390994854783631
@sachinsdate
sachinsdate / poisson_regression.py
Last active April 12, 2023 14:20
Poisson Regression model
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#Create a pandas DataFrame for the counts data set.
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
import math
import numpy as np
import statsmodels.api as sm
from statsmodels.base.model import GenericLikelihoodModel
from scipy.stats import poisson
from scipy.stats import binom
from patsy import dmatrices
import statsmodels.graphics.tsaplots as tsa
from matplotlib import pyplot as plt
@sachinsdate
sachinsdate / UMCSENT_PCE.csv
Last active February 26, 2023 18:13
Personal Consumption Expenditures [PCE] combined with The Index of Consumer Sentiment. https://fred.stlouisfed.org/series/PCE, Available under Public license. https://medium.com/r/?url=https%3A%2F%2Fdata.sca.isr.umich.edu%2Fdata-archive%2Fmine.php Available under public license.
DATE UMCSENT UMCSENT_CHG PCE PCE_CHG
01-01-78 83.7 0 1336 0
02-01-78 84.3 0.007168459 1329.5 -0.004865269
03-01-78 78.8 -0.065243179 1355.1 0.019255359
04-01-78 81.6 0.035532995 1377.5 0.016530145
05-01-78 82.9 0.015931373 1396.4 0.013720508
06-01-78 80 -0.034981906 1412 0.011171584
07-01-78 82.4 0.03 1425.8 0.009773371
08-01-78 78.4 -0.048543689 1426.8 0.000701361
09-01-78 80.4 0.025510204 1447 0.014157555
@sachinsdate
sachinsdate / f_test.py
Created October 26, 2019 18:14
F-test for regression analysis. An ilustrative example
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Create a pandas DataFrame for the djia data set.
df = pd.read_csv('djia.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
################################
######## THE MEAN MODEL ########
################################
@sachinsdate
sachinsdate / system_of_regression_equations.py
Created December 14, 2022 14:03
A tutorial on solving a linear system of regression equations using Generalized Least Squares
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sb
#Create a list of the assets whose capital asset pricing models will make up the the
@sachinsdate
sachinsdate / risk_adjusted_return_ds.py
Created November 20, 2022 11:34
A Python script to calculate the 90-day risk adjusted return on assets
import pandas as pd
df_asset_prices = pd.read_csv('asset_prices.csv', header=0, parse_dates=['Date'], index_col=0)
df_asset_prices_shifted89 = df_asset_prices.shift(89).dropna()
df_asset_prices_trunc89 = df_asset_prices[89:]
df_asset_prices_90day_return = (df_asset_prices_trunc89-df_asset_prices_shifted89)/df_asset_prices_shifted89*100
df_DTB3 = pd.read_csv('DTB3.csv', header=0, parse_dates=['DATE'], index_col=0)
df_DTB3 = df_DTB3.dropna()
@sachinsdate
sachinsdate / 90day_RAR_on_assets.csv
Created November 20, 2022 11:31
90 day risk adjusted return (RAR) of 8 stocks on NYSE plus with the RAR of the corresponding S&P industry or sector index. The RAR is calculated subtracting the rate on the 90-day T-bill from the 90 day return on the corresponding asset
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 13 columns, instead of 10. in line 4.
Date,RAR_Energy,RAR_Metals,RAR_Auto,RAR_Technology,RAR_Chevron,RAR_Halliburton,RAR_Alcoa,RAR_Nucor,RAR_Ford,RAR_Tesla,RAR_Google,RAR_Microsoft
2019-05-10,6.5961545570058915,-0.382100259291267,16.980423096067693,20.144324542716525,7.838687140506143,-9.476220040520879,-6.9431669207317,6.421419436725146,29.022405063291146,-25.13538238802106,8.952849356982366,23.35190786030733
2019-05-13,6.077872310603407,-0.5278551837630419,16.595338809034903,21.905609130918425,8.573040434742577,-11.501168785151815,-8.83865472560975,4.248187263317492,27.202982005141386,-26.78069516580104,9.053695816906558,24.28270581842493
2019-05-14,3.459818903497883,-3.7857422081352388,14.098548786527978,18.566912229335955,7.393579678758356,-12.890760028149195,-14.120176429075507,0.13789141713202824,24.362673267326734,-29.245256175442353,2.2745797648289487,19.99829490827037
2019-05-15,2.550591352362299,-4.029390347163423,11.923854856180043,18.67323611276973,6.3909948547836315,-13.617494795281056,-14.408592540464461,0.30434117238267255,22.55984
@sachinsdate
sachinsdate / system_of_regression_equations.py
Created November 20, 2022 11:04
A tutorial on how to solve a system of regression equations. The assumption is that the residuals from individual regression models are correlated across different models but only for the same time period. The data set is in the file 90day_RAR_on_assets.csv
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
asset_names = ['Chevron', 'Halliburton', 'Alcoa', 'Nucor', 'Ford', 'Tesla', 'Google', 'Microsoft']
#M = number of equations
M = len(asset_names)
@sachinsdate
sachinsdate / us_census_bureau_acs_2015_2019_subset.csv
Last active October 27, 2022 10:54
A subset of the 2015–2019 American Community Survey (ACS) 5-Year Estimates conducted by the US Census Bureau used under the following terms of use: https://www.census.gov/data/developers/about/terms-of-service.html
County Percent_Households_Below_Poverty_Level Median_Age Homeowner_Vacancy_Rate Percent_Pop_25_And_Over_With_College_Or_Higher_Educ
Autauga, Alabama 14.7 38.2 1.4 26.6
Baldwin, Alabama 10.5 43 3.3 31.9
Barbour, Alabama 27.5 40.4 3.8 11.6
Bibb, Alabama 18.4 40.9 1.5 10.4
Blount, Alabama 14.2 40.7 0.7 13.1
Bullock, Alabama 28.2 40.2 0.2 12.1
Butler, Alabama 20.5 40.8 3.7 16.1
Calhoun, Alabama 18 39.6 2.1 18.5
Chambers, Alabama 18.1 42 2.7 13.3