Skip to content

Instantly share code, notes, and snippets.

View sachinsdate's full-sized avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / automobile_uciml_6vars.csv
Created March 30, 2022 09:50
A 6-variable subset of the Automobiles data set. Source: UCI ML repository: https://archive.ics.uci.edu/ml/datasets/automobile
City_MPG Curb_Weight Vehicle_Volume Num_Cyclinders Vehicle_Price Engine_Size
21 2548 528019.904 4 13495 130
21 2548 528019.904 4 16500 130
19 2823 587592.64 6 16500 152
24 2337 634816.956 4 13950 109
18 2824 636734.832 5 17450 136
19 2507 624189.969 5 15250 136
19 2844 766364.046 5 17710 136
19 2954 766364.046 5 18920 136
17 3086 769115.802 5 23875 131
@sachinsdate
sachinsdate / automobile_uciml_4vars.csv
Created March 24, 2022 13:33
A 4-variable subset of the Automobiles data set. Source: UCI ML repository: https://archive.ics.uci.edu/ml/datasets/automobile
Car_Volume Curb_Weight Engine_Size City_MPG
528019.904 2548 130 21
528019.904 2548 130 21
587592.64 2823 152 19
634816.956 2337 109 24
636734.832 2824 136 18
624189.969 2507 136 19
766364.046 2844 136 19
766364.046 2954 136 19
769115.802 3086 131 17
@sachinsdate
sachinsdate / automobile_uciml_4vars.csv
Created March 20, 2022 17:04
A 4-variable subset of the Automobiles data set. Source: UCI ML repository: https://archive.ics.uci.edu/ml/datasets/automobile
528019.904 2548 130 21
528019.904 2548 130 21
587592.64 2823 152 19
634816.956 2337 109 24
636734.832 2824 136 18
624189.969 2507 136 19
766364.046 2844 136 19
766364.046 2954 136 19
769115.802 3086 131 17
@sachinsdate
sachinsdate / automobile_uciml_3vars.csv
Created March 18, 2022 14:02
A subset of the automobiles data set containing only 3 variables of interest. The original data is available at https://archive.ics.uci.edu/ml/datasets/automobile
Make Curb_Weight Engine_Size City_MPG
alfa-romero 2548 130 21
alfa-romero 2548 130 21
alfa-romero 2823 152 19
audi 2337 109 24
audi 2824 136 18
audi 2507 136 19
audi 2844 136 19
audi 2954 136 19
audi 3086 131 17
import pandas as pd
from patsy import dmatrices
import numpy as np
import scipy.stats
import statsmodels.api as sm
import matplotlib.pyplot as plt
#Read the automobiles dataset into a Pandas DataFrame
df = pd.read_csv('automobile_uciml_3vars.csv', header=0)
@sachinsdate
sachinsdate / random_effects_regression_model.py
Last active March 22, 2022 12:46
The Random Effects regression model is used to estimate the effect of characteristics of individuals that are intrinsic and unmeasurable.
import math
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
import statsmodels.formula.api as smf
from matplotlib import pyplot as plt
import seaborn as sns
colors_master = ['blue', 'red', 'orange', 'lime', 'yellow', 'cyan', 'violet', 'yellow',
'sandybrown', 'silver']
@sachinsdate
sachinsdate / fixed_effects_regression_model.py
Created February 2, 2022 06:10
The Fixed Effects regression model is used to estimate the effect of intrinsic characteristics of individuals in a panel data set. This gist builds and trains an FE model on the World Bank dataset available at https://gist.github.com/sachinsdate/c40651e9e4bc13a696780462209f1992.
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
import statsmodels.formula.api as smf
from matplotlib import pyplot as plt
import seaborn as sns
colors_master = ['blue', 'red', 'orange', 'lime', 'yellow', 'cyan', 'violet', 'yellow',
'sandybrown', 'silver']
@sachinsdate
sachinsdate / pooled_ols_regression_model.py
Last active December 15, 2023 13:52
A Pooled OLS regression model for panel data sets using Python and statsmodels, alongwith a detailed analysis of its goodness of fit.
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
import statsmodels.graphics.tsaplots as tsap
from statsmodels.compat import lzip
from statsmodels.stats.diagnostic import het_white
from matplotlib import pyplot as plt
import seaborn as sns
@sachinsdate
sachinsdate / wb_data_panel_2ind_7units_1992_2014.csv
Created January 23, 2022 11:22
The following panel data set contains the year-over-year per capita GDP percentage growth of seven countries measured from 1992 through 2014. Along with GDP growth data, the panel also contains Y-o-Y percentage growth in Gross Capital Formation in each country (Source: World Development Indicators World Bank data under CC BY 4.0 license)
COUNTRY YEAR GCF_GWTH_PCNT GDP_PCAP_GWTH_PCNT
Belgium 1992 1.829137475 1.11956586
Belgium 1993 -2.956525218 -1.34799971
Belgium 1994 3.764435394 2.909318769
Belgium 1995 4.113740593 2.170550274
Belgium 1996 0.415438625 1.123669018
Belgium 1997 7.67936209 3.542789064
Belgium 1998 1.535928255 1.744323895
Belgium 1999 3.811360631 3.305706514
Belgium 2000 7.189729001 3.465452571
@sachinsdate
sachinsdate / poisson_hidden_markov_model.py
Created November 26, 2021 14:23
A Python based implementation of the Poisson Hidden Markov Model and a tutorial on how to build and train it on the US manufacturing strikes data set.
import math
import numpy as np
import statsmodels.api as sm
from statsmodels.base.model import GenericLikelihoodModel
from scipy.stats import poisson
from patsy import dmatrices
import statsmodels.graphics.tsaplots as tsa
from matplotlib import pyplot as plt
from statsmodels.tools.numdiff import approx_hess1, approx_hess2, approx_hess3