Skip to content

Instantly share code, notes, and snippets.

View sachinsdate's full-sized avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / markov_switching_dynamic_regression.py
Created November 13, 2021 17:35
A tutorial on Markov Switching Dynamic Regression Model using Python and statsmodels
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import statsmodels.api as sm
#Load the PCE and UMCSENT datasets
df = pd.read_csv(filepath_or_buffer='UMCSENT_PCE.csv', header=0, index_col=0,
infer_datetime_format=True, parse_dates=['DATE'])
#Set the index frequency to 'Month-Start'
df = df.asfreq('MS')
@sachinsdate
sachinsdate / survival_analysis.py
Created November 20, 2020 18:37
Survival analysis using Python and Lifelines using the Stanford heart transplant dataset
import numpy as np
import pandas as pd
from lifelines import KaplanMeierFitter
from lifelines.statistics import survival_difference_at_fixed_point_in_time_test
from lifelines import CoxPHFitter
from matplotlib import pyplot as plt
#dataset link
#https://statistics.stanford.edu/research/covariance-analysis-heart-transplant-survival-data
#http://www.stat.rice.edu/~sneeley/STAT553/Datasets/survivaldata.txt
@sachinsdate
sachinsdate / pooled_ols_regression_model.py
Last active December 15, 2023 13:52
A Pooled OLS regression model for panel data sets using Python and statsmodels, alongwith a detailed analysis of its goodness of fit.
import pandas as pd
import scipy.stats as st
import statsmodels.api as sm
import statsmodels.graphics.tsaplots as tsap
from statsmodels.compat import lzip
from statsmodels.stats.diagnostic import het_white
from matplotlib import pyplot as plt
import seaborn as sns
@sachinsdate
sachinsdate / gls.py
Created October 30, 2022 17:02
A tutorial on fitting a linear model using the Generalized Least Squares estimator
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
ID BIDPREM DOC_NUM FINREST INSTHOLD LEGLREST REALREST REGULATN SIZE TAKEOVER WEEKS_INITIAL_FINAL WHITEKNT SIZESQ NUMBIDS
1 1.190497 78001 0 0.136 1 0 0 0.76676 1 23.571 1 0.588 2
2 1.036 78005 0 0.134 0 0 0 0.162503 1 13.571 0 0.0264 0
3 1.403412 78015 0 0.002 1 0 1 0.120489 1 5 1 0.0145 1
4 1.504455 78016 0 0.181 1 0 0 0.0723 1 7.429 0 0.00523 1
5 1.380736 78028 0 0.329 1 0 1 0.189118 1 8.857 0 0.0358 1
6 1.400069 78031 0 0.188 1 0 0 0.154217 1 6.429 1 0.0238 3
7 1.181691 78033 0 0.319 0 0 1 0.460355 1 13.571 1 0.212 2
8 1.32256 78037 0 0.123 0 0 1 0.276814 1 14.857 0 0.0766 1
9 1.650588 78039 0 0.379 0 0 0 0.22895 1 20.714 0 0.0524 1
@sachinsdate
sachinsdate / negative_binomial_regression.py
Last active October 13, 2023 17:20
Negative Binomial Regression using the GLM class of statsmodels
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
#create a pandas DataFrame for the counts data set
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#Create a pandas DataFrame for the counts data set.
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
def nloglikeobs(self, params):
#Reconstitute the q and beta matrices from the current values of all the params
self.reconstitute_parameter_matrices(params)
#Build the regime wise matrix of Poisson means
self.compute_regime_specific_poisson_means()
#Build the matrix of Markov transition probabilities by standardizing all the q values to
# the 0 to 1 range
self.compute_markov_transition_probabilities()
@sachinsdate
sachinsdate / stanford_heart_transplant_dataset_full.csv
Created November 20, 2020 19:04
The Stanford heart transplant data set is taken from https://statistics.stanford.edu/research/covariance-analysis-heart-transplant-survival-data and available for personal use/research purposes only.
PATIENT_ID YR_OF_ACCEPTANCE AGE SURVIVAL_STATUS SURVIVAL_TIME PRIOR_SURGERY TRANSPLANT_STATUS WAITING_TIME_FOR_TRANSPLANT MISMATCH_ON_ALLELES MISMATCH_ON_ANTIGEN MISMATCH_SCORE
15 68 53 1 1 0 0
43 70 43 1 2 0 0
61 71 52 1 2 0 0
75 72 52 1 2 0 0
6 68 54 1 3 0 0
42 70 36 1 3 0 0
54 71 47 1 3 0 0
38 70 41 1 5 0 1 5 3 0 0.87
85 73 47 1 5 0 0
@sachinsdate
sachinsdate / nyc_bb_bicyclist_counts.csv
Last active June 23, 2023 06:03
Daily total of bike counts conducted on the Brooklyn Bridge from 01 April 2017 to 31 October 2017. Source: NYC Open Data: Bicycle Counts for East River Bridges
Date HIGH_T LOW_T PRECIP BB_COUNT
1-Apr-17 46.00 37.00 0.00 606
2-Apr-17 62.10 41.00 0.00 2021
3-Apr-17 63.00 50.00 0.03 2470
4-Apr-17 51.10 46.00 1.18 723
5-Apr-17 63.00 46.00 0.00 2807
6-Apr-17 48.90 41.00 0.73 461
7-Apr-17 48.00 43.00 0.01 1222
8-Apr-17 55.90 39.90 0.00 1674
9-Apr-17 66.00 45.00 0.00 2375