Skip to content

Instantly share code, notes, and snippets.

View sachinsdate's full-sized avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / gls.py
Created October 30, 2022 17:02
A tutorial on fitting a linear model using the Generalized Least Squares estimator
View gls.py
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
from patsy import dmatrices
from matplotlib import pyplot as plt
import numpy as np
#Load the US Census Bureau data into a Dataframe
df = pd.read_csv('us_census_bureau_acs_2015_2019_subset.csv', header=0)
View takeover_bids_dataset.csv
ID BIDPREM DOC_NUM FINREST INSTHOLD LEGLREST REALREST REGULATN SIZE TAKEOVER WEEKS_INITIAL_FINAL WHITEKNT SIZESQ NUMBIDS
1 1.190497 78001 0 0.136 1 0 0 0.76676 1 23.571 1 0.588 2
2 1.036 78005 0 0.134 0 0 0 0.162503 1 13.571 0 0.0264 0
3 1.403412 78015 0 0.002 1 0 1 0.120489 1 5 1 0.0145 1
4 1.504455 78016 0 0.181 1 0 0 0.0723 1 7.429 0 0.00523 1
5 1.380736 78028 0 0.329 1 0 1 0.189118 1 8.857 0 0.0358 1
6 1.400069 78031 0 0.188 1 0 0 0.154217 1 6.429 1 0.0238 3
7 1.181691 78033 0 0.319 0 0 1 0.460355 1 13.571 1 0.212 2
8 1.32256 78037 0 0.123 0 0 1 0.276814 1 14.857 0 0.0766 1
9 1.650588 78039 0 0.379 0 0 0 0.22895 1 20.714 0 0.0524 1
@sachinsdate
sachinsdate / negative_binomial_regression.py
Last active October 13, 2023 17:20
Negative Binomial Regression using the GLM class of statsmodels
View negative_binomial_regression.py
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
#create a pandas DataFrame for the counts data set
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
View generalized_poisson_regression.py
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#Create a pandas DataFrame for the counts data set.
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
View poisson_hmm_nloglikeobs.py
def nloglikeobs(self, params):
#Reconstitute the q and beta matrices from the current values of all the params
self.reconstitute_parameter_matrices(params)
#Build the regime wise matrix of Poisson means
self.compute_regime_specific_poisson_means()
#Build the matrix of Markov transition probabilities by standardizing all the q values to
# the 0 to 1 range
self.compute_markov_transition_probabilities()
@sachinsdate
sachinsdate / stanford_heart_transplant_dataset_full.csv
Created November 20, 2020 19:04
The Stanford heart transplant data set is taken from https://statistics.stanford.edu/research/covariance-analysis-heart-transplant-survival-data and available for personal use/research purposes only.
View stanford_heart_transplant_dataset_full.csv
PATIENT_ID YR_OF_ACCEPTANCE AGE SURVIVAL_STATUS SURVIVAL_TIME PRIOR_SURGERY TRANSPLANT_STATUS WAITING_TIME_FOR_TRANSPLANT MISMATCH_ON_ALLELES MISMATCH_ON_ANTIGEN MISMATCH_SCORE
15 68 53 1 1 0 0
43 70 43 1 2 0 0
61 71 52 1 2 0 0
75 72 52 1 2 0 0
6 68 54 1 3 0 0
42 70 36 1 3 0 0
54 71 47 1 3 0 0
38 70 41 1 5 0 1 5 3 0 0.87
85 73 47 1 5 0 0
@sachinsdate
sachinsdate / nyc_bb_bicyclist_counts.csv
Last active June 23, 2023 06:03
Daily total of bike counts conducted on the Brooklyn Bridge from 01 April 2017 to 31 October 2017. Source: NYC Open Data: Bicycle Counts for East River Bridges
View nyc_bb_bicyclist_counts.csv
Date HIGH_T LOW_T PRECIP BB_COUNT
1-Apr-17 46.00 37.00 0.00 606
2-Apr-17 62.10 41.00 0.00 2021
3-Apr-17 63.00 50.00 0.03 2470
4-Apr-17 51.10 46.00 1.18 723
5-Apr-17 63.00 46.00 0.00 2807
6-Apr-17 48.90 41.00 0.73 461
7-Apr-17 48.00 43.00 0.01 1222
8-Apr-17 55.90 39.90 0.00 1674
9-Apr-17 66.00 45.00 0.00 2375
@sachinsdate
sachinsdate / 90day_RAR_on_assets.csv
Created December 12, 2022 09:57
Trailing 90-day returns of 8 stocks on NYSE and NASDAQ, and for the corresponding sector indexes. Source Yahoo Finance under Terms of use: https://legal.yahoo.com/us/en/yahoo/terms/otos/index.html
View 90day_RAR_on_assets.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 5 should actually have 14 columns, instead of 6. in line 4.
Date,RAR_Energy,RAR_Metals,RAR_Auto,RAR_Technology,RAR_Chevron,RAR_Halliburton,RAR_Alcoa,RAR_Nucor,RAR_USSteel,RAR_Ford,RAR_Tesla,RAR_Google,RAR_Microsoft
2019-05-10,6.5961545570058915,-0.382100259291267,16.980423096067693,20.144324542716525,7.838687140506143,-9.476220040520879,-6.9431669207317,6.42141943672513,-17.767082658022694,29.022405063291146,-25.13538238802106,8.952849356982366,23.35190786030733
2019-05-13,6.077872310603407,-0.5278551837630419,16.595338809034903,21.905609130918425,8.573040434742577,-11.501168785151826,-8.83865472560975,4.248187263317492,-22.706320346320346,27.202982005141386,-26.78069516580104,9.053695816906558,24.28270581842493
2019-05-14,3.459818903497883,-3.7857422081352388,14.098548786527978,18.566912229335955,7.393579678758356,-12.890760028149195,-14.120176429075507,0.13789141713202824,-28.08288102261554,24.362673267326734,-29.245256175442353,2.2745797648289487,19.99829490827037
2019-05-15,2.550591352362299,-4.029390347163423,11.923854856180043,18.67323611276973,6.390994854783631
@sachinsdate
sachinsdate / markov_switching_dynamic_regression.py
Created November 13, 2021 17:35
A tutorial on Markov Switching Dynamic Regression Model using Python and statsmodels
View markov_switching_dynamic_regression.py
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import statsmodels.api as sm
#Load the PCE and UMCSENT datasets
df = pd.read_csv(filepath_or_buffer='UMCSENT_PCE.csv', header=0, index_col=0,
infer_datetime_format=True, parse_dates=['DATE'])
#Set the index frequency to 'Month-Start'
df = df.asfreq('MS')
@sachinsdate
sachinsdate / poisson_regression.py
Last active April 12, 2023 14:20
Poisson Regression model
View poisson_regression.py
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#Create a pandas DataFrame for the counts data set.
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])