Skip to content

Instantly share code, notes, and snippets.

View sachinsdate's full-sized avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / poisson_regression.py
Last active April 12, 2023 14:20
Poisson Regression model
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
#Create a pandas DataFrame for the counts data set.
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
@sachinsdate
sachinsdate / poisson_counts_generator.py
Last active September 19, 2019 10:53
A Python program to generate event counts using a Poisson process
import random
import math
_lambda = 5
_num_total_arrivals = 150
_num_arrivals = 0
_arrival_time = 0
_num_arrivals_in_unit_time = []
_time_tick = 1
@sachinsdate
sachinsdate / negative_binomial_regression.py
Last active October 13, 2023 17:20
Negative Binomial Regression using the GLM class of statsmodels
import pandas as pd
from patsy import dmatrices
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
#create a pandas DataFrame for the counts data set
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
@sachinsdate
sachinsdate / f_test.py
Created October 26, 2019 18:14
F-test for regression analysis. An ilustrative example
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Create a pandas DataFrame for the djia data set.
df = pd.read_csv('djia.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
################################
######## THE MEAN MODEL ########
################################
@sachinsdate
sachinsdate / djia.csv
Created October 26, 2019 18:18
Daily closing price of the Dow Jones Industrial Average over a 3 month period
Date Closing Price
7/24/2019 27269.9707
7/25/2019 27140.98047
7/26/2019 27192.44922
7/29/2019 27221.34961
7/30/2019 27198.01953
7/31/2019 26864.26953
8/1/2019 26583.41992
8/2/2019 26485.00977
8/5/2019 25717.74023
@sachinsdate
sachinsdate / poisson_sim2.py
Created October 27, 2019 17:29
A simulation of the Poisson process
import random
import math
import statistics
import matplotlib.pyplot as plt
_lambda = 5
_num_events = 100
_event_num = []
_inter_event_times = []
@sachinsdate
sachinsdate / aic.py
Last active March 10, 2020 11:15
Select the best linear regression time series model using AIC score as the criterion by performing a brute force search through a sample space of candidate models
import pandas as pd
from patsy import dmatrices
from collections import OrderedDict
import itertools
import statsmodels.formula.api as smf
import sys
import matplotlib.pyplot as plt
#Read the data set into a pandas DataFrame
df = pd.read_csv('boston_daily_temps_1978_2019.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
@sachinsdate
sachinsdate / boston_daily_temps_1978_2019.csv
Last active November 8, 2019 17:42
Daily average temperature in Boston, MA from 1978 to 2019
DATE TAVG
1/1/1978 26.5
1/2/1978 24
1/3/1978 25.5
1/4/1978 23
1/5/1978 35.5
1/6/1978 39.5
1/7/1978 30.5
1/8/1978 39
1/9/1978 38.5
@sachinsdate
sachinsdate / wages_and_salaries_1984_2019_us_bls_CXU900000LB1203M.csv
Created November 16, 2019 13:48
Wages and salaries (series id: CXU900000LB1203M). U.S. Bureau of Labor Statistics
Year Wages
1984 25088
1985 26611
1986 27005
1987 29103
1988 30168
1989 31922
1990 33183
1991 35576
1992 35679
@sachinsdate
sachinsdate / olsr_for_counts_based_data.py
Created November 30, 2019 15:25
OLS regression model fitted to the bicyclist counts data set
import pandas as pd
from matplotlib import pyplot as plt
#load the data into a pandas data frame and plot the BB_COUNT variable
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
fig = plt.figure()
fig.suptitle('Bicyclist counts on the Brooklyn bridge')
plt.xlabel('Date')
plt.ylabel('Count')
actual, = plt.plot(df.index, df['BB_COUNT'], 'go-', label='Count of bicyclists')