Skip to content

Instantly share code, notes, and snippets.

View sachinsdate's full-sized avatar
💭
Up to my ears in regression modeling

sachinsdate

💭
Up to my ears in regression modeling
View GitHub Profile
@sachinsdate
sachinsdate / poisson_sim2.py
Created October 27, 2019 17:29
A simulation of the Poisson process
import random
import math
import statistics
import matplotlib.pyplot as plt
_lambda = 5
_num_events = 100
_event_num = []
_inter_event_times = []
@sachinsdate
sachinsdate / boston_daily_temps_1978_2019.csv
Last active November 8, 2019 17:42
Daily average temperature in Boston, MA from 1978 to 2019
DATE TAVG
1/1/1978 26.5
1/2/1978 24
1/3/1978 25.5
1/4/1978 23
1/5/1978 35.5
1/6/1978 39.5
1/7/1978 30.5
1/8/1978 39
1/9/1978 38.5
@sachinsdate
sachinsdate / wages_and_salaries_1984_2019_us_bls_CXU900000LB1203M.csv
Created November 16, 2019 13:48
Wages and salaries (series id: CXU900000LB1203M). U.S. Bureau of Labor Statistics
Year Wages
1984 25088
1985 26611
1986 27005
1987 29103
1988 30168
1989 31922
1990 33183
1991 35576
1992 35679
@sachinsdate
sachinsdate / olsr_for_counts_based_data.py
Created November 30, 2019 15:25
OLS regression model fitted to the bicyclist counts data set
import pandas as pd
from matplotlib import pyplot as plt
#load the data into a pandas data frame and plot the BB_COUNT variable
df = pd.read_csv('nyc_bb_bicyclist_counts.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
fig = plt.figure()
fig.suptitle('Bicyclist counts on the Brooklyn bridge')
plt.xlabel('Date')
plt.ylabel('Count')
actual, = plt.plot(df.index, df['BB_COUNT'], 'go-', label='Count of bicyclists')
@sachinsdate
sachinsdate / monthly_gold_price_index_fred.csv
Last active December 25, 2019 12:50
Monthly Gold Price Index from 2001 to 2011 (Source: US FRED)
DATE Export_Price_Index_of_Gold
2001-01-01 97
2001-02-01 94.8
2001-03-01 93.7
2001-04-01 93.9
2001-05-01 93.1
2001-06-01 97.2
2001-07-01 94.7
2001-08-01 94.4
2001-09-01 95.9
@sachinsdate
sachinsdate / binomial_regression.py
Last active February 25, 2020 14:35
Build, train and test a Binomial Regression model on the Titanic dataset using Python, pandas, and statsmodels
import pandas as pd
#load the data set into a Pandas data frame, and print out the first few rows
df = pd.read_csv('titanic_dataset.csv', header=0)
df.head(10)
#Drop the columns that our model will not use
df = df.drop(['Name','Siblings/Spouses Aboard', 'Parents/Children Aboard', 'Fare'], axis=1)
#print the top 10 rows
@sachinsdate
sachinsdate / titanic_dataset.csv
Created February 25, 2020 14:39
The Titanic passengers data set
Name Pclass Sex Age Siblings/Spouses Aboard Parents/Children Aboard Fare Survived
Mr. Owen Harris Braund 3 male 22 1 0 7.25 0
Mrs. John Bradley (Florence Briggs Thayer) Cumings 1 female 38 1 0 71.2833 1
Miss. Laina Heikkinen 3 female 26 0 0 7.925 1
Mrs. Jacques Heath (Lily May Peel) Futrelle 1 female 35 1 0 53.1 1
Mr. William Henry Allen 3 male 35 0 0 8.05 0
Mr. James Moran 3 male 27 0 0 8.4583 0
Mr. Timothy J McCarthy 1 male 54 0 0 51.8625 0
Master. Gosta Leonard Palsson 3 male 2 3 1 21.075 0
Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson 3 female 27 0 2 11.1333 1
@sachinsdate
sachinsdate / aic.py
Last active March 10, 2020 11:15
Select the best linear regression time series model using AIC score as the criterion by performing a brute force search through a sample space of candidate models
import pandas as pd
from patsy import dmatrices
from collections import OrderedDict
import itertools
import statsmodels.formula.api as smf
import sys
import matplotlib.pyplot as plt
#Read the data set into a pandas DataFrame
df = pd.read_csv('boston_daily_temps_1978_2019.csv', header=0, infer_datetime_format=True, parse_dates=[0], index_col=[0])
LIVE_BAIT CAMPER PERSONS CHILDREN FISH_COUNT
0 0 1 0 0
1 1 1 0 0
1 0 1 0 0
1 1 2 1 0
1 0 1 0 1
1 1 4 2 0
1 0 3 1 0
1 0 4 3 0
0 1 3 2 0
instant dteday season yr mnth holiday weekday workingday weathersit temp atemp hum windspeed casual_user_count registered_user_count total_user_count
1 01-01-11 1 0 1 0 6 0 2 0.344167 0.363625 0.805833 0.160446 331 654 985
2 02-01-11 1 0 1 0 0 0 2 0.363478 0.353739 0.696087 0.248539 131 670 801
3 03-01-11 1 0 1 0 1 1 1 0.196364 0.189405 0.437273 0.248309 120 1229 1349
4 04-01-11 1 0 1 0 2 1 1 0.2 0.212122 0.590435 0.160296 108 1454 1562
5 05-01-11 1 0 1 0 3 1 1 0.226957 0.22927 0.436957 0.1869 82 1518 1600
6 06-01-11 1 0 1 0 4 1 1 0.204348 0.233209 0.518261 0.0895652 88 1518 1606
7 07-01-11 1 0 1 0 5 1 2 0.196522 0.208839 0.498696 0.168726 148 1362 1510
8 08-01-11 1 0 1 0 6 0 2 0.165 0.162254 0.535833 0.266804 68 891 959
9 09-01-11 1 0 1 0 0 0 1 0.138333 0.116175 0.434167 0.36195 54 768 822