Skip to content

Instantly share code, notes, and snippets.

@ifnull
Last active February 10, 2021 06:26
Show Gist options
  • Save ifnull/859454d076791c99a52510af4bcf6088 to your computer and use it in GitHub Desktop.
Save ifnull/859454d076791c99a52510af4bcf6088 to your computer and use it in GitHub Desktop.
MIT xPRO: DSx Data Science and Big Data Analytics: Making Data-Driven Decisions - Challenger Case Study
import numpy as np
import statsmodels.discrete.discrete_model as sm
import matplotlib as mpl
import pandas as pd
from patsy import dmatrices
from matplotlib import pyplot as plt
data = pd.read_csv("challenger-data.csv")
# subsetting data
failures = data.loc[(data.Y == 1)]
no_failures = data.loc[(data.Y == 0)]
# frequencies
failures_freq = failures.X.value_counts()
no_failures_freq = no_failures.X.value_counts()
# plotting
plt.scatter(failures_freq.index, failures_freq, c="red", s=40)
plt.scatter(no_failures_freq.index, np.zeros(
len(no_failures_freq)), c="blue", s=40)
plt.xlabel("X: Temperature")
plt.ylabel("Number of Failures")
plt.show()
# get the data in correct format
y, X = dmatrices("Y ~ X", data, return_type="dataframe")
# build the model
logit = sm.Logit(y, X)
result = logit.fit()
# summarize the model
print(result.summary())
matplotlib==3.3.4
numpy==1.18.0
pandas==1.2.2
patsy==0.5.1
statsmodels==0.12.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment