This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df.shape | |
df.dtypes | |
df.columns | |
df[column].value_counts(dropna=False) # string | |
df[column].describe() # numeric |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import numpy as np | |
from sklearn.metrics import accuracy_score | |
from sklearn.linear_model import Ridge | |
# load the data. | |
df = pd.read_csv('hockey_games.csv', skiprows=1, names=['date', 'visitor', 'visitor_goals', 'home', 'home_goals']) | |
# make the date column into a date format. | |
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df['goal_difference'] = df['home_goals'] - df['visitor_goals'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# create new variables to show home team win or loss result | |
df['home_win'] = np.where(df['goal_difference'] > 0, 1, 0) | |
df['home_loss'] = np.where(df['goal_difference'] < 0, 1, 0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_visitor = pd.get_dummies(df['visitor'], dtype=np.int64) | |
df_home = pd.get_dummies(df['home'], dtype=np.int64) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# subtract home from visitor | |
df_model = df_home.sub(df_visitor) | |
df_model['goal_difference'] = df['goal_difference'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_train = df_model # not required but I like to rename my dataframe with the name train. | |
lr = Ridge(alpha=0.001) | |
X = df_train.drop(['goal_difference'], axis=1) | |
y = df_train['goal_difference'] | |
lr.fit(X, y) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_ratings = pd.DataFrame(data={'team': X.columns, 'rating': lr.coef_}) | |
df_ratings |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# import packages | |
import pandas as pd | |
import numpy as np | |
import seaborn as sns | |
import matplotlib.pyplot as plt | |
import matplotlib.mlab as mlab | |
import matplotlib | |
plt.style.use('ggplot') | |
from matplotlib.pyplot import figure |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cols = df.columns[:30] # first 30 columns | |
colours = ['#000099', '#ffff00'] # specify the colours - yellow is missing. blue is not missing. | |
sns.heatmap(df[cols].isnull(), cmap=sns.color_palette(colours)) |
OlderNewer