This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df.shape | |
df.dtypes | |
df.columns | |
df[column].value_counts(dropna=False) # string | |
df[column].describe() # numeric |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# create new variables to show home team win or loss result | |
df['home_win'] = np.where(df['goal_difference'] > 0, 1, 0) | |
df['home_loss'] = np.where(df['goal_difference'] < 0, 1, 0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_visitor = pd.get_dummies(df['visitor'], dtype=np.int64) | |
df_home = pd.get_dummies(df['home'], dtype=np.int64) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df_ratings = pd.DataFrame(data={'team': X.columns, 'rating': lr.coef_}) | |
df_ratings |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# subtract home from visitor | |
df_model = df_home.sub(df_visitor) | |
df_model['goal_difference'] = df['goal_difference'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cols = df.columns[:30] # first 30 columns | |
colours = ['#000099', '#ffff00'] # specify the colours - yellow is missing. blue is not missing. | |
sns.heatmap(df[cols].isnull(), cmap=sns.color_palette(colours)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# if it's a larger dataset and the visualization takes too long can do this. | |
# % of missing. | |
for col in df.columns: | |
pct_missing = np.mean(df[col].isnull()) | |
print('{} - {}%'.format(col, round(pct_missing*100))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# life_sq has a lot of missing values. | |
# life_sq: living area in square meters, excluding loggias, balconies and other non-residential areas | |
df['life_sq'].value_counts(dropna=False) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df['life_sq'].describe() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# create an ismissing indicator variable for life_sq. | |
df['life_sq_ismissing'] = df['life_sq'].isnull() | |
df['life_sq_ismissing'].value_counts(dropna=False) |
OlderNewer