This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
""" | |
Created on Sat Jun 07 11:11:16 2014 | |
@author: Dipanjan | |
""" | |
#import sys | |
#from lxml import html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
white_wine = pd.read_csv('winequality-white.csv', sep=';') | |
red_wine = pd.read_csv('winequality-red.csv', sep=';') | |
# store wine type as an attribute | |
red_wine['wine_type'] = 'red' | |
white_wine['wine_type'] = 'white' | |
# bucket wine quality scores into qualitative quality labels | |
red_wine['quality_label'] = red_wine['quality'].apply(lambda value: 'low' | |
if value <= 5 else 'medium' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
subset_attributes = ['residual sugar', 'total sulfur dioxide', 'sulphates', | |
'alcohol', 'volatile acidity', 'quality'] | |
rs = round(red_wine[subset_attributes].describe(),2) | |
ws = round(white_wine[subset_attributes].describe(),2) | |
pd.concat([rs, ws], axis=1, keys=['Red Wine Statistics', 'White Wine Statistics']) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wines.hist(bins=15, color='steelblue', edgecolor='black', linewidth=1.0, | |
xlabelsize=8, ylabelsize=8, grid=False) | |
plt.tight_layout(rect=(0, 0, 1.2, 1.2)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Histogram | |
fig = plt.figure(figsize = (6,4)) | |
title = fig.suptitle("Sulphates Content in Wine", fontsize=14) | |
fig.subplots_adjust(top=0.85, wspace=0.3) | |
ax = fig.add_subplot(1,1, 1) | |
ax.set_xlabel("Sulphates") | |
ax.set_ylabel("Frequency") | |
ax.text(1.2, 800, r'$\mu$='+str(round(wines['sulphates'].mean(),2)), | |
fontsize=12) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Bar Plot | |
fig = plt.figure(figsize = (6, 4)) | |
title = fig.suptitle("Wine Quality Frequency", fontsize=14) | |
fig.subplots_adjust(top=0.85, wspace=0.3) | |
ax = fig.add_subplot(1,1, 1) | |
ax.set_xlabel("Quality") | |
ax.set_ylabel("Frequency") | |
w_q = wines['quality'].value_counts() | |
w_q = (list(w_q.index), list(w_q.values)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Pair-wise Scatter Plots | |
cols = ['density', 'residual sugar', 'total sulfur dioxide', 'fixed acidity'] | |
pp = sns.pairplot(wines[cols], size=1.8, aspect=1.8, | |
plot_kws=dict(edgecolor="k", linewidth=0.5), | |
diag_kind="kde", diag_kws=dict(shade=True)) | |
fig = pp.fig | |
fig.subplots_adjust(top=0.93, wspace=0.3) | |
t = fig.suptitle('Wine Attributes Pairwise Plots', fontsize=14) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Scaling attribute values to avoid few outiers | |
cols = ['density', 'residual sugar', 'total sulfur dioxide', 'fixed acidity'] | |
subset_df = wines[cols] | |
from sklearn.preprocessing import StandardScaler | |
ss = StandardScaler() | |
scaled_df = ss.fit_transform(subset_df) | |
scaled_df = pd.DataFrame(scaled_df, columns=cols) | |
final_df = pd.concat([scaled_df, wines['wine_type']], axis=1) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Scatter Plot | |
plt.scatter(wines['sulphates'], wines['alcohol'], | |
alpha=0.4, edgecolors='w') | |
plt.xlabel('Sulphates') | |
plt.ylabel('Alcohol') | |
plt.title('Wine Sulphates - Alcohol Content',y=1.05) | |
# Joint Plot |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Using subplots or facets along with Bar Plots | |
fig = plt.figure(figsize = (10, 4)) | |
title = fig.suptitle("Wine Type - Quality", fontsize=14) | |
fig.subplots_adjust(top=0.85, wspace=0.3) | |
# red wine - wine quality | |
ax1 = fig.add_subplot(1,2, 1) | |
ax1.set_title("Red Wine") | |
ax1.set_xlabel("Quality") | |
ax1.set_ylabel("Frequency") | |
rw_q = red_wine['quality'].value_counts() |
OlderNewer