Skip to content

Instantly share code, notes, and snippets.

@jefersondaniel
Last active March 1, 2019 16:15
Show Gist options
  • Save jefersondaniel/ba93d598f4925043f78d7167b1bed311 to your computer and use it in GitHub Desktop.
Save jefersondaniel/ba93d598f4925043f78d7167b1bed311 to your computer and use it in GitHub Desktop.
Python Ploting In Pratice

Python Ploting

Libs

import matplotlib.pyplot as plt
import seaborn as sns

Distribution plot

Simple

sns.distplot(df_train['SalePrice'])
print("Skewness: %f" % df_train['SalePrice'].skew())
print("Kurtosis: %f" % df_train['SalePrice'].kurt())

Histogram and normal probability plot

sns.distplot(df_train['SalePrice'], fit=norm);
fig = plt.figure()
res = stats.probplot(df_train['SalePrice'], plot=plt)

Scatter plot

Simple

var = 'GrLivArea'
data = pd.concat([df_train['SalePrice'], df_train[var]], axis=1)
data.plot.scatter(x=var, y='SalePrice', ylim=(0,800000));
plt.scatter(df_train['GrLivArea'], df_train['SalePrice'])

Paired

sns.set()
cols = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'YearBuilt']
sns.pairplot(df_train[cols], size = 2.5)
plt.show();

Boxplot

var = 'OverallQual'
data = pd.concat([df_train['SalePrice'], df_train[var]], axis=1)
f, ax = plt.subplots(figsize=(8, 6))
fig = sns.boxplot(x=var, y="SalePrice", data=data)
fig.axis(ymin=0, ymax=800000);

Correlation matrix

Simple

corrmat = df_train.corr()
f, ax = plt.subplots(figsize=(12, 9))
sns.heatmap(corrmat, vmax=.8, square=True);

Zoomed

k = 10 #number of variables for heatmap
cols = corrmat.nlargest(k, 'SalePrice')['SalePrice'].index
cm = np.corrcoef(df_train[cols].values.T)
sns.set(font_scale=1.25)
hm = sns.heatmap(cm, cbar=True, annot=True, square=True, fmt='.2f', annot_kws={'size': 10}, yticklabels=cols.values, xticklabels=cols.values)
plt.show()

References

https://www.kaggle.com/pmarcelino/comprehensive-data-exploration-with-python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment