Created
November 2, 2017 18:51
-
-
Save Aylr/f6e9a9e3def45ac35ec88d2f3ff0eeb9 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# A Quick Tour of the Python Visualization Landscape" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import healthcareai\n", | |
"import numpy as np\n", | |
"import pandas as pd\n", | |
"\n", | |
"%matplotlib inline\n", | |
"\n", | |
"diabetes = pd.read_csv('diabetes.csv')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## One Line Data Profiling\n", | |
"\n", | |
"- Using the amazing [pandas_profiling](https://github.com/JosPolfliet/pandas-profiling) package.\n", | |
"- Install: `pip install pandas_profiling`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"scrolled": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import pandas_profiling\n", | |
"\n", | |
"pandas_profiling.ProfileReport(diabetes)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Real Data Demo\n", | |
"\n", | |
"[Boston](profile%20reports/2017-10-28_commercial_data_profile.html)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Pandas - Exploring Relationships" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"scatter = pd.plotting.scatter_matrix(diabetes, figsize=(20, 20))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Seaborn: Statistical Visualizations" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import matplotlib.pyplot as plt\n", | |
"%matplotlib inline\n", | |
"\n", | |
"import seaborn as sns\n", | |
"\n", | |
"sns.set()\n", | |
"\n", | |
"iris = sns.load_dataset(\"iris\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Variable Relationships\n", | |
"\n", | |
"#### PairPlot" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sns.pairplot(iris)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### JointPlot" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sns.jointplot(\n", | |
" data=iris,\n", | |
" x='sepal_width',\n", | |
" y='sepal_length',\n", | |
" size=6)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sns.jointplot(\n", | |
" data=iris,\n", | |
" x='sepal_width',\n", | |
" y='sepal_length',\n", | |
" size=10,\n", | |
" kind='kde')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Linear Models" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"scrolled": false | |
}, | |
"outputs": [], | |
"source": [ | |
"g = sns.lmplot(\n", | |
" data=iris,\n", | |
" x=\"sepal_length\",\n", | |
" y=\"sepal_width\",\n", | |
" hue=\"species\",\n", | |
" truncate=True,\n", | |
" size=7)\n", | |
"\n", | |
"g.set_axis_labels(\"Sepal length (mm)\", \"Sepal width (mm)\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Categorical Variables" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"box = sns.boxplot(\n", | |
" data=diabetes,\n", | |
" x='MaritalStatusCD',\n", | |
" y='SystolicBPNBR')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"violin = sns.violinplot(\n", | |
" data=diabetes,\n", | |
" x='MaritalStatusCD',\n", | |
" y='A1CNBR')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Factors Across Groups" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"sns.set(style=\"whitegrid\")\n", | |
"\n", | |
"g = sns.factorplot(x=\"time\", y=\"pulse\", hue=\"kind\", col=\"diet\", data=sns.load_dataset(\"exercise\"),\n", | |
" capsize=.2, palette=\"YlGnBu_d\", size=6, aspect=.75)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"\n", | |
"## Yellowbrick\n", | |
"\n", | |
"Yellowbrick is a suite of visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process.\n", | |
"\n", | |
"- The delightful [yellowbrick](http://www.scikit-yb.org/) library.\n", | |
"- Install: `pip install yellowbrick`\n", | |
"- Conforms to familiar scikit-learn API" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Feature Relationships" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"data = pd.read_csv('bikeshare.csv')\n", | |
"X = data[[\n", | |
" \"season\", \"month\", \"hour\", \"holiday\", \"weekday\", \"workingday\",\n", | |
" \"weather\", \"temp\", \"feelslike\", \"humidity\", \"windspeed\"\n", | |
"]]\n", | |
"y = data[\"riders\"]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from yellowbrick.features import Rank2D, ParallelCoordinates\n", | |
"\n", | |
"visualizer = Rank2D(algorithm=\"pearson\")\n", | |
"visualizer.fit_transform(X)\n", | |
"visualizer.poof()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Parallel Coordinate Plots" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment