Skip to content

Instantly share code, notes, and snippets.

@Aylr
Created November 2, 2017 18:51
Show Gist options
  • Save Aylr/f6e9a9e3def45ac35ec88d2f3ff0eeb9 to your computer and use it in GitHub Desktop.
Save Aylr/f6e9a9e3def45ac35ec88d2f3ff0eeb9 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# A Quick Tour of the Python Visualization Landscape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import healthcareai\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"%matplotlib inline\n",
"\n",
"diabetes = pd.read_csv('diabetes.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## One Line Data Profiling\n",
"\n",
"- Using the amazing [pandas_profiling](https://github.com/JosPolfliet/pandas-profiling) package.\n",
"- Install: `pip install pandas_profiling`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"import pandas_profiling\n",
"\n",
"pandas_profiling.ProfileReport(diabetes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Real Data Demo\n",
"\n",
"[Boston](profile%20reports/2017-10-28_commercial_data_profile.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pandas - Exploring Relationships"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"scatter = pd.plotting.scatter_matrix(diabetes, figsize=(20, 20))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Seaborn: Statistical Visualizations"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"import seaborn as sns\n",
"\n",
"sns.set()\n",
"\n",
"iris = sns.load_dataset(\"iris\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Variable Relationships\n",
"\n",
"#### PairPlot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.pairplot(iris)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### JointPlot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.jointplot(\n",
" data=iris,\n",
" x='sepal_width',\n",
" y='sepal_length',\n",
" size=6)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.jointplot(\n",
" data=iris,\n",
" x='sepal_width',\n",
" y='sepal_length',\n",
" size=10,\n",
" kind='kde')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Linear Models"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"g = sns.lmplot(\n",
" data=iris,\n",
" x=\"sepal_length\",\n",
" y=\"sepal_width\",\n",
" hue=\"species\",\n",
" truncate=True,\n",
" size=7)\n",
"\n",
"g.set_axis_labels(\"Sepal length (mm)\", \"Sepal width (mm)\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Categorical Variables"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"box = sns.boxplot(\n",
" data=diabetes,\n",
" x='MaritalStatusCD',\n",
" y='SystolicBPNBR')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"violin = sns.violinplot(\n",
" data=diabetes,\n",
" x='MaritalStatusCD',\n",
" y='A1CNBR')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Factors Across Groups"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.set(style=\"whitegrid\")\n",
"\n",
"g = sns.factorplot(x=\"time\", y=\"pulse\", hue=\"kind\", col=\"diet\", data=sns.load_dataset(\"exercise\"),\n",
" capsize=.2, palette=\"YlGnBu_d\", size=6, aspect=.75)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Yellowbrick\n",
"\n",
"Yellowbrick is a suite of visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process.\n",
"\n",
"- The delightful [yellowbrick](http://www.scikit-yb.org/) library.\n",
"- Install: `pip install yellowbrick`\n",
"- Conforms to familiar scikit-learn API"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Feature Relationships"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = pd.read_csv('bikeshare.csv')\n",
"X = data[[\n",
" \"season\", \"month\", \"hour\", \"holiday\", \"weekday\", \"workingday\",\n",
" \"weather\", \"temp\", \"feelslike\", \"humidity\", \"windspeed\"\n",
"]]\n",
"y = data[\"riders\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from yellowbrick.features import Rank2D, ParallelCoordinates\n",
"\n",
"visualizer = Rank2D(algorithm=\"pearson\")\n",
"visualizer.fit_transform(X)\n",
"visualizer.poof()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parallel Coordinate Plots"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment