Aylr/2017-11-02_Lighting_Data_Vizualization.ipynb

## 2017-11-02_Lighting_Data_Vizualization.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Quick Tour of the Python Visualization Landscape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import healthcareai\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "diabetes = pd.read_csv('diabetes.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## One Line Data Profiling\n",
    "\n",
    "- Using the amazing [pandas_profiling](https://github.com/JosPolfliet/pandas-profiling) package.\n",
    "- Install: `pip install pandas_profiling`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "import pandas_profiling\n",
    "\n",
    "pandas_profiling.ProfileReport(diabetes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Real Data Demo\n",
    "\n",
    "[Boston](profile%20reports/2017-10-28_commercial_data_profile.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pandas - Exploring Relationships"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scatter = pd.plotting.scatter_matrix(diabetes, figsize=(20, 20))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Seaborn: Statistical Visualizations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "import seaborn as sns\n",
    "\n",
    "sns.set()\n",
    "\n",
    "iris = sns.load_dataset(\"iris\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Variable Relationships\n",
    "\n",
    "#### PairPlot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.pairplot(iris)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### JointPlot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.jointplot(\n",
    "    data=iris,\n",
    "    x='sepal_width',\n",
    "    y='sepal_length',\n",
    "    size=6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.jointplot(\n",
    "    data=iris,\n",
    "    x='sepal_width',\n",
    "    y='sepal_length',\n",
    "    size=10,\n",
    "    kind='kde')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Linear Models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "g = sns.lmplot(\n",
    "    data=iris,\n",
    "    x=\"sepal_length\",\n",
    "    y=\"sepal_width\",\n",
    "    hue=\"species\",\n",
    "    truncate=True,\n",
    "    size=7)\n",
    "\n",
    "g.set_axis_labels(\"Sepal length (mm)\", \"Sepal width (mm)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Categorical Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "box = sns.boxplot(\n",
    "    data=diabetes,\n",
    "    x='MaritalStatusCD',\n",
    "    y='SystolicBPNBR')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "violin = sns.violinplot(\n",
    "    data=diabetes,\n",
    "    x='MaritalStatusCD',\n",
    "    y='A1CNBR')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Factors Across Groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.set(style=\"whitegrid\")\n",
    "\n",
    "g = sns.factorplot(x=\"time\", y=\"pulse\", hue=\"kind\", col=\"diet\", data=sns.load_dataset(\"exercise\"),\n",
    "                   capsize=.2, palette=\"YlGnBu_d\", size=6, aspect=.75)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Yellowbrick\n",
    "\n",
    "Yellowbrick is a suite of visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process.\n",
    "\n",
    "- The delightful [yellowbrick](http://www.scikit-yb.org/) library.\n",
    "- Install: `pip install yellowbrick`\n",
    "- Conforms to familiar scikit-learn API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Feature Relationships"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.read_csv('bikeshare.csv')\n",
    "X = data[[\n",
    "    \"season\", \"month\", \"hour\", \"holiday\", \"weekday\", \"workingday\",\n",
    "    \"weather\", \"temp\", \"feelslike\", \"humidity\", \"windspeed\"\n",
    "]]\n",
    "y = data[\"riders\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from yellowbrick.features import Rank2D, ParallelCoordinates\n",
    "\n",
    "visualizer = Rank2D(algorithm=\"pearson\")\n",
    "visualizer.fit_transform(X)\n",
    "visualizer.poof()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parallel Coordinate Plots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# A Quick Tour of the Python Visualization Landscape"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import healthcareai\n",
	"import numpy as np\n",
	"import pandas as pd\n",
	"\n",
	"%matplotlib inline\n",
	"\n",
	"diabetes = pd.read_csv('diabetes.csv')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## One Line Data Profiling\n",
	"\n",
	"- Using the amazing [pandas_profiling](https://github.com/JosPolfliet/pandas-profiling) package.\n",
	"- Install: `pip install pandas_profiling`"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"import pandas_profiling\n",
	"\n",
	"pandas_profiling.ProfileReport(diabetes)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Real Data Demo\n",
	"\n",
	"[Boston](profile%20reports/2017-10-28_commercial_data_profile.html)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Pandas - Exploring Relationships"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"scatter = pd.plotting.scatter_matrix(diabetes, figsize=(20, 20))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Seaborn: Statistical Visualizations"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import matplotlib.pyplot as plt\n",
	"%matplotlib inline\n",
	"\n",
	"import seaborn as sns\n",
	"\n",
	"sns.set()\n",
	"\n",
	"iris = sns.load_dataset(\"iris\")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Variable Relationships\n",
	"\n",
	"#### PairPlot"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sns.pairplot(iris)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### JointPlot"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sns.jointplot(\n",
	" data=iris,\n",
	" x='sepal_width',\n",
	" y='sepal_length',\n",
	" size=6)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sns.jointplot(\n",
	" data=iris,\n",
	" x='sepal_width',\n",
	" y='sepal_length',\n",
	" size=10,\n",
	" kind='kde')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Linear Models"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"scrolled": false
	},
	"outputs": [],
	"source": [
	"g = sns.lmplot(\n",
	" data=iris,\n",
	" x=\"sepal_length\",\n",
	" y=\"sepal_width\",\n",
	" hue=\"species\",\n",
	" truncate=True,\n",
	" size=7)\n",
	"\n",
	"g.set_axis_labels(\"Sepal length (mm)\", \"Sepal width (mm)\")"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Categorical Variables"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"box = sns.boxplot(\n",
	" data=diabetes,\n",
	" x='MaritalStatusCD',\n",
	" y='SystolicBPNBR')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"violin = sns.violinplot(\n",
	" data=diabetes,\n",
	" x='MaritalStatusCD',\n",
	" y='A1CNBR')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Factors Across Groups"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"sns.set(style=\"whitegrid\")\n",
	"\n",
	"g = sns.factorplot(x=\"time\", y=\"pulse\", hue=\"kind\", col=\"diet\", data=sns.load_dataset(\"exercise\"),\n",
	" capsize=.2, palette=\"YlGnBu_d\", size=6, aspect=.75)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"\n",
	"## Yellowbrick\n",
	"\n",
	"Yellowbrick is a suite of visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process.\n",
	"\n",
	"- The delightful [yellowbrick](http://www.scikit-yb.org/) library.\n",
	"- Install: `pip install yellowbrick`\n",
	"- Conforms to familiar scikit-learn API"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Feature Relationships"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"data = pd.read_csv('bikeshare.csv')\n",
	"X = data[[\n",
	" \"season\", \"month\", \"hour\", \"holiday\", \"weekday\", \"workingday\",\n",
	" \"weather\", \"temp\", \"feelslike\", \"humidity\", \"windspeed\"\n",
	"]]\n",
	"y = data[\"riders\"]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from yellowbrick.features import Rank2D, ParallelCoordinates\n",
	"\n",
	"visualizer = Rank2D(algorithm=\"pearson\")\n",
	"visualizer.fit_transform(X)\n",
	"visualizer.poof()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Parallel Coordinate Plots"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.1"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}