Skip to content

Instantly share code, notes, and snippets.

@fonnesbeck
Created January 18, 2014 19:50
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fonnesbeck/8495259 to your computer and use it in GitHub Desktop.
Save fonnesbeck/8495259 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Using Plotly for Interactive and Collaborative Data Visualization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Plotly](https://plot.ly) is a collaborative data analysis and graphing platform. Plotly's [Scientific Graphing Libraries](https://plot.ly/api) interface Plotly's online graphing tools with the following scientific computing languages:\n",
"\n",
"* Python\n",
"* R\n",
"* Matlab\n",
"* Julia\n",
"\n",
"You can think of Plotly as \"Graphics as a Service\". It generates interactive, publication-quality plots that can be embedded in the locaiton of your choice. You can style them locally with code or via the online interface; plots can be shared publicly or privately with a url, and your graphs are accessible from anywhere.\n",
"\n",
"You can install Plotly on Python via pip:\n",
"\n",
" pip install plotly\n",
" \n",
"or in R via the `devtools` library:\n",
"\n",
" > install.packages(\"devtools\")\n",
" > library(\"devtools\")\n",
" > install_github(\"R-api\", \"plotly\")"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import plotly"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to use Plotly, you need an account. You can sign up using the API, without visiting the website:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# You should replace these values with your own registration information\n",
"reg = plotly.signup(\"foo12345\", \"fake_email_address@vanderbilt.edu\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Thanks for signing up to plotly!\n",
"\n",
"Your username is: foo12345\n",
"\n",
"Your temporary password is: la9di. You use this to log into your plotly account at https://plot.ly/plot.\n",
"\n",
"Your API key is: daia9p1ryi. You use this to access your plotly account through the API.\n",
"\n",
"To get started, initialize a plotly object with your username and api_key, e.g. \n",
">>> py = plotly.plotly('foo12345', 'daia9p1ryi')\n",
"Then, make a graph!\n",
">>> res = py.plot([1,2,3],[4,2,1])\n",
"\n",
">>> print(res['url'])\n",
"\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The resulting dict contains information including your user name (`un`) and an API key (`api_key`)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"reg"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"{u'api_key': u'daia9p1ryi',\n",
" u'error': u'',\n",
" u'message': u\"Thanks for signing up to plotly!\\n\\nYour username is: foo12345\\n\\nYour temporary password is: la9di. You use this to log into your plotly account at https://plot.ly/plot.\\n\\nYour API key is: daia9p1ryi. You use this to access your plotly account through the API.\\n\\nTo get started, initialize a plotly object with your username and api_key, e.g. \\n>>> py = plotly.plotly('foo12345', 'daia9p1ryi')\\nThen, make a graph!\\n>>> res = py.plot([1,2,3],[4,2,1])\\n\\n>>> print(res['url'])\\n\",\n",
" u'tmp_pw': u'la9di',\n",
" u'un': u'foo12345'}"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your login information can be used to generate a `plotly` instance."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ply = plotly.plotly(username_or_email=reg['un'], key=reg['api_key'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The easiest way to get started is to pass a dict including data and other plotting information to the `iplot` method:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"\n",
"data = {\n",
" 'x': np.random.randn(1000), \n",
" 'y': np.random.randn(1000),\n",
" \"type\": \"scatter\",\n",
" \"name\": \"Random Numbers\",\n",
" 'mode': 'markers'\n",
"}\n",
"\n",
"ply.iplot(data) "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"High five! You successfuly sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~foo12345/0 or inside your plot.ly account where it is named 'plot from API'\n"
]
},
{
"html": [
"<iframe height=\"500\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/0/600/450\" width=\"650\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"<IPython.core.display.HTML at 0x10b9019d0>"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since plotly uses lists and dicts to generate plots, one can employ Python idioms like list comprehensions to easily generate more substantial output."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ply.iplot([{'y': np.random.randn(10), 'type':'box'} for i in range(20)],\n",
" layout={'showlegend':False})"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe height=\"500\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/1/600/450\" width=\"650\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"<IPython.core.display.HTML at 0x10b3a4850>"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simple Examples\n",
"\n",
"Let's walk through a few simple plots. Many of these are taken from the [Plot.ly API](https://plot.ly/api/) examples.\n",
"\n",
"We first specify the dataset(s). In this case, we will have two series, each on its own y-axis."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data = [\n",
" {\n",
" \"x\":[1,2,3],\n",
" \"y\":[40,50,60],\n",
" \"name\":\"yaxis data\"\n",
" },\n",
" {\n",
" \"x\":[2,3,4],\n",
" \"y\":[4,5,6],\n",
" \"yaxis\":\"y2\",\n",
" \"name\": \"yaxis2 data\"\n",
" } \n",
"]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A separate dicationary takes care of layout specification:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"layout = {\n",
" \"yaxis\":{\n",
" \"title\": \"yaxis title\", \n",
" }, \n",
"\n",
" \"yaxis2\":{\n",
" \"title\": \"yaxis2 title\",\n",
" \"titlefont\":{\n",
" \"color\":\"rgb(148, 103, 189)\"\n",
" },\n",
" \"tickfont\":{\n",
" \"color\":\"rgb(148, 103, 189)\"\n",
" },\n",
" \"overlaying\":\"y\",\n",
" \"side\":\"right\",\n",
" }, \n",
"\n",
" \"title\": \"Double Y Axis Example\",\n",
"}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ply.iplot(data, layout=layout)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe height=\"500\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/2/600/450\" width=\"650\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"<IPython.core.display.HTML at 0x10b3a4490>"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can add as many axes as we deem necessary:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"data = [\n",
" {\n",
" \"x\":[1,2,3],\n",
" \"y\":[4,5,6],\n",
" \"name\":\"yaxis1 data\"\n",
" },\n",
" {\n",
" \"x\":[2,3,4],\n",
" \"y\":[40,50,60],\n",
" \"name\":\"yaxis2 data\",\n",
" \"yaxis\":\"y2\" \n",
" },\n",
" {\n",
" \"x\":[3,4,5],\n",
" \"y\":[400,500,600],\n",
" \"name\":\"yaxis3 data\",\n",
" \"yaxis\":\"y3\"\n",
" }\n",
"]\n",
"\n",
"c = ['#1f77b4', # muted blue\n",
" '#ff7f0e', # safety orange\n",
" '#2ca02c'] # cooked asparagus green\n",
"\n",
"layout = {\n",
" \"width\":800,\n",
" \"xaxis\":{\n",
" \"domain\":[0.3,0.7]\n",
" },\n",
" \"yaxis\":{\n",
" \"title\": \"yaxis title\",\n",
" \"titlefont\":{\n",
" \"color\":c[0]\n",
" },\n",
" \"tickfont\":{\n",
" \"color\":c[0]\n",
" },\n",
" },\n",
" \"yaxis2\":{\n",
" \"overlaying\":\"y\",\n",
" \"side\":\"tight\",\n",
" \"anchor\":\"free\",\n",
" \"position\":0.15,\n",
" \n",
" \"title\": \"yaxis2 title\",\n",
" \"titlefont\":{\n",
" \"color\":c[1]\n",
" },\n",
" \"tickfont\":{\n",
" \"color\":c[1]\n",
" },\n",
" },\n",
" \"yaxis3\":{\n",
" \"overlaying\":\"y\",\n",
" \"side\":\"left\",\n",
" \"anchor\":\"free\",\n",
" \"position\":0,\n",
" \n",
" \"title\": \"yaxis3 title\",\n",
" \"titlefont\":{\n",
" \"color\":c[2]\n",
" },\n",
" \"tickfont\":{\n",
" \"color\":c[2]\n",
" },\n",
" },\n",
" \"title\": \"multiple y-axes example\"\n",
"}\n",
"\n",
"ply.iplot(data, layout=layout)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe height=\"500\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/4/600/450\" width=\"650\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"<IPython.core.display.HTML at 0x10b4a1d10>"
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Subplots\n",
"\n",
"It is similarly straightforward to generate subplots. This is done by allocating a proportion of the x or y axis to the `domain` of each subplot. Here is an example that illustrates the use of subplots and multiple y-axes to compare changes in vessel speeds according to a number of speed enforcement programs. In this case, we place a timeline of the enforcement programs on a smaller subplot above a larger subplot containing the estimates of ship behavior."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"'''\n",
"Created January 14, 2014 by Chris Fonnesbeck\n",
"\n",
"Licensed under a Creative Commons Attribution 4.0 International License.\n",
"'''\n",
"\n",
"import pandas as pd\n",
"\n",
"dates = pd.date_range(start='11/1/2008', end='7/31/2012', freq='M')\n",
"\n",
"c = ['#fbb4ae',\n",
" '#b3cde3',\n",
" '#ccebc5',\n",
" '#decbe4'] \n",
"\n",
"band_width = 15\n",
"opacity = 0.8\n",
"\n",
"# Management interventions in seasonal management areas (SMA)\n",
"SMA = [{'name': 'Intermittant, at-sea radio',\n",
" 'x': pd.date_range(start='2/1/2009', end='5/1/2009', freq='M'),\n",
" 'y': ['USCG']*100,\n",
" 'mode':'lines',\n",
" 'line':{'color': c[0], 'width': band_width},\n",
" \"yaxis\": \"y2\", \n",
" 'showlegend': False, \n",
" 'opacity': opacity\n",
" },\n",
" {'name': 'Intermittant, at-sea radio',\n",
" 'x': pd.date_range(start='1/1/2010', end='7/1/2010', freq='M'),\n",
" 'y': ['USCG']*100,\n",
" 'mode':'lines',\n",
" 'line':{'color': c[0], 'width': band_width},\n",
" \"yaxis\": \"y2\", \n",
" 'showlegend': False, \n",
" 'opacity': opacity\n",
" },\n",
" {'name': 'Intermittant, at-sea radio',\n",
" 'x': pd.date_range(start='11/1/2010', end='6/1/2011', freq='M'),\n",
" 'y': ['USCG']*100,\n",
" 'mode':'lines',\n",
" 'line':{'color': c[0], 'width': band_width},\n",
" \"yaxis\": \"y2\", \n",
" 'showlegend': False, \n",
" 'opacity': opacity\n",
" },\n",
" {'name': 'Intermittant, at-sea radio',\n",
" 'x': pd.date_range(start='1/1/2012', end='3/1/2012', freq='M'),\n",
" 'y': ['USCG']*100,\n",
" 'mode':'lines',\n",
" 'line':{'color': c[0], 'width': band_width},\n",
" \"yaxis\": \"y2\", \n",
" 'showlegend': False, \n",
" 'opacity': opacity\n",
" }\n",
" ,{'name': 'Intermittant, letter',\n",
" 'x': pd.date_range(start='10/1/2009', end='12/31/2009', freq='M'),\n",
" 'y': ['COPPS']*100,\n",
" 'mode':'lines',\n",
" 'line':{'color': c[1], 'width': band_width},\n",
" \"yaxis\": \"y2\", \n",
" 'showlegend': False, \n",
" 'opacity': opacity\n",
" },\n",
" {'name': 'Certified mail, ongoing litigation',\n",
" 'x': pd.date_range(start='11/1/2010', end='8/1/2012', freq='M'),\n",
" 'y': ['NOVA']*100,\n",
" 'mode':'lines',\n",
" 'line':{'color': c[2], 'width': band_width},\n",
" \"yaxis\": \"y2\", \n",
" 'showlegend': False, \n",
" 'opacity': opacity\n",
" },\n",
" {'name': 'E-mail, monthly summaries',\n",
" 'x': pd.date_range(start='12/1/2010', end='8/1/2012', freq='M'),\n",
" 'y': ['WSC']*100,\n",
" 'mode':'lines',\n",
" 'line':{'color': c[3], 'width': band_width},\n",
" \"yaxis\": \"y2\", \n",
" 'showlegend': False, \n",
" 'opacity': opacity\n",
" },\n",
" {'name': 'E-mail, monthly summaries',\n",
" 'x': pd.date_range(start='2/1/2011', end='8/1/2012', freq='M'),\n",
" 'y': ['CSA']*100,\n",
" 'mode':'lines',\n",
" 'line':{'color': c[3], 'width': band_width},\n",
" \"yaxis\": \"y2\", \n",
" 'showlegend': False, \n",
" 'opacity': opacity,\n",
" }\n",
"][::-1]\n",
"\n",
"sma_dates = pd.date_range(start='2/1/2009', end='11/1/2012', freq='12M')\n",
"\n",
"sds = 2\n",
"line_width = 2\n",
"line_color = \"rgb(3,78,123)\"\n",
"\n",
"# Parameter estimates from model\n",
"estimates = [{'x': sma_dates, \n",
" 'y': np.array([0, -0.04, 0.16, -0.67]), \n",
" 'name': 'Passenger',\n",
" 'showlegend': True,\n",
" \"line\":{\"color\": line_color, \n",
" \"width\": line_width, \n",
" \"dash\":\"dashdot\"},\n",
" 'error_y': {'type': 'data', \n",
" 'array': np.array([0, 0.08, 0.08, 0.08]) * sds, \n",
" 'visible': True, \n",
" \"color\": line_color}},\n",
" {'x': sma_dates, \n",
" 'y': np.array([0, -0.26, -0.92, -1.35]), \n",
" 'name': 'Cargo',\n",
" 'showlegend': True,\n",
" \"line\":{\"color\": line_color, \n",
" \"width\": line_width, \n",
" \"dash\":\"dot\"},\n",
" 'error_y': {'type': 'data', \n",
" 'array': np.array([0, 0.02, 0.02, 0.02]) * sds, \n",
" 'visible': True, \n",
" \"color\": line_color}},\n",
" {'x': sma_dates, \n",
" 'y': np.array([0, -0.02, -0.41, -0.62]), \n",
" 'name': 'Tanker',\n",
" 'showlegend': True,\n",
" \"line\":{\"color\": line_color, \n",
" \"width\": line_width, \n",
" \"dash\":\"solid\"},\n",
" 'error_y': {'type': 'data', \n",
" 'array': np.array([0, 0.03, 0.03, 0.03]) * sds, \n",
" 'visible': True, \n",
" \"color\": line_color}},]\n",
"\n",
"\n",
"legendstyle = {\"x\" : 0, \n",
" \"y\" : 0, \n",
" \"bgcolor\" : \"#F0F0F0\",\n",
" \"bordercolor\" : \"#FFFFFF\",}\n",
"\n",
"layout = {\n",
" \"yaxis2\":{'showgrid': False,\n",
" 'zeroline': True,\n",
" 'side': 'right',\n",
" \"showticklabels\" : True,\n",
" 'domain': [2./3., 1],\n",
" 'title': 'Program'\n",
" }, \n",
"\n",
" \"yaxis\":{\"title\": \"Speed change from 1st season\",\n",
" 'showgrid': False,\n",
" 'zeroline': True,\n",
" \"zerolinecolor\" : \"#F0F0F0\",\n",
" \"zerolinewidth\" : 4,\n",
" 'mode': 'markers',\n",
" 'domain': [0, 2./3.]\n",
" }, \n",
"\n",
" \"xaxis\": {'showgrid':False,\n",
" 'zeroline':False, \n",
" 'title': 'Date',\n",
" 'range': [dates[0], dates[-1]]},\n",
"\n",
" \"title\": \"Vessel speed change in response to notification programs\",\n",
" 'showlegend': True,\n",
" \"legend\": legendstyle\n",
"}\n",
"\n",
"\n",
"ply.iplot(SMA + estimates, layout=layout) "
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe height=\"500\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/5/600/450\" width=\"650\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"<IPython.core.display.HTML at 0x10a160c90>"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Titanic Dataset\n",
"\n",
"We can use Pandas to manage larger datasets, and generate visualizations of it using Plotly. Here is a simple boxplot with data points overplotted."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"titanic = pd.read_csv(\"http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"titanic.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pclass</th>\n",
" <th>survived</th>\n",
" <th>name</th>\n",
" <th>sex</th>\n",
" <th>age</th>\n",
" <th>sibsp</th>\n",
" <th>parch</th>\n",
" <th>ticket</th>\n",
" <th>fare</th>\n",
" <th>cabin</th>\n",
" <th>embarked</th>\n",
" <th>boat</th>\n",
" <th>body</th>\n",
" <th>home.dest</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 1</td>\n",
" <td> 1</td>\n",
" <td> Allen, Miss. Elisabeth Walton</td>\n",
" <td> female</td>\n",
" <td> 29.00</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 24160</td>\n",
" <td> 211.3375</td>\n",
" <td> B5</td>\n",
" <td> S</td>\n",
" <td> 2</td>\n",
" <td> NaN</td>\n",
" <td> St Louis, MO</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 1</td>\n",
" <td> 1</td>\n",
" <td> Allison, Master. Hudson Trevor</td>\n",
" <td> male</td>\n",
" <td> 0.92</td>\n",
" <td> 1</td>\n",
" <td> 2</td>\n",
" <td> 113781</td>\n",
" <td> 151.5500</td>\n",
" <td> C22 C26</td>\n",
" <td> S</td>\n",
" <td> 11</td>\n",
" <td> NaN</td>\n",
" <td> Montreal, PQ / Chesterville, ON</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 1</td>\n",
" <td> 0</td>\n",
" <td> Allison, Miss. Helen Loraine</td>\n",
" <td> female</td>\n",
" <td> 2.00</td>\n",
" <td> 1</td>\n",
" <td> 2</td>\n",
" <td> 113781</td>\n",
" <td> 151.5500</td>\n",
" <td> C22 C26</td>\n",
" <td> S</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> Montreal, PQ / Chesterville, ON</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 1</td>\n",
" <td> 0</td>\n",
" <td> Allison, Mr. Hudson Joshua Creighton</td>\n",
" <td> male</td>\n",
" <td> 30.00</td>\n",
" <td> 1</td>\n",
" <td> 2</td>\n",
" <td> 113781</td>\n",
" <td> 151.5500</td>\n",
" <td> C22 C26</td>\n",
" <td> S</td>\n",
" <td> NaN</td>\n",
" <td> 135</td>\n",
" <td> Montreal, PQ / Chesterville, ON</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 1</td>\n",
" <td> 0</td>\n",
" <td> Allison, Mrs. Hudson J C (Bessie Waldo Daniels)</td>\n",
" <td> female</td>\n",
" <td> 25.00</td>\n",
" <td> 1</td>\n",
" <td> 2</td>\n",
" <td> 113781</td>\n",
" <td> 151.5500</td>\n",
" <td> C22 C26</td>\n",
" <td> S</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> Montreal, PQ / Chesterville, ON</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows \u00d7 14 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
" pclass survived name sex \\\n",
"0 1 1 Allen, Miss. Elisabeth Walton female \n",
"1 1 1 Allison, Master. Hudson Trevor male \n",
"2 1 0 Allison, Miss. Helen Loraine female \n",
"3 1 0 Allison, Mr. Hudson Joshua Creighton male \n",
"4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female \n",
"\n",
" age sibsp parch ticket fare cabin embarked boat body \\\n",
"0 29.00 0 0 24160 211.3375 B5 S 2 NaN \n",
"1 0.92 1 2 113781 151.5500 C22 C26 S 11 NaN \n",
"2 2.00 1 2 113781 151.5500 C22 C26 S NaN NaN \n",
"3 30.00 1 2 113781 151.5500 C22 C26 S NaN 135 \n",
"4 25.00 1 2 113781 151.5500 C22 C26 S NaN NaN \n",
"\n",
" home.dest \n",
"0 St Louis, MO \n",
"1 Montreal, PQ / Chesterville, ON \n",
"2 Montreal, PQ / Chesterville, ON \n",
"3 Montreal, PQ / Chesterville, ON \n",
"4 Montreal, PQ / Chesterville, ON \n",
"\n",
"[5 rows x 14 columns]"
]
}
],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"age_by_class = [{'y': data.values, \n",
" 'name': pclass,\n",
" 'type': 'box',\n",
" 'boxpoints': 'all', \n",
" 'jitter': 0.3} for pclass,data in list(titanic.groupby('pclass')['age'])]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"layout = {'xaxis': {'showgrid':False,'zeroline':False, \n",
" 'title': 'Passenger class'},\n",
" 'yaxis': {'zeroline':False,'gridcolor':'white', 'title': 'Age'},\n",
" 'plot_bgcolor': 'rgb(233,233,233)',\n",
" 'showlegend':False}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ply.iplot(age_by_class, layout=layout)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe height=\"500\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/6/600/450\" width=\"650\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"<IPython.core.display.HTML at 0x10b4a1110>"
]
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotly from R\n",
"\n",
"Along with the Python API, Plotly includes API for several other languages suitable for statistical analysis, such as R and Julia. In the IPython notebook, we can run R code via the [R magic command](http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html), which gives us access to R's rich library of modules without having to open a separate R session."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%load_ext rmagic"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%R -o hist_url\n",
"\n",
"library(plotly)\n",
"\n",
"py <- plotly('foo12345', key='daia9p1ryi')\n",
"\n",
"# Sample data\n",
"samples <- rnorm(50)\n",
"\n",
"# Normal density\n",
"x_norm <- seq(-5,5,length=100)\n",
"y_norm <- 1./sqrt(2*pi)*exp(-x_norm**2/2.)\n",
"\n",
"# layout\n",
"l <- list(showlegend = FALSE, \n",
" xaxis = list(zeroline = FALSE),\n",
" yaxis = list(zeroline = FALSE))\n",
"\n",
"# Histogram data\n",
"dataHistogram <- list(y = samples, \n",
" type = 'histogramy',\n",
" histnorm = 'probability density')\n",
"\n",
"# Curve data\n",
"dataArea <- list(x = x_norm,\n",
" y = y_norm,\n",
" fill = 'tozeroy')\n",
"\n",
"# Call plotly\n",
"response <- py$plotly(list(dataHistogram, dataArea), kwargs = list(layout = l))\n",
"\n",
"# url and filename\n",
"hist_url <- response$url"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"Loading required package: RCurl\n",
"Loading required package: bitops\n",
"Loading required package: RJSONIO\n"
]
}
],
"prompt_number": 29
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To view the resulting plot in IPython notebook, we can use the URL returned by plotly and place it within an HTML object."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"hist_url"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 30,
"text": [
"array(['https://plot.ly/~foo12345/7'], \n",
" dtype='|S27')"
]
}
],
"prompt_number": 30
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.display import HTML\n",
"HTML('<iframe src={0} width=700 height=500></iframe>'.format(hist_url[0]))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe src=https://plot.ly/~foo12345/7 width=700 height=500></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 32,
"text": [
"<IPython.core.display.HTML at 0x10e8e0950>"
]
}
],
"prompt_number": 32
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Displaying MCMC output\n",
"\n",
"Running this example requires that PyMC 2.3 be installed.\n",
"\n",
"In Bayesian modeling using Markov chain Monte Carlo (MCMC) methods, we make inferences based on samples drawn from the posterior distribution of unknown variables in our model. Therefore, it is useful to plot these samples to visually assess whether the model has converged.\n",
"\n",
"PyMC includes an example dataset, which is a time series of recorded coal mining\n",
"disasters in the UK from 1851 to 1962."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pymc.examples import disaster_model\n",
"from pymc import MCMC, graph\n",
"\n",
"M = MCMC(disaster_model)\n",
"\n",
"ply.iplot({'y': disaster_model.disasters_array, \n",
" 'x': range(1851, 1963),\n",
" \"type\": \"scatter\", \n",
" \"mode\": \"lines\", \n",
" \"name\": \"UK coal mining disasters, 1851-1962\"})"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe height=\"500\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/11/600/450\" width=\"650\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 47,
"text": [
"<IPython.core.display.HTML at 0x115df2d10>"
]
}
],
"prompt_number": 47
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Occurrences of disasters in the time series is thought to be derived from a\n",
"Poisson process with a large rate parameter in the early part of the time\n",
"series, and from one with a smaller rate in the later part. We are interested\n",
"in locating the change point in the series, which perhaps is related to changes\n",
"in mining safety regulations."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nchains = 3\n",
"for i in range(nchains):\n",
" M.sample(5000, progress_bar=False)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 41
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Two simple visualizations is to look at the **trace**, essentially a time series of the samples, and the histogram of the samples."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"trace = M.early_mean.trace()\n",
"\n",
"data = [{'y': trace}]\n",
"data.append({'y': trace,\n",
" 'xaxis': 'x2',\n",
" 'yaxis': 'y2',\n",
" 'type': 'histogramy'})\n",
"\n",
"layout = {\n",
" \"xaxis\":{\n",
" \"domain\":[0,0.5],\n",
" \"title\": \"Iteration\"\n",
" },\n",
" \"yaxis\":{\n",
" \"title\": \"Value\"\n",
" },\n",
" \"xaxis2\":{\n",
" \"domain\":[0.55,1],\n",
" \"title\": \"Value\"\n",
" },\n",
" \"yaxis2\":{\n",
" \"anchor\":\"x2\",\n",
" \"side\": \"right\",\n",
" \"title\": \"Frequency\"\n",
" },\n",
" \"showlegend\": False,\n",
" \"title\": \"Posterior samples of early Poisson mean\"\n",
"}\n",
"\n",
"ply.iplot(data, layout=layout, width=850,height=400)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe height=\"450\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/8/850/400\" width=\"900\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 43,
"text": [
"<IPython.core.display.HTML at 0x1127b3f90>"
]
}
],
"prompt_number": 43
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This shows that the chain is relatively homogeneous (the mean and variance of the samples do not appear to change much over the course of sampling).\n",
"\n",
"However, stronger evidence can be obtained by comparing multiple idependent samples, which is why we ran three chains above. We can use Plotly to compare samples from each of the chains."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nparams = len(M.stochastics)\n",
"mcmc_data = {'chain {0}'.format(i): {s.__name__:s.trace(chain=i)[:100] for s in M.stochastics} for i in range(nchains)}\n",
"\n",
"attr = mcmc_data['chain 1'].keys()\n",
"colors = {'chain 0': 'rgb(31, 119, 180)', \n",
" 'chain 1': 'rgb(255, 127, 14)',\n",
" 'chain 2': 'rgb(44, 160, 44)'}\n",
"\n",
"data = []\n",
"for i in range(nparams):\n",
" for j in range(nparams):\n",
" for chain in mcmc_data.keys():\n",
" data.append({\"name\": chain, \n",
" \"x\": mcmc_data[chain][attr[i]], \"y\": mcmc_data[chain][attr[j]],\n",
" \"type\":\"scatter\",\"mode\":\"markers\",\n",
" 'marker': {'color': colors[chain], 'opacity':0.2},\n",
" \"xaxis\": \"x\"+(str(i) if i!=0 else ''), \"yaxis\": \"y\"+(str(j) if j!=0 else '')})\n",
"padding = 0.04;\n",
"domains = [[i*padding + i*(1-3*padding)/nparams, i*padding + ((i+1)*(1-3*padding)/nparams)] for i in range(nparams)]\n",
"\n",
"layout = {\n",
" \"xaxis\":{\"domain\":domains[0], \"title\":attr[0], \n",
" 'zeroline':False,'showline':False,'ticks':'', \n",
" 'titlefont':{'color': \"rgb(67, 67, 67)\"},'tickfont':{'color': 'rgb(102,102,102)'}},\n",
" \"yaxis\":{\"domain\":domains[0], \"title\":attr[0], \n",
" 'zeroline':False,'showline':False,'ticks':'', \n",
" 'titlefont':{'color': \"rgb(67, 67, 67)\"},'tickfont':{'color': 'rgb(102,102,102)'}},\n",
" \"xaxis1\":{\"domain\":domains[1], \"title\":attr[1], \n",
" 'zeroline':False,'showline':False,'ticks':'', \n",
" 'titlefont':{'color': \"rgb(67, 67, 67)\"},'tickfont':{'color': 'rgb(102,102,102)'}},\n",
" \"yaxis1\":{\"domain\":domains[1], \"title\":attr[1], \n",
" 'zeroline':False,'showline':False,'ticks':'', \n",
" 'titlefont':{'color': \"rgb(67, 67, 67)\"},'tickfont':{'color': 'rgb(102,102,102)'}},\n",
" \"xaxis2\":{\"domain\":domains[2], \"title\":attr[2], \n",
" 'zeroline':False,'showline':False,'ticks':'', \n",
" 'titlefont':{'color': \"rgb(67, 67, 67)\"},'tickfont':{'color': 'rgb(102,102,102)'}},\n",
" \"yaxis2\":{\"domain\":domains[2], \"title\":attr[2], \n",
" 'zeroline':False,'showline':False,'ticks':'', \n",
" 'titlefont':{'color': \"rgb(67, 67, 67)\"},'tickfont':{'color': 'rgb(102,102,102)'}},\n",
" \n",
" \"showlegend\":False,\n",
" \"title\":\"Posterior samples for coal mining disasters model\",\n",
" \"titlefont\":{'color':'rgb(67,67,67)', 'size': 20}\n",
" }\n",
"\n",
"ply.iplot(data,layout=layout)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<iframe height=\"500\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"https://plot.ly/~foo12345/9/600/450\" width=\"650\"></iframe>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 45,
"text": [
"<IPython.core.display.HTML at 0x114349dd0>"
]
}
],
"prompt_number": 45
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment