Skip to content

Instantly share code, notes, and snippets.

@zgulde
Created May 24, 2022 12:59
Show Gist options
  • Save zgulde/bf835e9692be565dea43c1ff1b403429 to your computer and use it in GitHub Desktop.
Save zgulde/bf835e9692be565dea43c1ff1b403429 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "790dd8c6",
"metadata": {},
"source": [
"# Plotly\n",
"\n",
"- plotly is a suite of open source libraries + commercial software\n",
"- libraries for multiple programming languages including python\n",
"- data viz\n",
"- interactive applications\n",
"- commercial offerings focus on interactive applications + hosting\n",
"- high level interface: plotly.express\n",
"- low level object interface: graph_objects\n",
"- uniform api similar to, but not quite the same as seaborn (tidy data)\n",
"- outputs HTML in a notebook\n",
"\n",
"```python\n",
"pip install plotly\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0366793e",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import plotly.express as px"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ef03d63",
"metadata": {},
"outputs": [],
"source": [
"df = px.data.tips()"
]
},
{
"cell_type": "markdown",
"id": "2119e206",
"metadata": {},
"source": [
"## Continuous and Categorical"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9811d0ed",
"metadata": {},
"outputs": [],
"source": [
"px.box(df, y='tip', x='time')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "52e573b8",
"metadata": {},
"outputs": [],
"source": [
"px.violin(df, y='time', x='total_bill')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0face30f",
"metadata": {},
"outputs": [],
"source": [
"# NB we have to aggregate, plotly won't do it for us like seaborn\n",
"tips_by_day = df.groupby('day').tip.mean()\n",
"px.bar(tips_by_day)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e32cb9f",
"metadata": {},
"outputs": [],
"source": [
"tips_by_day_and_time = df.groupby(['day', 'time'], as_index=False).tip.mean()\n",
"px.bar(tips_by_day_and_time, y='tip', x='day', color='time', barmode='group')"
]
},
{
"cell_type": "markdown",
"id": "0b6aba80",
"metadata": {},
"source": [
"### Treemaps\n",
"\n",
"Usually only useful for sums where we want to represent percentage of a whole."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be8f046f",
"metadata": {},
"outputs": [],
"source": [
"px.treemap(df, values='total_bill', path=['day'])"
]
},
{
"cell_type": "markdown",
"id": "765a4cdf",
"metadata": {},
"source": [
"## Heatmaps"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b3cabc9",
"metadata": {},
"outputs": [],
"source": [
"ctab = pd.crosstab(df.time, df['size'])\n",
"px.imshow(ctab, color_continuous_scale=['white', 'green'], text_auto=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08fc2a3a",
"metadata": {},
"outputs": [],
"source": [
"correlation_table = px.data.iris().drop(columns='species_id').corr()\n",
"px.imshow(\n",
" correlation_table,\n",
" zmin=-1, zmax=1,\n",
" color_continuous_scale=['red', 'white', 'green'],\n",
" text_auto=True,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "274b41d1",
"metadata": {},
"source": [
"## Continuous and Continuous"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5b4c381",
"metadata": {},
"outputs": [],
"source": [
"px.scatter(df, y='tip', x='total_bill')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56b225f1",
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(123)\n",
"\n",
"ts_df = pd.DataFrame({\n",
" 'x': pd.date_range('2022', freq='D', periods=100),\n",
" 'y': np.random.randn(100).cumsum(),\n",
"})\n",
"px.line(ts_df, x='x', y='y')"
]
},
{
"cell_type": "markdown",
"id": "97aa87fe",
"metadata": {},
"source": [
"## Adding Dimensions\n",
"\n",
"- color, symbol, size\n",
"- facet"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56dfdbc8",
"metadata": {},
"outputs": [],
"source": [
"px.scatter(df, y='tip', x='total_bill', color='time')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a75663d",
"metadata": {},
"outputs": [],
"source": [
"px.scatter(df, y='tip', x='total_bill', symbol='smoker', size='size')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a6c9563",
"metadata": {},
"outputs": [],
"source": [
"px.scatter(df, y='tip', x='total_bill', facet_col='day', facet_row='time')"
]
},
{
"cell_type": "markdown",
"id": "5a8aea03",
"metadata": {},
"source": [
"## Customizing Figures"
]
},
{
"cell_type": "markdown",
"id": "62782c2d",
"metadata": {},
"source": [
"### Titles and Labels"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e3bb199",
"metadata": {},
"outputs": [],
"source": [
"fig = px.scatter(df, y='tip', x='total_bill')\n",
"fig.update_layout(xaxis_title='Total Bill ($)', yaxis_title='Tip Amount ($)', title='Tip vs Total Bill')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "373bfa23",
"metadata": {},
"outputs": [],
"source": [
"# Alternatively...\n",
"fig = px.scatter(df, y='tip', x='total_bill')\n",
"fig.layout.xaxis.title = 'Total Bill ($)'\n",
"fig"
]
},
{
"cell_type": "markdown",
"id": "39748aab",
"metadata": {},
"source": [
"### Horizontal and Vertical Lines"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ddae59f",
"metadata": {},
"outputs": [],
"source": [
"fig = px.scatter(df, y='tip', x='total_bill')\n",
"fig.add_vline(\n",
" df.total_bill.mean(), line_dash='dot', opacity=.7,\n",
" annotation_text=f'Average Total Bill: ${df.total_bill.mean():.2f}',\n",
" annotation_position='top right'\n",
")\n",
"fig.add_hline(\n",
" df.tip.mean(), line_dash='dot', opacity=.7,\n",
" annotation_text=f'Average Tip: ${df.tip.mean():.2f}',\n",
" annotation_position='bottom right'\n",
")"
]
},
{
"cell_type": "markdown",
"id": "d78c18db",
"metadata": {},
"source": [
"### Axis Ticks"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c302703a",
"metadata": {},
"outputs": [],
"source": [
"fig = px.scatter(df, y='tip', x='total_bill')\n",
"fig.update_layout(\n",
" xaxis_tickmode='array', xaxis_tickvals=[10, 20, 22.5, 40],\n",
" yaxis_tickmode='linear', yaxis_tick0=1, yaxis_dtick=0.5,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5072c381",
"metadata": {},
"source": [
"### Axis Limits"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c82de6bd",
"metadata": {},
"outputs": [],
"source": [
"fig = px.scatter(df, y='tip', x='total_bill')\n",
"fig.update_layout(xaxis_range=[10, 25], yaxis_range=[0, 8])"
]
},
{
"cell_type": "markdown",
"id": "b72cb384",
"metadata": {},
"source": [
"### Annotations"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25d6a362",
"metadata": {},
"outputs": [],
"source": [
"fig = px.scatter(df, y='tip', x='total_bill')\n",
"fig.add_annotation(\n",
" x=df.total_bill.max(), y=df.tip.max(),\n",
" ayref='y', ay='9', axref='x', ax=55,\n",
" text='Highest Tip <br />and Total Bill'\n",
")"
]
},
{
"cell_type": "markdown",
"id": "2ab79116",
"metadata": {},
"source": [
"### Hover Text"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db6e39df",
"metadata": {},
"outputs": [],
"source": [
"fig = px.scatter(df, y='tip', x='total_bill', hover_name='time')\n",
"fig"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "596040cb",
"metadata": {},
"outputs": [],
"source": [
"fig = px.scatter(df, y='tip', x='total_bill', hover_data=['day', 'time'])\n",
"fig"
]
},
{
"cell_type": "markdown",
"id": "2bbac231",
"metadata": {},
"source": [
"## Additional Features\n",
"\n",
"### Saving Figures\n",
"\n",
"You can always just take a screenshot with command + shift + 5 or click the \"download plot as png\" button."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2682a774",
"metadata": {},
"outputs": [],
"source": [
"fig = px.scatter(df, y='tip', x='total_bill')\n",
"fig.write_image('scatter_tip_total_bill.png')\n",
"fig.write_html('scatter_tip_total_bill.html')"
]
},
{
"cell_type": "markdown",
"id": "6c3b58ea",
"metadata": {},
"source": [
"Note the html file embeds your data and the plotly library, so can be quite large!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c156bd47",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"png_size = os.path.getsize('scatter_tip_total_bill.png')\n",
"html_size = os.path.getsize('scatter_tip_total_bill.html')\n",
"\n",
"print(f'''\n",
"PNG file size = {png_size / 1024:7.2f} K ({png_size / 1024 / 1024:.2f} M)\n",
"HTML file size = {html_size / 1024:.2f} K ({html_size / 1024 / 1024:.2f} M)\n",
"'''.strip())"
]
},
{
"cell_type": "markdown",
"id": "2d8e449b",
"metadata": {},
"source": [
"### Pandas Plotting Backend\n",
"\n",
"Any `.plot` calls will use plotly visualizations."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9bca141c",
"metadata": {},
"outputs": [],
"source": [
"pd.options.plotting.backend = 'plotly'\n",
"\n",
"df.plot.scatter(y='tip', x='total_bill')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "45fc7d98",
"metadata": {},
"outputs": [],
"source": [
"import pydataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e7ad483",
"metadata": {},
"outputs": [],
"source": [
"pydataset.data().sample(20)"
]
},
{
"cell_type": "markdown",
"id": "3a8cdee3",
"metadata": {},
"source": [
"## Exercise\n",
"\n",
"1. Use the code snippet below to get you started with a dataset of various characteristics of scooby doo episodes:\n",
"\n",
" ```python\n",
" df = pd.read_csv('https://github.com/rfordatascience/tidytuesday/raw/master/data/2021/2021-07-13/scoobydoo.csv')\n",
" ```\n",
"\n",
"1. Do episodes where the monster is an animal or ghost have higer imdb ratings?\n",
"1. Does number of \"zoinks\" correlate with the number of \"jinkies\"? Does whether or not the episode contains a door gag affect this?\n",
"1. Does the setting terrain affect the imdb rating of an episode? What if you take into account whether or not scrappy doo was in the episode?\n",
"1. Do number of monsters correlate with number of \"jeepers\"? Does this vary by network?\n",
"1. Use plotly express to continue to explore the scooby doo episode dataset.\n",
"\n",
"---\n",
"\n",
"1. Download the kickstarter dataset from kaggle: https://www.kaggle.com/datasets/kemical/kickstarter-projects?select=ks-projects-201801.csv\n",
"1. Visualize the relationship between the goal and pledged amount by category.\n",
"1. Visualize the percentage of successful projects by category. How does number of backers affect this?\n",
"1. Visualize the number of successful projects over time.\n",
"1. What is the relationship between campaign length (deadline - launch date) and number of backers? How does this vary between successful and failed projects?\n",
"1. Use plotly express to further explore the kickstarter dataset."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment