Skip to content

Instantly share code, notes, and snippets.

@saxenaiway
Created April 24, 2021 07:55
Show Gist options
  • Save saxenaiway/1217810b86b52bca8dcd217206a8c2de to your computer and use it in GitHub Desktop.
Save saxenaiway/1217810b86b52bca8dcd217206a8c2de to your computer and use it in GitHub Desktop.
Created on Skills Network Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
" <img src=\"https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/labs/Module%204/logo.png\" width=\"300\" alt=\"cognitiveclass.ai logo\" />\n",
"</center>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Basic Plotly Charts\n",
"\n",
"Estimated time needed: 30 minutes\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Objectives\n",
"\n",
"In this lab, you will learn about creating plotly charts using plotly.graph_objects and plotly.express.\n",
"\n",
"Learn more about:\n",
"\n",
"- [Plotly python](https://plotly.com/python?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"- [Plotly Graph Objects](https://plotly.com/python/graph-objects?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) \n",
"- [Plotly Express](https://plotly.com/python/plotly-express?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"- Handling data using [Pandas](https://pandas.pydata.org?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"\n",
"We will be using the [airline dataset](https://developer.ibm.com/exchanges/data/all/airline?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) from [Data Asset eXchange](https://developer.ibm.com/exchanges/data/).\n",
"\n",
"#### Airline Reporting Carrier On-Time Performance Dataset\n",
"\n",
"The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.\n",
"\n",
"Preview data, dataset metadata, and data glossary [here.](https://dax-cdn.cdn.appdomain.cloud/dax-airline/1.0.1/data-preview/index.html)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import required libraries\n",
"import pandas as pd\n",
"import plotly.express as px\n",
"import plotly.graph_objects as go"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Read Data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Read the airline data into pandas dataframe\n",
"airline_data = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv', \n",
" encoding = \"ISO-8859-1\",\n",
" dtype={'Div1Airport': str, 'Div1TailNum': str, \n",
" 'Div2Airport': str, 'Div2TailNum': str})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Preview the first 5 lines of the loaded data \n",
"airline_data.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Shape of the data\n",
"airline_data.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Randomly sample 500 data points. Setting the random state to be 42 so that we get same result.\n",
"data = airline_data.sample(n=500, random_state=42)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the shape of the trimmed data\n",
"data.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Lab structure\n",
"\n",
"#### plotly.graph_objects\n",
"\n",
"1. Review scatter plot creation\n",
"\n",
" Theme: How departure time changes with respect to airport distance\n",
"2. **To do** - Create line plot\n",
"\n",
" Theme: Extract average monthly delay time and see how it changes over the year\n",
"\n",
"#### plotly.express\n",
"\n",
"1. Review bar chart creation\n",
"\n",
" Theme: Extract number of flights from a specific airline that goes to a destination\n",
"2. **To do** - Create bubble chart\n",
"\n",
" Theme: Get number of flights as per reporting airline\n",
"3. **To do** - Create histogram \n",
"\n",
" Theme: Get distribution of arrival delay\n",
"4. Review pie chart\n",
"\n",
" Theme: Proportion of distance group by month (month indicated by numbers)\n",
"5. **To do** - Create sunburst chart\n",
"\n",
" Theme: Hierarchical view in othe order of month and destination state holding value of number of flights\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# plotly.graph_objects¶\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Scatter Plot\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Learn more about usage of scatter plot [here](https://plotly.com/python/line-and-scatter?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"\n",
"#### Idea: How departure time changes with respect to airport distance\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# First we create a figure using go.Figure and adding trace to it through go.scatter\n",
"fig = go.Figure(data=go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers', marker=dict(color='red')))\n",
"# Updating layout through `update_layout`. Here we are adding title to the plot and providing title to x and y axis.\n",
"fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')\n",
"# Display the figure\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Line Plot\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Learn more about line plot [here](https://plotly.com/python/line-charts?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"\n",
"#### Idea: Extract average monthly arrival delay time and see how it changes over the year.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Group the data by Month and compute average over arrival delay time.\n",
"line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Display the data\n",
"line_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### To do:\n",
"\n",
"- Create a line plot with x-axis being the month and y-axis being computed average delay time. Update plot title, \n",
" xaxis, and yaxis title.\n",
"- Hint: Scatter and line plot vary by updating mode parameter.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create line plot here\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- The answer is below:\n",
"\n",
"fig = go.Figure(data=go.Scatter(x=line_data['Month'], y=line_data['ArrDelay'], mode='lines', marker=dict(color='green')))\n",
"fig.update_layout(title='Month vs Average Flight Delay Time', xaxis_title='Month', yaxis_title='ArrDelay')\n",
"fig.show()\n",
"\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# plotly.express¶\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Bar Chart\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Learn more about bar chart [here](https://plotly.com/python/bar-charts?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"\n",
"#### Idea: Extract number of flights from a specific airline that goes to a destination\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Group the data by destination state and reporting airline. Compute total number of flights in each combination\n",
"bar_data = data.groupby(['DestState'])['Flights'].sum().reset_index()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Display the data\n",
"bar_data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use plotly express bar chart function px.bar. Provide input data, x and y axis variable, and title of the chart.\n",
"# This will give total number of flights to the destination state.\n",
"fig = px.bar(bar_data, x=\"DestState\", y=\"Flights\", title='Total number of flights to the destination state split by reporting airline') \n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Bubble Chart\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Learn more about bubble chart [here](https://plotly.com/python/bubble-charts?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"\n",
"#### Idea: Get number of flights as per reporting airline\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Group the data by reporting airline and get number of flights\n",
"bub_data = data.groupby('Reporting_Airline')['Flights'].sum().reset_index()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bub_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**To do**\n",
"\n",
"- Create a bubble chart using the `bub_data` with x-axis being reporting airline and y-axis being flights.\n",
"- Provide title to the chart\n",
"- Update size of the bubble based on the number of flights. Use `size` parameter.\n",
"- Update name of the hover tooltip to `reporting_airline` using `hover_name` parameter.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create bubble chart here\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- The answer is below:\n",
"\n",
" \n",
"fig = px.scatter(bub_data, x=\"Reporting_Airline\", y=\"Flights\", size=\"Flights\",\n",
" hover_name=\"Reporting_Airline\", title='Reporting Airline vs Number of Flights', size_max=60)\n",
"fig.show()\n",
"\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Histogram\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Learn more about histogram [here](https://plotly.com/python/histograms?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"\n",
"#### Idea: Get distribution of arrival delay\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set missing values to 0\n",
"data['ArrDelay'] = data['ArrDelay'].fillna(0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**To do**\n",
"\n",
"- Use px.histogram and pass the dataset.\n",
"- Pass `ArrDelay` to x parameter.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create histogram here\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- The answer is below:\n",
"\n",
"fig = px.histogram(data, x=\"ArrDelay\")\n",
"fig.show()\n",
"\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pie Chart\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Learn more about pie chart [here](https://plotly.com/python/pie-charts?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"\n",
"#### Idea: Proportion of distance group by month (month indicated by numbers)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Use px.pie function to create the chart. Input dataset. \n",
"# Values parameter will set values associated to the sector. 'Month' feature is passed to it.\n",
"# labels for the sector are passed to the `names` parameter.\n",
"fig = px.pie(data, values='Month', names='DistanceGroup', title='Distance group proportion by month')\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sunburst Charts\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Learn more about sunburst chart [here](https://plotly.com/python/sunburst-charts?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ)\n",
"\n",
"#### Idea: Hierarchical view in othe order of month and destination state holding value of number of flights\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**To do**\n",
"\n",
"- Create sunburst chart using `px.sunburst`.\n",
"- Define hierarchy of sectors from root to leaves in `path` parameter. Here, we go from `Month` to `DestStateName` feature.\n",
"- Set sector values in `values` paramter. Here, we can pass in `Flights` feature. \n",
"- Show the figure.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create sunburst chart here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click **here** for the solution.\n",
"\n",
"<!-- The answer is below:\n",
" \n",
"fig = px.sunburst(data, path=['Month', 'DestStateName'], values='Flights')\n",
"fig.show()\n",
"\n",
"-->\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"Congratulations for completing your first lab. \n",
"\n",
"In this lab, you have learnt how to use `plotly.graph_objects` and `plotly.express` for creating plots and charts. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Author\n",
"\n",
"[Saishruthi Swaminathan](https://www.linkedin.com/in/saishruthi-swaminathan?cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork-20297740&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ) \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Changelog\n",
"\n",
"| Date | Version | Changed by | Change Description |\n",
"| ---------- | ------- | ---------- | ------------------------------------ |\n",
"| 12-18-2020 | 1.0 | Nayef | Added dataset link and upload to Git |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## <h3 align=\"center\"> © IBM Corporation 2020. All rights reserved. <h3/>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment