Skip to content

Instantly share code, notes, and snippets.

@quarrazzella
Created October 25, 2018 10:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save quarrazzella/0445cab798ef5552a5e2cf302a812484 to your computer and use it in GitHub Desktop.
Save quarrazzella/0445cab798ef5552a5e2cf302a812484 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<center>\n",
"<img src=\"../../img/ods_stickers.jpg\" />\n",
" \n",
"## [mlcourse.ai](mlcourse.ai) – Open Machine Learning Course \n",
"\n",
"Author: [Yury Kashnitskiy](https://yorko.github.io). \n",
"Translated and edited by [Maxim Keremet](https://www.linkedin.com/in/maximkeremet/), [Artem Trunov](https://www.linkedin.com/in/datamove/), and [Aditya Soni](https://www.linkedin.com/in/aditya-soni-0505a9124/). This material is subject to the terms and conditions of the [Creative Commons CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. Free use is permitted for any non-commercial purpose."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# <center>Assignment #2. Fall 2018 <br> Exploratory Data Analysis (EDA) of US flights <br> (using Pandas, Matplotlib & Seaborn)\n",
"\n",
"<img src='../../img/plane_sunset.png' width=50%>\n",
"\n",
"Prior to working on the assignment, you'd better check out the corresponding course material:\n",
" - [Visualization: from Simple Distributions to Dimensionality Reduction](https://mlcourse.ai/notebooks/blob/master/jupyter_english/topic02_visual_data_analysis/topic2_visual_data_analysis.ipynb?flush_cache=true)\n",
" - [Overview of Seaborn, Matplotlib and Plotly libraries](https://mlcourse.ai/notebooks/blob/master/jupyter_english/topic02_visual_data_analysis/topic2_additional_seaborn_matplotlib_plotly.ipynb?flush_cache=true)\n",
" - first lectures in [this](https://www.youtube.com/watch?v=QKTuw4PNOsU&list=PLVlY_7IJCMJeRfZ68eVfEcu-UcN9BbwiX) YouTube playlist \n",
"\n",
"### Your task is to:\n",
" - write code and perform computations in the cells below\n",
" - choose answers in the [webform](https://docs.google.com/forms/d/1qSTjLAGqsmpFRhacv0vM-CMQSTT_mtOalNXdRTcdtM0/edit)\n",
" - submit answers with **the very same email and name** as in assignment 1. This is a part of the assignment, if you don't manage to do so, you won't get credits. If in doubt, you can re-submit A1 form till the deadline for A1, no problem\n",
" \n",
"### <center> Deadline for A2: 2018 October 21, 20:59 CET\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"# pip install seaborn \n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Download the data [archive](http://stat-computing.org/dataexpo/2009/2008.csv.bz2) (Archived ~ 114 Mb, unzipped - ~ 690 Mb). No need to unzip - pandas can unbzip on the fly.\n",
"* Place it in the \"../../data\" folder, or change the path below according to your location.\n",
"* The dataset has information about carriers and flights between US airports during the year 2008. \n",
"* Column description is available [here](http://www.transtats.bts.gov/Fields.asp?Table_ID=236). Visit this site to find ex. meaning of flight cancellation codes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Reading data into memory and creating a Pandas _DataFrame_ object**\n",
"\n",
"(This may take a while, be patient)\n",
"\n",
"We are not going to read in the whole dataset. In order to reduce memory footprint, we instead load only needed columns and cast them suitable data types."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"dtype = {'DayOfWeek': np.uint8, 'DayofMonth': np.uint8, 'Month': np.uint8 , 'Cancelled': np.uint8, \n",
" 'Year': np.uint16, 'FlightNum': np.uint16 , 'Distance': np.uint16, \n",
" 'UniqueCarrier': str, 'CancellationCode': str, 'Origin': str, 'Dest': str,\n",
" 'ArrDelay': np.float16, 'DepDelay': np.float16, 'CarrierDelay': np.float16,\n",
" 'WeatherDelay': np.float16, 'NASDelay': np.float16, 'SecurityDelay': np.float16,\n",
" 'LateAircraftDelay': np.float16, 'DepTime': np.float16}"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 26.6 s, sys: 311 ms, total: 26.9 s\n",
"Wall time: 25.6 s\n"
]
}
],
"source": [
"%%time\n",
"# change the path if needed\n",
"path = '../../data/2008.csv.bz2'\n",
"flights_df = pd.read_csv(path, usecols=dtype.keys(), dtype=dtype)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Check the number of rows and columns and print column names.**"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(7009728, 19)\n",
"Index(['Year', 'Month', 'DayofMonth', 'DayOfWeek', 'DepTime', 'UniqueCarrier',\n",
" 'FlightNum', 'ArrDelay', 'DepDelay', 'Origin', 'Dest', 'Distance',\n",
" 'Cancelled', 'CancellationCode', 'CarrierDelay', 'WeatherDelay',\n",
" 'NASDelay', 'SecurityDelay', 'LateAircraftDelay'],\n",
" dtype='object')\n"
]
}
],
"source": [
"print(flights_df.shape)\n",
"print(flights_df.columns)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Print first 5 rows of the dataset.**"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Year</th>\n",
" <th>Month</th>\n",
" <th>DayofMonth</th>\n",
" <th>DayOfWeek</th>\n",
" <th>DepTime</th>\n",
" <th>UniqueCarrier</th>\n",
" <th>FlightNum</th>\n",
" <th>ArrDelay</th>\n",
" <th>DepDelay</th>\n",
" <th>Origin</th>\n",
" <th>Dest</th>\n",
" <th>Distance</th>\n",
" <th>Cancelled</th>\n",
" <th>CancellationCode</th>\n",
" <th>CarrierDelay</th>\n",
" <th>WeatherDelay</th>\n",
" <th>NASDelay</th>\n",
" <th>SecurityDelay</th>\n",
" <th>LateAircraftDelay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2008</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>2003.0</td>\n",
" <td>WN</td>\n",
" <td>335</td>\n",
" <td>-14.0</td>\n",
" <td>8.0</td>\n",
" <td>IAD</td>\n",
" <td>TPA</td>\n",
" <td>810</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2008</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>754.0</td>\n",
" <td>WN</td>\n",
" <td>3231</td>\n",
" <td>2.0</td>\n",
" <td>19.0</td>\n",
" <td>IAD</td>\n",
" <td>TPA</td>\n",
" <td>810</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2008</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>628.0</td>\n",
" <td>WN</td>\n",
" <td>448</td>\n",
" <td>14.0</td>\n",
" <td>8.0</td>\n",
" <td>IND</td>\n",
" <td>BWI</td>\n",
" <td>515</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2008</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>926.0</td>\n",
" <td>WN</td>\n",
" <td>1746</td>\n",
" <td>-6.0</td>\n",
" <td>-4.0</td>\n",
" <td>IND</td>\n",
" <td>BWI</td>\n",
" <td>515</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2008</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>1829.0</td>\n",
" <td>WN</td>\n",
" <td>3920</td>\n",
" <td>34.0</td>\n",
" <td>34.0</td>\n",
" <td>IND</td>\n",
" <td>BWI</td>\n",
" <td>515</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>2.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>32.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Year Month DayofMonth DayOfWeek DepTime UniqueCarrier FlightNum \\\n",
"0 2008 1 3 4 2003.0 WN 335 \n",
"1 2008 1 3 4 754.0 WN 3231 \n",
"2 2008 1 3 4 628.0 WN 448 \n",
"3 2008 1 3 4 926.0 WN 1746 \n",
"4 2008 1 3 4 1829.0 WN 3920 \n",
"\n",
" ArrDelay DepDelay Origin Dest Distance Cancelled CancellationCode \\\n",
"0 -14.0 8.0 IAD TPA 810 0 NaN \n",
"1 2.0 19.0 IAD TPA 810 0 NaN \n",
"2 14.0 8.0 IND BWI 515 0 NaN \n",
"3 -6.0 -4.0 IND BWI 515 0 NaN \n",
"4 34.0 34.0 IND BWI 515 0 NaN \n",
"\n",
" CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay \n",
"0 NaN NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN NaN \n",
"4 2.0 0.0 0.0 0.0 32.0 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"flights_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Transpose the frame to see all features at once.**"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Year</th>\n",
" <td>2008</td>\n",
" <td>2008</td>\n",
" <td>2008</td>\n",
" <td>2008</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Month</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DayofMonth</th>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DayOfWeek</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DepTime</th>\n",
" <td>2003</td>\n",
" <td>754</td>\n",
" <td>628</td>\n",
" <td>926</td>\n",
" <td>1829</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UniqueCarrier</th>\n",
" <td>WN</td>\n",
" <td>WN</td>\n",
" <td>WN</td>\n",
" <td>WN</td>\n",
" <td>WN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FlightNum</th>\n",
" <td>335</td>\n",
" <td>3231</td>\n",
" <td>448</td>\n",
" <td>1746</td>\n",
" <td>3920</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ArrDelay</th>\n",
" <td>-14</td>\n",
" <td>2</td>\n",
" <td>14</td>\n",
" <td>-6</td>\n",
" <td>34</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DepDelay</th>\n",
" <td>8</td>\n",
" <td>19</td>\n",
" <td>8</td>\n",
" <td>-4</td>\n",
" <td>34</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Origin</th>\n",
" <td>IAD</td>\n",
" <td>IAD</td>\n",
" <td>IND</td>\n",
" <td>IND</td>\n",
" <td>IND</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Dest</th>\n",
" <td>TPA</td>\n",
" <td>TPA</td>\n",
" <td>BWI</td>\n",
" <td>BWI</td>\n",
" <td>BWI</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Distance</th>\n",
" <td>810</td>\n",
" <td>810</td>\n",
" <td>515</td>\n",
" <td>515</td>\n",
" <td>515</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cancelled</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CancellationCode</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CarrierDelay</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>WeatherDelay</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>NASDelay</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SecurityDelay</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LateAircraftDelay</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>32</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3 4\n",
"Year 2008 2008 2008 2008 2008\n",
"Month 1 1 1 1 1\n",
"DayofMonth 3 3 3 3 3\n",
"DayOfWeek 4 4 4 4 4\n",
"DepTime 2003 754 628 926 1829\n",
"UniqueCarrier WN WN WN WN WN\n",
"FlightNum 335 3231 448 1746 3920\n",
"ArrDelay -14 2 14 -6 34\n",
"DepDelay 8 19 8 -4 34\n",
"Origin IAD IAD IND IND IND\n",
"Dest TPA TPA BWI BWI BWI\n",
"Distance 810 810 515 515 515\n",
"Cancelled 0 0 0 0 0\n",
"CancellationCode NaN NaN NaN NaN NaN\n",
"CarrierDelay NaN NaN NaN NaN 2\n",
"WeatherDelay NaN NaN NaN NaN 0\n",
"NASDelay NaN NaN NaN NaN 0\n",
"SecurityDelay NaN NaN NaN NaN 0\n",
"LateAircraftDelay NaN NaN NaN NaN 32"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"flights_df.head().T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Examine data types of all features and total dataframe size in memory.**"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 7009728 entries, 0 to 7009727\n",
"Data columns (total 19 columns):\n",
"Year uint16\n",
"Month uint8\n",
"DayofMonth uint8\n",
"DayOfWeek uint8\n",
"DepTime float16\n",
"UniqueCarrier object\n",
"FlightNum uint16\n",
"ArrDelay float16\n",
"DepDelay float16\n",
"Origin object\n",
"Dest object\n",
"Distance uint16\n",
"Cancelled uint8\n",
"CancellationCode object\n",
"CarrierDelay float16\n",
"WeatherDelay float16\n",
"NASDelay float16\n",
"SecurityDelay float16\n",
"LateAircraftDelay float16\n",
"dtypes: float16(8), object(4), uint16(3), uint8(4)\n",
"memory usage: 387.7+ MB\n"
]
}
],
"source": [
"flights_df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Get basic statistics of each feature.**"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Year</th>\n",
" <td>7009728.0</td>\n",
" <td>2008.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2008.0</td>\n",
" <td>2008.0</td>\n",
" <td>2008.0</td>\n",
" <td>2008.0</td>\n",
" <td>2008.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Month</th>\n",
" <td>7009728.0</td>\n",
" <td>6.375130</td>\n",
" <td>3.406737</td>\n",
" <td>1.0</td>\n",
" <td>3.0</td>\n",
" <td>6.0</td>\n",
" <td>9.0</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DayofMonth</th>\n",
" <td>7009728.0</td>\n",
" <td>15.728015</td>\n",
" <td>8.797068</td>\n",
" <td>1.0</td>\n",
" <td>8.0</td>\n",
" <td>16.0</td>\n",
" <td>23.0</td>\n",
" <td>31.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DayOfWeek</th>\n",
" <td>7009728.0</td>\n",
" <td>3.924182</td>\n",
" <td>1.988259</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>4.0</td>\n",
" <td>6.0</td>\n",
" <td>7.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DepTime</th>\n",
" <td>6873482.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1.0</td>\n",
" <td>928.0</td>\n",
" <td>1325.0</td>\n",
" <td>1728.0</td>\n",
" <td>2400.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FlightNum</th>\n",
" <td>7009728.0</td>\n",
" <td>2224.200105</td>\n",
" <td>1961.715999</td>\n",
" <td>1.0</td>\n",
" <td>622.0</td>\n",
" <td>1571.0</td>\n",
" <td>3518.0</td>\n",
" <td>9743.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ArrDelay</th>\n",
" <td>6855029.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>-519.0</td>\n",
" <td>-10.0</td>\n",
" <td>-2.0</td>\n",
" <td>12.0</td>\n",
" <td>2460.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DepDelay</th>\n",
" <td>6873482.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>-534.0</td>\n",
" <td>-4.0</td>\n",
" <td>-1.0</td>\n",
" <td>8.0</td>\n",
" <td>2468.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Distance</th>\n",
" <td>7009728.0</td>\n",
" <td>726.387029</td>\n",
" <td>562.101803</td>\n",
" <td>11.0</td>\n",
" <td>325.0</td>\n",
" <td>581.0</td>\n",
" <td>954.0</td>\n",
" <td>4962.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Cancelled</th>\n",
" <td>7009728.0</td>\n",
" <td>0.019606</td>\n",
" <td>0.138643</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CarrierDelay</th>\n",
" <td>1524735.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>16.0</td>\n",
" <td>2436.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>WeatherDelay</th>\n",
" <td>1524735.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1352.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>NASDelay</th>\n",
" <td>1524735.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>6.0</td>\n",
" <td>21.0</td>\n",
" <td>1357.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SecurityDelay</th>\n",
" <td>1524735.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>392.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>LateAircraftDelay</th>\n",
" <td>1524735.0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>26.0</td>\n",
" <td>1316.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std min 25% \\\n",
"Year 7009728.0 2008.000000 0.000000 2008.0 2008.0 \n",
"Month 7009728.0 6.375130 3.406737 1.0 3.0 \n",
"DayofMonth 7009728.0 15.728015 8.797068 1.0 8.0 \n",
"DayOfWeek 7009728.0 3.924182 1.988259 1.0 2.0 \n",
"DepTime 6873482.0 NaN NaN 1.0 928.0 \n",
"FlightNum 7009728.0 2224.200105 1961.715999 1.0 622.0 \n",
"ArrDelay 6855029.0 NaN NaN -519.0 -10.0 \n",
"DepDelay 6873482.0 NaN NaN -534.0 -4.0 \n",
"Distance 7009728.0 726.387029 562.101803 11.0 325.0 \n",
"Cancelled 7009728.0 0.019606 0.138643 0.0 0.0 \n",
"CarrierDelay 1524735.0 NaN NaN 0.0 0.0 \n",
"WeatherDelay 1524735.0 NaN NaN 0.0 0.0 \n",
"NASDelay 1524735.0 NaN NaN 0.0 0.0 \n",
"SecurityDelay 1524735.0 NaN NaN 0.0 0.0 \n",
"LateAircraftDelay 1524735.0 NaN NaN 0.0 0.0 \n",
"\n",
" 50% 75% max \n",
"Year 2008.0 2008.0 2008.0 \n",
"Month 6.0 9.0 12.0 \n",
"DayofMonth 16.0 23.0 31.0 \n",
"DayOfWeek 4.0 6.0 7.0 \n",
"DepTime 1325.0 1728.0 2400.0 \n",
"FlightNum 1571.0 3518.0 9743.0 \n",
"ArrDelay -2.0 12.0 2460.0 \n",
"DepDelay -1.0 8.0 2468.0 \n",
"Distance 581.0 954.0 4962.0 \n",
"Cancelled 0.0 0.0 1.0 \n",
"CarrierDelay 0.0 16.0 2436.0 \n",
"WeatherDelay 0.0 0.0 1352.0 \n",
"NASDelay 6.0 21.0 1357.0 \n",
"SecurityDelay 0.0 0.0 392.0 \n",
"LateAircraftDelay 0.0 26.0 1316.0 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"flights_df.describe().T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Count unique Carriers and plot their relative share of flights:**"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"20"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"flights_df['UniqueCarrier'].nunique()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"flights_df.groupby('UniqueCarrier').size().plot(kind='bar');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**We can also _group by_ category/categories in order to calculate different aggregated statistics.**\n",
"\n",
"**For example, finding top-3 flight codes, that have the largest total distance travelled in year 2008.**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"UniqueCarrier FlightNum\n",
"CO 15 1796244.0\n",
" 14 1796244.0\n",
"UA 52 1789722.0\n",
"Name: Distance, dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"flights_df.groupby(['UniqueCarrier','FlightNum'])['Distance'].sum().sort_values(ascending=False).iloc[:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Another way:**"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe thead tr:last-of-type th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th colspan=\"3\" halign=\"left\">Distance</th>\n",
" <th>Cancelled</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th></th>\n",
" <th>mean</th>\n",
" <th>sum</th>\n",
" <th>count</th>\n",
" <th>sum</th>\n",
" </tr>\n",
" <tr>\n",
" <th>UniqueCarrier</th>\n",
" <th>FlightNum</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"2\" valign=\"top\">CO</th>\n",
" <th>15</th>\n",
" <td>4962.000000</td>\n",
" <td>1796244.0</td>\n",
" <td>362</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>4962.000000</td>\n",
" <td>1796244.0</td>\n",
" <td>362</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UA</th>\n",
" <th>52</th>\n",
" <td>2465.181818</td>\n",
" <td>1789722.0</td>\n",
" <td>726</td>\n",
" <td>8</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Distance Cancelled\n",
" mean sum count sum\n",
"UniqueCarrier FlightNum \n",
"CO 15 4962.000000 1796244.0 362 0\n",
" 14 4962.000000 1796244.0 362 0\n",
"UA 52 2465.181818 1789722.0 726 8"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"flights_df.groupby(['UniqueCarrier','FlightNum'])\\\n",
" .agg({'Distance': [np.mean, np.sum, 'count'],\n",
" 'Cancelled': np.sum})\\\n",
" .sort_values(('Distance', 'sum'), ascending=False)\\\n",
" .iloc[0:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Number of flights by days of week and months:**"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>DayOfWeek</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" <th>4</th>\n",
" <th>5</th>\n",
" <th>6</th>\n",
" <th>7</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Month</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>80807</td>\n",
" <td>97298</td>\n",
" <td>100080</td>\n",
" <td>102043</td>\n",
" <td>81940</td>\n",
" <td>67178</td>\n",
" <td>76419</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>81504</td>\n",
" <td>79700</td>\n",
" <td>80587</td>\n",
" <td>82158</td>\n",
" <td>102726</td>\n",
" <td>66462</td>\n",
" <td>76099</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>103210</td>\n",
" <td>81159</td>\n",
" <td>82307</td>\n",
" <td>82831</td>\n",
" <td>82936</td>\n",
" <td>86153</td>\n",
" <td>97494</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>82463</td>\n",
" <td>100785</td>\n",
" <td>102586</td>\n",
" <td>82799</td>\n",
" <td>82964</td>\n",
" <td>68304</td>\n",
" <td>78225</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>80626</td>\n",
" <td>79884</td>\n",
" <td>81264</td>\n",
" <td>102572</td>\n",
" <td>102878</td>\n",
" <td>84493</td>\n",
" <td>74576</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>104168</td>\n",
" <td>82160</td>\n",
" <td>82902</td>\n",
" <td>83617</td>\n",
" <td>83930</td>\n",
" <td>72322</td>\n",
" <td>99566</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>84095</td>\n",
" <td>103429</td>\n",
" <td>103315</td>\n",
" <td>105035</td>\n",
" <td>79349</td>\n",
" <td>72219</td>\n",
" <td>80489</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>82983</td>\n",
" <td>80895</td>\n",
" <td>81773</td>\n",
" <td>82625</td>\n",
" <td>103878</td>\n",
" <td>86155</td>\n",
" <td>93970</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>94300</td>\n",
" <td>91533</td>\n",
" <td>74057</td>\n",
" <td>75589</td>\n",
" <td>75881</td>\n",
" <td>58343</td>\n",
" <td>71205</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>75131</td>\n",
" <td>72195</td>\n",
" <td>91900</td>\n",
" <td>94123</td>\n",
" <td>93894</td>\n",
" <td>58168</td>\n",
" <td>70794</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>74214</td>\n",
" <td>72443</td>\n",
" <td>73653</td>\n",
" <td>68071</td>\n",
" <td>70484</td>\n",
" <td>76031</td>\n",
" <td>88376</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>92700</td>\n",
" <td>90568</td>\n",
" <td>85241</td>\n",
" <td>70761</td>\n",
" <td>74306</td>\n",
" <td>61708</td>\n",
" <td>69674</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"DayOfWeek 1 2 3 4 5 6 7\n",
"Month \n",
"1 80807 97298 100080 102043 81940 67178 76419\n",
"2 81504 79700 80587 82158 102726 66462 76099\n",
"3 103210 81159 82307 82831 82936 86153 97494\n",
"4 82463 100785 102586 82799 82964 68304 78225\n",
"5 80626 79884 81264 102572 102878 84493 74576\n",
"6 104168 82160 82902 83617 83930 72322 99566\n",
"7 84095 103429 103315 105035 79349 72219 80489\n",
"8 82983 80895 81773 82625 103878 86155 93970\n",
"9 94300 91533 74057 75589 75881 58343 71205\n",
"10 75131 72195 91900 94123 93894 58168 70794\n",
"11 74214 72443 73653 68071 70484 76031 88376\n",
"12 92700 90568 85241 70761 74306 61708 69674"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.crosstab(flights_df.Month, flights_df.DayOfWeek)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**It can also be handy to color such tables in order to easily notice outliers:**"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAKQAAAD8CAYAAAD5aA/bAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAACy1JREFUeJzt3VuIXeUZxvH/05kkmpM5qDHNjEYxKCKo6SC1ghdaqdY0SulFrIralFCKxyoSvZGWFnohYi+KJSSmAYNexJRalVo1xrYgaTKjbYzxEFJ1xlMSz1o0GXx7MdsyxshM5/uc/ep6fhBm9nbxrpfk8Vtr7732uxQRmGXxtXY3YDacA2mpOJCWigNpqTiQlooDaak4kJaKA2mpOJCWSud47mz27EOju3t+cZ19/+wtbwbYW6UKDFaqAzB74cIqdXr7PqpSp6vroCp1BgZ690TEYSNtN66B7O6ez4YNW4rrvDJbFbqBl6pUgV2V6gBc9vjjVepo0o4qda699oQqda67Ti+OZjsfsi0VB9JScSAtFQfSUikKpKRzJD0raYek5bWasuYacyAldQC/Bc4FTgAulFTnJZk1VskKeSqwIyJ2RsRe4G7g/DptWVOVBHIe0D/s8UDruU+RtEzSFklb3nhjd8HurAlKAnmgd6c/8wWdiFgRET0R0TN79ohv1FvDlQRyAOge9rgLeKWsHWu6kkBuBhZIOlrSRGAJcG+dtqypxvxZdkQMSroCeBDoAO6IiG3VOrNGKrq4IiIeAB6o1IuZP6mxXBxIS8WBtFTG9QLdzk6YMaO8zmvb6swj+sbsKmWYc/C7dQoBv5s0qVKlf1Sp8uGHVcqMmldIS8WBtFQcSEvFgbRUHEhLxYG0VBxIS8WBtFQcSEvFgbRUHEhLxYG0VBxIS8WBtFQcSEvFgbRUHEhLReN5N9gjpbi+Qp0ff1Cn58mDla70fv/9OnUABitNLH/44Spl/rV0aZU6J0FvRPSMtJ1XSEvFgbRUHEhLxYG0VBxIS6VkpHO3pEclbZe0TdLVNRuzZioZFDAIXBcRfZKmAb2SHoqIpyv1Zg005hUyIl6NiL7W7+8B2znASGez/0eVc0hJ84FTgE016llzFc/2kTQVuAe4JiI+89GHpGXAMoCZpTuzr7zSGydNYCiMayNi/YG2GT70fmrJzqwRSl5lC1gFbI+IW+u1ZE1WskKeDlwCnCnpydaf71bqyxqqZOj93znwvWrMxsyf1FgqDqSl4kBaKuM6Y/zwk07iqg0biuusn1Ln1HVPlSqw7OWXK1WqR0vPrVLn/vsrfaPgvNH9m3mFtFQcSEvFgbRUHEhLxYG0VBxIS8WBtFQcSEvFgbRUHEhLxYG0VBxIS8WBtFQcSEvFgbRUHEhLxYG0VBxIS2Vcv8KwLzp5fd+s4jonPlvnsvr586uUYcWket8GrjTynnjssSp1lt05t0qd0fIKaak4kJaKA2mpOJCWSnEgJXVIekLSfTUasmarsUJezdA4Z7NipQNLu4DzgJV12rGmK10hbwNuAD6u0ItZ0QTdRcCuiOgdYbtlkrZI2vLmm7vHujtriNIJuoslvQDczdAk3Tv332j4jPFZsw4r2J01Qcl9am6MiK6ImA8sATZExMXVOrNG8vuQlkqViysiYiOwsUYtazavkJaKA2mpOJCWigNpqSii0lDzUeiS4ooKdZa/806FKsDbb+eqAzBY55rxU3+ysEqdX26uczX8d6A3InpG2s4rpKXiQFoqDqSl4kBaKg6kpeJAWioOpKXiQFoqDqSl4kBaKg6kpeJAWioOpKXiQFoqDqSl4kBaKg6kpTKuM8aPOP54lq9ZU1zntkMOqdBNvf8b36xUB2Bxb50r+CtdeM6JL1f6RsG80V157hXSUnEgLRUH0lJxIC2V0gm6MyStk/SMpO2STqvVmDVT6avs3wB/jogfSJoITK7QkzXYmAMpaTpwBnAZQETsBfbWacuaquSQfQywG1jdui3ISklTKvVlDVUSyE5gIXB7RJwCfAAs33+j4TPGd9ccOWJfSSWBHAAGImJT6/E6hgL6KcNnjB82Y0bB7qwJSmaMvwb0Szqu9dRZwNNVurLGKn2VfSWwtvUKeydweXlL1mRFgYyIJ4ERR6yZjZY/qbFUHEhLxYG0VBxIS2VcrxhnwgQ44ojiMj96p85VzNOnVrqJ7Z49deoAt8ypM9O7b9WqKnU+nLe0Sp3R8gppqTiQlooDaak4kJaKA2mpOJCWigNpqTiQlooDaak4kJaKA2mpOJCWigNpqTiQlooDaak4kJaKA2mpjOsV4x9v3cr7Rx1VXGf6ggUVuoEf9jxXpc6xxx5epQ5A58/rXA2vpd+rUqe7u9KM8X7PGLcvIQfSUnEgLRUH0lJxIC2V0qH310raJukpSXdJOqhWY9ZMYw6kpHnAVUBPRJwIdABLajVmzVR6yO4EDpbUydAdGF4pb8marGSC7svALcBLwKvAOxHxl/23Gz5jvN7AEfuqKjlkzwTOB44Gvg5MkXTx/tsNnzF+6Nj7tIYoOWR/G/h3ROyOiH3AeuBbddqypioJ5EvANyVNliSGht5vr9OWNVXJOeQmhm4F0gdsbdVaUakva6jSofc3AzdX6sXMn9RYLg6kpeJAWiqKqHRF8CjMmdMTF120pbjOxo3lvQD07TmySp3/9PdXqQP1VoiDWVSlzsyZf6pS56231BsRI95kyyukpeJAWioOpKXiQFoqDqSl4kBaKg6kpeJAWioOpKXiQFoqDqSl4kBaKg6kpeJAWioOpKXiQFoqDqSl4kBaKuM69H5wEPZUGPDTWanr7/e8VKXOH/oHqtQBmDatq0qdXy2vUoabBn5apY5uH912XiEtFQfSUnEgLRUH0lIZMZCS7pC0S9JTw56bJekhSc+3fs78Ytu0phjNCvl74Jz9nlsOPBIRC4BHWo/Nio0YyIj4K/Dmfk+fD6xp/b4GuKByX9ZQYz2HnBMRrwK0fta7+6Q12hf+omb40PuPPtr9Re/OvuTGGsjXJc0FaP3c9XkbDh96P2nSYWPcnTXFWAN5L3Bp6/dLgT/WaceabjRv+9wFPA4cJ2lA0lLg18DZkp4Hzm49Nis24mUKEXHh5/ynsyr3YuZPaiwXB9JScSAtFQfSUhnXofc9HR2xZcqU8kInn1xeA+D666uUmXXZ4ip1AC6o9CHs6tW/qFOIGyvVmeih9/bl40BaKg6kpeJAWioOpKXiQFoqDqSl4kBaKg6kpeJAWioOpKXiQFoqDqSl4kBaKg6kpeJAWioOpKUyrleMS9oNvDjCZocCFSaRV+N+Rjaano6KiBFHl4xrIEdD0pbRXOo+XtzPyGr25EO2peJAWioZA7mi3Q3sx/2MrFpP6c4hrdkyrpDWYGkCKekcSc9K2iGp7UP0JXVLelTSdknbJF3d7p4AJHVIekLSfQl6mSFpnaRnWn9PpxXXzHDIltQBPMfQrMkBYDNwYUQ83cae5gJzI6JP0jSgF7ignT21+voZ0ANMj4hFbe5lDfC3iFgpaSIwOSLeLqmZZYU8FdgRETsjYi9wN0N3emibiHg1Ivpav78HbAfmtbMnSV3AecDKdvbR6mU6cAawCiAi9paGEfIEch7QP+zxAG3+xx9O0nzgFGBTezvhNuAG4OM29wFwDLAbWN06hVgpqXhwU5ZA6gDPtf9cApA0FbgHuCYi3m1jH4uAXRHR264e9tMJLARuj4hTgA+ocAOtLIEcALqHPe4CXmlTL/8jaQJDYVwbEevb3M7pwGJJLzB0SnOmpDvb2M8AMBARnxw11jEU0CJZArkZWCDp6NbJ8RKG7vTQNpLE0PnR9oi4tZ29AETEjRHRFRHzGfr72RARF7exn9eAfknHtZ46Cyh+wTfi0PvxEBGDkq4AHgQ6gDsiYlub2zoduATYKunJ1nM3RcQDbewpmyuBta1FZCdweWnBFG/7mH0iyyHbDHAgLRkH0lJxIC0VB9JScSAtFQfSUnEgLZX/AgZzuSHKcDmWAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.imshow(pd.crosstab(flights_df.Month, flights_df.DayOfWeek),\n",
" cmap='seismic', interpolation='none');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Flight distance histogram:**"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAEICAYAAACavRnhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAHPFJREFUeJzt3X+QVeWd5/H3JxiN4y9QtJcFIjj2TI3KDKO9apU72R7JKKgTzJbuwFqhVaZIXK2d1LA1YsyuGY2zJFPEXaeMLlkpMeuIjsaRiTiGUW8Zq0TFSAQ1hpYQbSFQCiKtE5PG7/5xntZj5/YP7tPdl9v386o6dc/5nud5zvneavrb5znnXhQRmJmZ5fhEvU/AzMwan4uJmZllczExM7NsLiZmZpbNxcTMzLK5mJiZWTYXE7MMkm6T9N/rfR5m9SZ/zsSsf5K2Ai1AD7APeAm4E1geER/s5zh/HhH/MgKnaVZ3vjIxG9yfRsQRwPHAUuBq4Pb6npLZgcXFxGyIImJPRKwG/gzokHSKpDskfR1A0kRJ35f0tqRdkn4o6ROSvgt8GvgnSd2S/iq1/wdJv5C0R9ITkk7uPVYa9xZJD0naK+lpSb9d2n+ypLXpODskfSXFPyFpiaRXJb0l6V5JR4/m+2TNycXEbD9FxDNAF/BHfXYtTvFjKabGvlI0jy8Ar1Fc4RweEd9M7R8GWoHjgB8Bd/UZbz7w18AEoBO4EUDSEcC/AP8M/FvgRODR1Oe/AhcC/yHt2w3ckp202SBcTMxqsw3o+xf/r4FJwPER8euI+GEMcFMyIlZExN6IeB/4GvAHko4qNfleRDwTET0UhWZmil8A/CIilkXEL9MYT6d9XwSujYiu0rgXSTooM1+zAbmYmNVmMrCrT+xvKa4gfiBpi6Ql/XWWNE7S0jQd9Q6wNe2aWGr2i9L6e8DhaX0q8Go/Qx8PPJCm2t4GXqZ4cKBlCDmZ1czFxGw/Sfp3FMXkyXI8XSEsjogTgD8F/lLSrN7dfYb5z8Bc4LPAUcC03uGHcAqvA789wL45ETG+tHwqIt4YwrhmNXMxMRsiSUdKugBYBfy/iNjYZ/8Fkk6UJOAdiiuCfWn3DuCEUvMjgPeBt4DfAv5mP07l+8C/kfRlSYdIOkLSGWnfbcCNko5P53SspLn7l6nZ/nMxMRvcP0naS/FX/7XAt4DLqrRrpbgx3g08BXw7Iipp3/8Evpqmn/4bxWdVfg68QfHZlXVDPZmI2Av8CcXVzy+AzcAfp93/G1hNMdW2N417RrVxzIaTP7RoZmbZfGViZmbZXEzMzCybi4mZmWVzMTEzs2xN86nYiRMnxrRp02rq++6773LYYYcN7wkd4JoxZ2jOvJsxZ2jOvGvJ+bnnnnszIo4drF3TFJNp06axfv36mvpWKhXa29uH94QOcM2YMzRn3s2YMzRn3rXkLOnnQ2nnaS4zM8vmYmJmZtlcTMzMLJuLiZmZZXMxMTOzbC4mZmaWzcXEzMyyuZiYmVk2FxMzM8vWNJ+Ar6dpSx6que/WpecP45mYmY0MX5mYmVk2FxMzM8vmYmJmZtlcTMzMLJuLiZmZZXMxMTOzbC4mZmaWbdBiImmFpJ2SNpVi90jakJatkjak+DRJ/1rad1upz2mSNkrqlHSzJKX40ZLWStqcXiekuFK7TkkvSDq1NFZHar9ZUsdwviFmZrb/hnJlcgcwuxyIiD+LiJkRMRO4H/heafervfsi4kul+K3AIqA1Lb1jLgEejYhW4NG0DTCn1HZR6o+ko4HrgDOA04HreguQmZnVx6DFJCKeAHZV25euLv4TcPdAY0iaBBwZEU9FRAB3Ahem3XOBlWl9ZZ/4nVFYB4xP45wLrI2IXRGxG1hLn2JnZmajK/frVP4I2BERm0ux6ZKeB94BvhoRPwQmA12lNl0pBtASEdsBImK7pONSfDLwepU+/cV/g6RFFFc1tLS0UKlU9jtBgO7u7pr7Aiye0VNz35zj5sjNuVE1Y97NmDM0Z94jmXNuMZnPx69KtgOfjoi3JJ0G/KOkkwFV6RuDjN1fnyGPFRHLgeUAbW1t0d7ePsghq6tUKtTaF+DSnO/muqT24+bIzblRNWPezZgzNGfeI5lzzU9zSToI+I/APb2xiHg/It5K688BrwK/Q3H1MKXUfQqwLa3vSNNXvdNhO1O8C5hapU9/cTMzq5OcR4M/C/wkIj6cvpJ0rKRxaf0EipvnW9I01l5JZ6b7LAuAB1O31UDvE1kdfeIL0lNdZwJ70jiPAOdImpBuvJ+TYmZmVieDTnNJuhtoByZK6gKui4jbgXn85o33zwDXS+oB9gFfiojem/dXUDwZdijwcFoAlgL3SloIvAZcnOJrgPOATuA94DKAiNgl6Qbg2dTu+tIxzMysDgYtJhExv5/4pVVi91M8Klyt/XrglCrxt4BZVeIBXNnPWCuAFQOdt5mZjR5/At7MzLK5mJiZWTYXEzMzy+ZiYmZm2VxMzMwsm4uJmZllczExM7NsLiZmZpbNxcTMzLK5mJiZWTYXEzMzy+ZiYmZm2VxMzMwsm4uJmZllczExM7NsLiZmZpbNxcTMzLK5mJiZWTYXEzMzyzZoMZG0QtJOSZtKsa9JekPShrScV9p3jaROSa9IOrcUn51inZKWlOLTJT0tabOkeyQdnOKHpO3OtH/aYMcwM7P6GMqVyR3A7CrxmyJiZlrWAEg6CZgHnJz6fFvSOEnjgFuAOcBJwPzUFuAbaaxWYDewMMUXArsj4kTgptSu32PsX9pmZjacBi0mEfEEsGuI480FVkXE+xHxM6ATOD0tnRGxJSJ+BawC5koScDZwX+q/EriwNNbKtH4fMCu17+8YZmZWJwdl9L1K0gJgPbA4InYDk4F1pTZdKQbwep/4GcAxwNsR0VOl/eTePhHRI2lPaj/QMT5G0iJgEUBLSwuVSmX/swS6u7tr7guweEbP4I36kXPcHLk5N6pmzLsZc4bmzHskc661mNwK3ABEel0GXA6oStug+hVQDNCeAfYN1OfjwYjlwHKAtra2aG9vr9ZsUJVKhVr7Aly65KGa+269pPbj5sjNuVE1Y97NmDM0Z94jmXNNT3NFxI6I2BcRHwDf4aNppi5gaqnpFGDbAPE3gfGSDuoT/9hYaf9RFNNt/Y1lZmZ1UlMxkTSptPl5oPdJr9XAvPQk1nSgFXgGeBZoTU9uHUxxA311RATwOHBR6t8BPFgaqyOtXwQ8ltr3dwwzM6uTQae5JN0NtAMTJXUB1wHtkmZSTC9tBb4IEBEvSroXeAnoAa6MiH1pnKuAR4BxwIqIeDEd4mpglaSvA88Dt6f47cB3JXVSXJHMG+wYZmZWH4MWk4iYXyV8e5VYb/sbgRurxNcAa6rEt1DlaayI+CVw8f4cw8zM6sOfgDczs2wuJmZmls3FxMzMsrmYmJlZNhcTMzPL5mJiZmbZcr6by0bBtJyvYll6/jCeiZlZ/3xlYmZm2VxMzMwsm4uJmZllczExM7NsLiZmZpbNxcTMzLK5mJiZWTYXEzMzy+ZiYmZm2VxMzMwsm4uJmZllczExM7NsgxYTSSsk7ZS0qRT7W0k/kfSCpAckjU/xaZL+VdKGtNxW6nOapI2SOiXdLEkpfrSktZI2p9cJKa7UrjMd59TSWB2p/WZJHcP5hpiZ2f4bypXJHcDsPrG1wCkR8fvAT4FrSvtejYiZaflSKX4rsAhoTUvvmEuARyOiFXg0bQPMKbVdlPoj6WjgOuAM4HTgut4CZGZm9TFoMYmIJ4BdfWI/iIietLkOmDLQGJImAUdGxFMREcCdwIVp91xgZVpf2Sd+ZxTWAePTOOcCayNiV0TspihsfYudmZmNouH4/0wuB+4pbU+X9DzwDvDViPghMBnoKrXpSjGAlojYDhAR2yUdl+KTgder9Okv/hskLaK4qqGlpYVKpbLfyQF0d3fX3Bdg8YyewRuNgJxzzs25UTVj3s2YMzRn3iOZc1YxkXQt0APclULbgU9HxFuSTgP+UdLJgKp0j8GG76fPkMeKiOXAcoC2trZob28f5JDVVSoVau0LcGnGf3CVY+sl7TX3zc25UTVj3s2YMzRn3iOZc83FJN34vgCYlaauiIj3gffT+nOSXgV+h+LqoTwVNgXYltZ3SJqUrkomATtTvAuYWqVPF9DeJ16pNY+hyvkfD83MxrqaHg2WNBu4GvhcRLxXih8raVxaP4Hi5vmWNI21V9KZ6SmuBcCDqdtqoPeJrI4+8QXpqa4zgT1pnEeAcyRNSDfez0kxMzOrk0GvTCTdTXElMFFSF8WTVNcAhwBr0xO+69KTW58BrpfUA+wDvhQRvTfvr6B4MuxQ4OG0ACwF7pW0EHgNuDjF1wDnAZ3Ae8BlABGxS9INwLOp3fWlY5iZWR0MWkwiYn6V8O39tL0fuL+ffeuBU6rE3wJmVYkHcGU/Y60AVvR/1mZmNpr8CXgzM8vmYmJmZtlcTMzMLJuLiZmZZXMxMTOzbC4mZmaWzcXEzMyyuZiYmVk2FxMzM8vmYmJmZtlcTMzMLJuLiZmZZXMxMTOzbC4mZmaWzcXEzMyyuZiYmVk2FxMzM8vmYmJmZtlcTMzMLNuQiomkFZJ2StpUih0taa2kzel1QopL0s2SOiW9IOnUUp+O1H6zpI5S/DRJG1OfmyWp1mOYmdnoG+qVyR3A7D6xJcCjEdEKPJq2AeYArWlZBNwKRWEArgPOAE4HrustDqnNolK/2bUcw8zM6mNIxSQingB29QnPBVam9ZXAhaX4nVFYB4yXNAk4F1gbEbsiYjewFpid9h0ZEU9FRAB39hlrf45hZmZ1cFBG35aI2A4QEdslHZfik4HXS+26UmygeFeVeC3H2F4+QUmLKK5caGlpoVKp7H+WQHd3N4tn7Kupbz3Vmi8UOef0b1TNmHcz5gzNmfdI5pxTTPqjKrGoIV7LMT4eiFgOLAdoa2uL9vb2QYatrlKpsOzJd2vqW09bL2mvuW+lUqHW96uRNWPezZgzNGfeI5lzztNcO3qnltLrzhTvAqaW2k0Btg0Sn1IlXssxzMysDnKuTFYDHcDS9PpgKX6VpFUUN9v3pCmqR4C/Kd10Pwe4JiJ2Sdor6UzgaWAB8He1HCMjlzFp2pKHau57x+zDhvFMzGysG1IxkXQ30A5MlNRF8VTWUuBeSQuB14CLU/M1wHlAJ/AecBlAKho3AM+mdtdHRO9N/Ssonhg7FHg4LezvMczMrD6GVEwiYn4/u2ZVaRvAlf2MswJYUSW+HjilSvyt/T2GmZmNPn8C3szMsrmYmJlZNhcTMzPL5mJiZmbZXEzMzCybi4mZmWVzMTEzs2wuJmZmls3FxMzMsrmYmJlZNhcTMzPL5mJiZmbZXEzMzCybi4mZmWVzMTEzs2wuJmZmls3FxMzMsrmYmJlZNhcTMzPLVnMxkfS7kjaUlnckfVnS1yS9UYqfV+pzjaROSa9IOrcUn51inZKWlOLTJT0tabOkeyQdnOKHpO3OtH9arXmYmVm+motJRLwSETMjYiZwGvAe8EDafVPvvohYAyDpJGAecDIwG/i2pHGSxgG3AHOAk4D5qS3AN9JYrcBuYGGKLwR2R8SJwE2pnZmZ1clwTXPNAl6NiJ8P0GYusCoi3o+InwGdwOlp6YyILRHxK2AVMFeSgLOB+1L/lcCFpbFWpvX7gFmpvZmZ1cFBwzTOPODu0vZVkhYA64HFEbEbmAysK7XpSjGA1/vEzwCOAd6OiJ4q7Sf39omIHkl7Uvs3yyclaRGwCKClpYVKpVJTct3d3Syesa+mvo2qu7u75verkTVj3s2YMzRn3iOZc3YxSfcxPgdck0K3AjcAkV6XAZcD1a4cgupXRzFAewbZ91EgYjmwHKCtrS3a29v7S2NAlUqFZU++W1PfRnXH7MOo9f1qZJVKpenybsacoTnzHsmch2Oaaw7wo4jYARAROyJiX0R8AHyHYhoLiiuLqaV+U4BtA8TfBMZLOqhP/GNjpf1HAbuGIRczM6vBcBST+ZSmuCRNKu37PLApra8G5qUnsaYDrcAzwLNAa3py62CKKbPVERHA48BFqX8H8GBprI60fhHwWGpvZmZ1kDXNJem3gD8BvlgKf1PSTIppp629+yLiRUn3Ai8BPcCVEbEvjXMV8AgwDlgRES+msa4GVkn6OvA8cHuK3w58V1InxRXJvJw8zMwsT1YxiYj3KG58l2NfGKD9jcCNVeJrgDVV4lv4aJqsHP8lcHENp2xmZiPAn4A3M7NsLiZmZpbNxcTMzLK5mJiZWTYXEzMzy+ZiYmZm2VxMzMwsm4uJmZllczExM7NsLiZmZpbNxcTMzLK5mJiZWbbh+p8WbYzZ+MYeLl3yUM39ty49fxjPxswOdL4yMTOzbC4mZmaWzcXEzMyyuZiYmVk2FxMzM8vmYmJmZtmyi4mkrZI2StogaX2KHS1praTN6XVCikvSzZI6Jb0g6dTSOB2p/WZJHaX4aWn8ztRXAx3DzMxG33BdmfxxRMyMiLa0vQR4NCJagUfTNsAcoDUti4BboSgMwHXAGcDpwHWl4nBratvbb/YgxzAzs1E2UtNcc4GVaX0lcGEpfmcU1gHjJU0CzgXWRsSuiNgNrAVmp31HRsRTERHAnX3GqnYMMzMbZcPxCfgAfiApgP8TEcuBlojYDhAR2yUdl9pOBl4v9e1KsYHiXVXiDHCMD0laRHFVQ0tLC5VKpaYEu7u7WTxjX019G1XLobB4Rk/N/Wt9r+utu7u7Yc+9Vs2YMzRn3iOZ83AUk7MiYlv6Zb5W0k8GaKsqsaghPiSpsC0HaGtri/b29qF2/ZhKpcKyJ9+tqW+jWjyjh2Uba//x2HpJ+/CdzCiqVCrU+nPSqJoxZ2jOvEcy5+xprojYll53Ag9Q3PPYkaaoSK87U/MuYGqp+xRg2yDxKVXiDHAMMzMbZVnFRNJhko7oXQfOATYBq4HeJ7I6gAfT+mpgQXqq60xgT5qqegQ4R9KEdOP9HOCRtG+vpDPTU1wL+oxV7RhmZjbKcqe5WoAH0tO6BwF/HxH/LOlZ4F5JC4HXgItT+zXAeUAn8B5wGUBE7JJ0A/Bsand9ROxK61cAdwCHAg+nBWBpP8cwM7NRllVMImIL8AdV4m8Bs6rEA7iyn7FWACuqxNcDpwz1GGZmNvr8CXgzM8vmYmJmZtlcTMzMLJuLiZmZZXMxMTOzbC4mZmaWzcXEzMyyuZiYmVk2FxMzM8vmYmJmZtlcTMzMLJuLiZmZZXMxMTOzbC4mZmaWzcXEzMyyuZiYmVk2FxMzM8vmYmJmZtlcTMzMLFvNxUTSVEmPS3pZ0ouS/iLFvybpDUkb0nJeqc81kjolvSLp3FJ8dop1SlpSik+X9LSkzZLukXRwih+StjvT/mm15mFmZvlyrkx6gMUR8XvAmcCVkk5K+26KiJlpWQOQ9s0DTgZmA9+WNE7SOOAWYA5wEjC/NM430litwG5gYYovBHZHxInATamdmZnVSc3FJCK2R8SP0vpe4GVg8gBd5gKrIuL9iPgZ0AmcnpbOiNgSEb8CVgFzJQk4G7gv9V8JXFgaa2Vavw+YldqbmVkdHDQcg6Rppj8EngbOAq6StABYT3H1spui0Kwrdevio+Lzep/4GcAxwNsR0VOl/eTePhHRI2lPav9mn/NaBCwCaGlpoVKp1JRfd3c3i2fsq6lvo2o5FBbP6Bm8YT/+7q4Ha+47Y/JRNffN1d3dXfPPSaNqxpyhOfMeyZyzi4mkw4H7gS9HxDuSbgVuACK9LgMuB6pdOQTVr45igPYMsu+jQMRyYDlAW1tbtLe3D5hLfyqVCsuefLemvo1q8Ywelm0clr819t/G2t/rrUvPzzp0pVKh1p+TRtWMOUNz5j2SOWc9zSXpkxSF5K6I+B5AROyIiH0R8QHwHYppLCiuLKaWuk8Btg0QfxMYL+mgPvGPjZX2HwXsysnFzMxql/M0l4DbgZcj4lul+KRSs88Dm9L6amBeehJrOtAKPAM8C7SmJ7cOprhJvzoiAngcuCj17wAeLI3VkdYvAh5L7c3MrA5y5jHOAr4AbJS0IcW+QvE01kyKaaetwBcBIuJFSfcCL1E8CXZlROwDkHQV8AgwDlgRES+m8a4GVkn6OvA8RfEivX5XUifFFcm8jDzMzCxTzcUkIp6k+r2LNQP0uRG4sUp8TbV+EbGFj6bJyvFfAhfvz/mamdnI8Sfgzcwsm4uJmZllczExM7NsdfoggdnImLbkoZr75n5GxayZ+crEzMyyuZiYmVk2FxMzM8vmeyZmw8T3a6yZ+crEzMyyuZiYmVk2FxMzM8vmYmJmZtl8A94smbbkIRbP6OHSjBvpZs3KxcTsAOAnwazReZrLzMyyuZiYmVk2FxMzM8vmYmJmZtlcTMzMLFtDFxNJsyW9IqlT0pJ6n4+ZWbNq2GIiaRxwCzAHOAmYL+mk+p6VmVlzauTPmZwOdEbEFgBJq4C5wEt1PSuzUdaMn1FpxpwPdIqIep9DTSRdBMyOiD9P218AzoiIq0ptFgGL0ubvAq/UeLiJwJsZp9uImjFnaM68mzFnaM68a8n5+Ig4drBGjXxloiqxj1XGiFgOLM8+kLQ+Itpyx2kkzZgzNGfezZgzNGfeI5lzw94zAbqAqaXtKcC2Op2LmVlTa+Ri8izQKmm6pIOBecDqOp+TmVlTathprojokXQV8AgwDlgRES+O0OGyp8oaUDPmDM2ZdzPmDM2Z94jl3LA34M3M7MDRyNNcZmZ2gHAxMTOzbC4mAxhrX9ciaYWknZI2lWJHS1oraXN6nZDiknRzyv0FSaeW+nSk9pslddQjl6GSNFXS45JelvSipL9I8bGe96ckPSPpxynvv07x6ZKeTjnckx5eQdIhabsz7Z9WGuuaFH9F0rn1yWjoJI2T9Lyk76ftMZ2zpK2SNkraIGl9io3+z3dEeKmyUNzUfxU4ATgY+DFwUr3PKzOnzwCnAptKsW8CS9L6EuAbaf084GGKz/OcCTyd4kcDW9LrhLQ+od65DZDzJODUtH4E8FOKr98Z63kLODytfxJ4OuVzLzAvxW8Drkjr/wW4La3PA+5J6yeln/1DgOnp38S4euc3SO5/Cfw98P20PaZzBrYCE/vERv3n21cm/fvw61oi4ldA79e1NKyIeALY1Sc8F1iZ1lcCF5bid0ZhHTBe0iTgXGBtROyKiN3AWmD2yJ99bSJie0T8KK3vBV4GJjP2846I6E6bn0xLAGcD96V437x734/7gFmSlOKrIuL9iPgZ0Enxb+OAJGkKcD7wf9O2GOM592PUf75dTPo3GXi9tN2VYmNNS0Rsh+IXL3BciveXf8O+L2ka4w8p/kof83mn6Z4NwE6KXw6vAm9HRE9qUs7hw/zS/j3AMTRe3v8L+Cvgg7R9DGM/5wB+IOk5FV8hBXX4+W7Yz5mMgkG/rmWM6y//hnxfJB0O3A98OSLeKf4Ard60Sqwh846IfcBMSeOBB4Dfq9YsvTZ83pIuAHZGxHOS2nvDVZqOmZyTsyJim6TjgLWSfjJA2xHL2Vcm/WuWr2vZkS5zSa87U7y//BvufZH0SYpCcldEfC+Fx3zevSLibaBCMUc+XlLvH5HlHD7ML+0/imJKtJHyPgv4nKStFNPSZ1NcqYzlnImIbel1J8UfDadTh59vF5P+NcvXtawGep/c6AAeLMUXpKc/zgT2pMvlR4BzJE1IT4ick2IHpDQHfjvwckR8q7RrrOd9bLoiQdKhwGcp7hc9DlyUmvXNu/f9uAh4LIo7s6uBeenJp+lAK/DM6GSxfyLimoiYEhHTKP69PhYRlzCGc5Z0mKQjetcpfi43UY+f73o/iXAgLxRPPvyUYq752nqfzzDkczewHfg1xV8iCynmiB8FNqfXo1NbUfznY68CG4G20jiXU9yU7AQuq3deg+T87yku118ANqTlvCbI+/eB51Pem4D/keInUPxi7AT+ATgkxT+VtjvT/hNKY12b3o9XgDn1zm2I+bfz0dNcYzbnlNuP0/Ji7++pevx8++tUzMwsm6e5zMwsm4uJmZllczExM7NsLiZmZpbNxcTMzLK5mJiZWTYXEzMzy/b/ARyuVfyETsp7AAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"flights_df.hist('Distance', bins=20);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Making a histogram of flight frequency by date.**"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"flights_df['Date'] = pd.to_datetime(flights_df.rename(columns={'DayofMonth': 'Day'})[['Year', 'Month', 'Day']])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"num_flights_by_date = flights_df.groupby('Date').size()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"num_flights_by_date.plot();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Do you see a weekly pattern above? And below?**"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"num_flights_by_date.rolling(window=7).mean().plot();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**1. Find top-10 carriers in terms of the number of completed flights (_UniqueCarrier_ column)?**\n",
"\n",
"**Which of the listed below is _not_ in your top-10 list?**\n",
"- DL\n",
"- AA\n",
"- OO\n",
"- EV "
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**2. Plot distributions of flight cancellation reasons (_CancellationCode_).**\n",
"\n",
"**What is the most frequent reason for flight cancellation? (Use this [link](https://www.transtats.bts.gov/Fields.asp?Table_ID=236) to translate codes into reasons)**\n",
"- carrier\n",
"- weather conditions \n",
"- National Air System\n",
"- security reasons"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**3. Which route is the most frequent, in terms of the number of flights?**\n",
"\n",
"(Take a look at _'Origin'_ and _'Dest'_ features. Consider _A->B_ and _B->A_ directions as _different_ routes) \n",
"\n",
" - New-York – Washington\n",
" - San-Francisco – Los-Angeles \n",
" - San-Jose – Dallas\n",
" - New-York – San-Francisco"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**4. Find top-5 delayed routes (count how many times they were delayed on departure). From all flights on these 5 routes, count all flights with weather conditions contributing to a delay.**\n",
"\n",
"- 449 \n",
"- 539 \n",
"- 549 \n",
"- 668 "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**5. Examine the hourly distribution of departure times. For that, create a new series from DepTime, removing missing values.**\n",
"\n",
"**Choose all correct statements:**\n",
" - Flights are normally distributed within time interval [0-23] (Search for: Normal distribution, bell curve).\n",
" - Flights are uniformly distributed within time interval [0-23].\n",
" - In the period from 0 am to 4 am there are considerably less flights than from 7 pm to 8 pm."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**6. Show how the number of flights changes through time (on the daily/weekly/monthly basis) and interpret the findings.**\n",
"\n",
"**Choose all correct statements:**\n",
"- The number of flights during weekends is less than during weekdays (working days).\n",
"- The lowest number of flights is on Sunday.\n",
"- There are less flights during winter than during summer. "
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**7. Examine the distribution of cancellation reasons with time. Make a bar plot of cancellation reasons aggregated by months.**\n",
"\n",
"**Choose all correct statements:**\n",
"- December has the highest rate of cancellations due to weather. \n",
"- The highest rate of cancellations in September is due to Security reasons.\n",
"- April's top cancellation reason is carriers.\n",
"- Flights cancellations due to National Air System are more frequent than those due to carriers."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**8. Which month has the greatest number of cancellations due to Carrier?** \n",
"- May\n",
"- January\n",
"- September\n",
"- April "
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**9. Identify the carrier with the greatest number of cancellations due to carrier in the corresponding month from the previous question.**\n",
"\n",
"- 9E\n",
"- EV\n",
"- HA\n",
"- AA"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**10. Examine median arrival and departure delays (in time) by carrier. Which carrier has the lowest median delay time for both arrivals and departures? Leave only non-negative values of delay times ('ArrDelay', 'DepDelay').\n",
"[Boxplots](https://seaborn.pydata.org/generated/seaborn.boxplot.html) can be helpful in this exercise, as well as it might be a good idea to remove outliers in order to build nice graphs. You can exclude delay time values higher than a corresponding .95 percentile.**\n",
"\n",
"- EV\n",
"- OO\n",
"- AA\n",
"- AQ "
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"# You code here"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment