Skip to content

Instantly share code, notes, and snippets.

@IreneCrisologo
Created December 23, 2021 16:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save IreneCrisologo/1f263419263d905e763c029df26ad2ef to your computer and use it in GitHub Desktop.
Save IreneCrisologo/1f263419263d905e763c029df26ad2ef to your computer and use it in GitHub Desktop.
Amtrak Delays.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/IreneCrisologo/1f263419263d905e763c029df26ad2ef/amtrak-delays.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"id": "69f8137d",
"metadata": {
"id": "69f8137d"
},
"source": [
"# Are Amtrak trains always late?\n",
"\n",
"Short answer: no!\n",
"\n",
"I took the train from Chicago going to and from New Orleans to attend AGU 2021, and experienced a 6-hour and a 3-hour delay, respectively. Given that there was a tornado that ploughed through the train route, I wasn't too bothered about it. But I've heard several complaints about Amtrak being always late, so I wanted to check the data and see for myself. This analysis is also inspired by a tweet from @douglas_rao https://twitter.com/douglas_rao/status/1469863061494272001\n",
"\n",
"Data from https://juckins.net/amtrak_status/archive/html/home.php\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ccbeab73",
"metadata": {
"id": "ccbeab73"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import os\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "d5afa0f9",
"metadata": {
"id": "d5afa0f9"
},
"source": [
"## Part 1: Chicago to New Orleans"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9e06d0be",
"metadata": {
"id": "9e06d0be"
},
"outputs": [],
"source": [
"df = pd.read_excel('AmtrakDelays_CHI_NOL.xlsx')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55f5ceb1",
"metadata": {
"id": "55f5ceb1",
"outputId": "6742dbf2-bea4-4580-fa1b-b4cae3714db1"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Origin Date</th>\n",
" <th>Station</th>\n",
" <th>Sch Ar</th>\n",
" <th>Act Ar</th>\n",
" <th>Comments</th>\n",
" <th>Ar Delay (mins)</th>\n",
" <th>Service Disruption</th>\n",
" <th>Cancellations</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>06/27/2015 (Sa)</td>\n",
" <td>NOL</td>\n",
" <td>06/28/2015 3:32 PM (Su)</td>\n",
" <td>3:12AM</td>\n",
" <td>Ar: 11 hr, 40 min late.</td>\n",
" <td>700</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>06/27/2015 (Sa)</td>\n",
" <td>GWD</td>\n",
" <td>06/28/2015 9:00 AM (Su)</td>\n",
" <td>7:10PM</td>\n",
" <td>Ar: 10 hr, 10 min late | Dp: 10 hr, 17 min late.</td>\n",
" <td>610</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>06/27/2015 (Sa)</td>\n",
" <td>JAN</td>\n",
" <td>06/28/2015 11:12 AM (Su)</td>\n",
" <td>9:11PM</td>\n",
" <td>Ar: 9 hr, 59 min late. | Dp: 10 hr, 41 min late.</td>\n",
" <td>599</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>04/29/2017 (Sa)</td>\n",
" <td>NOL</td>\n",
" <td>04/30/2017 3:32 PM (Su)</td>\n",
" <td>1:19AM</td>\n",
" <td>Ar: 9 hr, 47 min late.</td>\n",
" <td>587</td>\n",
" <td>SD</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>06/27/2015 (Sa)</td>\n",
" <td>MEM</td>\n",
" <td>06/28/2015 6:27 AM (Su)</td>\n",
" <td>4:05PM</td>\n",
" <td>Ar: 9 hr, 38 min late. | Dp: 9 hr, 48 min late.</td>\n",
" <td>578</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Origin Date Station Sch Ar Act Ar \\\n",
"0 06/27/2015 (Sa) NOL 06/28/2015 3:32 PM (Su) 3:12AM \n",
"1 06/27/2015 (Sa) GWD 06/28/2015 9:00 AM (Su) 7:10PM \n",
"2 06/27/2015 (Sa) JAN 06/28/2015 11:12 AM (Su) 9:11PM \n",
"3 04/29/2017 (Sa) NOL 04/30/2017 3:32 PM (Su) 1:19AM \n",
"4 06/27/2015 (Sa) MEM 06/28/2015 6:27 AM (Su) 4:05PM \n",
"\n",
" Comments Ar Delay (mins) \\\n",
"0 Ar: 11 hr, 40 min late. 700 \n",
"1 Ar: 10 hr, 10 min late | Dp: 10 hr, 17 min late. 610 \n",
"2 Ar: 9 hr, 59 min late. | Dp: 10 hr, 41 min late. 599 \n",
"3 Ar: 9 hr, 47 min late. 587 \n",
"4 Ar: 9 hr, 38 min late. | Dp: 9 hr, 48 min late. 578 \n",
"\n",
" Service Disruption Cancellations \n",
"0 NaN NaN \n",
"1 NaN NaN \n",
"2 NaN NaN \n",
"3 SD NaN \n",
"4 NaN NaN "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "cc29268c",
"metadata": {
"id": "cc29268c"
},
"source": [
"Rename column for easier referencing."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ac198ddb",
"metadata": {
"id": "ac198ddb"
},
"outputs": [],
"source": [
"df = df.rename(columns={'Ar Delay (mins)':'Ar_Delay_mins'})"
]
},
{
"cell_type": "markdown",
"id": "4f917269",
"metadata": {
"id": "4f917269"
},
"source": [
"Checking for NaN values."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93a4272c",
"metadata": {
"id": "93a4272c",
"outputId": "38bf64bb-ddb9-4115-df2f-2a550deb3bc0"
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Ar_Delay_mins'].isna().sum()"
]
},
{
"cell_type": "markdown",
"id": "f3f9c05c",
"metadata": {
"id": "f3f9c05c"
},
"source": [
"Counting how many datapoints are recorded for each station."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd5d3dc7",
"metadata": {
"id": "bd5d3dc7",
"outputId": "9e5b7c13-2959-413a-a630-c952ab3df124"
},
"outputs": [
{
"data": {
"text/plain": [
"NOL 3533\n",
"CDL 2207\n",
"GWD 2195\n",
"MEM 2188\n",
"JAN 2171\n",
"MCB 367\n",
"HAZ 76\n",
"CEN 64\n",
"YAZ 34\n",
"NBN 31\n",
"HMW 30\n",
"BRH 27\n",
"MKS 25\n",
"FTN 25\n",
"EFG 24\n",
"MAT 22\n",
"KKI 21\n",
"CHM 20\n",
"HMD 12\n",
"Name: Station, dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Station'].value_counts()"
]
},
{
"cell_type": "markdown",
"id": "f9c480ef",
"metadata": {
"id": "f9c480ef"
},
"source": [
"Looks like most of the reports come from 5 stations (New Orleans LA, Carbondale IL, Greenwood MI, Memphis TN, Jackson MI) so i'm only going to process those."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c816ddc",
"metadata": {
"id": "3c816ddc"
},
"outputs": [],
"source": [
"df = df[df['Station'].isin(['NOL','CDL','MEM','GWD','JAN'])]"
]
},
{
"cell_type": "markdown",
"id": "cbf96d04",
"metadata": {
"id": "cbf96d04"
},
"source": [
"Let's look at the histogram of delays, converting minutes to hours."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d411fbe",
"metadata": {
"id": "4d411fbe",
"outputId": "8043a1b2-b249-4ed7-e0b9-f70247937197"
},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.0, 1.0, 'Amtrak City of New Orleans (Southbound) is not always late')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"counts, labels = np.histogram(df['Ar_Delay_mins'],bins=np.arange(-60,600,1))\n",
"fig = plt.figure(figsize=(10,5))\n",
"plt.bar(labels[1:], counts, align='center',width=1)\n",
"plt.xlabel('Delays in minutes')\n",
"plt.ylabel('Frequency')\n",
"plt.title('Amtrak City of New Orleans (Southbound) is not always late',loc='left',fontsize=16)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "57dc400e",
"metadata": {
"id": "57dc400e",
"outputId": "2a6ee686-cb05-468b-f4c5-7eb7a4ff1c85"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Median delay in minutes: 2.0\n",
"Mean delay in minutes: 10.485846754514398\n"
]
}
],
"source": [
"print('Median delay in minutes: ', np.median(df['Ar_Delay_mins']))\n",
"print('Mean delay in minutes: ', np.mean(df['Ar_Delay_mins']))"
]
},
{
"cell_type": "markdown",
"id": "09531ff9",
"metadata": {
"id": "09531ff9"
},
"source": [
"Delays reaching the destination of Union Passenger Terminal, New Orleans"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "204dff7c",
"metadata": {
"id": "204dff7c"
},
"outputs": [],
"source": [
"df_dest = df.loc[df['Station']=='NOL']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92731edf",
"metadata": {
"id": "92731edf",
"outputId": "75c57f6d-95f2-4985-ecf5-8994fc5733a6"
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x2729b9c8970>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"counts, labels = np.histogram(df['Ar_Delay_mins'],bins=np.arange(-60,600,1))\n",
"fig = plt.figure(figsize=(10,5))\n",
"plt.bar(labels[1:], counts, align='center',width=1,label='All')\n",
"\n",
"counts_dest, labels_dest = np.histogram(df_dest['Ar_Delay_mins'],bins=np.arange(-60,600,1))\n",
"plt.bar(labels_dest[1:], counts_dest, align='center',width=1,label='New Orleans')\n",
"plt.xlabel('Delays in minutes')\n",
"plt.ylabel('Frequency')\n",
"plt.legend()\n",
"#plt.suptitle('You are more likely to arrive in New Orleans on-time or earlier via Amtrak',fontsize=14,ha='left',x=1)\n",
"#plt.title('The train successfully makes up for time despite delays in other stations',loc='left')"
]
},
{
"cell_type": "markdown",
"id": "4d772e7b",
"metadata": {
"id": "4d772e7b"
},
"source": [
"The histogram shows that despite delays arriving at the other stations along the route, the train is almost always able to make up for lost time and arrive in New Orleans on-time or earlier than scheduled."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c4c0b8d3",
"metadata": {
"id": "c4c0b8d3",
"outputId": "26ad9cf2-3005-4552-9cad-78d3f97ead38"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Median delay in minutes: -25.0\n",
"Mean delay in minutes: -7.866685536371356\n"
]
}
],
"source": [
"print('Median delay in minutes: ', np.median(df_dest['Ar_Delay_mins']))\n",
"print('Mean delay in minutes: ', np.mean(df_dest['Ar_Delay_mins']))"
]
},
{
"cell_type": "markdown",
"id": "4ce60ba4",
"metadata": {
"id": "4ce60ba4"
},
"source": [
"Earliest arrival"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1439a469",
"metadata": {
"id": "1439a469",
"outputId": "b1377aa5-7f87-4f5a-fc36-a35bb9ab0649"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-107 minutes\n"
]
}
],
"source": [
"print(str(np.min(df_dest['Ar_Delay_mins'])) + ' minutes')"
]
},
{
"cell_type": "markdown",
"id": "fdf77824",
"metadata": {
"id": "fdf77824"
},
"source": [
"Latest arrival"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e80a19c2",
"metadata": {
"id": "e80a19c2",
"outputId": "4100d97d-835e-4b01-95bc-208a3cb65b97"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"700 minutes\n"
]
}
],
"source": [
"print(str(np.max(df_dest['Ar_Delay_mins'])) + ' minutes')"
]
},
{
"cell_type": "markdown",
"id": "984131bc",
"metadata": {
"id": "984131bc"
},
"source": [
"Percentage arrives on time"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d093b607",
"metadata": {
"id": "d093b607"
},
"outputs": [],
"source": [
"df_ontime30 = df.loc[(df['Ar_Delay_mins']<=30)]\n",
"df_ontime60 = df.loc[(df['Ar_Delay_mins']<=60)]\n",
"df_ontime = df.loc[(df['Ar_Delay_mins']<=0)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a9f17ae",
"metadata": {
"id": "4a9f17ae",
"outputId": "65500e6a-0001-4fc6-db02-4e8776f76677"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"88.86% of the time the train arrives within 60 mins of schedule\n",
"78.14% of the time the train arrives within 30 mins of schedule\n",
"47.36% of the time the train arrives on time or earlier\n"
]
}
],
"source": [
"print(str(np.round(len(df_ontime60)*100./len(df),2))+'% of the time the train arrives within 60 mins of schedule')\n",
"print(str(np.round(len(df_ontime30)*100./len(df),2))+'% of the time the train arrives within 30 mins of schedule')\n",
"print(str(np.round(len(df_ontime)*100./len(df),2))+'% of the time the train arrives on time or earlier')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f46835b",
"metadata": {
"id": "4f46835b",
"outputId": "20ca0cc5-e8ec-499d-be73-9c84e5a63978"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"90.01% of the time the train arrives within 60 mins of schedule\n",
"83.78% of the time the train arrives within 30 mins of schedule\n",
"72.43% of the time the train arrives on time or earlier\n"
]
}
],
"source": [
"df_dest_ontime30 = df_dest.loc[(df_dest['Ar_Delay_mins']<=30)]\n",
"df_dest_ontime60 = df_dest.loc[(df_dest['Ar_Delay_mins']<=60)]\n",
"df_dest_ontime = df_dest.loc[(df_dest['Ar_Delay_mins']<=0)]\n",
"\n",
"print(str(np.round(len(df_dest_ontime60)*100./len(df_dest),2))+'% of the time the train arrives within 60 mins of schedule')\n",
"print(str(np.round(len(df_dest_ontime30)*100./len(df_dest),2))+'% of the time the train arrives within 30 mins of schedule')\n",
"print(str(np.round(len(df_dest_ontime)*100./len(df_dest),2))+'% of the time the train arrives on time or earlier')"
]
},
{
"cell_type": "markdown",
"id": "5a60d3de",
"metadata": {
"id": "5a60d3de"
},
"source": [
"## Part 2: New Orleans to Chicago"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c9824de",
"metadata": {
"id": "6c9824de",
"outputId": "b802faa8-2869-4291-b97b-3af741cc09b1"
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Origin_date</th>\n",
" <th>Station</th>\n",
" <th>Sch_Ar</th>\n",
" <th>Act_Ar</th>\n",
" <th>Comments</th>\n",
" <th>Ar_Delay_mins</th>\n",
" <th>ServiceDisruption</th>\n",
" <th>Cancellations</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>05/27/2017 (Sa)</td>\n",
" <td>CHI</td>\n",
" <td>05/28/2017 9:00 AM (Su)</td>\n",
" <td>10:39PM</td>\n",
" <td>Ar: 13 hr, 39 min late.</td>\n",
" <td>819</td>\n",
" <td>SD</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>06/23/2009 (Tu)</td>\n",
" <td>CHI</td>\n",
" <td>06/24/2009 9:00 AM (We)</td>\n",
" <td>10:28PM</td>\n",
" <td>Ar: 13 hr and 28 min late.</td>\n",
" <td>808</td>\n",
" <td>SD</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>05/27/2017 (Sa)</td>\n",
" <td>CDL</td>\n",
" <td>05/28/2017 3:11 AM (Su)</td>\n",
" <td>4:27PM</td>\n",
" <td>Ar: 13 hr, 16 min late | Dp: 13 hr, 34 min late.</td>\n",
" <td>796</td>\n",
" <td>SD</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>03/30/2015 (Mo)</td>\n",
" <td>CHI</td>\n",
" <td>03/31/2015 9:00 AM (Tu)</td>\n",
" <td>7:30PM</td>\n",
" <td>Ar: 10 hr, 30 min late.</td>\n",
" <td>630</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>11/20/2020 (Fr)</td>\n",
" <td>CHI</td>\n",
" <td>11/21/2020 9:15 AM (Sa)</td>\n",
" <td>7:17PM</td>\n",
" <td>Ar: 10 hr, 2 min late.</td>\n",
" <td>602</td>\n",
" <td>SD</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Origin_date Station Sch_Ar Act_Ar \\\n",
"0 05/27/2017 (Sa) CHI 05/28/2017 9:00 AM (Su) 10:39PM \n",
"1 06/23/2009 (Tu) CHI 06/24/2009 9:00 AM (We) 10:28PM \n",
"2 05/27/2017 (Sa) CDL 05/28/2017 3:11 AM (Su) 4:27PM \n",
"3 03/30/2015 (Mo) CHI 03/31/2015 9:00 AM (Tu) 7:30PM \n",
"4 11/20/2020 (Fr) CHI 11/21/2020 9:15 AM (Sa) 7:17PM \n",
"\n",
" Comments Ar_Delay_mins \\\n",
"0 Ar: 13 hr, 39 min late. 819 \n",
"1 Ar: 13 hr and 28 min late. 808 \n",
"2 Ar: 13 hr, 16 min late | Dp: 13 hr, 34 min late. 796 \n",
"3 Ar: 10 hr, 30 min late. 630 \n",
"4 Ar: 10 hr, 2 min late. 602 \n",
"\n",
" ServiceDisruption Cancellations \n",
"0 SD NaN \n",
"1 SD NaN \n",
"2 SD NaN \n",
"3 NaN NaN \n",
"4 SD NaN "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.read_excel('AmtrakDelays_NOL_CHI.xlsx')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "8e85dd6a",
"metadata": {
"id": "8e85dd6a"
},
"source": [
"Rename column for easier referencing."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "68695463",
"metadata": {
"id": "68695463"
},
"outputs": [],
"source": [
"df = df.rename(columns={'Ar Delay (mins)':'Ar_Delay_mins'})"
]
},
{
"cell_type": "markdown",
"id": "ba297f76",
"metadata": {
"id": "ba297f76"
},
"source": [
"Checking for NaN values."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3aaf54b5",
"metadata": {
"id": "3aaf54b5",
"outputId": "f631fff1-8ebf-476c-e3d3-54b684762a80"
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Ar_Delay_mins'].isna().sum()"
]
},
{
"cell_type": "markdown",
"id": "3b433479",
"metadata": {
"id": "3b433479"
},
"source": [
"Counting how many datapoints are recorded for each station."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adfb7f08",
"metadata": {
"id": "adfb7f08",
"outputId": "c2457b8a-a74d-4049-b98a-cf129df4a89b"
},
"outputs": [
{
"data": {
"text/plain": [
"CHI 4534\n",
"CDL 2265\n",
"MEM 2229\n",
"JAN 2079\n",
"GWD 181\n",
"HMW 58\n",
"MAT 52\n",
"KKI 49\n",
"FTN 46\n",
"CEN 43\n",
"EFG 43\n",
"NBN 42\n",
"CHM 21\n",
"YAZ 17\n",
"MKS 15\n",
"MCB 11\n",
"BRH 10\n",
"HMD 9\n",
"HAZ 9\n",
"NYP 7\n",
"PHL 4\n",
"BAL 1\n",
"NCR 1\n",
"Name: Station, dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Station'].value_counts()"
]
},
{
"cell_type": "markdown",
"id": "fa8193d4",
"metadata": {
"id": "fa8193d4"
},
"source": [
"Looks like most of the reports come from 4 stations (Chicago IL, Carbondale IL, Memphis TN, Jackson MI) so i'm only going to process those."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8e9d51f",
"metadata": {
"id": "f8e9d51f"
},
"outputs": [],
"source": [
"df = df[df['Station'].isin(['CHI','CDL','MEM','JAN'])]"
]
},
{
"cell_type": "markdown",
"id": "a045dc54",
"metadata": {
"id": "a045dc54"
},
"source": [
"Let's look at the histogram of delays, converting minutes to hours."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cd60d52",
"metadata": {
"id": "9cd60d52",
"outputId": "216d3aa0-1c83-414a-c1e2-a4e4b2eab0c9"
},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.0, 1.0, 'Amtrak City of New Orleans (Northbound) is not always late')"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmQAAAFPCAYAAADjktLUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAtDUlEQVR4nO3debwkVX3H/c8XUMEdZDCI6KjBBRNZBOIu7sQNjTFiEkUjkhg0ouZJ0EQlC0afGI15FHcFF0SMEokaERFcooiIKJsIkVE2mXHFdRD8PX/UudDT07dv35nbt+6d+3m/Xv3qrlOnqk6drq7+9TmnqlNVSJIkqT9b9V0ASZKklc6ATJIkqWcGZJIkST0zIJMkSeqZAZkkSVLPDMgkSZJ6tkUFZEnekaSSvG5K6z8yycOntO5K8s+bsfz9k5yQ5Mok1yb5QZJTkhycZOuW51ltO6sHlpvaPk1Y5i8n+Xkr156z5Jsp94+TbD80b5s278jFKPM4Se6R5NgkV7T34Iok701yj3muZ0nsz+ZIct8kv0iyy0Da6W3f3jMi/yHDx+YClGHPdnzvMGLeRJ+3VuYvLFSZFkvb7xqY3i7JVUmeOuHyxyRZM7UCbqIkhyf5gwVe5wZ1tVwlWd2O62dtwrILXq+avy0mIEuyHTBzsvmTJNtMYTOvBHoJXsZJcjjwv8AOwN8CjwT+DPgW8Gbg8S3rx4H7A1cNLN7nPr0T2AZ4Al25vjVH/tvQ7d+Sk+SRwNnAHsDL6N6DlwL3Bs5u81eSfwXeVVVXjJj3J0l2X4Qy7El3fG8UkK00VfVL4P8F/iXJTSZY5J+AJ0+3VJvkcMDAYeEdjvXauy0mIKM7edwa+ASwE3BAn4VJcrNF2s5DgNcBb6yqR1bVe6vqc1X10ao6DPhd4FKAqlpXVWdU1frFKNs4SbYC7gF8vKo+08r1izkW+xTwgiS/Nf0STi7J7YDjga8D96uqY9t78B7gAS39+JZv3HoW5ZiZtiR7Aw+j+zEw7Bzg+3Rf+NPa/tZT+kG23B0D7MoEgVZV/V9VfW3qJZJ0gy0pIDsY+BHwLOCXwDOHM8w0TSe5Z5KTW1fZd5M8u81/RpJvJvlZktOS3G1g2Zkm7b9r67ihW6k171/euuC+mGTm1yhJDkrymSTr2nq/luTguXYmyc2T/HfrZthjTNYjgB8CfzNqZjuxfqOtc4Muy9n2KclfJ1mfZNVQmZLk20k+MEfZb53kjem6T9cnuSjJi5JkphzA9XTH38vbdtfMVSfATBfT382VMcldkry/1fv6JOckefLA/H3adh80kPaC4a6sJLu1tMeO2dwhwO2AF1bVrwZntOnD2/xDBtZ7epIvJHlCOybWA385Zn/2SHJSkh8l+WWS/03y4KE8+yb5z3Ys/rLV+6ta6/FgvpltPzLJ2em6Fs9L8qShfHdPcmKStUl+1T4rH5og2Hku8I2qOn/EvJ8DrwL+IMk+41bSjrcXtf24tn0W3pjk1kP5KslRSY5IcilwLfAC4N0ty8UDx/fqoWX/KsmlSX6a5LNJ7j1LWQ5sdbS+nSP+aESeA5J8qdX9T5L8V4a6q5OsSXLMiGU36KbOjeeq3ZJ8vJ07vpPkFel+zAwuu1eSz7f36IokLwcyvI2q+hFwMgPH4Wwy1GWZbmjAPyX5v7ad77dj6EFjVjPxsdbyjq2/Vp4707Wwzryfx4zZ9qokb03yrbbdy5Icl4Fu9FmW+1iSTw9MJzeeR24+kP7+JGcOTM95rk9ybpITR2xz/7Y/j2nTm/rZG17vnOeEueo1E5x7tECqatk/gDvQfcG/uU0fB/wK2H4o35FAAecCfwU8Cjixpb0K+CLwJLquzyuBLw8se7+W793t9f2AO7Z5xwA/Bb5D90WwP/B7bd7L6L5oH03XjfWPwK+BvxgqWwH/3F7v0MryLeAuY/Z7a+AXwHET1tOz2nZWj9untv1fAn8ztPxjWv79x2xjK+DzdF+8L2n7/YaZOm55VgEPbGnvaNvda4Jy/zbwGmA9cOc2b5s278iB/LsCa4HzgD9t5X4X8BvgiQPl/BHwioHlTmz1+cWBtD8HrgNuNaZ8nwKumqPuvwecPDB9eivjpXTdy/sD9xk4Fgb3Z+9Wn18A/hB4LHBSq4f7DuR7CvD3dF3UD23H3feA44fKcjpdt/X5rX4OAE5p+/nbA/m+BZzZ1vtQ4I+B9wE3nWNfL6VrsR1OP73tw83oPiufHJh3CAPHZkt7VUt7Y3sPXwT8rB1fWw19dq5o6U9p+3N7ula4anU2c3zfbGCZNXQByhNbnkuBS4Bthsr8vVbeZwOPAz7WjqWHDeQ7gO4cdEpb3x+3da0DdhnItwY4ZkTdDL/nR7a08+g+R4/kxs/Rswfy7Uh3HF8IPI3u/PW/wGVAjdjOX7fjZts53sNjgDUD03/X6v6F7Vh4AvAPtM/TmPVMeqzNWX/AXm1dnxx4P+82Ztv3aHX2FOAhwEHAV9p7sO1AviMH6wp4Md15YOZY2aO9378CHj2Q70rgNQPTc57r2/xfA3cYKusHgG8D2dTPHrC6HR/Pms85YVy9MuG5x8fCPHovwILsRDeuqID7t+mZwGE46DmypT9zIG37dnL4AXDrgfS/annvPJB2Q9A0tN5j2rwD5yjnVnQBxNuBrw/NK7oWoDvRnVy/AqyaY323b8v9y4T19Cw2/tIbt0+XzJwgWtpHgG/OsY3HD58UWvo72od4xza9USA1Qbl/my5Y/DHd+KSR66Ebm7YOuN3Qek4BzhmY/ihw2sB780Pg3+hOmLds6ccDZ8xRvguBL82R5wzggoHp0+lO8nuOyDu8P6e2bdx0IG3rlvZfs2wvrW7+tG3ndkPb/jWw20DaTnRfiC9r0zu2coz9wh1zTD53xLzTgS+0189p+R7SpjcIyNr7/CuGgpe2PxuUq01fCWw323EzSx1fDNxkIO0PW/oDhspcdF3Rg3X/TeDzA2lntfUNBnN3afX8uoG0NcP7NMt7fiRDwVdLPxf41MD0UXQtgncaSLsFXbdwjdjOI4b3cZb38Rg2DMg+BnxkPsfCpMfaJtTf++ZbjoH3bde2/08eruuB6b1anoe26cOBb9CdP/6lpd2z5Tlglm2NPNcDtwKuAV4+kLYj3bnxiM387K1mxLl3YP64c8LIemUTzj0+Nv2xpXRZPhO4uKq+1KY/TXeC3qjbsvmfmRfVNeOvpfvSvWYgzzfb864TluE6upPWBlqXwweSXEF3cvk13ZfPqCvvdqdrGbuM7tf3ugm3PQ1HA3ejO4GTZGe6X8VvnWO5h9B92Ie7Nd8H3JRu8P4mq6qZoOmZw91BAw6gG0v4k9bVsk1r6j8Z2CM3dnmdBtw/ybZ0A8BvS9fVvB6YaZLfH/jMHMXaqHtowjxrquqcsQt1XQsPBT4E/GZgX0J3nD9kIO+tk7wmyf+1ffg18N6Wd7ehVV9cVRfPTFTVWrrPwZ1a0g/ofrG/OslzkwwvP5s7tOe5jt1j6FoBjppl/v3oWtLeN5R+PN1n7aFD6Z+sbuD6fJxSVb8emD63Pd9pKN9lVXXGzERVXU/3fuyXZKskt6BrSfhgVV03kO9Sutaq4bLOx8eHps8bKt/96c5d3x3Y7s+B/55lfTPvyx1mmT+brwCPTdc1/KAkN53HsmOPtWnWX5LnJfl6kp/RHTcz9TTuyuev0/04m7nY6eF054DPDKX9mq5VdmZbc57rq+qndMf0IQNdz8+m+4y+u01v6mdvI/M8JwwvO/G5Rwtj2QdkSfalC2Q+kuS2SW5L9yvkI3RftncfsdiPhqavnSUNYNsJi7K2nagHy3ZLul9Ve9CN9XowsC9d99moAdwPAXYB3llVP5tgmz+g61q884RlnFhVnUn3q/UvWtIhdCe0Y+dYdAfgh7XxhQPfG5i/uV5Pd8L8x1nm70QXjP966PGvbf7M4PrP0L0PD6AbhP71qrqarnn+YenGE92eLnAb5zK6X6fj3LnlG3TVqIxDdqD7RfpyNt6f5wPbD5zY3033fv0HXXf8vsBhbd7wcfzDEdtaP5Ovup/Cj6I7Bv4F+Fa68YPPm6O8M9sZe+FI+6y8AnhQkt8fkWXmONmgjtoX9g/Y+DiapC6HDdfBTJmH6+rqEcteTfcDYxVdK3tmKcP32LxjflQZB8u385jyjTITtG43y/zZvIruitUn0gUhP0jy7iQ7TrDs2GONKdVfkhfQ/bD8NN0VhPvRBfow5rxeVb8BPkt3Dtia7rx8Wnvct/2gexjwlRb8zvdcfzRdMPrYJAEOBU5s557N+eyNMp9zwrD5nHu0ALaEK5EObs9/y+hbIjyTrg992mpE2v3pvogfXFU33MtozMDMt9Ld2uF9Sa6rqg+P3WDVdUlOBx6V5GYjgqDN9WbgrW0Q7CHAh1oL1Tg/BHZIctOqunYgfebKyB9sbqGq6mdJ/oWupexfR2T5Ad2XxmtmWcWV7flcuq6dh9N1U8y0hH0G+CO6AOpaul/p45wKPDLJvlX1leGZSfajC+yGW9pGHTPDfkzX4vgmYKP7d0H3BdJa+Q6k6/Z6w8C2f3eCbYxUVd+ma4kM3RfN84Gjk6ypqv+ZZbGZ93f7WeYPOoHuy+ufgbcMzZs5zn6LbvwRcMNn53ZsfBxNUpeb6vazpF1L1+K0Xdv+qKt/f4sNy/orukDuBhlxn7R5uGpM+UaZ2db357OR1pL4GuA16a5yfjzd1d03pxu7tjl+xOT1Nx8HAadW1UtmEpLcZcJlTwNeCzyI7gf+Z+nGCf+crtVofzbsLZj4XF9V5yX5PN341F/RDcX486E8m/LZ28ACnBN+zATnngnXpQks6+i2NZsfBHyZ7hfL8OMc4BntoF4I1zK/X5YzV+Tc0C2S7samB86Sv6rq+XQfgOMz2U0cX033BTUqMJm52vA+Y5Yft08foDsJHUf3i274S3OUz9IdV8Nl/5O2rTM2WmLTHE03kHvUzT0/CdwHOL+qzhrxWA83/BL9LN0vxwezYUC2F93tAb5cc9+O4x10XypvaCfBG7Tpf6cLMN4x351sv8A/T3dSPnvU/rSsN6P7NfvroVU8a77bHFGGal2rL25JvzMm+xq6L5m7TrJeuh9Le9MNPh50Bl0rykFD6U+j+yH52bnWz40tXvNtDRq2a5KZlhVaq8lTgTOr6jftPfoq8NQ2bybfnelaXwfL+h02rr/Hs+m+BNwvyQ1DK1oX4BNmyT8TkFy0qRusqu9V1TvoWp7GHQuTrm8+9beeyd/Pm7Px5+HZEy57Gl3g/HK6z92PW6vu5+kubNiRDX9gzfdcfzTw+3Tj175VVSOHRczzszdsPueEjep1HuceLZDl3kL2eLpg5CVVdfrwzCRvpWvl2Z+5u50mcQHwuCSfpPsCvrKqrhyT/4t0AzjflOSVdINt/57u1+ltZluoqg5Pcj1wXJKtquqDY/J+LsmLgdcluRfd2Jzv0rVQPIKuZeuP6QalzmufquqX7fLnFwHnVtUXx+zrjP+h6/J7S7rbZpxPd2XOIXQDYuf1y3w2VbU+yT8Cbxsx+xV0Vyh9Lskb6YKE7elOZnetqj8byPsZugB45mQL3Q1er6EL6mfrFh0sy/eTPJ3uKs0vJXk93RV7q+nq7p50g4g39Zf+i4HPAScneSddq8iOdIHM1lV1RFX9JMkZwEuSXEV3jP0ZXRf4vLUg/g3AB+ku7tia7kR+HWPG1FXVtUm+TNc9NKeq+niS/6W7EGcw/Yfp/nHjpUl+Tjcm8F50AfgX2Hhs1SgXtOfDkhxL98X0jaGW20lcDXywfYbXAc8D7t6eZ7y8leljSY4Gbkl3FeJP6FpyZxwPvKsdIx+j+7J71jzLM+j1dFfOfSrdbTPWA/8PN3ZNDvs94IrWAjOxJB+lG1t1Nt15Yi+6sZpzjSmd1KT1dwHw4CSPp+vO/H5VrZllnZ8E/jbJy+jOBw+nu3BjTq0Vay3dOXTwx+5My9l6umB4xnzP9R+m+6H2QLqraG+wqZ+9Efswn3PCbPU657ln0vJoAsOj/JfTg+4quWuAm88y/zZ0ly8f06aPpGsa32Yo3xqGrjChC+IKeORA2gPpfsn9ioGrouiCoMtnKcPDga/RnSD/j+7qzSMZugKKEVc70nURXAc8fYK6eADd4Mur6L54fkh3O4Y/pd0igNFXWY7cp4H592/ph83jfbk13a0KrqJrFfsWXWAyeMXmJl1lOZS+TVv3qHLfka5F6opWhqvoxnj86VC+e7XlzxhK/yhz3OJjRDnvRTdg9sqBbb4f2H1E3tNpVxyOmDdqf+5F92W+lu7L4HK6y88fO5BnNV1A/NOW7410t2nYYD9m2zYDVwDSjcM7ttXvL9rx9FngMRPUw/PobpFwi0n2ma4LqEYcm2nHzUUD9fkmBq6Gnu2zMzDvle0YuH5w/aOWYfRtA06nCwCfSDegfn0rz9NGbOsAui/pX9IFEh8F7jGUZyu6HwzfafV6Mt3FMxu858x+rjqGgasfW9redD8mftX29eV0wUyNKOO3gNdO8B5usB26oOEMbhy3elEr403mWM+cx9o86++ebV9/0ernmDHb3o7uB/k6us/Ex+haCEfW9YjlP8jQlZTceAXm6SPyT3SuH8j/1vaeDV8NvkmfvVmO39VMdk6YtV6Z4NzjY2EeM/c8kUZKchRdE/0dasOrUKWR2qDny4G/rKrhqyTVkyS/R9eSc6+qmutvyjRFbWzZJXS3TnlG3+XR0rCsx5BpetLd/fsgumDsbQZjmlQ7Vl4D/M0Cjt/U5jsCONZgrD/tNhQPoOuu3JUNu2O1wi33MWSanhPprtQ6ma7bR5qP19GNfdmZG69qVU/axSVfo7tRqfqzN904tLV0f7V2Tr/F0VJil6UkSVLP7LKUJEnqmQGZJElSz5b1GLIdd9yxVq9e3XcxJEmS5vTVr371+1W1atS8ZR2QrV69mrPO8mbBkiRp6Uvyndnm2WUpSZLUMwMySZKknhmQSZIk9cyATJIkqWcGZJIkST0zIJMkSeqZAZkkSVLPDMgkSZJ6ZkAmSZLUMwMySZKknhmQSZIk9cyAbBlYfcTH+y6CJEmaIgMySZKknhmQSZIk9cyATJIkqWcGZJIkST0zIJMkSeqZAZkkSVLPDMgkSZJ6ZkAmSZLUMwMySZKknhmQSZIk9cyATJIkqWcGZJIkST2bWkCWZNckpyW5MMn5SV7Y0o9MckWSc9rjsQPLvDTJJUkuSvKYaZVNkiRpKdlmiuu+DnhJVZ2d5FbAV5Oc0ua9vqpeO5g5ye7AQcC9gTsAn05y96q6fopllCRJ6t3UWsiq6qqqOru9/ilwIbDLmEUOBI6vqvVVdSlwCbDftMonSZK0VCzKGLIkq4G9gC+3pOcn+UaSdyXZvqXtAlw2sNjljAjgkhya5KwkZ61bt26axZYkSVoUUw/IktwS+DBweFVdA7wZuBuwJ3AV8G8zWUcsXhslVL2tqvapqn1WrVo1nUJLkiQtoqkGZEluQheMvb+qPgJQVVdX1fVV9Rvg7dzYLXk5sOvA4ncErpxm+SRJkpaCaV5lGeCdwIVV9bqB9J0Hsj0ZOK+9Pgk4KMnNktwF2A04c1rlkyRJWiqmeZXlA4FnAOcmOaelvQx4epI96boj1wB/DlBV5yc5AbiA7grNw7zCUpIkrQRTC8iq6guMHhf2iTHLHAUcNa0ySZIkLUXeqV+SJKlnBmSSJEk9MyCTJEnqmQHZFmz1ER/vuwiSJGkCBmSSJEk9MyCTJEnqmQGZJElSzwzIJEmSemZAJkmS1DMDMkmSpJ4ZkEmSJPXMgEySJKlnBmSSJEk9MyCTJEnqmQGZJElSzwzIJEmSemZAJkmS1DMDMkmSpJ4ZkEmSJPXMgEySJKlnBmSSJEk9MyCTJEnqmQGZJElSzwzIJEmSemZAJkmS1DMDMkmSpJ4ZkEmSJPXMgEySJKlnBmSSJEk9MyCTJEnqmQGZJElSzwzIJEmSemZAJkmS1DMDMkmSpJ4ZkEmSJPXMgEySJKlnBmTLyOojPt53ESRJ0hQYkEmSJPXMgGyJs1VMkqQt39QCsiS7JjktyYVJzk/ywpa+Q5JTklzcnrcfWOalSS5JclGSx0yrbMuRgZkkSVuuabaQXQe8pKruBdwPOCzJ7sARwKlVtRtwapumzTsIuDdwAHB0kq2nWD5JkqQlYWoBWVVdVVVnt9c/BS4EdgEOBI5t2Y4FntReHwgcX1Xrq+pS4BJgv2mVT5IkaalYlDFkSVYDewFfBm5fVVdBF7QBO7VsuwCXDSx2eUuTJEnaok09IEtyS+DDwOFVdc24rCPSasT6Dk1yVpKz1q1bt1DFlCRJ6s1UA7IkN6ELxt5fVR9pyVcn2bnN3xlY29IvB3YdWPyOwJXD66yqt1XVPlW1z6pVq6ZXeEmSpEUyzassA7wTuLCqXjcw6yTg4Pb6YOCjA+kHJblZkrsAuwFnTqt8kiRJS8U2U1z3A4FnAOcmOaelvQx4NXBCkucA3wWeClBV5yc5AbiA7grNw6rq+imWT5IkaUmYWkBWVV9g9LgwgEfMssxRwFHTKpMkSdJS5J36JUmSemZAJkmS1DMDMkmSpJ4ZkEmSJPXMgEySJKlnBmSSJEk9MyCTJEnqmQGZJElSzwzIJEmSemZAtoStPuLjfRdBkiQtAgOyJWo+wZiBmyRJy5sBmSRJUs8MyJYhW8QkSdqyGJBJkiT1zIBsmbF1TJKkLY8BmSRJUs8MyCRJknpmQLaC2f0pSdLSYEAmSZLUMwMySZKknhmQSZIk9cyAbJlzHJgkScufAZkkSVLPDMgkSZJ6ZkAmSZLUMwOyZcqxY5IkbTkMyCRJknpmQCZJktQzA7ItlF2akiQtHxMFZEl+Z9oF0cIxGJMkaXmZtIXsLUnOTPKXSW47zQJJkiStNBMFZFX1IOBPgF2Bs5Icl+RRUy2ZJEnSCjHxGLKquhj4e+BvgYcC/5Hkm0n+YFqFkyRJWgkmHUN2nySvBy4EHg48oaru1V6/forl0xiOFZMkacswaQvZG4GzgT2q6rCqOhugqq6kazXTEjAqQDNokyRp6dtmwnyPBX5ZVdcDJNkK2LaqflFV751a6TQxAy9JkpavSVvIPg1sNzB985YmSZKkzTRpQLZtVf1sZqK9vvl0iiRJkrSyTBqQ/TzJ3jMTSe4L/HI6RZIkSVpZJg3IDgc+lOTzST4PfBB4/tRKpQXl+DJJkpa2SW8M+xXgnsDzgL8E7lVVXx23TJJ3JVmb5LyBtCOTXJHknPZ47MC8lya5JMlFSR6zabuj+TJYkySpf5NeZQmwL7C6LbNXEqrqPWPyH0N3u4zhPK+vqtcOJiTZHTgIuDdwB+DTSe4+c1WnJEnSlmyigCzJe4G7AecAM0FSsXGwdYOq+lyS1ROW40Dg+KpaD1ya5BJgP+BLEy4vSZK0bE3aQrYPsHtV1QJs8/lJngmcBbykqn4E7AKcMZDn8pa2kSSHAocC3OlOd1qA4kiSJPVr0kH95wG/tQDbezNdS9uewFXAv7X0jMg7MvirqrdV1T5Vtc+qVasWoEiSJEn9mrSFbEfggiRnAutnEqvqifPZWFVdPfM6yduBj7XJy4FdB7LeEbhyPuuWJElariYNyI5ciI0l2bmqrmqTT6ZreQM4CTguyevoBvXvBpy5ENuUJEla6iYKyKrqs0nuDOxWVZ9OcnNg63HLJPkAsD+wY5LLgVcC+yfZk647cg3w52395yc5AbgAuA44zCssN423sZAkafmZ9CrL59INpN+BbgzYLsBbgEfMtkxVPX1E8jvH5D8KOGqS8kiSJG1JJh3UfxjwQOAagKq6GNhpWoXS4rNlTZKk/kwakK2vqmtnJpJswyxXQUqSJGl+Jg3IPpvkZcB2SR4FfAj47+kVS5IkaeWYNCA7AlgHnEs3EP8TwN9Pq1Arnd2HkiStLJNeZfkb4O3tIUmSpAU06VWWlzJizFhV3XXBSyRJkrTCzOe/LGdsCzyV7hYYkiRJ2kwTjSGrqh8MPK6oqn8HHj7dokmSJK0Mk3ZZ7j0wuRVdi9mtplIiSZKkFWbSLst/G3h9Hd3fHv3RgpdGkiRpBZr0KsuHTbsgkiRJK9WkXZYvHje/ql63MMXRYlh9xMdZ8+rHbTAtSZL6M5+rLPcFTmrTTwA+B1w2jUJpegy+JElaeiYNyHYE9q6qnwIkORL4UFUdMq2CSZIkrRST/nXSnYBrB6avBVYveGkkSZJWoElbyN4LnJnkRLo79j8ZeM/USiVJkrSCTHqV5VFJ/gd4cEt6dlV9bXrFkiRJWjkm7bIEuDlwTVW9Abg8yV2mVCZJkqQVZaKALMkrgb8FXtqSbgK8b1qFkiRJWkkmbSF7MvBE4OcAVXUl/nWSJEnSgpg0ILu2qopuQD9JbjG9ImkavP+YJElL16QB2QlJ3grcNslzgU8Db59esSRJklaOOa+yTBLgg8A9gWuAewCvqKpTplw2SZKkFWHOgKyqKsl/VdV9AYOwZcxuS0mSlqZJuyzPSLLvVEsiSZK0Qk16p/6HAX+RZA3dlZahazy7z7QKJkmStFKMDciS3Kmqvgv8/iKVR4vE7ktJkpaOuVrI/gvYu6q+k+TDVfWURSiTJEnSijLXGLIMvL7rNAsiSZK0Us0VkNUsryVJkrRA5uqy3CPJNXQtZdu113DjoP5bT7V0kiRJK8DYgKyqtl6sgkiSJK1Uk96HTJIkSVNiQKYNeDsMSZIWnwGZJElSzwzIJEmSemZAphvYXSlJUj8MyCRJknpmQCZJktSzqQVkSd6VZG2S8wbSdkhySpKL2/P2A/NemuSSJBclecy0yiVJkrTUTLOF7BjggKG0I4BTq2o34NQ2TZLdgYOAe7dljk6yIm9K6zguSZJWnqkFZFX1OeCHQ8kHAse218cCTxpIP76q1lfVpcAlwH7TKpskSdJSsthjyG5fVVcBtOedWvouwGUD+S5vaZIkSVu8pTKoPyPSamTG5NAkZyU5a926dVMuliRJ0vQtdkB2dZKdAdrz2pZ+ObDrQL47AleOWkFVva2q9qmqfVatWjXVwkqSJC2GxQ7ITgIObq8PBj46kH5QkpsluQuwG3DmIpdNkiSpF9tMa8VJPgDsD+yY5HLglcCrgROSPAf4LvBUgKo6P8kJwAXAdcBhVXX9tMomSZK0lEwtIKuqp88y6xGz5D8KOGpa5ZEkSVqqlsqgfkmSpBXLgEwb8ea0kiQtLgMySZKknhmQSZIk9cyATJIkqWcGZEuIY7ckSVqZDMgkSZJ6ZkCmkWytkyRp8RiQSZIk9cyATJIkqWcGZJIkST0zIJMkSeqZAZkkSVLPDMg0lldbSpI0fQZkPTPgkSRJBmRLgEGZJEkrmwHZEmFQJknSymVAJkmS1DMDMk3EFjxJkqbHgEySJKlnBmSSJEk9MyDTnOyulCRpugzINC8GZ5IkLTwDMkmSpJ4ZkEmSJPXMgEyzsntSkqTFYUAmSZLUMwMySZKknhmQSZIk9cyATJIkqWcGZJIkST0zIJMkSeqZAVkPvJ2EJEkaZECmiRlISpI0HQZkmjcDM0mSFpYBmSRJUs8MyCRJknpmQCZJktQzA7KeOA5LkiTNMCDTJjGglCRp4fQSkCVZk+TcJOckOaul7ZDklCQXt+ft+yibJmdQJknSwuizhexhVbVnVe3Tpo8ATq2q3YBT2/QWzYBGkiTB0uqyPBA4tr0+FnhSf0WRJElaPH0FZAV8KslXkxza0m5fVVcBtOedRi2Y5NAkZyU5a926dYtUXEmSpOnZpqftPrCqrkyyE3BKkm9OumBVvQ14G8A+++xT0yqgJEnSYumlhayqrmzPa4ETgf2Aq5PsDNCe1/ZRNm0ax8NJkrTpFj0gS3KLJLeaeQ08GjgPOAk4uGU7GPjoYpdN82cgJknS5uujy/L2wIlJZrZ/XFV9MslXgBOSPAf4LvDUHsomSZK06BY9IKuqbwN7jEj/AfCIxS6PJElS35bSbS8kSZJWJAMySZKknhmQTdlKGPS+EvZRkqRpMiDTghoOzgzWJEmamwGZJElSzwzIJEmSemZAtkjsupMkSbMxINOCM/iUJGl+DMi0YOYKxAbnG7RJknQjAzJJkqSeGZAtIFt9JEnSpjAg01TMJzg1kJUkrXQGZJIkST0zIFtkK7U1aGa/V+r+S5I0jgHZIjIYkSRJoxiQLbBRLUEGYpIkaRwDMkmSpJ4ZkGnqhlsIR7UYjmtRtIVRkrSlMyBTbwy0JEnqGJCpV7aGSZJkQDYVBhWbxnqTJK1UBmSamoUIsAzSJEkrgQGZJElSzwzIJEmSerZN3wXYUti1trCsT0nSSmILmZYF//lAkrQlMyCbIgOH6Rl1uwzrW5K0XBmQLQADgaXJ90WStFwYkG0mv/T7Y91LkrYUBmRacQzkJElLjQGZlrXNDa4MziRJS4EBmZaN2YKnUVdgzuc/Mg3KJEl9MyDTijJbwDZbvvnkMbCTJG0qAzItK/MNeia9HYbBlCSpTwZkWrE2NVDblKBQkqRxDMi0YhgYSZKWKgMyaRYLHcAZEEqSZmNAJjWbGjDNNrh/IS4ckCStDAZkm8Ev1KVpc96X2caMDV8csNSvsNzc23wstf2RpC3dkgvIkhyQ5KIklyQ5ou/yjOKX1ZZtPsFMXwP8p3lftb6P7763L0l9WFIBWZKtgTcBvw/sDjw9ye79lupGg60kfmlolElvTDtpS9xs6x9eZra8m3u8bk636zQ+I9PeH0nqy5IKyID9gEuq6ttVdS1wPHBgz2WSpmquoGbSYG6udW2qwUBx1HbGBX7zKU8f/6Yw6XoXcvuzvafzvcXKpF3Pk653obv6pcW0JRyDSy0g2wW4bGD68pbWq0lvLioNm+uLb3Na1CbZ1mzbm3k9rlVurvVNWsbB/MPB2/C6Jgk2NzVwGfc82xjB+b4Ps9XpQgTMk56H5hrzuCnH0lzH0HyMC+wnLc9c650tbVzL9Gzb25T9G/U8yTKzLT/JPmyOTd3HhSzD8Lpme0829Vw4yfmxb6mqvstwgyRPBR5TVYe06WcA+1XVCwbyHAoc2ibvAVy06AVdHnYEvt93IZYZ62z+rLP5s87mzzqbP+ts/hajzu5cVatGzdhmyhuer8uBXQem7whcOZihqt4GvG0xC7UcJTmrqvbpuxzLiXU2f9bZ/Fln82edzZ91Nn9919lS67L8CrBbkrskuSlwEHBSz2WSJEmaqiXVQlZV1yV5PnAysDXwrqo6v+diSZIkTdWSCsgAquoTwCf6LscWwG7d+bPO5s86mz/rbP6ss/mzzuav1zpbUoP6JUmSVqKlNoZMkiRpxTEg2wIth7+f6kOSdyVZm+S8gbQdkpyS5OL2vP3AvJe2OrwoyWP6KXW/kuya5LQkFyY5P8kLW7r1Nosk2yY5M8nXW539Q0u3zsZIsnWSryX5WJu2vuaQZE2Sc5Ock+Sslma9jZHktkn+M8k323nt/kulzgzItjBL/e+nenYMcMBQ2hHAqVW1G3Bqm6bV2UHAvdsyR7e6XWmuA15SVfcC7gcc1urGepvdeuDhVbUHsCdwQJL7YZ3N5YXAhQPT1tdkHlZVew7crsF6G+8NwCer6p7AHnTH3JKoMwOyLY9/PzWLqvoc8MOh5AOBY9vrY4EnDaQfX1Xrq+pS4BK6ul1Rquqqqjq7vf4p3clrF6y3WVXnZ23yJu1RWGezSnJH4HHAOwaSra9NY73NIsmtgYcA7wSoqmur6scskTozINvyLMm/n1rCbl9VV0EXfAA7tXTrcUiS1cBewJex3sZq3W/nAGuBU6rKOhvv34G/AX4zkGZ9za2ATyX5avsXG7DexrkrsA54d+sef0eSW7BE6syAbMuTEWleSjt/1uOAJLcEPgwcXlXXjMs6Im3F1VtVXV9Ve9L928h+SX5nTPYVXWdJHg+sraqvTrrIiLQVU19DHlhVe9MNUTksyUPG5LXeult97Q28uar2An5O656cxaLWmQHZlmfOv5/SBq5OsjNAe17b0q3HJslN6IKx91fVR1qy9TaB1h1yOt34E+tstAcCT0yyhm6IxcOTvA/ra05VdWV7XgucSNedZr3N7nLg8tZiDfCfdAHakqgzA7Itj38/NT8nAQe31wcDHx1IPyjJzZLcBdgNOLOH8vUqSejGW1xYVa8bmGW9zSLJqiS3ba+3Ax4JfBPrbKSqemlV3bGqVtOdrz5TVX+K9TVWklskudXMa+DRwHlYb7Oqqu8BlyW5R0t6BHABS6TOltyd+rV5/Pup2SX5ALA/sGOSy4FXAq8GTkjyHOC7wFMBqur8JCfQfVivAw6rqut7KXi/Hgg8Azi3jYkCeBnW2zg7A8e2q7G2Ak6oqo8l+RLW2Xx4jI13e+DE7jcT2wDHVdUnk3wF622cFwDvbw0W3waeTfuc9l1n3qlfkiSpZ3ZZSpIk9cyATJIkqWcGZJIkST0zIJMkSeqZAZkkSVLPDMgkTV2S65Ock+T8JF9P8uIkY88/SVYnOW+KZfrEzP3Clsp6k+yZ5LELXCRJy4D3IZO0GH7Z/kqIJDsBxwG3obsXXC+qaiqBz2aud09gH+ATC1MaScuFLWSSFlX7m5dDgeens3WSf03ylSTfSPLnw8u01rLPJzm7PR7Q0t+b5MCBfO9P8sQk905yZmuV+0aS3Uasc02SHdu6L0zy9taC96l2h/3h/MckeXOS05J8O8lDk7yrLXvMfNab5PQk+7TXO7Zlbgr8I/C0Vu6ntbuxv6vVzddm9nWS/ZO0vBiQSVp0VfVtuvPPTsBzgJ9U1b7AvsBz29+UDFoLPKr9kfLTgP9o6e+gu9M2SW4DPICudekvgDe0Vrl96P6TbpzdgDdV1b2BHwNPmSXf9sDDgRcB/w28Hrg38LtJ9tyM9VJV1wKvAD5YVXtW1QeBv6P7K6F9gYcB/9r+Jme++ydpibPLUlJf0p4fDdwnyR+26dvQBTLfGsh7E+CNLei5Hrg7QFV9NsmbWjfoHwAfbn8f9iXg75LcEfhIVV08R1kurapz2uuvAqtnyfffVVVJzgWurqpzAZKc35Y5Zyj/pOudzaPp/nj7r9v0tsCdgPnun6QlzoBM0qJLcle6wGotXWD2gqo6eSjP6oHJFwFXA3vQtaz9amDee4E/oftj6j8DqKrjknwZeBxwcpJDquozY4q0fuD19cBGXZZD+X4ztMxvGH0+nW2913FjD8W2Y8oV4ClVddFQ+oXz3D9JS5xdlpIWVZJVwFuAN1b3Z7onA89LcpM2/+6tW27QbYCrquo3dH92vvXAvGOAw6H7M+C2jrsC366q/wBOAu4ztR3aNGuA+7bXfziQ/lPgVgPTJwMvSPsH6SR7teelvn+S5smATNJi2G7mthfAp4FPAf/Q5r0DuAA4u93m4q1s3Np0NHBwkjPouit/PjOjqq4GLgTePZD/acB5Sc4B7gm8Z8H3aPO8li4I/SKw40D6acDuM4P6gX+i6679Rqubf2r5lvr+SZqndD9QJWl5SnJz4Fxg76r6Sd/lkaRNYQuZpGUrySOBbwL/n8GYpOXMFjJJkqSe2UImSZLUMwMySZKknhmQSZIk9cyATJIkqWcGZJIkST0zIJMkSerZ/w+gBABfl8gsGQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 720x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"counts, labels = np.histogram(df['Ar_Delay_mins'],bins=np.arange(-60,600,1))\n",
"fig = plt.figure(figsize=(10,5))\n",
"plt.bar(labels[1:], counts, align='center',width=1)\n",
"plt.xlabel('Delays in minutes')\n",
"plt.ylabel('Frequency')\n",
"plt.title('Amtrak City of New Orleans (Northbound) is not always late',loc='left',fontsize=16)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e56d899b",
"metadata": {
"id": "e56d899b",
"outputId": "b888cc1e-0d9f-4baf-81e5-0d7e70aa5260"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Median delay in minutes: 0.0\n",
"Mean delay in minutes: 12.635004951832178\n"
]
}
],
"source": [
"print('Median delay in minutes: ', np.median(df['Ar_Delay_mins']))\n",
"print('Mean delay in minutes: ', np.mean(df['Ar_Delay_mins']))"
]
},
{
"cell_type": "markdown",
"id": "b2174f32",
"metadata": {
"id": "b2174f32"
},
"source": [
"Delays reaching the destination of Union Station, Chicago"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "94a4dc76",
"metadata": {
"id": "94a4dc76"
},
"outputs": [],
"source": [
"df_dest = df.loc[df['Station']=='CHI']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0a0da9c2",
"metadata": {
"id": "0a0da9c2",
"outputId": "728a8094-1f57-44e9-8bbf-0a924eecddfb"
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x2729eb6fa60>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"counts, labels = np.histogram(df['Ar_Delay_mins'],bins=np.arange(-60,600,1))\n",
"fig = plt.figure(figsize=(10,5))\n",
"plt.bar(labels[1:], counts, align='center',width=1,label='All')\n",
"\n",
"counts_dest, labels_dest = np.histogram(df_dest['Ar_Delay_mins'],bins=np.arange(-60,600,1))\n",
"plt.bar(labels_dest[1:], counts_dest, align='center',width=1,label='New Orleans')\n",
"plt.xlabel('Delays in minutes')\n",
"plt.ylabel('Frequency')\n",
"plt.legend()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a560d894",
"metadata": {
"id": "a560d894",
"outputId": "d83a999c-4074-4916-f2e2-9d965fe99e94"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Median delay in minutes: -3.0\n",
"Mean delay in minutes: 12.15968239964711\n"
]
}
],
"source": [
"print('Median delay in minutes: ', np.median(df_dest['Ar_Delay_mins']))\n",
"print('Mean delay in minutes: ', np.mean(df_dest['Ar_Delay_mins']))"
]
},
{
"cell_type": "markdown",
"id": "908c61d7",
"metadata": {
"id": "908c61d7"
},
"source": [
"Earliest arrival"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b281bed3",
"metadata": {
"id": "b281bed3",
"outputId": "fbd33296-5126-460e-a3e5-bbb97e83bfde"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-61 minutes\n"
]
}
],
"source": [
"print(str(np.min(df_dest['Ar_Delay_mins'])) + ' minutes')"
]
},
{
"cell_type": "markdown",
"id": "f09838ff",
"metadata": {
"id": "f09838ff"
},
"source": [
"Latest arrival"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67bf5a95",
"metadata": {
"id": "67bf5a95",
"outputId": "b769aaf1-6c6c-4367-9717-c6137d82bf5f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"819 minutes\n"
]
}
],
"source": [
"print(str(np.max(df_dest['Ar_Delay_mins'])) + ' minutes')"
]
},
{
"cell_type": "markdown",
"id": "7697a5d0",
"metadata": {
"id": "7697a5d0"
},
"source": [
"Percentage arrives on time"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aef9ad03",
"metadata": {
"id": "aef9ad03"
},
"outputs": [],
"source": [
"df_ontime30 = df.loc[(df['Ar_Delay_mins']<=30)]\n",
"df_ontime60 = df.loc[(df['Ar_Delay_mins']<=60)]\n",
"df_ontime = df.loc[(df['Ar_Delay_mins']<=0)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "45a64dd4",
"metadata": {
"id": "45a64dd4",
"outputId": "b4d9784f-c4c1-47c3-8e75-225520a7f6a1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"90.51% of the time the train arrives within 60 mins of schedule\n",
"81.4% of the time the train arrives within 30 mins of schedule\n",
"51.12% of the time the train arrives on time or earlier\n"
]
}
],
"source": [
"print(str(np.round(len(df_ontime60)*100./len(df),2))+'% of the time the train arrives within 60 mins of schedule')\n",
"print(str(np.round(len(df_ontime30)*100./len(df),2))+'% of the time the train arrives within 30 mins of schedule')\n",
"print(str(np.round(len(df_ontime)*100./len(df),2))+'% of the time the train arrives on time or earlier')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e5f4e15",
"metadata": {
"id": "0e5f4e15",
"outputId": "e4cdb45b-98b7-43d3-f4cf-4038c69a57c4"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"88.8% of the time the train arrives within 60 mins of schedule\n",
"80.88% of the time the train arrives within 30 mins of schedule\n",
"54.34% of the time the train arrives on time or earlier\n"
]
}
],
"source": [
"df_dest_ontime30 = df_dest.loc[(df_dest['Ar_Delay_mins']<=30)]\n",
"df_dest_ontime60 = df_dest.loc[(df_dest['Ar_Delay_mins']<=60)]\n",
"df_dest_ontime = df_dest.loc[(df_dest['Ar_Delay_mins']<=0)]\n",
"\n",
"print(str(np.round(len(df_dest_ontime60)*100./len(df_dest),2))+'% of the time the train arrives within 60 mins of schedule')\n",
"print(str(np.round(len(df_dest_ontime30)*100./len(df_dest),2))+'% of the time the train arrives within 30 mins of schedule')\n",
"print(str(np.round(len(df_dest_ontime)*100./len(df_dest),2))+'% of the time the train arrives on time or earlier')"
]
},
{
"cell_type": "markdown",
"id": "ff1cba0f",
"metadata": {
"id": "ff1cba0f"
},
"source": [
"# Conclusion\n",
"\n",
"My experience of 6-hour and 3-hour delays are not the norm. Based on 10 years of data from the most reported stations along the City of New Orleans route, the trains are actually within 30 minutes of schedule most of the time. Half the time it arrives on-time or even earlier at the end stations. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
},
"colab": {
"name": "Amtrak Delays.ipynb",
"provenance": [],
"include_colab_link": true
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment