Skip to content

Instantly share code, notes, and snippets.

@cleishm
Last active December 28, 2015 05:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cleishm/7453537 to your computer and use it in GitHub Desktop.
Save cleishm/7453537 to your computer and use it in GitHub Desktop.

US Flights & Airports: Delays, Cancellations, & Diversions

Inspiration

For any airline carrier, efficiency is key: delayed or cancelled flights and long taxi times often lead to unhappy customers. Flight planning is one the most complex optimization and scheduling problems out there, requiring a deep analysis of flight and airport data.

There is a dataset "Airline On-Time Performance and Causes of Flight Delays" released by the Bureau of Transportation Statistics that contains detailed information on all United States flights since 1987. For demonstrational purposes and for the sake of simplicity, I have only included 30 observations of American Airlines (AA) flights travelling between four (4) major airports in the year 2012.

Included Variables

  • Date of flight

  • Flight number

  • Origin airport

  • Duration of `taxi out' time at origin airport

  • Duration of departure delay at origin airport

  • Destination airport

  • Duration of `taxi in' time at destination airport

  • Duration of arrival delay at destination airport

  • Duration of delay due to carrier

  • Duration of delay due to weather

  • Duration of delay due to NAS (National Aviation System)

  • Duration of delay due to security

  • Duration of delay due to late aircraft

  • Diversion airport

  • Duration of diversion arrival delay

  • Cancelled indicator

  • Cancellation reason

All duration times are in minutes and delay type definitons can be found here.

Included Airports

Name City Abbreviation
Hartsfield-Jackson Atlanta International Airport Atlanta ATL
O'Hare International Airport Chicago ORD
Los Angeles International Airport Los Angeles LAX
Dallas/Fort Worth International Airport Dallas/Fort Worth DFW

Proposed Model

Applications

A database of this nature can answer broad questions such as:

  • What is the average taxi time at each airport for both departures and arrivals?

  • What is the leading cause of departure delays at each airport?

  • How many outbound flights were cancelled at each airport?

Or more specific questions such as:

  • Which flights from Los Angeles (LAX) to Chicago (ORD) were delayed for more than 10 minutes due to late arrivals?

  • How does seasonality affect departure taxi times at Chicago’s O’Hare International Airport (ORD)?

  • What is the standard deviation of arrival taxi times at Dallas/Fort Worth (DFW)?

These are the types of questions airline carriers are asking when attempting to construct efficient flight plans for their customers. After initializing the data (Query 1), we can start answering these questions and drawing insights.

// US Flights & Airports
CREATE (SF :City {name: 'San Francisco'}),
       (Chicago :City {name: 'Chicago'}),
       (LA :City {name: 'Los Angeles'}),
       (NewYork :City {name: 'New York'})

CREATE (SFO :Airport {code:'SFO', name:'San Francisco International Airport'}),
       (ORD :Airport {code:'ORD', name:"O'Hare International Airport"}),
       (LAX :Airport {code:'LAX', name:'Los Angeles International Airport'}),
       (JFK :Airport {code:'JFK', name:'John F. Kennedy International Airport'})

CREATE (SFO)-[:IN]->(SF),
       (ORD)-[:IN]->(Chicago),
       (LAX)-[:IN]->(LA),
       (JFK)-[:IN]->(NewYork)

//Delay & Cancellation Causes
CREATE (Carrier :Cause        {name:'Carrier'}),
       (Weather :Cause        {name:'Weather'}),
       (NAS :Cause            {name:'NAS'}),
       (Security :Cause       {name:'Security'}),
       (LateAircraft :Cause   {name:'Late Aircraft'})

//Flights
CREATE (flight1 :Flight {number:2412, date:'20130812'}),
       (flight2 :Flight {number:2321, date:'20130412'}),
       (flight3 :Flight {number:1890, date:'20130212'}),
       (flight4 :Flight {number:2420, date:'20130912'}),
       (flight5 :Flight {number:1092, date:'20130712'}),
       (flight6 :Flight {number:2497, date:'20130312'}),
       (flight7 :Flight {number:1609, date:'20130512'}),
       (flight8 :Flight {number:557, date:'20130412'}),
       (flight9 :Flight {number:1894, date:'20130912'}),
       (flight10 :Flight {number:1011, date:'20130812'}),
       (flight11 :Flight {number:2324, date:'20130812'}),
       (flight12 :Flight {number:1332, date:'20130912'}),
       (flight13 :Flight {number:2430, date:'20130512'}),
       (flight14 :Flight {number:607, date:'20130912'}),
       (flight15 :Flight {number:2336, date:'20130212'}),
       (flight16 :Flight {number:2222, date:'20131212'}),
       (flight17 :Flight {number:2404, date:'20131212'}),
       (flight18 :Flight {number:2401, date:'20130312'}),
       (flight19 :Flight {number:693, date:'20130312'}),
       (flight20 :Flight {number:565, date:'20130112'}),
       (flight21 :Flight {number:1829, date:'20130912'}),
       (flight22 :Flight {number:2495, date:'20131212'}),
       (flight23 :Flight {number:2327, date:'20130712'}),
       (flight24 :Flight {number:1062, date:'20130912'}),
       (flight25 :Flight {number:1282, date:'20130912'}),
       (flight26 :Flight {number:2450, date:'20131212'}),
       (flight27 :Flight {number:2336, date:'20130112'}),
       (flight28 :Flight {number:2357, date:'20131012'}),
       (flight29 :Flight {number:557, date:'20130612'}),
       (flight30 :Flight {number:2339, date:'20130312'}),
       (flight31 :Flight {number:637, date:'20130312'}),
       (flight32 :Flight {number:7634, date:'20130312'}),
       (flight33 :Flight {number:323, date:'20130312'}),
       (flight34 :Flight {number:4837, date:'20130312'}),
       (flight35 :Flight {number:3283, date:'20130312'}),
       (flight36 :Flight {number:1983, date:'20130312'})

//Origins & Destinations
CREATE (LAX)<-[:ORIGIN]-(flight1)-[:DESTINATION]->(JFK),
    (SFO)<-[:ORIGIN]-(flight2)-[:DESTINATION]->(JFK),
    (SFO)<-[:ORIGIN]-(flight3)-[:DESTINATION]->(ORD),
    (LAX)<-[:ORIGIN {taxi_time:56, dep_delay:32}]-(flight4)-[:DESTINATION]->(JFK),
    (LAX)<-[:ORIGIN {taxi_time:17, dep_delay:1}]-(flight5)-[:DESTINATION]->(ORD),
    (JFK)<-[:ORIGIN {taxi_time:21, dep_delay:43}]-(flight6)-[:DESTINATION]->(LAX),
    (SFO)<-[:ORIGIN {taxi_time:20, dep_delay:0}]-(flight7)-[:DESTINATION {taxi_time:16, arr_delay:0}]->(JFK),
    (ORD)<-[:ORIGIN {taxi_time:8, dep_delay:0}]-(flight8)-[:DESTINATION {taxi_time:5, arr_delay:0}]->(LAX),
    (LAX)<-[:ORIGIN {taxi_time:33, dep_delay:15}]-(flight9)-[:DESTINATION {taxi_time:5, arr_delay:31}]->(ORD),
    (SFO)<-[:ORIGIN {taxi_time:14, dep_delay:0}]-(flight10)-[:DESTINATION {taxi_time:6, arr_delay:0}]->(JFK),
    (JFK)<-[:ORIGIN {taxi_time:11, dep_delay:0}]-(flight11)-[:DESTINATION {taxi_time:6, arr_delay:0}]->(ORD),
    (JFK)<-[:ORIGIN {taxi_time:15, dep_delay:53}]-(flight12)-[:DESTINATION {taxi_time:13, arr_delay:60}]->(SFO),
    (LAX)<-[:ORIGIN {taxi_time:26, dep_delay:0}]-(flight13)-[:DESTINATION {taxi_time:5, arr_delay:0}]->(JFK),
    (ORD)<-[:ORIGIN {taxi_time:17, dep_delay:3}]-(flight14)-[:DESTINATION {taxi_time:7, arr_delay:27}]->(LAX),
    (JFK)<-[:ORIGIN {taxi_time:11, dep_delay:0}]-(flight15)-[:DESTINATION {taxi_time:3, arr_delay:0}]->(ORD),
    (JFK)<-[:ORIGIN {taxi_time:11, dep_delay:0}]-(flight16)-[:DESTINATION {taxi_time:8, arr_delay:0}]->(SFO),
    (LAX)<-[:ORIGIN {taxi_time:20, dep_delay:0}]-(flight17)-[:DESTINATION {taxi_time:6, arr_delay:0}]->(JFK),
    (JFK)<-[:ORIGIN {taxi_time:13, dep_delay:0}]-(flight18)-[:DESTINATION {taxi_time:14, arr_delay:0}]->(LAX),
    (SFO)<-[:ORIGIN {taxi_time:26, dep_delay:0}]-(flight19)-[:DESTINATION {taxi_time:9, arr_delay:0}]->(JFK),
    (SFO)<-[:ORIGIN {taxi_time:15, dep_delay:0}]-(flight20)-[:DESTINATION {taxi_time:4, arr_delay:0}]->(JFK),
    (SFO)<-[:ORIGIN {taxi_time:16, dep_delay:0}]-(flight21)-[:DESTINATION {taxi_time:12, arr_delay:0}]->(JFK),
    (JFK)<-[:ORIGIN {taxi_time:18, dep_delay:22}]-(flight22)-[:DESTINATION {taxi_time:10, arr_delay:37}]->(LAX),
    (ORD)<-[:ORIGIN {taxi_time:15, dep_delay:0}]-(flight23)-[:DESTINATION {taxi_time:6, arr_delay:0}]->(JFK),
    (LAX)<-[:ORIGIN {taxi_time:19, dep_delay:35}]-(flight24)-[:DESTINATION {taxi_time:8, arr_delay:19}]->(ORD),
    (JFK)<-[:ORIGIN {taxi_time:10, dep_delay:0}]-(flight25)-[:DESTINATION {taxi_time:6, arr_delay:0}]->(SFO),
    (LAX)<-[:ORIGIN {taxi_time:12, dep_delay:0}]-(flight26)-[:DESTINATION {taxi_time:8, arr_delay:0}]->(JFK),
    (JFK)<-[:ORIGIN {taxi_time:8, dep_delay:0}]-(flight27)-[:DESTINATION {taxi_time:6, arr_delay:0}]->(ORD),
    (ORD)<-[:ORIGIN {taxi_time:12, dep_delay:0}]-(flight28)-[:DESTINATION {taxi_time:8, arr_delay:0}]->(JFK),
    (ORD)<-[:ORIGIN {taxi_time:11, dep_delay:0}]-(flight29)-[:DESTINATION {taxi_time:6, arr_delay:0}]->(LAX),
    (ORD)<-[:ORIGIN {taxi_time:17, dep_delay:0}]-(flight30)-[:DESTINATION {taxi_time:6, arr_delay:0}]->(JFK),
    (SFO)<-[:ORIGIN]-(flight31)-[:DESTINATION]->(JFK),
    (SFO)<-[:ORIGIN]-(flight32)-[:DESTINATION]->(JFK),
    (SFO)<-[:ORIGIN]-(flight33)-[:DESTINATION]->(JFK),
    (SFO)<-[:ORIGIN]-(flight34)-[:DESTINATION]->(JFK),
    (SFO)<-[:ORIGIN]-(flight35)-[:DESTINATION]->(JFK),
    (SFO)<-[:ORIGIN]-(flight36)-[:DESTINATION]->(JFK)

//Carrier Delays
CREATE (flight22)-[:DELAYED_BY {time:22}]->(Carrier),
    (flight24)-[:DELAYED_BY {time:3}]->(Carrier)

//Weather Delays

//NAS Delays
CREATE (flight9)-[:DELAYED_BY {time:16}]->(NAS),
    (flight12)-[:DELAYED_BY {time:7}]->(NAS),
    (flight14)-[:DELAYED_BY {time:27}]->(NAS),
    (flight22)-[:DELAYED_BY {time:15}]->(NAS)

//Late Aircraft Delays
CREATE (flight9)-[:DELAYED_BY {time:15}]->(LateAircraft),
    (flight12)-[:DELAYED_BY {time:53}]->(LateAircraft),
    (flight24)-[:DELAYED_BY {time:16}]->(LateAircraft)

//Cancellations
CREATE (flight1)-[:CANCELLED_BY]->(Carrier),
    (flight2)-[:CANCELLED_BY]->(NAS),
    (flight3)-[:CANCELLED_BY]->(Weather),
    (flight31)-[:CANCELLED_BY]->(Weather),
    (flight32)-[:CANCELLED_BY]->(Carrier),
    (flight33)-[:CANCELLED_BY]->(Weather),
    (flight34)-[:CANCELLED_BY]->(Weather),
    (flight35)-[:CANCELLED_BY]->(Carrier),
    (flight36)-[:CANCELLED_BY]->(Weather)

//Diversions
CREATE (flight4)-[:DIVERTED_TO {taxi_time:14, div_arr_delay:280}]->(LAX),
    (flight5)-[:DIVERTED_TO {taxi_time:17, div_arr_delay:145}]->(LAX),
    (flight6)-[:DIVERTED_TO {taxi_time:6, div_arr_delay:257}]->(JFK)

What is the average taxi time at each airport for both departures and arrivals?

A flight planner will want to take into consideration how long it takes on average for a plane to travel from its gate to the runway, or vice versa, at a given airport. The consequences for leaving customers sitting on a tarmac for too long can range from a few angry letters to a PR nightmare.

MATCH (a)<-[q:ORIGIN]-(), ()-[r:DESTINATION|DIVERTED_TO]->(a)
WITH a, ROUND(AVG(q.taxi_time)) AS avg1, ROUND(AVG(r.taxi_time)) AS avg2
RETURN a.name AS Airport, avg1 AS `Average Departure Taxi Time`, avg2 AS `Average Arrival Taxi Time`

What is the leading cause of departure delays at each airport?

Are the delays at a given airport mostly out of one’s control (weather delays) or are the delays mostly preventable (carrier delays)? A flight planner would be interested to learn which of these types of delays are most prevalent at each of its airports.

MATCH (a)<-[:ORIGIN]-()-[r:DELAYED_BY]->(d)
WITH a.name AS Airport, d.name AS Delay, SUM(r.time) AS total
ORDER BY Airport, total desc
RETURN Airport, Delay, total as `Total Delay Time`

How many outbound flights were cancelled at each airport?

Cancelled flights are the worst. An analyst will want to know the volume of cancelled flights by airport for the purposes of identifying outliers for further analysis.

MATCH (a)<-[:ORIGIN]-()-[:CANCELLED_BY]->()
WITH a, COUNT(*) AS total
ORDER BY total desc
RETURN a.name AS Airport, total AS `Total Cancelled Flights`

Which flights from Los Angeles (LAX) to Chicago (ORD) were delayed for more than 10 minutes due to late arrivals?

If a flight planner has identified that there are long delay times for flights from LAX to ORD, he or she can investigate the issue on a deeper, increasingly-micro level.

MATCH (a)<-[:ORIGIN]-(f)-[:DESTINATION|DIVERTED_TO]->(b), (f)-[r:DELAYED_BY]->(d)
WHERE a.name="Los Angeles International Airport" AND b.name="O'Hare International Airport"
      AND r.time > 10 AND d.name="Late Aircraft"
WITH f, r.time AS latedelay
RETURN f.flight_number AS Flight, latedelay AS `Delay Time Due to Late Arrival`

How does seasonality affect departure taxi times at Chicago’s O’Hare International Airport (ORD)?

The effects of seasonal travel are huge in the airline industry. People travel more during the summer. Weather delays occur more frequently during the winter. A flight planner should be very aware of any and all seasonal trends.

MATCH (a)<-[r:ORIGIN]-(f)
WHERE a.name="O'Hare International Airport"
WITH f.month AS Month, ROUND(AVG(r.taxi_time)) AS avg
ORDER BY avg desc
RETURN Month, avg AS `Average Departure Taxi Time`

What is the standard deviation of arrival taxi times at Dallas/Fort Worth (DFW)?

The standard deviation of a parameter can tell us about the variability, and thus the reliability, of a sample estimate of an unknown population parameter (\( \mu \)). In the first application, we found that the sample mean of arrival taxi times at DFW is roughly 8 minutes. But what is the variability (or reliability) of this sample estimate?

MATCH (f)-[r:DESTINATION|DIVERTED_TO]->(a)
WHERE a.name="Dallas/Fort Worth International Airport" AND HAS(r.taxi_time)
RETURN STDEV(r.taxi_time) AS `Standard Deviation`, COUNT(*) as N

With this result, let’s summarize what we know. The sample mean of arrival taxi times at DFW is 8 minutes, and we just calculated that the standard deviation is 3.37 minutes. We also asked the query for the number of observations, N.

\(\large \bar x = 8\)

\(\large \sigma = 3.37\)

\(\large N = 12\)

Using these, we can calculate the 95% confidence interval. The 95% confidence interval tells us the range in which we believe the true population parameter value lies, with 95% confidence. The narrower this range, the better; that means we know more about the true population parameter. Calculating this range is simple:

\( \large \bar x \pm 1.96 \cdot \frac{\sigma}{\sqrt N} \)       [1]

Substituting in our values, we can find the 95% confidence interval of \( \mu \):

\( \large 8 \pm 1.96 \cdot \frac{3.37}{\sqrt 12} = [6.09, 9.91] \)

We now say, with 95% confidence, that we believe the true population parameter value, or the true mean of arrival taxi times at DFW, lies between 6.09 and 9.91 minutes. Using standard deviations and confidence intervals, we now know more about the variability and reliability of our sample parameter estimate of 8 minutes calculated earlier.


Visualizations

Flights Between Dallas/Fort Worth (DFW) and Atlanta (ATL)

A simple visualization of flights between DFW and ATL (both ways) shows that one (1) out of the eight (8) flights was delayed. It can be seen that flight 1332 has a DELAYED_BY relationship with both Late Aircraft and NAS, meaning it was delayed by a late-arriving aircraft and the National Aviation System.

It can also be seen that all flights to / from Atlanta originated in / departed to Dallas.

MATCH x = (a)<-[:ORIGIN|DESTINATION|DIVERTED_TO]-()-[:ORIGIN|DESTINATION|DIVERTED_TO]->(b)
WHERE a.name="Dallas/Fort Worth International Airport" AND b.name="Hartsfield-Jackson Atlanta International Airport"
RETURN x

Los Angeles (LAX) Outbound Flights

A quick visualization of LAX outbound flights shows that there are no direct flights from Los Angeles to Atlanta.

MATCH x = (a)<-[:ORIGIN]-()-[:DESTINATION|DIVERTED_TO]->()
WHERE a.name="Los Angeles International Airport"
RETURN x

Console

Enter your own query here.


Moving Forward

I am implementing the full version of this database for a class project, including all variables across all flights dating back to 1987. I plan to analyze carrier performance, specifically with attention to the Hub-And-Spoke versus Point-To-Point models of air travel.

Created by Nicole White:


1. See here for an explanation of 1.96.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment