

keep scrolling down
Team
Presents...
Through the Master of Applied Data Science program at the
...
Flight Delays and Propagation Effects
A Capstone Project
Problem: Delays are frustrating and costly to passengers, airlines and airports.
We thought...
Wouldn't it be great to predict flight delays?
and wondered...
How do delays spread across flights?
Flight delays aren't a new problem...we wanted to see what researchers had done in this space.
The team reviewed several research papers using both machine learning and network analysis approaches
These inspired our choices of algorithms and network analysis approaches for flight delay propagation.
Next, we reached out to industry experts using qualitative research methods.
We interviewed an airline analyst and a former pilot using a technique called semi-structured interviews.
We extracted quotes, paraphrased statements and our own insights and used an Affinity Wall identifying these 3 broad insights...
#1: Carriers and large hub airports share a complex partnership that seek logistical efficiencies but can potentially amplify delay issues when they occur.
#2: The FAA / Air Traffic Control emphasize safety. This can lead to more (and in some cases unnecessary) delays.
#3: Weather emerged as a significant challenge affecting both pilots' decision-making and air travel operations. Bad weather is a catalyst for many types of problems.
Given the rich stories provided by our interviewees, abundance of free data, and our own delay experiences, we decided to focus on modeling weather delays and its propagating effects.
Almost a decade's worth of flight, weather and aircraft data were gather, cleaned and joined.
We aggregated delays and looked at airports with the most frequent delays.
We found that 18% of the flights in our dataset experienced a departure delay.
We also looked at the percentage of delays for the top 100 most flown routes across the country.
Dallas/Fort Worth International Airport was our focus due to its importance in the domestic US Airport network and its high volume of flights (ontime and delayed).
Our Machine Learning Classification Task: predict whether flights leaving DFW would experience a weather delay
We developed 4 classifiers: Logistic Regression, Random Forest, Gradient Boosting Classifier and a Neural Network classifier
We ran a couple of iterations on our models, accounting for the imbalanced in weather delays, weather variables with low or redundant information and other improvements
Our models didn't do an exceptional job predicting departure delays...
...however they placed high importance on:
- air temperature
- dew point
- base height of cloud coverage
Next, we used a Susceptible-Infected-Recovered-Susceptible (SIRS) Infection model.
This model simulates the spread of infection and recovery and potential infection as nodes in a network.
Each aircraft was tracked via tail number. We treated flights delayed out of DFW as Infected nodes.
Each subsequent flight flown by that aircaft became 'infected' if that flight was delayed
On average, delays propagated around 55.78 percent out of all subsequent flights on a given day for the affected tail numbers.
We saw a higher rate of delay propagation and subsequently faster recovery during the Winter Season.
Broad Impacts
Thanks for flying!