摘要:Flight delay has been one of the major issues in the airline industry. A study by Frankfurt-based consulting company 'Aviation Experts', presented that costs of $25 billion were incurred in 2014 due to flight delays worldwide. Domestic flight delays have an indirect negative impact on the US economy, reducing the US gross domestic product (GDP) by $4 billion [1]. This project investigates the significant factors responsible for flight delays in the year 2016. The data set extracted from Bureau of Transportation Statistics (BTS) [2] containing one million instances each having 8 attributes is used for the analysis. We describe a predictive modeling engine using machine learning techniques and statistical models to identify delays in advance. The data set is cleaned and imputed and techniques such as decision trees, random forest and multiple linear regressions are used. We attempt to put forth a solution to the delay losses incurred by the airline industry by identifying the critical parameters responsible for flight delay. Not only airlines incur a huge amount of cost per year, airport authorities and its operations are also affected adversely. This leads to inconvenience to the travelers. Predictive modeling developed in this study can lead to better management decisions allowing for effective flight scheduling. In addition, the highlighted significant factors can give an insight into the root cause of aircraft delays.
关键词:Decision Trees;Machine Learning Techniques;Multiple Linear Regression;Predictive Modeling;Random Forest;Flight Delay