scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Cloud based flight delay prediction using logistic regression

Rahul Nigam1, K. Govinda1
01 Dec 2017-
TL;DR: This paper utilises the method of logistic regression which is a supervised learning method to predict delay in departure times of aircraft and uses the Microsoft Azure Learning Studio platform for utilizing machine learning for training and testing the model on the cloud.
Abstract: In the modern world, airlines play a vital role for transporting people and goods on time. Any delay in the timings of these flights can adversely affect the work and business of thousands of people at any given moment. Forecasting these delays is very important during the planning process in commercial airlines. Several techniques have already been proposed for designing models to forecast the delay in departure time of aircraft. But because of the continuously increasing complexity of the airplane transportation and the amount of data related to it, designing accurate prediction methods has become very difficult. In this paper we utilise the method of logistic regression which is a supervised learning method to predict delay in departure times of aircraft. We utilise the Microsoft Azure Learning Studio platform which is an Integrated Development Environment for utilizing machine learning for training and testing the model on the cloud. We also join weather data such as temperature, humidity, precipitation, dew point along with the airport data to derive more accurate predictions as well as find out the effect of weather changes in flight delays. Our method was able to achieve about 80 percent accuracy in predicting whether a given aircraft would be delayed or not based on the training using past data.
Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a taxonomy of data science techniques used for investigating flight delay studies, and offers a systematic literature review that describes the trends of the field and methods to analyse the applicability of newly proposed methods.

32 citations

Proceedings ArticleDOI
01 Jun 2022
TL;DR: The paper complied and structured data to construct a correlation model, which takes into account all the statistics and calculations, showing the predicted number of delays or cancellations of flights and their actual number.
Abstract: The paper complied and structured data to construct a correlation model. The reasons for flight delays and cancellations between 2009 from 2019 are examined, based on the statistical electronic source "Bureau of Transportation Statistics". The normalized data are calculated from the primary data. Based on the data collected, a primary histogram was constructed (outliers were not checked). In addition, significant and weak influencing factors have been identified for the constructing of a correlation table, this table helps to identify an additional factor influencing a flight delay or cancellation. Based on the data from the table, a new histogram was built and a correlation table was constructed, allowing regression analysis to be started. At the beginning of the regression analysis, regression statistics are used and an analysis of variance is performed, then the regression analysis is performed directly and the results of the regression are summed up. Based on the summed-up results, a histogram was built, which takes into account all the statistics and calculations, showing the predicted number of delays or cancellations of flights and their actual number.

13 citations

Book ChapterDOI
01 Jul 2021
TL;DR: Wang et al. as discussed by the authors proposed a distributed and improved grasshopper optimization algorithm based on Spark to optimize the classification model of random forest parameters (SPGOA-RF) for flight delay prediction.
Abstract: Flight delay prediction can improve the quality of airline services, help air traffic control agencies to develop more accurate flight plans. This paper proposes a distributed and improved grasshopper optimization algorithm based on Spark to optimize the classification model of random forest parameters (SPGOA-RF) for flight delay prediction. The SPGOA-RF uses the method of adaptive chaotic descent which based on Logistic mapping and Sigmoid curve to enhance the randomness of the grasshopper optimization algorithm, thereby improve the early exploration and later optimization capabilities of the algorithm and accelerate the speed of convergence. The improved grasshopper optimization algorithm is used to adjust the random forest parameters to obtain a better performance classification model. In addition, the Spark platform is used to implement a distributed grasshopper optimization algorithm training model to effectively improve its operating efficiency. The results of simulation experiment prove that in comparison to the unoptimized algorithm, the SPGOA-RF flight delay prediction accuracy rate could achieve to 89.17%.

2 citations

DOI
01 Jan 2022
TL;DR: In this article, three machine learning algorithms, namely linear regression, logistic regression, and clustering, were used to predict the temperature of a certain day in the upcoming calendar by date or year.
Abstract: This paper clearly illustrates the working of different machine learning algorithms to determine the weather conditions. This involves the prediction of temperature by training with the pre-existing dataset of weather conditions on each day for around 40 years. This trained data is tested to evaluate the temperature of a certain day in the upcoming calendar by date or year. This illustration describes the comparison between different algorithm results and determines the most efficient algorithm. The algorithm involved were Linear Regression, Logistic Regression, and Clustering. These three algorithms involve different mechanisms such as predicting based on mean, probability, and grouping based on similar constraints. The model helps to select the most efficient algorithm which gives the approximate values nearer to accurate values. Though all the techniques involved in previous analysis are mostly based on mean analysis the result is almost approximate but under logistic regression, it either gives almost the accurate result or the wrong result. Here we introduce clustering since the date or year could be grouped under a certain condition where either based on the temperature of a certain year or the season.
Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a flight delay prediction model based on the lightweight network ECA-MobileNetV3 algorithm, which first preprocesses the data with real flight information and weather information.
Abstract: In exploring the flight delay problem, traditional deep learning algorithms suffer from low accuracy and extreme computational complexity; therefore, the deep flight delay prediction algorithm is difficult to directly deploy to the mobile terminal. In this paper, a flight delay prediction model based on the lightweight network ECA-MobileNetV3 algorithm is proposed. The algorithm first preprocesses the data with real flight information and weather information. Then, in order to increase the accuracy of the model without increasing the computational complexity too much, feature extraction is performed using the lightweight ECA-MobileNetV3 algorithm with the addition of the Efficient Channel Attention mechanism. Finally, the flight delay classification prediction level is output via a Softmax classifier. In the experiments of single airport and airport cluster datasets, the optimal accuracy of the ECA-MobileNetV3 algorithm is 98.97% and 96.81%, the number of parameters is 0.33 million and 0.55 million, and the computational volume is 32.80 million and 60.44 million, respectively, which are better than the performance of the MobileNetV3 algorithm under the same conditions. The improved model can achieve a better balance between accuracy and computational complexity, which is more conducive mobility.
References
More filters
Journal ArticleDOI
TL;DR: The proposed genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data is elitist which maintains the monotonic convergence property of the EM algorithm.
Abstract: We propose a genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data. This algorithm is capable of selecting the number of components of the model using the minimum description length (MDL) criterion. Our approach benefits from the properties of genetic algorithms (GA) and the EM algorithm by combination of both into a single procedure. The population-based stochastic search of the GA explores the search space more thoroughly than the EM method. Therefore, our algorithm enables escaping from local optimal solutions since the algorithm becomes less sensitive to its initialization. The GA-EM algorithm is elitist which maintains the monotonic convergence property of the EM algorithm. The experiments on simulated and real data show that the GA-EM outperforms the EM method since: (1) we have obtained a better MDL score while using exactly the same termination condition for both algorithms; (2) our approach identifies the number of components which were used to generate the underlying data more often than the EM algorithm.

239 citations


"Cloud based flight delay prediction..." refers methods in this paper

  • ...By using the principles of Genetic Algorithms [2] the paper also provides an algorithm that takes in account global solution of the EM algorithm to solve the challenge of local solutions in the distribution....

    [...]

  • ...By using the principles of Genetic Algorithms [2] the paper also provides an algorithm...

    [...]

Proceedings ArticleDOI
01 Oct 2002
TL;DR: This paper analyzes departure and arrival data for ten major airports in the United States that experience large volumes of traffic and significant delays and shows that departure delay is better modeled using a Poisson distribution, while the enroute and arrival delays fit the Normal distribution better.
Abstract: The increase in delays in the National Airspace System (NAS) has been the subject of several studies in recent years. These reports contain delay statistics over the entire NAS, along with some data specific to individual airports, however, a comprehensive characterization and comparison of the delay distributions is absent. Historical d elay data for these airports are summarized. The various causal factors related to aircraft, airline operations, change of procedures and traffic volume are also discussed. Motivated by the desire to improve the accuracy of demand prediction in enroute se ctors and at airports through probabilistic delay forecasting, this paper analyzes departure and arrival data for ten major airports in the United States that experience large volumes of traffic and significant delays. To enable such an analysis, several d ata fields for every aircraft departing from or arriving at these ten airports in a 21 day period were extracted from the Post Operations Evaluation Tool (POET) database. Distributions that show the probability of a certain delay time for a given aircraft were created. These delay -time probability density functions were modeled using Normal and Poisson distributions with the mean and standard deviations derived from the raw data. The models were then improved by adjusting the mean and standard deviation val ues via a least squares method designed to minimize the fit error between the raw distribution and the model. It is shown that departure delay is better modeled using a Poisson distribution, while the enroute and arrival delays fit the Normal distribution better. Finally, correlation between the number of departures, number of arrivals and departure delays is examined from a time -series modeling perspective.

199 citations

Journal ArticleDOI
TL;DR: In this paper, when airport capacity is reduced below demand, the on-time arrivals can be improved by reducing the number of flights to be serviced by the airline when the capacity is below demand.
Abstract: Airline schedules are based on the carefully-planned use of resources, airports, planes, crews, etc., to provide passengers with on-time arrivals. When airport capacity is reduced below demand, the...

181 citations

Journal ArticleDOI
TL;DR: A taxonomy is proposed and the initiatives used to address the flight delay prediction problem are summarized, according to scope, data, and computational methods, giving particular attention to an increased usage of machine learning methods.
Abstract: Flight delays hurt airlines, airports, and passengers. Their prediction is crucial during the decision-making process for all players of commercial aviation. Moreover, the development of accurate prediction models for flight delays became cumbersome due to the complexity of air transportation system, the number of methods for prediction, and the deluge of flight data. In this context, this paper presents a thorough literature review of approaches used to build flight delay prediction models from the Data Science perspective. We propose a taxonomy and summarize the initiatives used to address the flight delay prediction problem, according to scope, data, and computational methods, giving particular attention to an increased usage of machine learning methods. Besides, we also present a timeline of significant works that depicts relationships between flight delay prediction problems and research trends to address them. The published version of this paper is made available at \url{this https URL}. Please cite as: L. Carvalho, A. Sternberg, L. Maia Goncalves, A. Beatriz Cruz, J.A. Soares, D. Brandao, D. Carvalho, e E. Ogasawara, 2020, On the relevance of data science for flight delay research: a systematic review, Transport Reviews

68 citations

Proceedings ArticleDOI
21 Dec 2008
TL;DR: A new method to alarm large scale of flight delays based on machine learning is presented, which synthesizes more factors to do alarm and performs will be more practical value than recent ones.
Abstract: A new method to alarm large scale of flight delays based on machine learning is presented in this paper. This new method first does unsupervised learning on the data of the flights collected from the airport. The standard of each class of delay can be gotten after the learning process. With these classes of delay, the supervised learning method can be used on the data so that the alarm model could be built. Comparing with the recent manual alarm standard, this model synthesizes more factors to do alarm. Since the recent delay standard is only related to the number of flights, which is helpful only in serious delay case, the new model performs will be more practical value than recent ones.

39 citations


"Cloud based flight delay prediction..." refers methods in this paper

  • ...Wang et al [8] propose a method which utilises artificial neural networks and machine learning to determine an airport’s traffic....

    [...]