scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Bus travel time prediction: a log-normal auto-regressive (AR) modelling approach

TL;DR: This study proposes two predictive modelling methodologies using the concepts of time series analysis, namely (a) classical seasonal AR model with possible integrating non-stationary effects and (b) linear non- stationary AR approach, a novel technique exploiting the notion of partial correlation for learning from data to predict arrival time of buses efficiently.
Abstract: Accurate prediction of arrival time of buses is still a challenging problem in dynamically varying traffic conditions especially under Indian traffic conditions. The present study proposes two pred...
Citations
More filters
Journal ArticleDOI
16 Dec 2020-PLOS ONE
TL;DR: A desktop-based application that predicts traffic congestion state using Estimated Time of Arrival (ETA) and the results demonstrate that the random forest classification algorithm has the highest prediction accuracy of 92 percent followed by XGBoost and KNN respectively.
Abstract: With the rapid expansion of sensor technologies and wireless network infrastructure, research and development of traffic associated applications, such as real-time traffic maps, on-demand travel route reference and traffic forecasting are gaining much more attention than ever before. In this paper, we elaborate on our traffic prediction application, which is based on traffic data collected through Google Map API. Our application is a desktop-based application that predicts traffic congestion state using Estimated Time of Arrival (ETA). In addition to ETA, the prediction system takes into account various features such as weather, time period, special conditions, holidays, etc. The label of the classifier is identified as one of the five traffic states i.e. smooth, slightly congested, congested, highly congested or blockage. The results demonstrate that the random forest classification algorithm has the highest prediction accuracy of 92 percent followed by XGBoost and KNN respectively.

19 citations

Journal ArticleDOI
TL;DR: The Random Forest ensemble model constructed based on the bagging method has the best prediction accuracy among the three ensemble models; and in terms of the five sub-models, the prediction accuracy of LR is better than that of the other four models.
Abstract: The prediction of bus single-trip time is essential for passenger travel decision-making and bus scheduling. Since many factors could influence bus operations, the accurate prediction of the bus single-trip time faces a great challenge. Moreover, bus single-trip time has obvious nonlinear and seasonal characteristics. Hence, in order to improve the accuracy of bus single-trip time prediction, five prediction algorithms including LSTM (Long Short-term Memory), LR (Linear Regression), KNN (K-Nearest Neighbor), XGBoost (Extreme Gradient Boosting), and GRU (Gate Recurrent Unit) are used and examined as the base models, and three ensemble models are further constructed by using various ensemble methods including Random Forest (bagging), AdaBoost (boosting), and Linear Regression (stacking). A data-driven bus single-trip time prediction framework is then proposed, which consists of three phases including traffic data analysis, feature extraction, and ensemble model prediction. Finally, the data features and the proposed ensembled models are analyzed using real-world datasets that are collected from the Beijing Transportation Operations Coordination Center (TOCC). Through comparing the predicting results, the following conclusions are drawn: (1) the accuracy of predicting by using the three ensemble models constructed is better than the corresponding prediction results by using the five sub-models; (2) the Random Forest ensemble model constructed based on the bagging method has the best prediction accuracy among the three ensemble models; and (3) in terms of the five sub-models, the prediction accuracy of LR is better than that of the other four models.

2 citations

Proceedings ArticleDOI
01 Jul 2020
TL;DR: The central idea of the method is to recast the dynamic prediction problem as a value-function prediction problem under a suitably constructed Markov reward process (MRP) and explore a family of value- function predictors using temporal-difference (TD) learning for bus prediction.
Abstract: Public transport buses suffer travel time uncertainties owing to diverse factors such as dwell times at bus stops, signals, seasonal variations and fluctuating travel demands etc. Traffic in the developing world in particular is afflicted by additional factors like lack of lane discipline, diverse modes of transport and excess vehicles. The bus travel time prediction problem on account of these factors continues to remain a demanding problem especially in developing countries. The current work proposes a method to address bus travel time prediction in real-time. The central idea of our method is to recast the dynamic prediction problem as a value-function prediction problem under a suitably constructed Markov reward process (MRP). Once recast as an MRP, we explore a family of value-function predictors using temporal-difference (TD) learning for bus prediction. Existing approaches build supervised models either by (a)training based on travel time targets only between successive bus-stops while keeping the no. of models linear in the number of bus-stops OR (b)training a single model which predicts between any two bus-stops while ignoring the huge variation in the travel-time targets during training. Our TD-based approach attempts to strike an optimal balance between the above two class of approaches by training with travel-time targets between any two bus-stops while keeping the number of models (approximately) linear in the number of bus-stops. It also keeps a check on the variation in the travel-time targets. Our extensive experimental results vindicate the efficacy of the proposed method. The method exhibits comparable or superior prediction performance on mid-length and long-length routes compared to the state-of-the art.

2 citations


Cites background from "Bus travel time prediction: a log-n..."

  • ...Given all these various factors, BATP is still an active research problem in general [1] and in particular under mixed traffic conditions [2]....

    [...]

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a novel Burr mixture autoregressive (BMAR) model for the intermediate-to-long term period of bus section travel time prediction, which is useful for bus service and schedule planning.
Abstract: Travel time is an essential indicator for trip planning, transportation service planning, and operation. This study aims to propose a novel Burr mixture autoregressive (BMAR) model for the intermediate-to-long term period of bus section travel time prediction, which is useful for bus service and schedule planning. The BMAR model exhibits greater flexibility, allowing it to effectively capture the multi-peak and non-peak, non-linear with heteroscedasticity characteristics of travel time. The model is trained and well-validated with 6-month bus section travel time data collected via the automatic vehicle location system. Results show that the BMAR model gives promising results in travel time point and interval prediction, especially for the higher degree of variability and irregular pattern of travel time observed on urban roads and highways. The reliability ratio index derived from the BMAR model could be used to measure the bus service reliability and aid in the bus scheduling of maintenance activities.
Journal ArticleDOI
TL;DR: In this paper , the authors used Interaction Networks (INs) to model the interactions between transit speed, dwell time, and traffic speed for arrival time prediction, and the proposed method only uses limited historical data and can predict arrival time at stops along a transit route.
Abstract: Accurate transit arrival time prediction is a critical factor for improving the quality of transit services. A significant factor that can affect transit arrival times is the other vehicles traveling on transit routes, as transit arrival time is significantly affected by traffic conditions. However, few previous studies considered traffic condition impacts in transit arrival time prediction due to a lack of real-time traffic data, reducing the accuracy in predictions under varying traffic conditions. To fill this research gap, crowdsourced speed data with wide coverage is utilized to indicate traffic conditions and to predict transit arrival time in conjunction with General Transit Feed Specification (GTFS) data. Interaction Networks (INs) are employed to model the interactions between transit speed, dwell time, and traffic speed for arrival time prediction. The proposed method only uses limited historical data and can predict arrival time at stops along a transit route. Ten days of data were collected from bus route #4 in Tucson, Arizona to evaluate the performance of the proposed method. The evaluation results show the average mean absolute percentage error (MAPE) of predicted arrival time on weekdays and weekends is 13.5% and 14%, respectively, indicating that the proposed method is promising for predicting transit arrival time while considering real-time traffic conditions. Furthermore, six traditional methods are compared with the proposed method, and the comparison results show the proposed method outperforms the other six methods.
References
More filters
BookDOI
01 Dec 2010
TL;DR: A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods.
Abstract: A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods The emphasis is on presenting practical problems and full analyses of real data sets

18,346 citations

Journal ArticleDOI
TL;DR: It is believed that the use of a subset ARIMA model could increase the accuracy of the short-term forecasting task within time-series models.
Abstract: Traffic volume is one of the fundamental types of data that have been used for the traffic control and planning process. Forecasting needs and efforts for various applications will be increased with the deployment of advanced traffic management systems. With the importance of the short-term traffic forecasting task, numerous techniques have been utilized to improve its accuracy. The use of the subset autoregressive integrated moving average (ARIMA) model for short-term traffic volume forecasting is investigated. A typical time-series modeling procedure was employed for this study. Model identification was carried out with Akaike's information criterion. The conditional maximum likelihood method was used for the parameter estimation process. Two white noise tests were applied for model verification. From the analysis results, four time-series models in different categories were identified and used for the one-step-ahead forecasting task. The performance of each model was evaluated using two statistical error estimates. Results showed that all time-series models performed well with reasonable accuracy. However, it was observed that the subset ARIMA model gave more stable and accurate results than other time-series models, especially a full ARIMA model. It is believed that the use of a subset ARIMA model could increase the accuracy of the short-term forecasting task within time-series models.

447 citations

Journal ArticleDOI
TL;DR: In this paper, the role of conditional correlation and conditional covariance as measures of conditional independence of two random variables is investigated and a necessary and sufficient condition for the coincidence of the partial covariance with the conditional correlation is derived.
Abstract: Summary This paper investigates the roles of partial correlation and conditional correlation as measures of the conditional independence of two random variables. It first establishes a sufficient condition for the coincidence of the partial correlation with the conditional correlation. The condition is satisfied not only for multivariate normal but also for elliptical, multivariate hypergeometric, multivariate negative hypergeometric, multinomial and Dirichlet distributions. Such families of distributions are characterized by a semigroup property as a parametric family of distributions. A necessary and sufficient condition for the coincidence of the partial covariance with the conditional covariance is also derived. However, a known family of multivariate distributions which satisfies this condition cannot be found, except for the multivariate normal. The paper also shows that conditional independence has no close ties with zero partial correlation except in the case of the multivariate normal distribution; it has rather close ties to the zero conditional correlation. It shows that the equivalence between zero conditional covariance and conditional independence for normal variables is retained by any monotone transformation of each variable. The results suggest that care must be taken when using such correlations as measures of conditional independence unless the joint distribution is known to be normal. Otherwise a new concept of conditional independence may need to be introduced in place of conditional independence through zero conditional correlation or other statistics.

429 citations


"Bus travel time prediction: a log-n..." refers background in this paper

  • ...Interestingly, for multi-variate Gaussian distributions, it turns out that the conditional independence between A and B given C holds if and only if the associated PC between A and B given C is 0 [78], as stated next....

    [...]

Journal ArticleDOI
TL;DR: In this article, the mean and standard deviation of a normal distribution from a sample which is censored has been considered by Sarhan and Greenberg [1], who obtained coefficients for best linear systematic statistics.
Abstract: 0. Summary. Estimators of mean and standard deviation for censored normal samples which are based on linear systematic statistics and which use simple coefficients are almost as efficient as estimators using the best possible coefficients. Estimators are given for samples of size N < 20 for censoring at one extreme and for several types of censoring at both extremes. 1. Introduction. A censored sample is a sample lacking one or more observations at either or both extremes with the number and positions of the missing observations known. Censoring may take place naturally i.e., an observation has a magnitude known only to be more extreme than the other observations in the sample. Censoring may also be imposed by the experimenter who from past experience knows that extreme observations are so unreliable that their magnitudes should not be used as observed. The experimenter may impose censoring to reduce the duration of an experiment and obtain estimates before the extreme cases are determined. Estimation of the mean and standard deviation of a normal distribution from a sample which is censored has been considered by Sarhan and Greenberg [1], who obtained coefficients for best linear systematic statistics. They also record efficiencies of these estimators compared to the case of no censoring. Winsor [4} and perhaps others have suggested using for the magnitude of an extreme, poorly known, or unknown observation the magnitude of the next largest (or smallest) observation. We shall show that when symmetry is maintained (or proper adjustment is made) this practice results in estimators of the mean whose efficiencies are scarcely distinguishable from those of best linear estimators. For non-symmetrical censoring, it is demonstrated that optimum simple estimators of the mean result from these "Winsorized" estimators. Also presented are estimators of the standard deviations using one or two ranlges (not necessarily symmetrical) which have efficiency .94 or greater when compared with the best linear systematic statistics. The variances of the proposed estimators were computed from an original 21 decimal tabulation of the means variances and covariances of the order statistics made available by Dan Teichroew. These tables are described in reference [5]. The efficiencies are the ratios of variances of corresponding estimators givenl by Sarhan and Greenberg [1]. 2. Symmetrical censoring. Estimation of mean. If natural or imposed censoring of the sample results in the same number of observations censored from each extreme of the sample the practice of using for each missing observation the magnitude of its nearest neighbor whose magnitude is known has a minimum

365 citations

Journal ArticleDOI
TL;DR: Two artificial neural networks, trained by link-based and stop-based data, are applied to predict transit arrival times and show that the enhanced ANNs outperform the ones without integration of the adaptive algorithm.
Abstract: Transit operations are interrupted frequently by stochastic variations in traffic and ridership conditions that deteriorate schedule or headway adherence and thus lengthen passenger wait times. Providing passengers with accurate vehicle arrival information through advanced traveler information systems is vital to reducing wait time. Two artificial neural networks (ANNs), trained by link-based and stop-based data, are applied to predict transit arrival times. To improve prediction accuracy, both are integrated with an adaptive algorithm to adapt to the prediction error in real time. The bus arrival times predicted by the ANNs are assessed with the microscopic simulation model CORSIM, which has been calibrated and validated with real-world data collected from route number 39 of the New Jersey Transit Corporation. Results show that the enhanced ANNs outperform the ones without integration of the adaptive algorithm.

348 citations


"Bus travel time prediction: a log-n..." refers background in this paper

  • ...Machine learning techniques such as Artificial Neural Network (ANN) and Support Vector Machine (SVM) are some of the most commonly reported prediction techniques for travel time prediction because of their ability to solve complex relationships [31, 32]....

    [...]