Effects of Clustering Feature Vectors on Bus Travel Time Prediction: A Case Study
TL;DR: In this article, the authors analyzed the use of different feature vectors for clustering and the effect on travel time predictions and showed that the prediction accuracy is highest when only travel times are used as a clustering feature vector.
Abstract: Improving the accuracy of travel time predictions depends on providing the correct inputs as well as the prediction algorithm used. Clustering algorithms can be used to identify the patterns in the data, which can improve the inputs to the prediction algorithm. The feature vectors used for clustering greatly affect the clusters formed and, ultimately, the prediction performance. Clustering being an unsupervised learning technique, the accuracy or correctness of the cluster formed can not be evaluated directly. A possible solution for this would be to link the problem with prediction accuracy and choose the feature vector combination with maximum prediction accuracy. The present study analyses the use of different feature vectors for clustering and the effect on travel time predictions. Here, three cases, namely, travel time alone, travel time along with features such as time of the day, section index, and day of the week as numerical features and as a mix of categorical and numerical feature vectors, are studied. The effects of using each of these cases as clustering feature vectors on travel time predictions are evaluated. It is observed that the prediction accuracy is the highest when only travel times are used as a clustering feature vector. The study demonstrates the importance of choosing the correct feature vectors for clustering and its effect on a final application, namely, travel time prediction.
Related Papers (5)