A citywide and real-time model for estimating the travel time of any path (represented as a sequence of connected road segments) in real time in a city, based on the GPS trajectories of vehicles received in current time slots and over a period of history as well as map data sources is proposed.
Abstract:
In this paper, we propose a citywide and real-time model for estimating the travel time of any path (represented as a sequence of connected road segments) in real time in a city, based on the GPS trajectories of vehicles received in current time slots and over a period of history as well as map data sources. Though this is a strategically important task in many traffic monitoring and routing systems, the problem has not been well solved yet given the following three challenges. The first is the data sparsity problem, i.e., many road segments may not be traveled by any GPS-equipped vehicles in present time slot. In most cases, we cannot find a trajectory exactly traversing a query path either. Second, for the fragment of a path with trajectories, they are multiple ways of using (or combining) the trajectories to estimate the corresponding travel time. Finding an optimal combination is a challenging problem, subject to a tradeoff between the length of a path and the number of trajectories traversing the path (i.e., support). Third, we need to instantly answer users' queries which may occur in any part of a given city. This calls for an efficient, scalable and effective solution that can enable a citywide and real-time travel time estimation. To address these challenges, we model different drivers' travel times on different road segments in different time slots with a three dimension tensor. Combined with geospatial, temporal and historical contexts learned from trajectories and map data, we fill in the tensor's missing values through a context-aware tensor decomposition approach. We then devise and prove an object function to model the aforementioned tradeoff, with which we find the most optimal concatenation of trajectories for an estimate through a dynamic programming solution. In addition, we propose using frequent trajectory patterns (mined from historical trajectories) to scale down the candidates of concatenation and a suffix-tree-based index to manage the trajectories received in the present time slot. We evaluate our method based on extensive experiments, using GPS trajectories generated by more than 32,000 taxis over a period of two months. The results demonstrate the effectiveness, efficiency and scalability of our method beyond baseline approaches.
TL;DR: The concept of urban computing is introduced, discussing its general framework and key challenges from the perspective of computer sciences, and the typical technologies that are needed in urban computing are summarized into four folds.
TL;DR: A systematic survey on the major research into trajectory data mining, providing a panorama of the field as well as the scope of its research topics, and introduces the methods that transform trajectories into other data formats, such as graphs, matrices, and tensors.
TL;DR: Experimental results showed that the proposed Correlation Matrix kNN (CM-kNN) classification was more accurate and efficient than existing kNN methods in data-mining applications, such as classification, regression, and missing data imputation.
TL;DR: High-level principles of each category of methods are introduced, and examples in which these techniques are used to handle real big data problems are given, to help a wide range of communities find a solution for data fusion in big data projects.
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
TL;DR: This study proposes a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develops an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
TL;DR: The concept of urban computing is introduced, discussing its general framework and key challenges from the perspective of computer sciences, and the typical technologies that are needed in urban computing are summarized into four folds.
TL;DR: CarTel has been deployed on six cars, running on a small scale in Boston and Seattle for over a year, and has been used to analyze commute times, analyze metropolitan Wi-Fi deployments, and for automotive diagnostics.
Q1. What have the authors contributed in "Travel time estimation of a path using sparse trajectories" ?
In this paper, the authors propose a citywide and real-time model for estimating the travel time of any path ( represented as a sequence of connected road segments ) in real time in a city, based on the GPS trajectories of vehicles received in current time slots and over a period of history as well as map data sources. Though this is a strategically important task in many traffic monitoring and routing systems, the problem has not been well solved yet given the following three challenges. The authors then devise and prove an object function to model the aforementioned tradeoff, with which they find the most optimal concatenation of trajectories for an estimate through a dynamic programming solution. In addition, the authors propose using frequent trajectory patterns ( mined from historical trajectories ) to scale down the candidates of concatenation and a suffix-tree-based index to manage the trajectories received in the present time slot. The results demonstrate the effectiveness, efficiency and scalability of their method beyond baseline approaches. In most cases, the authors can not find a trajectory exactly traversing a query path either.
Q2. What future works have the authors mentioned in the paper "Travel time estimation of a path using sparse trajectories" ?
In the future, the authors plan to infer the travel time of a path for a particular driver. In addition, the authors would like to study the impact of other factors, such as weather conditions and air quality, on the travel time estimation of a path.
Q3. What is the main reason for the accuracy of the map-matching?
the map-matching for high sampling rate trajectories is more accurate than low sampling rate taxi trajectories, resulting in a more accurate estimation of the ground truth.
Q4. How does the model predict the travel time of a road segment?
When a vehicle passes through, the time interval for crossing two adjacent loop detectors is recorded, based on which the speed of the vehicle is inferred. [9, 14, 16] use various models to estimate the travel speed on an individual road segment based on the sensor readings from loop detectors, and then convert the speed into a travel time. [19] predicts the travel time of a road segment by applying support vector regression to its historical travel times.
Q5. How many segments are retrieved from the query paths?
The travel times of 58,223road segments (about 26.8% of the road segments in the query paths) are finally retrieved from for constructing the most optimal concatenation, i.e., 4.7 road segments per path.
Q6. How do the authors find the optimal concatenation of trajectories?
Using a dynamic programming solution, the authors find the most optimal concatenation of trajectories for estimating a path’s travel time.
Q7. What is the way to deal with the weakness of the individual road segment-based methods?
A possible approach to deal with the weakness of the individual road segment-based methods is to estimate the travel time of a path as a whole based on frequent trajectory patterns.
Q8. How long can the authors infer the travel time on each road segment for each particular driver?
In total, the authors can infer the travel time on each road segment for each particular driver within 6.4min if using 25 cores in a server.
Q9. How much time is the average error of the estimated travel time?
Given the queries introduced in Section 5.1.3, on average, the absolute error of the estimated travel time is about 2 minutes per path, which is about 19% of the true travel time.
Q10. Why is the length of a path collected in the study so long?
The major reason is the length of a path collected in the study is usually long (on average 8.78KM each), where their model has a better accuracy than a shorter path.
Q11. How do the authors calculate the travel time of a query path?
In the implementation, if not building an effective indexing structure, the authors need to scan a trajectory when calculating the travel time of a path based on the trajectory (i.e., Line 11 of Algorithm 2).