Trajectory Data Mining: An Overview

doi:10.1145/2743025

Home
/
Papers
/
Trajectory Data Mining: An Overview

Journal Article•DOI•

Trajectory Data Mining: An Overview

Yu Zheng¹•Institutions (1)

Microsoft¹

12 May 2015-ACM Transactions on Intelligent Systems and Technology (ACM)-Vol. 6, Iss: 3, pp 29

TL;DR: A systematic survey on the major research into trajectory data mining, providing a panorama of the field as well as the scope of its research topics, and introduces the methods that transform trajectories into other data formats, such as graphs, matrices, and tensors.

read less

Abstract: The advances in location-acquisition and mobile computing techniques have generated massive spatial trajectory data, which represent the mobility of a diversity of moving objects, such as people, vehicles, and animals. Many techniques have been proposed for processing, managing, and mining trajectory data in the past decade, fostering a broad range of applications. In this article, we conduct a systematic survey on the major research into trajectory data mining, providing a panorama of the field as well as the scope of its research topics. Following a road map from the derivation of trajectory data, to trajectory data preprocessing, to trajectory data management, and to a variety of mining tasks (such as trajectory pattern mining, outlier detection, and trajectory classification), the survey explores the connections, correlations, and differences among these existing techniques. This survey also introduces the methods that transform trajectories into other data formats, such as graphs, matrices, and tensors, to which more data mining and machine learning techniques can be applied. Finally, some public trajectory datasets are presented. This survey can help shape the field of trajectory data mining, providing a quick understanding of this field to the community.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Big data in tourism research: A literature review

[...]

Jingjing Li¹, Lizhi Xu², Ling Tang¹, Shouyang Wang³, Ling Li¹ - Show less +1 more•Institutions (3)

Beihang University¹, Beijing Union University², Chinese Academy of Sciences³

01 Oct 2018-Tourism Management

TL;DR: This paper might be the first attempt to present a comprehensive literature review on different types of big data in tourism research, and facilitates a thorough understanding of this sunrise research and offers valuable insights into its future prospects.

...read moreread less

585 citations

Journal Article•DOI•

Human motion trajectory prediction: a survey:

[...]

Andrey Rudenko¹, Andrey Rudenko², Luigi Palmieri², Michael Herman², Kris M. Kitani³, Dariu M. Gavrila⁴, Kai O. Arras² - Show less +3 more•Institutions (4)

Örebro University¹, Bosch², Carnegie Mellon University³, Delft University of Technology⁴

07 Jun 2020-The International Journal of Robotics Research

TL;DR: In this article, the ability of intelligent autonomous systems to perceive, understand, and anticipate human behavior becomes increasingly important in a growing number of intelligent systems in human environments, and the ability to do so is discussed.

...read moreread less

Abstract: With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand, and anticipate human behavior becomes increasingly important. Spec...

...read moreread less

547 citations

Journal Article•DOI•

Methodologies for Cross-Domain Data Fusion: An Overview

[...]

Yu Zheng¹•Institutions (1)

Microsoft¹

01 Mar 2015-IEEE Transactions on Big Data

TL;DR: High-level principles of each category of methods are introduced, and examples in which these techniques are used to handle real big data problems are given, to help a wide range of communities find a solution for data fusion in big data projects.

...read moreread less

Abstract: Traditional data mining usually deals with data from a single domain. In the big data era, we face a diversity of datasets from different sources in different domains. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. How to unlock the power of knowledge from multiple disparate (but potentially connected) datasets is paramount in big data research, essentially distinguishing big data from traditional data mining tasks. This calls for advanced techniques that can fuse knowledge from various datasets organically in a machine learning and data mining task. This paper summarizes the data fusion methodologies, classifying them into three categories: stage-based, feature level-based, and semantic meaning-based data fusion methods. The last category of data fusion methods is further divided into four groups: multi-view learning-based, similarity-based, probabilistic dependency-based, and transfer learning-based methods. These methods focus on knowledge fusion rather than schema mapping and data merging, significantly distinguishing between cross-domain data fusion and traditional data fusion studied in the database community. This paper does not only introduce high-level principles of each category of methods, but also give examples in which these techniques are used to handle real big data problems. In addition, this paper positions existing works in a framework, exploring the relationship and difference between different data fusion methods. This paper will help a wide range of communities find a solution for data fusion in big data projects.

...read moreread less

356 citations

Cites background from "Trajectory Data Mining: An Overview..."

...Index Terms—Big Data, cross-domain datamining, data fusion, multi-modality data representation, deep neural networks, multi-view learning, matrix factorization, probabilistic graphical models, transfer learning, urban computing Ç...
[...]

Journal Article•DOI•

Activity-Based Human Mobility Patterns Inferred from Mobile Phone Data: A Case Study of Singapore

[...]

Shan Jiang¹, Joseph Ferreira¹, Marta C. González¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 2017-IEEE Transactions on Big Data

TL;DR: This research provides an innovative data mining framework that synthesizes the state-of-the-art techniques in extracting mobility patterns from raw mobile phone CDR data, and design a pipeline that can translate the massive and passive mobile phone records to meaningful spatial human mobility patterns readily interpretable for urban and transportation planning purposes.

...read moreread less

Abstract: In this study, with Singapore as an example, we demonstrate how we can use mobile phone call detail record (CDR) data, which contains millions of anonymous users, to extract individual mobility networks comparable to the activity-based approach. Such an approach is widely used in the transportation planning practice to develop urban micro simulations of individual daily activities and travel; yet it depends highly on detailed travel survey data to capture individual activity-based behavior. We provide an innovative data mining framework that synthesizes the state-of-the-art techniques in extracting mobility patterns from raw mobile phone CDR data, and design a pipeline that can translate the massive and passive mobile phone records to meaningful spatial human mobility patterns readily interpretable for urban and transportation planning purposes. With growing ubiquitous mobile sensing, and shrinking labor and fiscal resources in the public sector globally, the method presented in this research can be used as a low-cost alternative for transportation and planning agencies to understand the human activity patterns in cities, and provide targeted plans for future sustainable development.

...read moreread less

351 citations

Journal Article•DOI•

Vehicular Social Networks: Enabling Smart Mobility

[...]

Zhaolong Ning¹, Feng Xia¹, Noor Ullah¹, Xiangjie Kong¹, Xiping Hu² - Show less +1 more•Institutions (2)

Dalian University of Technology¹, Chinese Academy of Sciences²

01 May 2017-IEEE Communications Magazine

TL;DR: An application scenario on trajectory data-analysis-based traffic anomaly detection for VSNs and several research challenges and open issues are highlighted and discussed.

...read moreread less

Abstract: Vehicular transportation is an essential part of modern cities. However, the ever increasing number of road accidents, traffic congestion, and other such issues become obstacles for the realization of smart cities. As the integration of the Internet of Vehicles and social networks, vehicular social networks (VSNs) are promising to solve the above-mentioned problems by enabling smart mobility in modern cities, which are likely to pave the way for sustainable development by promoting transportation efficiency. In this article, the definition of and a brief introduction to VSNs are presented first. Existing supporting communication technologies are then summarized. Furthermore, we introduce an application scenario on trajectory data-analysis-based traffic anomaly detection for VSNs. Finally, several research challenges and open issues are highlighted and discussed.

...read moreread less

286 citations

Cites background from "Trajectory Data Mining: An Overview..."

...Table 2 demonstrates the taxonomy of VSN applications, which can be further divided into social-data-driven vehicular networks, social vehicular ad hoc networks (VANETs), and data-driven social networks [4]....
[...]
...However, trajectories of vehicles are not perfectly accurate due to sensor noise and other reasons, for example, false positioning signals received in some urban areas [4]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Anomaly detection: A survey

[...]

Varun Chandola¹, Arindam Banerjee¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

30 Jul 2009-ACM Computing Surveys

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

...read moreread less

Abstract: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

...read moreread less

9,627 citations

"Trajectory Data Mining: An Overview..." refers background in this paper

...The regions whose log-likelihood ratio statistic value drops in the tail of χ2 distribution are likely to be anomalous [Chandola et al. 2009]....
[...]
...A survey on general anomaly detection methods can be found in [14]....
[...]
...The regions whose log-likelihood ratio statistic value drops in the tail of χ(2) distribution are likely to be anomalous [14]....
[...]

Journal Article•DOI•

Algorithms for the reduction of the number of points required to represent a digitized line or its caricature

[...]

David H. Douglas¹, Thomas K. Peucker²•Institutions (2)

Royal Military College of Canada¹, Simon Fraser University²

01 Dec 1973-Cartographica: The International Journal for Geographic Information and Geovisualization

TL;DR: In this paper, two algorithms to reduce the number of points required to represent the line and, if desired, produce caricatures are presented and compared with the most promising methods so far suggested.

...read moreread less

Abstract: All digitizing methods, as a general rule, record lines with far more data than is necessary for accurate graphic reproduction or for computer analysis. Two algorithms to reduce the number of points required to represent the line and, if desired, produce caricatures, are presented and compared with the most promising methods so far suggested. Line reduction will form a major part of automated generalization. Regle generale, les methodes numeriques enregistrent des lignes avec beaucoup plus de donnees qu'il n'est necessaire a la reproduction graphique precise ou a la recherche par ordinateur. L'auteur presente deux algorithmes pour reduire le nombre de points necessaires pour representer la ligne et produire des caricatures si desire, et les compare aux methodes les plus prometteuses suggerees jusqu'ici. La reduction de la ligne constituera une partie importante de la generalisation automatique.

...read moreread less

3,749 citations

"Trajectory Data Mining: An Overview..." refers methods in this paper

...A well-known algorithm, called Douglas-Peucker [Douglas and Peucker 1973], is used to approximate the original trajectory....
[...]
...The solution first identifies key points shaping a trajectory, by using a line simplification algorithm like DP [Douglas and Peucker 1973]....
[...]

Book Chapter•DOI•

[...]

Rakesh Agrawal¹, Christos Faloutsos¹, Arun N. Swami¹•Institutions (1)

IBM¹

13 Oct 1993

TL;DR: An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning.

...read moreread less

Abstract: We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

...read moreread less

2,082 citations

"Trajectory Data Mining: An Overview..." refers background or methods in this paper

...As the assumption may not hold in reality, Dynamic Time Wrapping (DTW) distance was proposed to allow ‘repeating’ some points as many times as needed in order to get the best alignment [3]....
[...]
...As the assumption may not hold in reality, Dynamic Time Wrapping (DTW) distance was proposed to allow “repeating” some points as many times as needed in order to get the best alignment [Agrawal et al. 1993]....
[...]
...KNN queries retrieve the top K trajectories with the minimum aggregate distance to a few points (entitled the KNN point query [21][94][95]) or a specific trajectory (entitled the KNN trajectory query [117][3])....
[...]
...KNN queries retrieve the top-K trajectories with the minimum aggregate distance to a few points (entitled the KNN point query [Chen et al. 2010; Tao et al. 2002; Tang et al. 2011]) or a specific trajectory (entitled the KNN trajectory query [Yi et al. 1998; Agrawal et al. 1993])....
[...]

Proceedings Article•DOI•

PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth

[...]

Jian Pei¹, Jiawei Han¹, Behzad Mortazavi-Asl¹, Helen Pinto¹, Qiming Chen², Umeshwar Dayal², Meichun Hsu² - Show less +3 more•Institutions (2)

Simon Fraser University¹, Hewlett-Packard²

02 Apr 2001

TL;DR: This work proposes a novel sequential pattern mining method, called Prefixspan (i.e., Prefix-projected - Ettern_ mining), which explores prejxprojection in sequential pattern Mining, and shows that Pre fixspan outperforms both the Apriori-based GSP algorithm and another recently proposed method; Frees pan, in mining large sequence data bases.

...read moreread less

Abstract: Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of A priori which may substantially reduce the number of combinations to be examined. Howeve6 Apriori still encounters problems when a sequence database is large andor when sequential patterns to be mined are numerous ano we propose a novel sequential pattern mining method, called Prefixspan (i.e., Prefix-projected - Ettern_ mining), which explores prejxprojection in sequential pattern mining. Prefixspan mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation. Moreover; prefi-projection substantially reduces the size of projected databases and leads to efJicient processing. Our performance study shows that Prefixspan outperforms both the Apriori-based GSP algorithm and another recently proposed method; Frees pan, in mining large sequence data bases.

...read moreread less

1,975 citations

"Trajectory Data Mining: An Overview..." refers methods in this paper

...After the transformation, we can mine the sequential patterns from these sequences by using existing sequential pattern mining algorithms, such as PrefixSpan [Pei et al. 2011] and CloseSpan [Yan et al. 2003], with time constraints....
[...]
...After the transformation, we can mine the sequential patterns from these sequences by using existing sequential pattern mining algorithms, such as PrefixSpan [80] and CloseSpan [112], with time constraints....
[...]

Proceedings Article•DOI•

Mining interesting locations and travel sequences from GPS trajectories

[...]

Yu Zheng¹, Lizhu Zhang¹, Xing Xie¹, Wei-Ying Ma¹•Institutions (1)

Microsoft¹

20 Apr 2009

TL;DR: This work first model multiple individuals' location histories with a tree-based hierarchical graph (TBHG), and proposes a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual's access on a location as a directed link from the user to that location.

...read moreread less

Abstract: The increasing availability of GPS-enabled devices is changing the way people interact with the Web, and brings us a large amount of GPS trajectories representing people's location histories. In this paper, based on multiple users' GPS trajectories, we aim to mine interesting locations and classical travel sequences in a given geospatial region. Here, interesting locations mean the culturally important places, such as Tiananmen Square in Beijing, and frequented public areas, like shopping malls and restaurants, etc. Such information can help users understand surrounding locations, and would enable travel recommendation. In this work, we first model multiple individuals' location histories with a tree-based hierarchical graph (TBHG). Second, based on the TBHG, we propose a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual's access on a location as a directed link from the user to that location. This model infers the interest of a location by taking into account the following three factors. 1) The interest of a location depends on not only the number of users visiting this location but also these users' travel experiences. 2) Users' travel experiences and location interests have a mutual reinforcement relationship. 3) The interest of a location and the travel experience of a user are relative values and are region-related. Third, we mine the classical travel sequences among locations considering the interests of these locations and users' travel experiences. We evaluated our system using a large GPS dataset collected by 107 users over a period of one year in the real world. As a result, our HITS-based inference model outperformed baseline approaches like rank-by-count and rank-by-frequency. Meanwhile, when considering the users' travel experiences and location interests, we achieved a better performance beyond baselines, such as rank-by-count and rank-by-interest, etc.

...read moreread less

1,903 citations

"Trajectory Data Mining: An Overview..." refers background or methods in this paper

...The noise filtering method, which has been used in T-Drive [Yuan et al. 2010a, 2011a, 2013a] and GeoLife [Zheng et al. 2009a; Zheng et al. 2010] projects, first calculates the travel speed of each point in a trajectory based on the time interval and distance between a point and its successor (we…...
[...]
...The dataset has been used to estimate the similarity between users [Li et al. 2008], which enables friend and location recommendations [Zheng and Xie 2011b; Zheng et al. 2009c]....
[...]
...The dataset has been used to estimate the similarity between users [54], which enables friend and location recommendations [154][155]....
[...]
...[155][154] transform users’ GPS trajectory into a user-location matrix, where a row stands for a user and a column denotes a location (such as a cluster shown in Figure 21)....
[...]
...2011; Zheng et al. 2012b] and travel recommendation [Zheng and Xie 2011b; Zheng et al. 2011c; Zheng et al. 2009b]....
[...]