scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Trajectory Data Mining: An Overview

Yu Zheng1
12 May 2015-ACM Transactions on Intelligent Systems and Technology (ACM)-Vol. 6, Iss: 3, pp 29
TL;DR: A systematic survey on the major research into trajectory data mining, providing a panorama of the field as well as the scope of its research topics, and introduces the methods that transform trajectories into other data formats, such as graphs, matrices, and tensors.
Abstract: The advances in location-acquisition and mobile computing techniques have generated massive spatial trajectory data, which represent the mobility of a diversity of moving objects, such as people, vehicles, and animals. Many techniques have been proposed for processing, managing, and mining trajectory data in the past decade, fostering a broad range of applications. In this article, we conduct a systematic survey on the major research into trajectory data mining, providing a panorama of the field as well as the scope of its research topics. Following a road map from the derivation of trajectory data, to trajectory data preprocessing, to trajectory data management, and to a variety of mining tasks (such as trajectory pattern mining, outlier detection, and trajectory classification), the survey explores the connections, correlations, and differences among these existing techniques. This survey also introduces the methods that transform trajectories into other data formats, such as graphs, matrices, and tensors, to which more data mining and machine learning techniques can be applied. Finally, some public trajectory datasets are presented. This survey can help shape the field of trajectory data mining, providing a quick understanding of this field to the community.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper might be the first attempt to present a comprehensive literature review on different types of big data in tourism research, and facilitates a thorough understanding of this sunrise research and offers valuable insights into its future prospects.

585 citations

Journal ArticleDOI
TL;DR: In this article, the ability of intelligent autonomous systems to perceive, understand, and anticipate human behavior becomes increasingly important in a growing number of intelligent systems in human environments, and the ability to do so is discussed.
Abstract: With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand, and anticipate human behavior becomes increasingly important. Spec...

547 citations

Journal ArticleDOI
Yu Zheng1
TL;DR: High-level principles of each category of methods are introduced, and examples in which these techniques are used to handle real big data problems are given, to help a wide range of communities find a solution for data fusion in big data projects.
Abstract: Traditional data mining usually deals with data from a single domain. In the big data era, we face a diversity of datasets from different sources in different domains. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. How to unlock the power of knowledge from multiple disparate (but potentially connected) datasets is paramount in big data research, essentially distinguishing big data from traditional data mining tasks. This calls for advanced techniques that can fuse knowledge from various datasets organically in a machine learning and data mining task. This paper summarizes the data fusion methodologies, classifying them into three categories: stage-based, feature level-based, and semantic meaning-based data fusion methods. The last category of data fusion methods is further divided into four groups: multi-view learning-based, similarity-based, probabilistic dependency-based, and transfer learning-based methods. These methods focus on knowledge fusion rather than schema mapping and data merging, significantly distinguishing between cross-domain data fusion and traditional data fusion studied in the database community. This paper does not only introduce high-level principles of each category of methods, but also give examples in which these techniques are used to handle real big data problems. In addition, this paper positions existing works in a framework, exploring the relationship and difference between different data fusion methods. This paper will help a wide range of communities find a solution for data fusion in big data projects.

356 citations


Cites background from "Trajectory Data Mining: An Overview..."

  • ...Index Terms—Big Data, cross-domain datamining, data fusion, multi-modality data representation, deep neural networks, multi-view learning, matrix factorization, probabilistic graphical models, transfer learning, urban computing Ç...

    [...]

Journal ArticleDOI
TL;DR: This research provides an innovative data mining framework that synthesizes the state-of-the-art techniques in extracting mobility patterns from raw mobile phone CDR data, and design a pipeline that can translate the massive and passive mobile phone records to meaningful spatial human mobility patterns readily interpretable for urban and transportation planning purposes.
Abstract: In this study, with Singapore as an example, we demonstrate how we can use mobile phone call detail record (CDR) data, which contains millions of anonymous users, to extract individual mobility networks comparable to the activity-based approach. Such an approach is widely used in the transportation planning practice to develop urban micro simulations of individual daily activities and travel; yet it depends highly on detailed travel survey data to capture individual activity-based behavior. We provide an innovative data mining framework that synthesizes the state-of-the-art techniques in extracting mobility patterns from raw mobile phone CDR data, and design a pipeline that can translate the massive and passive mobile phone records to meaningful spatial human mobility patterns readily interpretable for urban and transportation planning purposes. With growing ubiquitous mobile sensing, and shrinking labor and fiscal resources in the public sector globally, the method presented in this research can be used as a low-cost alternative for transportation and planning agencies to understand the human activity patterns in cities, and provide targeted plans for future sustainable development.

351 citations

Journal ArticleDOI
TL;DR: An application scenario on trajectory data-analysis-based traffic anomaly detection for VSNs and several research challenges and open issues are highlighted and discussed.
Abstract: Vehicular transportation is an essential part of modern cities. However, the ever increasing number of road accidents, traffic congestion, and other such issues become obstacles for the realization of smart cities. As the integration of the Internet of Vehicles and social networks, vehicular social networks (VSNs) are promising to solve the above-mentioned problems by enabling smart mobility in modern cities, which are likely to pave the way for sustainable development by promoting transportation efficiency. In this article, the definition of and a brief introduction to VSNs are presented first. Existing supporting communication technologies are then summarized. Furthermore, we introduce an application scenario on trajectory data-analysis-based traffic anomaly detection for VSNs. Finally, several research challenges and open issues are highlighted and discussed.

286 citations


Cites background from "Trajectory Data Mining: An Overview..."

  • ...Table 2 demonstrates the taxonomy of VSN applications, which can be further divided into social-data-driven vehicular networks, social vehicular ad hoc networks (VANETs), and data-driven social networks [4]....

    [...]

  • ...However, trajectories of vehicles are not perfectly accurate due to sensor noise and other reasons, for example, false positioning signals received in some urban areas [4]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
Abstract: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

9,627 citations


"Trajectory Data Mining: An Overview..." refers background in this paper

  • ...The regions whose log-likelihood ratio statistic value drops in the tail of χ2 distribution are likely to be anomalous [Chandola et al. 2009]....

    [...]

  • ...A survey on general anomaly detection methods can be found in [14]....

    [...]

  • ...The regions whose log-likelihood ratio statistic value drops in the tail of χ(2) distribution are likely to be anomalous [14]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, two algorithms to reduce the number of points required to represent the line and, if desired, produce caricatures are presented and compared with the most promising methods so far suggested.
Abstract: All digitizing methods, as a general rule, record lines with far more data than is necessary for accurate graphic reproduction or for computer analysis. Two algorithms to reduce the number of points required to represent the line and, if desired, produce caricatures, are presented and compared with the most promising methods so far suggested. Line reduction will form a major part of automated generalization. Regle generale, les methodes numeriques enregistrent des lignes avec beaucoup plus de donnees qu'il n'est necessaire a la reproduction graphique precise ou a la recherche par ordinateur. L'auteur presente deux algorithmes pour reduire le nombre de points necessaires pour representer la ligne et produire des caricatures si desire, et les compare aux methodes les plus prometteuses suggerees jusqu'ici. La reduction de la ligne constituera une partie importante de la generalisation automatique.

3,749 citations


"Trajectory Data Mining: An Overview..." refers methods in this paper

  • ...A well-known algorithm, called Douglas-Peucker [Douglas and Peucker 1973], is used to approximate the original trajectory....

    [...]

  • ...The solution first identifies key points shaping a trajectory, by using a line simplification algorithm like DP [Douglas and Peucker 1973]....

    [...]

Book ChapterDOI
13 Oct 1993
TL;DR: An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning.
Abstract: We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lower-dimensionality space by using only the first few Fourier coefficients, we use R * -trees to index the sequences and efficiently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (1–3) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.

2,082 citations


"Trajectory Data Mining: An Overview..." refers background or methods in this paper

  • ...As the assumption may not hold in reality, Dynamic Time Wrapping (DTW) distance was proposed to allow ‘repeating’ some points as many times as needed in order to get the best alignment [3]....

    [...]

  • ...As the assumption may not hold in reality, Dynamic Time Wrapping (DTW) distance was proposed to allow “repeating” some points as many times as needed in order to get the best alignment [Agrawal et al. 1993]....

    [...]

  • ...KNN queries retrieve the top K trajectories with the minimum aggregate distance to a few points (entitled the KNN point query [21][94][95]) or a specific trajectory (entitled the KNN trajectory query [117][3])....

    [...]

  • ...KNN queries retrieve the top-K trajectories with the minimum aggregate distance to a few points (entitled the KNN point query [Chen et al. 2010; Tao et al. 2002; Tang et al. 2011]) or a specific trajectory (entitled the KNN trajectory query [Yi et al. 1998; Agrawal et al. 1993])....

    [...]

Proceedings ArticleDOI
02 Apr 2001
TL;DR: This work proposes a novel sequential pattern mining method, called Prefixspan (i.e., Prefix-projected - Ettern_ mining), which explores prejxprojection in sequential pattern Mining, and shows that Pre fixspan outperforms both the Apriori-based GSP algorithm and another recently proposed method; Frees pan, in mining large sequence data bases.
Abstract: Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence patterns. Most of the previously developed sequential pattern mining methods follow the methodology of A priori which may substantially reduce the number of combinations to be examined. Howeve6 Apriori still encounters problems when a sequence database is large andor when sequential patterns to be mined are numerous ano we propose a novel sequential pattern mining method, called Prefixspan (i.e., Prefix-projected - Ettern_ mining), which explores prejxprojection in sequential pattern mining. Prefixspan mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation. Moreover; prefi-projection substantially reduces the size of projected databases and leads to efJicient processing. Our performance study shows that Prefixspan outperforms both the Apriori-based GSP algorithm and another recently proposed method; Frees pan, in mining large sequence data bases.

1,975 citations


"Trajectory Data Mining: An Overview..." refers methods in this paper

  • ...After the transformation, we can mine the sequential patterns from these sequences by using existing sequential pattern mining algorithms, such as PrefixSpan [Pei et al. 2011] and CloseSpan [Yan et al. 2003], with time constraints....

    [...]

  • ...After the transformation, we can mine the sequential patterns from these sequences by using existing sequential pattern mining algorithms, such as PrefixSpan [80] and CloseSpan [112], with time constraints....

    [...]

Proceedings ArticleDOI
20 Apr 2009
TL;DR: This work first model multiple individuals' location histories with a tree-based hierarchical graph (TBHG), and proposes a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual's access on a location as a directed link from the user to that location.
Abstract: The increasing availability of GPS-enabled devices is changing the way people interact with the Web, and brings us a large amount of GPS trajectories representing people's location histories. In this paper, based on multiple users' GPS trajectories, we aim to mine interesting locations and classical travel sequences in a given geospatial region. Here, interesting locations mean the culturally important places, such as Tiananmen Square in Beijing, and frequented public areas, like shopping malls and restaurants, etc. Such information can help users understand surrounding locations, and would enable travel recommendation. In this work, we first model multiple individuals' location histories with a tree-based hierarchical graph (TBHG). Second, based on the TBHG, we propose a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual's access on a location as a directed link from the user to that location. This model infers the interest of a location by taking into account the following three factors. 1) The interest of a location depends on not only the number of users visiting this location but also these users' travel experiences. 2) Users' travel experiences and location interests have a mutual reinforcement relationship. 3) The interest of a location and the travel experience of a user are relative values and are region-related. Third, we mine the classical travel sequences among locations considering the interests of these locations and users' travel experiences. We evaluated our system using a large GPS dataset collected by 107 users over a period of one year in the real world. As a result, our HITS-based inference model outperformed baseline approaches like rank-by-count and rank-by-frequency. Meanwhile, when considering the users' travel experiences and location interests, we achieved a better performance beyond baselines, such as rank-by-count and rank-by-interest, etc.

1,903 citations


"Trajectory Data Mining: An Overview..." refers background or methods in this paper

  • ...The noise filtering method, which has been used in T-Drive [Yuan et al. 2010a, 2011a, 2013a] and GeoLife [Zheng et al. 2009a; Zheng et al. 2010] projects, first calculates the travel speed of each point in a trajectory based on the time interval and distance between a point and its successor (we…...

    [...]

  • ...The dataset has been used to estimate the similarity between users [Li et al. 2008], which enables friend and location recommendations [Zheng and Xie 2011b; Zheng et al. 2009c]....

    [...]

  • ...The dataset has been used to estimate the similarity between users [54], which enables friend and location recommendations [154][155]....

    [...]

  • ...[155][154] transform users’ GPS trajectory into a user-location matrix, where a row stands for a user and a column denotes a location (such as a cluster shown in Figure 21)....

    [...]

  • ...2011; Zheng et al. 2012b] and travel recommendation [Zheng and Xie 2011b; Zheng et al. 2011c; Zheng et al. 2009b]....

    [...]