scispace - formally typeset
Open AccessJournal ArticleDOI

An Improved DBSCAN Algorithm to Detect Stops in Individual Trajectories

Reads0
Chats0
TLDR
A new, comprehensive, hybrid feature–based, density measurement method which considers temporal and spatial properties is constructed by introducing the theory of data fields and taking the new concept of move ability into consideration.
Abstract
With the increasing use of mobile GPS (global positioning system) devices, a large volume of trajectory data on users can be produced. In most existing work, trajectories are usually divided into a set of stops and moves. In trajectories, stops represent the most important and meaningful part of the trajectory; there are many data mining methods to extract these locations. DBSCAN (density-based spatial clustering of applications with noise) is a classical density-based algorithm used to find the high-density areas in space, and different derivative methods of this algorithm have been proposed to find the stops in trajectories. However, most of these methods required a manually-set threshold, such as the speed threshold, for each feature variable. In our research, we first defined our new concept of move ability. Second, by introducing the theory of data fields and by taking our new concept of move ability into consideration, we constructed a new, comprehensive, hybrid feature–based, density measurement method which considers temporal and spatial properties. Finally, an improved DBSCAN algorithm was proposed using our new density measurement method. In the Experimental Section, the effectiveness and efficiency of our method is validated against real datasets. When comparing our algorithm with the classical density-based clustering algorithms, our experimental results show the efficiency of the proposed method.

read more

Content maybe subject to copyright    Report

International Journal of
Geo-Information
Article
An Improved DBSCAN Algorithm to Detect Stops in
Individual Trajectories
Ting Luo
1,2
, Xinwei Zheng
1
, Guangluan Xu
1
, Kun Fu
1,
* and Wenjuan Ren
1
1
Key Laboratory of Spatial Information Processing and Application System Technology, Institute of
Electronics, Chinese Academy of Sciences, Beijing100190, China; 15110288226@163.com (T.L.);
zxw_1020@163.com (X.Z.); gluanxu@mail.ie.ac.cn (G.X.); renandliang@sina.com (W.R.)
2
School of Electronic, Electrical and Communication Engineering,
University of Chinese Academy of Sciences, Beijing 100190, China
* Correspondence: kunfuiecas@gmail.com; Tel.: +86-10-5888-7208 (ext. 8931)
Academic Editor: Wolfgang Kainz
Received: 18 December 2016; Accepted: 21 February 2017; Published: 25 February 2017
Abstract:
With the increasing use of mobile GPS (global positioning system) devices, a large volume
of trajectory data on users can be produced. In most existing work, trajectories are usually divided
into a set of stops and moves. In trajectories, stops represent the most important and meaningful
part of the trajectory; there are many data mining methods to extract these locations. DBSCAN
(density-based spatial clustering of applications with noise) is a classical density-based algorithm
used to find the high-density areas in space, and different derivative methods of this algorithm
have been proposed to find the stops in trajectories. However, most of these methods required a
manually-set threshold, such as the speed threshold, for each feature variable. In our research, we
first defined our new concept of move ability. Second, by introducing the theory of data fields and by
taking our new concept of move ability into consideration, we constructed a new, comprehensive,
hybrid feature–based, density measurement method which considers temporal and spatial properties.
Finally, an improved DBSCAN algorithm was proposed using our new density measurement method.
In the Experimental Section, the effectiveness and efficiency of our method is validated against real
datasets. When comparing our algorithm with the classical density-based clustering algorithms, our
experimental results show the efficiency of the proposed method.
Keywords:
trajectory data; stops and moves; improved DBSCAN algorithm; temporal and
spatial properties
1. Introduction
In recent years, miniaturized GPS (global positioning system) devices have become more widely
used in daily life and large amounts of target trajectory data can be easily recorded. For instance,
people’s daily activity trajectories can be recorded by car GPS equipment and GPS-enabled mobile
phones. A common trajectory of a person’s daily life is illustrated in Figure
1. Useful information
can be extracted from these trajectories and they can be used to benefit daily life. As a result, many
location-based services, such as position-based recommender systems and destination prediction
systems, are receiving increasing attention from both users and developers. The primary concern
of location-based applications is how to understand the semantic meaning of a trajectory, and not
just to consider trajectory as a combination of recorded points. The work in Reference [
1
] proposed a
conceptual model to present trajectories with semantic annotations, allowing one to assign semantic
information, such as moves and stops, to specific parts of trajectories. Stops in trajectories represent
the trajectory segments corresponding to a person’s stay in certain locations. Moves correspond to the
trajectory segments created by the motion of a target between stop locations.
ISPRS Int. J. Geo-Inf. 2017, 6, 63; doi:10.3390/ijgi6030063 www.mdpi.com/journal/ijgi

ISPRS Int. J. Geo-Inf. 2017, 6, 63 2 of 16
are able to cluster one’s stop
Figure 1. An example of a trajectory.
Stop locations in a trajectory are an indispensable part of various applications, such as purpose
prediction services, navigation services, and generic or personalized recommendations. In this paper,
the problem of how to extract stop locations from trajectories is called stop detection. In the literature,
many models have been proposed to divide a trajectory into stop parts and move parts. Research on
stop detection can be divided into two categories: static methods and dynamic methods. Important
positions are defined in advance for static techniques [
2
,
3
], while no prior knowledge regarding stops
is given for a dynamic approach. Recently, several papers have studied the dynamic solution by
considering different aspects of mobility characteristics, such as velocity characteristics. Typically,
general clustering algorithms, which are able to cluster one’s stop locations by assigning different
constraints to different features, are adopted in the dynamic solution.
In general, most of the existing clustering methods used in stop detection suffer from their
respective drawbacks. First, the value of commonly used characteristics in these clustering methods,
such as speed, intensely fluctuates when dealing with a real trajectory. We provide a qualitative
analysis about the speed feature in Section 3. Furthermore, this problem further leads to the second set
of drawbacks, namely that, in most cases, the algorithms need to be given manually-set parameters for
different features, which is a difficult task for users due to the fluctuations described above. Finally,
most of the clustering-based algorithms take the number of GPS points within a given distance as a
measurement of density. As a result, these methods ignore sequential features and the results of these
works are dramatically affected by distance parameters. Additionally, this will be worse when multiple
features are considered together, since users have to specify a parameter for each feature, respectively.
In this paper, taking the aspects described above into consideration, we constructed a new,
comprehensive, hybrid feature–based density measurement method. In our method, we define a
new concept for move ability and apply data field theory, proposed in Reference [
4
], to measure the
density around a GPS point; the new concept of move ability is considered by giving density a move
ability–dependent weight. In our work, the density threshold is automatically determined when
calculating core points. After that, we use our density measurement method to improve the original
DBSCAN (density-based spatial clustering of applications with noise) algorithm.
The rest of our paper is organized as follows. Some common stop detection algorithms are
presented in Section
2. In Section 3, we give the definitions of some basic concepts, for example,
detailed definitions of GPS trajectory and stops. After describing our improved DBSCAN algorithm in
detail in Section 4, we validate our method with real datasets both in terms of feasibility and efficiency
by comparing it with the four other algorithms in Section 5. We conclude our work in Section 6.
2. Related Works
In this section, we provide a survey of the clustering algorithms described or analyzed in the
literature. Various of methods can be used to extract the stop locations in GPS trajectories. In general,
the approaches for stop detection can be summarized into two categories: static methods and dynamic
methods. In static techniques [
2
,
3
], important positions, such as gas stations, are defined in advance.
When extracting stops from trajectories, if targets enter into a predefined region and the stay duration

ISPRS Int. J. Geo-Inf. 2017, 6, 63 3 of 16
exceeds the duration threshold, this previously defined region is regarded as a stop location in the
trajectory. The main drawback of static algorithms is that users need to specify their respective places
of interest. As a result, some interesting and personalized stop locations will not be found if they are
not provided by users beforehand.
As for dynamic approaches, no prior knowledge regarding stops is given and personalized
stop locations can be discovered. Multiple sources from the literature have studied the dynamic
solution by considering different aspects of mobility characteristics [
5
10
]. Considering only the spatial
characteristics, several classical clustering algorithms are introduced to extract stops from a trajectory.
A predictive model, based on automatically detected stop positions, is proposed in Reference [
11
],
and the authors adopted a variation of the traditional K-Means methods in order to detect stop
locations. The selection of the value of parameter K and the initial clustering center is the main issue,
and will directly affect the final results. The DBSCAN [
12
] algorithm is used in Reference [
13
] to
extract significant locations. In Reference [
14
], a modified DBSCAN algorithm, DJ-Cluster (density and
join-based clustering algorithm), is proposed to detect personal meaningful places. These density-based
clustering algorithms can overcome many limitations of the K-Means approach [
15
]; however, they
only take spatial dimensions into consideration and the temporal sequential features are ignored.
Compared with the algorithms described above, many studies have taken both the spatial and
temporal characteristics into consideration. Different derivative methods of the DBSCAN method, with
temporal sequential characteristic being considered, have been adopted by many researchers in order
to extract stop positions [
5
,
6
,
14
,
16
,
17
]. In Reference [
5
], an improved DBSCAN algorithm with gap
treatment was proposed to detect stop episodes in a trajectory. The CB-SMoT (clustering-based stops
and moves of trajectories) algorithm was proposed in Reference [
6
] to extract known and unknown
stops. As it considers temporal speed and spatial features, CB-SMoT is a density-based clustering
algorithm. In detail, clusters are generated by evaluating trajectory sample points at a slower speed
than the velocity threshold. In addition, one of the major parameters in Reference [
6
], namely Eps
(a given distance threshold around which the points are regarded as neighbors), is obtained using a
quantile function. As is described in Reference [
16
], the quantile function in Reference [
6
] does not
always work in estimating the appropriate value for the parameter Eps, making it difficult to determine
an appropriate threshold for the parameter. The method proposed in Reference [
16
] improves the
CB-SMoT algorithm by proposing an alternative for calculating the Eps parameter, but it is still difficult
to calculate it as it depends on users to distinguish the low speed part and high speed part. Additionally,
by assigning different thresholds to different characteristics, some clustering approaches have been
proposed [
18
21
]. Especially, information from satellites is introduced in the TDBC (a spatio-temporal
clustering method used to extract stop points from individual trajectory) algorithm [
21
]. Additionally,
a time-based clustering algorithm was proposed in Reference [
18
] and both the clustering distance
threshold and the time threshold are needed.
The methods mentioned above can obtain a desirable performance in some situations; however,
these methods also have their drawbacks. Most of these methods need to assign appropriate threshold
values for each parameter. While calculating the density of GPS points, most clustering-based
algorithms take the number of GPS points within a given distance into account, without considering
their consequential characteristics. In this paper, the density of GPS points will be calculated using
the adjacent points over the trajectory, but not the overall spatial points. First, we define the new
concept of the move ability feature. To the best of the authors’ knowledge, the move ability feature
was first proposed in stop detection. After that, by combining the theory of the data field, proposed
in Reference [
4
], and our new concept of move ability, we construct a new, comprehensive, hybrid
feature–based, density measurement method. In our method, the density threshold is automatically
determined when calculating core points. Finally, we use our density measurement method to improve
the original DBSCAN algorithm.

ISPRS Int. J. Geo-Inf. 2017, 6, 63 4 of 16
3. Basic Concepts
In this section, we show the definitions of GPS trajectory, stop, and move, based on the general
definitions in Reference [
1
]. These definitions will be used in the rest of this paper. These definitions
are given according to the particular application studied in this paper; for example, altitude is not
considered in this paper since there are small variations in altitude within urban regions.
Definition 1.
GPS Trajectory:A GPS Trajectory is a list of GPS data points {
p
0
=
(
x
0
, y
0
, t
0
)
,
p
1
=
(
x
1
, y
1
, t
1
)
,
. . .
,
p
n
=
(
x
n
, y
n
, t
n
)
}, where
i
[
1, n
]
,
p
i
=
(
x
i
, y
i
, t
i
)
and
t
i
< t
i+1
, and
x
i
,
y
i
and t
i
represent the longitude, latitude, and timestamp, respectively.
Stops represent the significant places of a GPS trajectory where a target has spent a minimal
amount of time, and, essentially, with a higher density of GPS points. A move represents the trajectory
between stops and is equipped with a lower density of GPS points. In Reference [
1
], Spaccapietra
defined some of their characteristics.
Definition 2.
Stop: A stop is a part of a trajectory and the features are as follows: (i) the user has explicitly
defined this part of the trajectory to represent a stop; (ii) the temporal extent is a non-empty time interval;
(iii) the traveling object
does not move as far as the application view of this trajectory is concerned; and (iv) all
stops in the same trajectory are temporally disjointed, i.e., the temporal extents of two stops are always disjointed.
Definition 3.
Move: A move is a part of a trajectory, such that: (i) the part is delimited by two extremities that
represent either two consecutive stops, or
t
begin
and the first stop, or the last stop and
t
end
, or [
t
begin
,
t
end
] (the
case when a trajectory has no stops); (ii) the temporal extent [
t
begin
,
t
end
] is a non-empty time interval; (iii) the
spatial range of a trajectory for the [
t
begin
,
t
end
] interval is a spatio-temporal line (not a point) defined by the
trajectory, where t
begin
is the initial point of the trajectory and tend is the final one.
Definition 4. Distance: The distance between two points < p
n
, p
m
> is denoted by:
Dist
(
p
n
, p
m
)
= 2R × arcsin
s
sin
2
lat
m
lat
n
2
+ cos
(
lat
n
)
× cos
(
lat
m
)
× sin
2
lgt
m
lgt
n
2
(1)
where R represents the radius of the Earth (
R =
6371
km
),
lat
n
and
lat
m
represent the latitudes of
p
n
and
p
m
,
respectively; similarly, lgt
n
and lgt
m
represent the longitude.
Definition 5.
Trajectory curve distance: the curve distance of a sub-trajectory segment,
traj
nm
, which is
composed of a sequence of points
{
p
n
, p
n+1
, . . . , p
m
}
, and is denoted by
TrajCurveDist
(
traj
nm
)
=
m1
k=n
Dist
(
p
k,
p
k+1
)
(2)
Definition 6.
Trajectory direct distance: the direct distance of sub-trajectory segment
traj
nm
=
{
p
n
, p
n+1
, . . . , p
m
}
equals the distance between the first point and the last point in the sub-trajectory and
is denoted by:
TrajDirectDist
(
traj
nm
)
= Dist
(
p
n
, p
m
)
(3)
In general, when a target stays at a stop region, the corresponding trajectory direct distance is far
less than the trajectory curve distance. On the contrary, the corresponding trajectory direct distance
would be close to the trajectory curve distance when the target moves between stop regions. Taking
this into consideration, we propose our new concept of move ability.

ISPRS Int. J. Geo-Inf. 2017, 6, 63 5 of 16
Definition 7.
Move ability: the move ability of a sub-trajectory segment
traj
nm
=
{
p
n
, p
n+1
, . . . , p
m
}
is
denoted by:
MoveAbility
(
traj
nm
)
=
TrajDirectDist
(
traj
nm
)
TrajCurveDist
(
traj
nm
)
(4)
Figure 2 illustrates the concept of move ability. In the figure, there are three sub-trajectories, each
of which contains six points. In detail, the coordinates of each point illustrate the spatial longitude
and latitude in the real world. In addition, for simplicity, the Euclidean distance is used in this
illustration to calculate the move ability features. These three sub-trajectories represent real trajectories
corresponding to different situations: Figure 2a represents the activity at a stop; Figure 2b represents
the movement on curved roads; Figure 2c represents a linear motion in reality. Comparing the move
ability of each sub-trajectory in Figure 2, the results are consistent with our reasoning, described above.
𝑡𝑟𝑎𝑗
𝑛𝑚
= {𝑝
𝑛
, 𝑝
𝑛+1
, , 𝑝
𝑚
}
𝑀𝑜𝑣𝑒𝐴𝑏𝑖𝑙𝑖𝑡𝑦
(
𝑡𝑟𝑎𝑗
𝑛𝑚
)
=
𝑇𝑟𝑎𝑗𝐷𝑖𝑟𝑒𝑐𝑡𝐷𝑖𝑠𝑡(𝑡𝑟𝑎𝑗
𝑛𝑚
)
𝑇𝑟𝑎𝑗𝐶𝑢𝑟𝑣𝑒𝐷𝑖𝑠𝑡(𝑡𝑟𝑎𝑗
𝑛𝑚
)
(a)
(b)
(c)
Figure 2.
Examples of move ability. (
a
) The activity at a stop; (
b
) movement on curved roads;
(c) linear motion.
Furthermore, we find that our new concept of move ability is more suitable for distinguishing
move and stop episodes. A qualitative comparison between the velocity feature and the move ability
was done. Taking a real track as an example, the velocity curve after Gaussian smoothing is shown
in Figure
3a. The velocity curve shows that the speed of moving objects can vary dramatically and
there are many short, slow-speed segments during high-speed parts, which may be caused by short
decelerations in motion. Comparatively, the move ability curve is more stable and discriminatory. The
smoothed move ability curve, using the same Gaussian kernel, is shown in Figure 3b. Especially, a low
value for move ability is only obtained when the target stays in movement around a certain region,
which is likely to be a stop region. In addition, even a low-speed sub-trajectory may achieve a high
move ability; for example, when a target moves in an approximately linear fashion with a low speed,
this can help to remove some fake stops, such as short-duration traffic jams.

Citations
More filters
Journal ArticleDOI

Low-Rank Sparse Subspace for Spectral Clustering

TL;DR: A Low-rank Sparse Subspace (LSS) clustering method via dynamically learning the affinity matrix from low-dimensional space of the original data is proposed, which outperforms the state-of-the-art clustering methods.
Journal ArticleDOI

A Novel K-Means Clustering Algorithm with a Noise Algorithm for Capturing Urban Hotspots

TL;DR: In this paper, a novel K-means clustering algorithm based on a noise algorithm is developed to capture urban hotspots in which the noise algorithm was employed to randomly enhance the attribution of data points and output results of clustering by adding noise judgment in order to automatically obtain the number of clusters for the given data and initialize the center cluster.
Journal ArticleDOI

TAD: A trajectory clustering algorithm based on spatial-temporal density analysis

TL;DR: The laws discovered in this work would provide a reasonable support for the designation of observational plans, and the new trajectory analysis method would also provide the services for the astronomical data analysis and then for the further studies of formation and evolution of the universe.
Journal ArticleDOI

An Automatic K-Means Clustering Algorithm of GPS Data Combining a Novel Niche Genetic Algorithm with Noise and Density

TL;DR: In NoiseClust, an improved noise method and K-means++ are proposed to produce the initial population and capture higher quality seeds that can automatically determine the proper number of clusters, and also handle the different sizes and shapes of genes.
Journal ArticleDOI

A probabilistic stop and move classifier for noisy GPS trajectories

TL;DR: An approach is proposed that takes a noisy GPS trajectory as input and calculates the stop probability at each entry and allows the user to directly filter out any classified stops that are of an unacceptable probability for their application through the use of a minimum stop probability parameter.
References
More filters
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Proceedings ArticleDOI

Mining interesting locations and travel sequences from GPS trajectories

TL;DR: This work first model multiple individuals' location histories with a tree-based hierarchical graph (TBHG), and proposes a HITS (Hypertext Induced Topic Search)-based inference model, which regards an individual's access on a location as a directed link from the user to that location.
Journal ArticleDOI

Using GPS to learn significant locations and predict movement across multiple users

TL;DR: This work presents a system that automatically clusters GPS data taken over an extended period of time into meaningful locations at multiple scales and incorporates these locations into a Markov model that can be consulted for use with a variety of applications in both single-user and collaborative scenarios.
Book

Artificial Intelligence with Uncertainty

TL;DR: This book develops a framework that shows how uncertainty in AI expands and generalizes traditional AI, and describes the cloud model, its uncertainties of randomness and fuzziness, and the correlation between them.
Related Papers (5)