An Improved DBSCAN Algorithm to Detect Stops in Individual Trajectories

doi:10.3390/IJGI6030063

International Journal of

Geo-Information

Article

An Improved DBSCAN Algorithm to Detect Stops in

Individual Trajectories

Ting Luo

1,2

, Xinwei Zheng

1

, Guangluan Xu

1

, Kun Fu

1,

* and Wenjuan Ren

1

Key Laboratory of Spatial Information Processing and Application System Technology, Institute of

Electronics, Chinese Academy of Sciences, Beijing100190, China; 15110288226@163.com (T.L.);

zxw_1020@163.com (X.Z.); gluanxu@mail.ie.ac.cn (G.X.); renandliang@sina.com (W.R.)

2

School of Electronic, Electrical and Communication Engineering,

University of Chinese Academy of Sciences, Beijing 100190, China

* Correspondence: kunfuiecas@gmail.com; Tel.: +86-10-5888-7208 (ext. 8931)

Academic Editor: Wolfgang Kainz

Received: 18 December 2016; Accepted: 21 February 2017; Published: 25 February 2017

Abstract:

With the increasing use of mobile GPS (global positioning system) devices, a large volume

of trajectory data on users can be produced. In most existing work, trajectories are usually divided

into a set of stops and moves. In trajectories, stops represent the most important and meaningful

part of the trajectory; there are many data mining methods to extract these locations. DBSCAN

(density-based spatial clustering of applications with noise) is a classical density-based algorithm

used to ﬁnd the high-density areas in space, and different derivative methods of this algorithm

have been proposed to ﬁnd the stops in trajectories. However, most of these methods required a

manually-set threshold, such as the speed threshold, for each feature variable. In our research, we

ﬁrst deﬁned our new concept of move ability. Second, by introducing the theory of data ﬁelds and by

taking our new concept of move ability into consideration, we constructed a new, comprehensive,

hybrid feature–based, density measurement method which considers temporal and spatial properties.

Finally, an improved DBSCAN algorithm was proposed using our new density measurement method.

In the Experimental Section, the effectiveness and efﬁciency of our method is validated against real

datasets. When comparing our algorithm with the classical density-based clustering algorithms, our

experimental results show the efﬁciency of the proposed method.

Keywords:

trajectory data; stops and moves; improved DBSCAN algorithm; temporal and

spatial properties

1. Introduction

In recent years, miniaturized GPS (global positioning system) devices have become more widely

used in daily life and large amounts of target trajectory data can be easily recorded. For instance,

people’s daily activity trajectories can be recorded by car GPS equipment and GPS-enabled mobile

phones. A common trajectory of a person’s daily life is illustrated in Figure

1. Useful information

can be extracted from these trajectories and they can be used to beneﬁt daily life. As a result, many

location-based services, such as position-based recommender systems and destination prediction

systems, are receiving increasing attention from both users and developers. The primary concern

of location-based applications is how to understand the semantic meaning of a trajectory, and not

just to consider trajectory as a combination of recorded points. The work in Reference [

1

] proposed a

conceptual model to present trajectories with semantic annotations, allowing one to assign semantic

information, such as moves and stops, to speciﬁc parts of trajectories. Stops in trajectories represent

the trajectory segments corresponding to a person’s stay in certain locations. Moves correspond to the

trajectory segments created by the motion of a target between stop locations.

ISPRS Int. J. Geo-Inf. 2017, 6, 63; doi:10.3390/ijgi6030063 www.mdpi.com/journal/ijgi

ISPRS Int. J. Geo-Inf. 2017, 6, 63 2 of 16

are able to cluster one’s stop

–

Figure 1. An example of a trajectory.

Stop locations in a trajectory are an indispensable part of various applications, such as purpose

prediction services, navigation services, and generic or personalized recommendations. In this paper,

the problem of how to extract stop locations from trajectories is called stop detection. In the literature,

many models have been proposed to divide a trajectory into stop parts and move parts. Research on

stop detection can be divided into two categories: static methods and dynamic methods. Important

positions are deﬁned in advance for static techniques [

2

,

3

], while no prior knowledge regarding stops

is given for a dynamic approach. Recently, several papers have studied the dynamic solution by

considering different aspects of mobility characteristics, such as velocity characteristics. Typically,

general clustering algorithms, which are able to cluster one’s stop locations by assigning different

constraints to different features, are adopted in the dynamic solution.

In general, most of the existing clustering methods used in stop detection suffer from their

respective drawbacks. First, the value of commonly used characteristics in these clustering methods,

such as speed, intensely ﬂuctuates when dealing with a real trajectory. We provide a qualitative

analysis about the speed feature in Section 3. Furthermore, this problem further leads to the second set

of drawbacks, namely that, in most cases, the algorithms need to be given manually-set parameters for

different features, which is a difﬁcult task for users due to the ﬂuctuations described above. Finally,

most of the clustering-based algorithms take the number of GPS points within a given distance as a

measurement of density. As a result, these methods ignore sequential features and the results of these

works are dramatically affected by distance parameters. Additionally, this will be worse when multiple

features are considered together, since users have to specify a parameter for each feature, respectively.

In this paper, taking the aspects described above into consideration, we constructed a new,

comprehensive, hybrid feature–based density measurement method. In our method, we deﬁne a

new concept for move ability and apply data ﬁeld theory, proposed in Reference [

4

], to measure the

density around a GPS point; the new concept of move ability is considered by giving density a move

ability–dependent weight. In our work, the density threshold is automatically determined when

calculating core points. After that, we use our density measurement method to improve the original

DBSCAN (density-based spatial clustering of applications with noise) algorithm.

The rest of our paper is organized as follows. Some common stop detection algorithms are

presented in Section

2. In Section 3, we give the deﬁnitions of some basic concepts, for example,

detailed deﬁnitions of GPS trajectory and stops. After describing our improved DBSCAN algorithm in

detail in Section 4, we validate our method with real datasets both in terms of feasibility and efﬁciency

by comparing it with the four other algorithms in Section 5. We conclude our work in Section 6.

2. Related Works

In this section, we provide a survey of the clustering algorithms described or analyzed in the

literature. Various of methods can be used to extract the stop locations in GPS trajectories. In general,

the approaches for stop detection can be summarized into two categories: static methods and dynamic

methods. In static techniques [

2

,

3

], important positions, such as gas stations, are deﬁned in advance.

When extracting stops from trajectories, if targets enter into a predeﬁned region and the stay duration

ISPRS Int. J. Geo-Inf. 2017, 6, 63 3 of 16

exceeds the duration threshold, this previously deﬁned region is regarded as a stop location in the

trajectory. The main drawback of static algorithms is that users need to specify their respective places

of interest. As a result, some interesting and personalized stop locations will not be found if they are

not provided by users beforehand.

As for dynamic approaches, no prior knowledge regarding stops is given and personalized

stop locations can be discovered. Multiple sources from the literature have studied the dynamic

solution by considering different aspects of mobility characteristics [

5

–

10

]. Considering only the spatial

characteristics, several classical clustering algorithms are introduced to extract stops from a trajectory.

A predictive model, based on automatically detected stop positions, is proposed in Reference [

11

],

and the authors adopted a variation of the traditional K-Means methods in order to detect stop

locations. The selection of the value of parameter K and the initial clustering center is the main issue,

and will directly affect the ﬁnal results. The DBSCAN [

12

] algorithm is used in Reference [

13

] to

extract signiﬁcant locations. In Reference [

14

], a modiﬁed DBSCAN algorithm, DJ-Cluster (density and

join-based clustering algorithm), is proposed to detect personal meaningful places. These density-based

clustering algorithms can overcome many limitations of the K-Means approach [

15

]; however, they

only take spatial dimensions into consideration and the temporal sequential features are ignored.

Compared with the algorithms described above, many studies have taken both the spatial and

temporal characteristics into consideration. Different derivative methods of the DBSCAN method, with

temporal sequential characteristic being considered, have been adopted by many researchers in order

to extract stop positions [

5

,

6

,

14

,

16

,

17

]. In Reference [

5

], an improved DBSCAN algorithm with gap

treatment was proposed to detect stop episodes in a trajectory. The CB-SMoT (clustering-based stops

and moves of trajectories) algorithm was proposed in Reference [

6

] to extract known and unknown

stops. As it considers temporal speed and spatial features, CB-SMoT is a density-based clustering

algorithm. In detail, clusters are generated by evaluating trajectory sample points at a slower speed

than the velocity threshold. In addition, one of the major parameters in Reference [

6

], namely Eps

(a given distance threshold around which the points are regarded as neighbors), is obtained using a

quantile function. As is described in Reference [

16

], the quantile function in Reference [

6

] does not

always work in estimating the appropriate value for the parameter Eps, making it difﬁcult to determine

an appropriate threshold for the parameter. The method proposed in Reference [

16

] improves the

CB-SMoT algorithm by proposing an alternative for calculating the Eps parameter, but it is still difﬁcult

to calculate it as it depends on users to distinguish the low speed part and high speed part. Additionally,

by assigning different thresholds to different characteristics, some clustering approaches have been

proposed [

18

–

21

]. Especially, information from satellites is introduced in the TDBC (a spatio-temporal

clustering method used to extract stop points from individual trajectory) algorithm [

21

]. Additionally,

a time-based clustering algorithm was proposed in Reference [

18

] and both the clustering distance

threshold and the time threshold are needed.

The methods mentioned above can obtain a desirable performance in some situations; however,

these methods also have their drawbacks. Most of these methods need to assign appropriate threshold

values for each parameter. While calculating the density of GPS points, most clustering-based

algorithms take the number of GPS points within a given distance into account, without considering

their consequential characteristics. In this paper, the density of GPS points will be calculated using

the adjacent points over the trajectory, but not the overall spatial points. First, we deﬁne the new

concept of the move ability feature. To the best of the authors’ knowledge, the move ability feature

was ﬁrst proposed in stop detection. After that, by combining the theory of the data ﬁeld, proposed

in Reference [

4

], and our new concept of move ability, we construct a new, comprehensive, hybrid

feature–based, density measurement method. In our method, the density threshold is automatically

determined when calculating core points. Finally, we use our density measurement method to improve

the original DBSCAN algorithm.

ISPRS Int. J. Geo-Inf. 2017, 6, 63 4 of 16

3. Basic Concepts

In this section, we show the deﬁnitions of GPS trajectory, stop, and move, based on the general

deﬁnitions in Reference [

1

]. These deﬁnitions will be used in the rest of this paper. These deﬁnitions

are given according to the particular application studied in this paper; for example, altitude is not

considered in this paper since there are small variations in altitude within urban regions.

Deﬁnition 1.

GPS Trajectory:A GPS Trajectory is a list of GPS data points {

p

0

=

(

x

0

, y

0

, t

0

)

,

p

1

=

(

x

1

, y

1

, t

1

)

,

. . .

,

p

n

=

(

x

n

, y

n

, t

n

)

}, where

∀i ∈

[

1, n

]

,

p

i

=

(

x

i

, y

i

, t

i

)

and

t

i

< t

i+1

, and

x

i

,

y

i

and t

i

represent the longitude, latitude, and timestamp, respectively.

Stops represent the signiﬁcant places of a GPS trajectory where a target has spent a minimal

amount of time, and, essentially, with a higher density of GPS points. A move represents the trajectory

between stops and is equipped with a lower density of GPS points. In Reference [

1

], Spaccapietra

deﬁned some of their characteristics.

Deﬁnition 2.

Stop: A stop is a part of a trajectory and the features are as follows: (i) the user has explicitly

deﬁned this part of the trajectory to represent a stop; (ii) the temporal extent is a non-empty time interval;

(iii) the traveling object

does not move as far as the application view of this trajectory is concerned; and (iv) all

stops in the same trajectory are temporally disjointed, i.e., the temporal extents of two stops are always disjointed.

Deﬁnition 3.

Move: A move is a part of a trajectory, such that: (i) the part is delimited by two extremities that

represent either two consecutive stops, or

t

begin

and the ﬁrst stop, or the last stop and

t

end

, or [

t

begin

,

t

end

] (the

case when a trajectory has no stops); (ii) the temporal extent [

t

begin

,

t

end

] is a non-empty time interval; (iii) the

spatial range of a trajectory for the [

t

begin

,

t

end

] interval is a spatio-temporal line (not a point) deﬁned by the

trajectory, where t

begin

is the initial point of the trajectory and tend is the ﬁnal one.

Deﬁnition 4. Distance: The distance between two points < p

n

, p

m

> is denoted by:

Dist

(

p

n

, p

m

)

= 2R × arcsin

s

sin

2



lat

m

− lat

n

2



+ cos

(

lat

n

)

× cos

(

lat

m

)

× sin

2



lgt

m

− lgt

n

2



(1)

where R represents the radius of the Earth (

R =

6371

km

),

lat

n

and

lat

m

represent the latitudes of

p

n

and

p

m

,

respectively; similarly, lgt

n

and lgt

m

represent the longitude.

Deﬁnition 5.

Trajectory curve distance: the curve distance of a sub-trajectory segment,

traj

nm

, which is

composed of a sequence of points

{

p

n

, p

n+1

, . . . , p

m

}

, and is denoted by

TrajCurveDist

(

traj

nm

)

=

m−1

∑

k=n

Dist

(

p

k,

p

k+1

)

(2)

Deﬁnition 6.

Trajectory direct distance: the direct distance of sub-trajectory segment

traj

nm

=

{

p

n

, p

n+1

, . . . , p

m

}

equals the distance between the ﬁrst point and the last point in the sub-trajectory and

is denoted by:

TrajDirectDist

(

traj

nm

)

= Dist

(

p

n

, p

m

)

(3)

In general, when a target stays at a stop region, the corresponding trajectory direct distance is far

less than the trajectory curve distance. On the contrary, the corresponding trajectory direct distance

would be close to the trajectory curve distance when the target moves between stop regions. Taking

this into consideration, we propose our new concept of move ability.

ISPRS Int. J. Geo-Inf. 2017, 6, 63 5 of 16

Deﬁnition 7.

Move ability: the move ability of a sub-trajectory segment

traj

nm

=

{

p

n

, p

n+1

, . . . , p

m

}

is

denoted by:

MoveAbility

(

traj

nm

)

=

TrajDirectDist

(

traj

nm

)

TrajCurveDist

(

traj

nm

)

(4)

Figure 2 illustrates the concept of move ability. In the ﬁgure, there are three sub-trajectories, each

of which contains six points. In detail, the coordinates of each point illustrate the spatial longitude

and latitude in the real world. In addition, for simplicity, the Euclidean distance is used in this

illustration to calculate the move ability features. These three sub-trajectories represent real trajectories

corresponding to different situations: Figure 2a represents the activity at a stop; Figure 2b represents

the movement on curved roads; Figure 2c represents a linear motion in reality. Comparing the move

ability of each sub-trajectory in Figure 2, the results are consistent with our reasoning, described above.

𝑡𝑟𝑎𝑗

𝑛𝑚

= {𝑝

𝑛

, 𝑝

𝑛+1

, … , 𝑝

𝑚

}

𝑀𝑜𝑣𝑒𝐴𝑏𝑖𝑙𝑖𝑡𝑦

(

𝑡𝑟𝑎𝑗

𝑛𝑚

)

=

𝑇𝑟𝑎𝑗𝐷𝑖𝑟𝑒𝑐𝑡𝐷𝑖𝑠𝑡(𝑡𝑟𝑎𝑗

𝑛𝑚

)

𝑇𝑟𝑎𝑗𝐶𝑢𝑟𝑣𝑒𝐷𝑖𝑠𝑡(𝑡𝑟𝑎𝑗

𝑛𝑚

)

(a)

(b)

(c)

Figure 2.

Examples of move ability. (

a

) The activity at a stop; (

b

) movement on curved roads;

(c) linear motion.

Furthermore, we ﬁnd that our new concept of move ability is more suitable for distinguishing

move and stop episodes. A qualitative comparison between the velocity feature and the move ability

was done. Taking a real track as an example, the velocity curve after Gaussian smoothing is shown

in Figure

3a. The velocity curve shows that the speed of moving objects can vary dramatically and

there are many short, slow-speed segments during high-speed parts, which may be caused by short

decelerations in motion. Comparatively, the move ability curve is more stable and discriminatory. The

smoothed move ability curve, using the same Gaussian kernel, is shown in Figure 3b. Especially, a low

value for move ability is only obtained when the target stays in movement around a certain region,

which is likely to be a stop region. In addition, even a low-speed sub-trajectory may achieve a high

move ability; for example, when a target moves in an approximately linear fashion with a low speed,

this can help to remove some fake stops, such as short-duration trafﬁc jams.

An Improved DBSCAN Algorithm to Detect Stops in Individual Trajectories

Citations

Low-Rank Sparse Subspace for Spectral Clustering

A Novel K-Means Clustering Algorithm with a Noise Algorithm for Capturing Urban Hotspots

TAD: A trajectory clustering algorithm based on spatial-temporal density analysis

An Automatic K-Means Clustering Algorithm of GPS Data Combining a Novel Niche Genetic Algorithm with Noise and Density

A probabilistic stop and move classifier for noisy GPS trajectories

References

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Mining interesting locations and travel sequences from GPS trajectories

Using GPS to learn significant locations and predict movement across multiple users

Artificial Intelligence with Uncertainty

Related Papers (5)

A conceptual view on trajectories

A density-based algorithm for discovering clusters in large spatial Databases with Noise

A model for enriching trajectories with semantic geographical information

Creating and benchmarking a new dataset for physical activity monitoring

A clustering-based approach for discovering interesting places in trajectories