scispace - formally typeset
Open AccessProceedings ArticleDOI

Symbolic representation and retrieval of moving object trajectories

TLDR
This paper proposes a novel representation of trajectories, called <i>movement pattern strings</i>, which convert the trajectories into symbolic representations, and defines a modified frequency distance for frequency vectors obtained from movement pattern strings to reduce the dimensionality and the computation cost.
Abstract
Searching moving object trajectories of video databases has been applied to many fields, such as video data analysis, content-based video retrieval, video scene classification. In this paper, we propose a novel representation of trajectories, called movement pattern strings, which convert the trajectories into symbolic representations. Movement pattern strings encode both the movement direction and the movement distance information of the trajectories. The distances that are computed in a symbolic space are lower bounds of the distances of original trajectory data, which guarantees that no false dismissals will be introduced using movement pattern strings to retrieve trajectories. In order to improve the retrieval efficiency, we define a modified frequency distance for frequency vectors that are obtained from movement pattern strings to reduce the dimensionality and the computation cost. The experimental results show that using movement pattern strings is almost as effective as using raw trajectories. In addition, the cost of retrieving similar trajectories can greatly be reduced when the modified frequency distance is used as a filter

read more

Content maybe subject to copyright    Report

Symbolic Representation and Retrieval of
Moving Object Trajectories
Lei Chen, M. Tamer
¨
Ozsu
University of Waterloo
School of Computer Science
Waterloo, Canada
{l6chen,tozsu}@uwaterlo o.ca
Vincent Oria
New Jersey Inst. of Technology
Dept. of Computer Science
Newark, New Jersey, USA
{vincent.oria@njit.edu}
Technical Report CS-2003-30 Sept 2003
1

Abstract
Similarity-based retrieval of moving object trajectory is useful to
many applications - GPS systems, sport and surveillance video analy-
sis. However, due to sensor failures, errors in detection techniques, or
different sampling rates, noises, local shifts and scales may appear in
the trajectory records. Hence, it is difficult to design a robust and fast
similarity measure for similarity-based retrieval in a large database.
In this paper, normalized edit distance (NED) is proposed to measure
the similarity between two trajectories. We evaluate the efficacy of
NED and compare it with those of Euclidean distance, Dynamic Time
Warping (DTW), and Longest Common Subsequences (LCSS), show-
ing that NED is more robust and accurate for trajectories that contain
noise and local time shifting. Furthermore, in order to improve the
retrieval efficiency, we propose a novel representation of trajectories,
called movement pattern strings, which convert the trajectories into a
symbolic representation. Movement pattern strings encode both the
movement direction and the movement distance information of the
trajectories. The distances that are computed in a symbolic space
are lower bounds of the distances of original trajectory data, which
guarantees that no false dismissals will be introduced using movement
pattern strings to retrieve trajectories. Finally, we define a modified
frequency distance for frequency vectors that are obtained from move-
ment pattern strings to reduce the dimensionality of movement pattern
strings and computation cost of NED. The experimental results show
that the cost of retrieving similar trajectories can be greatly reduced
when the modified frequency distance is used as a filter.
1 Introduction
With the growth of mobile computing and the development of computer vi-
sion techniques, it has become possible to trace the trajectories of moving
objects in real life and in videos. A number of interesting applications have
been developed based on the analysis of trajectories. For example, using a
GPS system, and by mining the trajectories of animals in a large farming
area, it is possible to determine migration patterns of certain groups of ani-
mals. In sports videos, such as hockey, it is quite useful for coaches or sports
researchers to know the movement patterns of top players. In a store surveil-
lance video monitoring system, finding the customers’ movement patterns
may help in the arrangement of merchandise. All of these applications re-
quire the definition of an accurate and robust similarity measure to determine
similarity among trajectories.
The trajectory of a moving object is defined as the successive positions
of the moving object over a period of time. Therefore, trajectories can be
considered as two (X Y plane) or three (X Y Z plane) dimensional time
series data. Considerable research has been conducted on similarity-based
2

retrieval on one dimensional time series data, such as stock or commodity
prices, sales volume, weather data and biomedical measurements [1, 13, 14,
18, 19, 23, 24, 30]. A question that can be easily raised is: “Can we apply
these techniques for one dimensional time series data to trajectories?” The
answer is unfortunately, negative; directly applying these techniques will not
get satisfactory results. The reason is that trajectories of moving objects have
their own characteristics, which will be briefly introduced in next section.
1.1 Characteristics of Trajectories
Compared to one dimensional time series data, trajectories of moving objects
have the following differences:
Trajectories are always two or three dimensional. Since each point of
a trajectory is represented as a vector in two or three dimensions, di-
mensionality reduction techniques for one dimensional time series data,
such as Discrete Fourier Transform (DFT) [1], Discrete Wavelet Trans-
form (DWT) [19, 23], Single Value Decomposition (SVD) [13, 18] and
Piece-Wise Aggregate Approximation (PAA) [14, 30], cannot be applied
to trajectories. Naively treating each dimension of the moving object
positions independently, the trajectories can be considered as two or
three one-dimensional time series data. However applying dimension-
ality reduction techniques independently on each of the dimensions will
lead to the loss of valuable information on the interdependency among
the dimensions embedded in the positions of a trajectory.
Trajectories may have many outliers. Unlike stock, weather, or com-
modity price data, trajectories of moving objects are captured by record-
ing the positions of the objects from time to time (or tracing the moving
object from frame-to-frame in video data). Therefore, due to sensor
failures or errors in detection techniques, many outliers may appear.
The similarity measures for one dimensional time series data, such as
Euclidean distance [1] and Dynamic Time Warping (DTW) [31] are
very sensitive to noise and can not be applied to trajectories [26].
Similar movement patterns may appear in different spatial regions of
trajectories. Different sampling rates of tracking and recording devices
combined with different speeds of the moving objects may introduce
various local scaling and shifting factors into trajectories. Several tech-
niques have been proposed to remove the shifting and scaling effects by
introducing shifting and scaling functions [5, 6]. Unfortunately, these
techniques work fine for global shifting and scaling but not for the local
shifting and scaling in movement patterns that appear in the trajecto-
ries.
After reviewing the complex characteristics of trajectory data, a question
comes to our mind is “can we find a suitable similarity measure which takes
3

these characteristics into consideration when we compare trajectories?” Fur-
thermore, with the proposed similarity measure, “how can we improve the
retrieval efficiency?” We will address these two questions in our paper.
1.2 Accurate and Robust Similarity Measures for Tra-
jectories
0 200 400 600 800 1000 1200 1400 1600
−600
−400
−200
0
200
400
600
LCSS
normalized
(T
A
, T
B
) = 0.36
T
B
T
A
(a)
0 500 1000 1500 2000 2500
−600
−400
−200
0
200
400
600
LCSS
normalized
(T
A
, T
C
) = 0.36
gap
T
A
T
C
(b)
Figure 1: A comparison of trajectories with the same normalized LCSS but
different gap sizes
Recently, Longest Common Subsequence (LCSS) has been proposed to
measure the similarity between trajectories [26]. Compared to DTW and
Euclidean distance, LCSS allows the matching sequence to stretch and some
elements to be unmatched, which makes it robust to noise [26]. However,
LCSS has difficulties in differentiating the sequences that have the longest
common subsequences of the same length but different sizes of gaps in be-
tween. Figure 1 shows an example of this case
1
, where the normalized LCSS
1
The original trajectory data are two dimensional. For clarity, in the figures, we only
4

score [26] between trajectories T
A
and T
B
(Figure 1(a)) is the same as that
between T
A
and T
C
(Figure 1(b)). However, by comparing the three trajec-
tories (the horizontal grey lines are used to show the common subsequences
between two trajectories), it quite clear that T
A
is more similar to T
B
than
to T
C
.
In this paper, we define a distance measure called Normalized Edit Dis-
tance (NED) to measure the similarity between two trajectories. NED is
based on Edit Distance (ED) [20], which is widely used in bio-informatics
and speech recognition to measure the similarity between two strings. In
contrast to LCSS, NED considers the gaps in between subsequences as well
as the subsequences themselves. For example, for the trajectories shown in
Figure 1, the value of NED between T
A
and T
B
is 0.7 and 0.78 for T
A
and
T
C
(the detailed definition of NED is given in Section 2), which conform to
the perceptual similarity that T
A
is similar to T
B
than to T
C
.
However, the space and time cost of computing NED is very high, in-
creasing the retrieval cost as a consequence. Since edit distance is originally
defined for strings, it seems possible to convert the real-valued trajectory data
into strings and utilize the well defined algorithms and embedded distance
functions of strings to improve the retrieval efficiency. Thus, we propose a
novel trajectory representation, called movement pattern strings (MPS). A
MPS is derived from a trajectory by quantizing the (movement direction,
distance ratio) space into a set of distinct equal-sized subregions and rep-
resenting each subregion by a symbol. Most importantly, the NED that is
computed from two MPSs establishes the lower bound of the NED of two
original sequences of movement direction and distance pairs, which guaran-
tees that no dismissals will be introduced using the symbolic representation.
Furthermore, we define a modified frequency distance (MFD) between two
frequency vectors (FV) of movement pattern strings to reduce the cost of
CPU time on computing NED of two movement sequences. A normalized
MFD (NMFD) between two FVs is also the lower bound of NED between
two trajectories. Therefore, we can directly use FV as a filter to remove the
false candidates during the retrieval.
1.3 Our Main Contributions
The main contributions of our paper are the following:
1. We define a distance measure, NED, based on ED, to measure the
similarity between two trajectories. NED is more robust than DTW
and Euclidean distance and more accurate than LCSS.
2. We develop a transformation scheme to convert a trajectory into a
symbolic representation, called movement pattern strings, and prove
that the NED that is computed over a symbolic space is the lower
show one dimension.
5

Citations
More filters
Journal ArticleDOI

Clustering of Vehicle Trajectories

TL;DR: This work combines ideas from two spectral clustering methods and proposes a trajectory-similarity measure based on the Hausdorff distance, with modifications to improve its robustness and account for the fact that trajectories are ordered collections of points.
Proceedings ArticleDOI

Visually mining and monitoring massive time series

TL;DR: VizTree is introduced, a novel time-series visualization tool to aid the Aerospace analysts who must make these engineering assessments, and is unique in that the same tool is used for mining archival data and monitoring incoming live telemetry.
Dissertation

Similarity search for multidimensional data sequences = 다차원 데이터 시퀀스에 대한 유사성 검색

Seok-Lyong Lee, +1 more
TL;DR: To prune irrelevant sequences in a database, correct and efficient similarity functions are introduced and 73-94 percent of irrelevant sequences are pruned using the proposed method, resulting in 16-28 times faster response time compared with that of the sequential search.
Journal ArticleDOI

Visualizing and discovering non-trivial patterns in large time series databases

TL;DR: VizTree is a time series pattern discovery and visualization system based on augmenting suffix trees that provides novel interactive solutions to many pattern discovery problems, including the discovery of frequently occurring patterns (motif discovery), surprising patterns (anomaly detection), and query by content.
Journal ArticleDOI

Movement similarity assessment using symbolic representation of trajectories

TL;DR: A novel approach for finding similar trajectories, using trajectory segmentation based on movement parameters (MPs) such as speed, acceleration, or direction, using a modified version of edit distance called normalized weighted edit distance (NWED) is introduced as a similarity measure.
References
More filters
Book

Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology

TL;DR: In this paper, the authors introduce suffix trees and their use in sequence alignment, core string edits, alignments and dynamic programming, and extend the core problems to extend the main problems.
Related Papers (5)
Frequently Asked Questions (19)
Q1. What contributions have the authors mentioned in the paper "Symbolic representation and retrieval of moving object trajectories" ?

In this paper, normalized edit distance ( NED ) is proposed to measure the similarity between two trajectories. The authors evaluate the efficacy of NED and compare it with those of Euclidean distance, Dynamic Time Warping ( DTW ), and Longest Common Subsequences ( LCSS ), showing that NED is more robust and accurate for trajectories that contain noise and local time shifting. Furthermore, in order to improve the retrieval efficiency, the authors propose a novel representation of trajectories, called movement pattern strings, which convert the trajectories into a symbolic representation. The distances that are computed in a symbolic space are lower bounds of the distances of original trajectory data, which guarantees that no false dismissals will be introduced using movement pattern strings to retrieve trajectories. 

Future work includes the following problems: 1. Finding an embedding method, which keeps both the lower bound property and the temporal order of elements in the strings. 

MPS has quite stable pruning power over trajectory length, because it maintains in the strings, the order of the corresponding (movement direction, distance ratio) pairs, and its ability to remove a lot of false candidates due to its consideration of neighbors of each symbol. 

Chen and Chang [4] used wavelet transform to decompose raw object trajectories (position sequences) into components at different scale. 

Using MPS as filter is based on the assumption that the retrieval cost may be reduced due to the smaller size of MPS compared to movement sequences. 

as the number of neighbors of each integer point in the frequency space is limited (at most 8), the computation time of Algorithm 3 is still linear. 

the authors define NMFD between two frequency vectors and use frequency vectors as filters to save the cost of CPU time on computing NED. 

The authors find that for ASL data, the authors get best results for LCSS and NED when ²dir = 0.167π and ²dis = 0.1 ∗ σmax, where σmax is the maximum value of movement distance ratio in the data set, which can be obtained when the authors convert raw trajectories to movement sequences. 

Once the authors quantize the (movement direction, distance ratio) space into subregions and derive the movement alphabet A, the authors use Algorithm 1 to map a (movement direction, distance ratio) pair (θ, σ) into a symbol. 

The similarity measure that the authors propose takes the longest common subsequences, gap penalties and compared sequence lengths into consideration. 

Given a movement sequence MA = [(θa,1, σa,1), . . . , (θa,n, σa,n)] of length n and movement pattern alphabet A, a movement pattern string (MPS) is defined as a sequence of symbols: Sa,1Sa,2 . . . 

Their experimental results confirm that NED is a suitable and superior similarity measure for trajectory data and feature vector with NMFD can effectively reduce the false candidates in trajectory retrieval. 

NED between original movement sequences MA and MB is 0, whereas the NED between MPSA and MPSB that is computed based on the standard edit distance [20] is 1, which is not the lower bound of 0. 

This is because the (movement direction, distance ratio) pairs that are located near the boundary of quantization subregions may be assigned different symbols and require a replace operation that is not needed in the original sequence comparison. 

Due to the lower bound property of NED on MPS, clustering on it achieves nearly the same number of correct results as that of clustering on original movement sequences. 

Little and Gu [22] used the path and speed curves to represent the motion trajectories and measured the distance between two trajectories using DTW. 

In terms of total retrieval efficiency, FV is much better than MPS due to the linearity of the computation cost of FV as opposed to quadratic cost for MPS.3. 

Even though the authors reduce the storage requirements by converting movement sequences into movement pattern strings, the cost of computing the NED between two MPSs is still O(n∗m), since the length of a movement sequence and that of its corresponding movement pattern string are the same. 

Let u and v be integer points in s dimensional space, The frequency distance FD(u, v) between u and v is defined as the minimum number of steps that is required to go from u to v (or equivalently from v to u) by moving to a neighbor point at each step.