scispace - formally typeset
Open AccessProceedings ArticleDOI

Efficient time series matching by wavelets

TLDR
This paper proposes to use Haar Wavelet Transform for time series indexing and shows that Haar transform can outperform DFT through experiments, and proposes a two-phase method for efficient n-nearest neighbor query in time series databases.
Abstract
Time series stored as feature vectors can be indexed by multidimensional index trees like R-Trees for fast retrieval. Due to the dimensionality curse problem, transformations are applied to time series to reduce the number of dimensions of the feature vectors. Different transformations like Discrete Fourier Transform (DFT) Discrete Wavelet Transform (DWT), Karhunen-Loeve (KL) transform or Singular Value Decomposition (SVD) can be applied. While the use of DFT and K-L transform or SVD have been studied on the literature, to our knowledge, there is no in-depth study on the application of DWT. In this paper we propose to use Haar Wavelet Transform for time series indexing. The major contributions are: (1) we show that Euclidean distance is preserved in the Haar transformed domain and no false dismissal will occur, (2) we show that Haar transform can outperform DFT through experiments, (3) a new similarity model is suggested to accommodate vertical shift of time series, and (4) a two-phase method is proposed for efficient n-nearest neighbor query in time series databases.

read more

Content maybe subject to copyright    Report

Efficient Time Series Matching by Wavelets
Kin-pong Chan and Ada Wai-chee Fu
Department of Computer Science and Engineering
The Chinese University of Hong Kong
Shatin, Hong Kong
kpchan, adafu
@cse.cuhk.edu.hk
Abstract
Time series stored as feature vectors can be indexed by multi-
dimensional index trees like R-Trees for fast retrieval. Due to
the dimensionality curse problem, transformations are applied to
time series to reduce the number of dimensions of the feature vec-
tors. Different transformations like Discrete Fourier Transform
(DFT), Discrete Wavelet Transform (DWT), Karhunen-Loeve (K-
L) transform or Singular Value Decomposition (SVD) can be ap-
plied. While the use of DFT and K-L transform or SVD have been
studied in the literature, to our knowledge, there is no in-depth
study on the application of DWT. In this paper, we propose to use
Haar Wavelet Transform for time series indexing. The major con-
tributions are: (1) we show that Euclidean distance is preserved
in the Haar transformed domain and no false dismissal will occur,
(2) we show that Haar transform can outperform DFT through
experiments, (3) a new similarity model is suggested to accom-
modate vertical shift of time series, and (4) a two-phase method
is proposed for efficient
-nearest neighbor query in time series
databases.
1. Introduction
Time series data are of growing importance in many new
database applications, such as data warehousing and data mining
[3, 8, 2, 12]. A time series (or time sequence) is a sequence of
real numbers, each number representing a value at a time point.
Typical examples include stock prices or currency exchange rates,
biomedical measurements, weather data, etc . . . collected over
time. Therefore, time series databases supporting fast retrieval of
time series data and similarity queries are desired.
In order to depict the similarity between two time series,
we define a similarity measurement during the matching pro-
cess. Given two time series

and

 ! "#
, a standard approach is to compute the Eu-
clidean distance
$&%'

)(
between time series
and
*,+!-
.0/
-
12354
 "
6 78
:9
.
7;
1
7
9 <=?>
@
By using this similarity model, we can retrieve similar time series
by considering distance
$A%!
B
)(
.
Indexing is used to support efficient retrieval and matching of
time series. Some important factors have to be considered: The
first factor is dimensionality reduction. Many multi-dimensional
indexing methods [13, 7, 5, 20] such as the R-Tree and R*-Tree
[20, 5, 11] scale exponentially for high dimensionalities, eventu-
ally reducing the performance to that of sequential scanning or
worse. Hence, transformation isappliedto mapthe time sequences
to a new feature space of a lower dimensionality. Next we must
ensure completeness and effectiveness when the number of dimen-
sions is reduced. To avoid missing any qualifying object, the Eu-
clidean distance in the reduced
C
-dimensional space should be less
than or equal to the Euclidean distance between the two original
time sequences. Finally, we must also consider the nature of data
series since the effectiveness of power concentration of a partic-
ular transformation depends on the nature of the time series. It
is believed that only brown noise or random walks exists in real
signals. In particular, stock movements and exchange rates can be
modeledsuccessfullyas random walks in [10], for which a skewed
energy spectrum can be obtained.
Discrete Fourier Transform (DFT) has been one of the most
commonly used techniques. One problem with DFT is that it
misses the important feature of time localization. Piecewise
Fourier Transform has been proposed to mitigate this problem, but
the size of the pieces leads to other problems. While large pieces
reduce the power of multi-resolution, small pieces has weakness
in modeling low frequencies.
Wavelet Transform (WT), or Discrete Wavelet Transform
(DWT) [9, 18] has been found to be effective in replacing DFT
in many applicationsin computer graphics, image [26], speech [1]
, and signal processing [6, 4]. We propose to apply this technique
in time series for dimension reduction and content-based search.
DWT is a discrete version of WT for numerical signal. Although
the potential application of DWT in this problem was pointed out
in [22], no further investigation has been reported to our knowl-
edge. Hence, it is of value to conduct studies and evaluations on
time series retrieval and matching by means of wavelets.
The advantageof using DWT is multi-resolution representation
of signals. It has the time-frequency localization property. Thus,
DWT is able to give locations in both time and frequency. There-
fore, wavelet representationsof signals bear more information than
that of DFT, in which only frequenciesareconsidered. While DFT
extracts the lower harmonics which represent the general shape of

a time sequence, DWT encodes a coarser resolution of the origi-
nal time sequence with its preceding coefficients. We show that
Euclidean distance is preserved in the Haar transformed domain.
Moreover, we show by experiments that Haar Wavelet Transform
[9], which is a commonly used wavelet transform, can outper-
form DFT significantly.
We alsosuggesta similarity definition to handlethe problem of
vertical shifts of time series. Finally we propose an algorithm on
-nearest neighbor query for the proposed wavelet method. The
algorithm makes use of the range query and dynamically adjusts
the range by the property of Euclidean distance preservationof the
wavelet transformation.
2. Related Work
Discrete Fourier Transform (DFT) is often used for dimension
reduction [2, 15] to achieve efficient indexing. An index built by
means of DFT is also called an F-index [2]. Suppose the DFT of
a time sequence
is denotedby
. For many applications such as
stock data, the low frequency components are located at the pre-
ceding coefficients of
which represent the general trend of the
time sequence
. These coefficients can be indexed in an R-Tree
or R*-Tree for fast retrieval. In most previous works, range query-
ing is considered. A range query (or epsilon query) evaluation
returns sequenceswith Euclidean distance within
from the query
point.
Parseval’s Theorem [23] shows that the Euclidean distance be-
tween two signals
and
in time domain is the same as their
Euclidean distance in frequency domain

<

<
(1)
Therefore, F-index may raise false alarms, but guarantees no false
dismissal. After a range query in the F-index, false alarms are fil-
tered by checking against the query sequence in the original time
domain in a post-processing step. F-index is further generalized
and subsequence matching is proposed in [15]. This is called the
ST-index which permits sequence query of varying length. Each
time sequence is broken up into pieces of subsequencesby a slid-
ing windowwith a fixedlength
for DFT. Feature points in nearby
offsets will form a trail due to the effect of stepwise sliding win-
dow, the minimum bounding rectangle(MBR) of a trail is then be-
ing indexed in an R-Tree instead of the feature points themselves.
When a query arrives, all MBRs that intersect the query region are
retrieved and their trails are matched.
New similarity models are applied to F-index based time se-
ries matching in [24]. It achieves time warping, moving average,
and reversing by applying transformations to feature points in the
frequencydomain. Given a query
, a new index is built by apply-
ing a transformation to all points in the original index and feature
points with a distance less than
from
are returned. However, a
lot of computations are involved in building the new index. which
has a great impact on the actual query performance.
In the above works, no efficient method for nearest neighbor
query, which can be more useful than range query, has been pro-
posed.
We shall use Haar wavelet transform and DWT interchangeably
throughout this paper, unless specified particularly.
Another method that has been employed for dimension reduc-
tion is Karhunen-Loeve (K-L) transform [28]. (This method is
also known as Singular Value Decomposition (SVD) [22], and
is called Principle Component analysis in statistical literature.)
Given a collection of
-dimensional points, we project them on a
C
-dimensional sub-space where
C

, maximizing the variances
in the chosen dimensions. The key weakness of K-L transform is
the deterioration of performance upon incremental update of the
index. Therefore, new projection matrix should be re-calculated
and the index tree has to be re-organized periodically to keep up
the search performance.
2.1. Wavelet Transform
Wavelets are basis functions used in representing data or other
functions. Wavelet algorithms process data at different scales or
resolutions in contrast with DFT where only frequency compo-
nents are considered. The origin of wavelets can be traced to the
work of Karl Weierstrass [27] in 1873. The construction of the
first orthonormal system by Haar [21] is an important milestone.
Haar basis is still a foundation of modern wavelet theory. Another
significant advanceis the introduction of a nonorthogonal basis by
Dennis Gabor in 1946 [16]. In this work we shall advocatethe use
of the Haar wavelets in the problem of time series retrieval.
3. The Proposed Approach
Following a trend in the disciplines of signal and image pro-
cessing, we propose to study the use of wavelet transformation for
the time series indexing problem. Before we go into the details of
our proposed techniques, we would first like to define the similar-
ity model used in sequence matching. The first definition is based
on the Euclidean distance
$A%'

(
between time sequences
and
.
Definition 1 Given a threshold
, two time sequences
and
of
equal length
are said to be similar if
*,+'-
. /
-
123 4
 "
6 78
+
1
7 ;
.
7
2
<
=?>
@

A shortcoming of Definition 1 is demonstrated in Figure 1.
From human interpretation,
and
may be quite similar because
can be shifted up vertically to obtain
or vice versa. However,
they will be considered not similar by Definition 1 because errors
are accumulated at each pair of
7
and
7
. Therefore, we suggest
another similarity model.
Definition 2 Given a threshold
, two time sequences
and
of
equal length
are said to be v-shift similar if
*,+!-
. /
-
1 23 4
 "
6 78
++
1
70;
.
7
2
;
+
1

;
.

22
<
=?>
@

where
.

3

 "
6 78
.
7
and
1

3

 "
6 78
1
7

From Definition 2, any two time sequencesare said to be v-shift
similar if the Euclidean distance is less than or equal to a thresh-
old
neglecting their vertical offsets from x-axis. This definition
can give a better estimation of the similarity between two time se-
quences with similar trends running at two completely different
levels.
y
t
y
x
Figure 1. Example of vertical shifts of time sequences
3.1. Haar Wavelets
We want to have a decomposition that is fast to compute and
requires little storage for each sequence. The Haar wavelet is cho-
sen for the following reasons: (1) it allows good approximation
with a subset of coefficients, (2) it can be computed quickly and
easily, requiring linear time in the length of the sequenceand sim-
ple coding, and (3) it preserves Euclidean distance (see Section
3.3). The formal definition of Haar wavelets is given in Appendix
A. Concrete mathematical foundationscan be found in [9, 19] and
related implementations in [14].
Haar transform can be seen as a series of averaging and differ-
encing operations on a discrete time function. We compute the av-
erage and difference between every two adjacent values of
%
(
.
The procedure to find the Haar transform of a discrete function
%
(
= (9 7 3 5) is shown below.
Resolution Averages Coefficients
4 (9 7 3 5)
2 (8 4) (1 -1)
1 (6) (2)
Resolution 4 is the full resolution of the discrete function
%
(
.
In resolution 2, (8 4) are obtained by taking the average of (9 7)
and (3 5) at resolution 4 respectively. (1 -1) are the differences
of (9 7) and (3 5) divided by two respectively. This process is
continued until a resolution of 1 is reached. The Haar transform
%
%
((
= (

) = (6 2 1 -1) is obtained which is composed
of the last average value 6 and the coefficients found on the right
most column, 2, 1 and -1. It should be pointed out that
is the
overall average value of the whole time sequence, which is equal
to
%

(


. Different resolutions can be obtained
by adding difference values back to or subtract differences from
averages. For instance, (8 4) = (6+2 6-2) where 6 and 2 are the
first and secondcoefficient respectively. This process can be done
recursively until the full resolution is reached.
Haar transform can be realizedby a series of matrix multiplica-
tions as illustrated in Equation (2). Envisioning the example input
signal
as a column vector with length
<
= 4, an intermediate
transform vector
as another column vector and Haar transform
matrix
.

.
3
;
;
"!#
.
.
.
<
.
%$
(2)
The factor 1/2 associated with the Haar transform matrix can be
varied according to different normalization
$
conditions. After the
first multiplication of
and
, half of the Haar transform coef-
ficients can be found which are
and
in
interleaving with
some intermediate coefficients
and
. Actually,
and
are
the last two coefficients of the Haar transform.
and
are then
extracted from
and put into a new column vector
= [
0
0]
&
.
is treated as the new input vector for transformation. This
process is done recursively until one element is left in
. In this
particular case,
and
can be found in the second iteration.
The complexity of Haar transform can be evaluated by consid-
ering the number of operations involved in the recursion process.
Lemma 1 Given a time sequence of length
where
is an inte-
gral power of 2, the complexity of Haar transform is
'
%
(
.
Proof: There are totally
matrix additions or subtractions in the
first iteration of matrix operation. The size of the input vector is
halved in each iteration onwards. The total number of operations
is formulated as
( )*
@
+ ,.- /
0
21
"34353
1
61
1
( )*
@
7
1
7
61
%
87
'(
which is boundedby
9
%
(
.
:
3.2. DFT versus Haar Transform
Our motivation of using Haar transform to replace DFT is
based on several evidences and observations, some of which are
also the reasonswhy the use of wavelet transforms instead of DFT
is considered in areas of image and signal processing.
The first reason is on the pruning power. The nature of the
Euclidean distance preserved by Haar transform and DFT are dif-
ferent. In DFT, comparison of two time sequences is based on
their low frequency components, where most energy is presumed
to be concentrated on. On the other hand, the comparison of Haar
coefficients is matching a gradually refined resolution of the two
time sequences. From intuition, Euclidean distance can be highly
related to low resolution of signal rather than low frequency com-
ponents. This property can give rise to more effective pruning, i.e.
less false alarms will appear, which is confirmed by experiments
in Section 5.
Another reason is the complexity consideration. The complex-
ity of Haar transform is O
%
(
whilst O
%
<;>=.?
(
computation is
<
As for Fast Fourier Transform, the length of the signal is restricted to
numbers which are power of 2.
$
The normalization is described in Section 3.3.

required for Fast Fourier Transform (FFT) [17]. Both impose re-
striction on the length of time sequences which must be an inte-
gral power of 2. Although these computations are all involved in
pre-processing stage, the complexity of the transformation can be
a concern especially when the database is large. From our experi-
ments, the pre-processingtime for DFT is about 3 to 4 times longer
than Haar transform.
Finally, the proposed method provides better similarity model.
Apart from Euclidean distance, our model can easily accommo-
date v-shift similarity of two time sequences(Definition 2) at a lit-
tle more cost. That is, the situation where vertically shifted signals
can match is accommodated. On the other hand, previous study on
F-index did not make use of this similarity model.
Note that similar to DFT, DWT will not require massive index
re-organization because of database updating, which is a major
drawback in using the K-L transform or SVD approach.
3.3. Guarantee of no False Dismissal
For FT and DFT, it is shown by Parseval’s Theorem [23] that
the energy of a signal conserves in both time and frequency do-
mains. Parseval’s Theorem also showsthat this situation is true for
wavelet transforms. On the other hand, the Euclidean distances of
both time and frequency domains are the same for DFT by Equa-
tion (1). This is a very important property in order that dimen-
sion reduction of sequence data is possible. It guarantees that no
qualified time sequence will be rejected, thus no false dismissal.
However, this property has not been shown for DWT in general,
and not for the Haar wavelets. Here we show such a relationship.
Lemma 2 Given a sequence
= (
 
) and a sequence
= (


). The Haar transforms of
and
are
%'
(
=
= (
!


) and
%
(
=
= (
) respectively. Lengths of
,
,
and
are
all equal to 2. Then Euclidean distance
$A%!
B
(
is
1
times of
Euclidean distance
$A%!
(
, i.e.
$A%!
B
0(
1
$A%'

(
Proof: Express
in terms of
and
in terms of
by applying
Equation (2) accordingly.
-
3

.

.
.
;
.
-
3

1

1
1
;
1
Square of Euclidean distance of
and
@




>
@
 

>
@
@
!
>
@
"
>
@
@
@

@
Thus,
$
<
%'

(
% $
<
%'

)((
.1
, and
$A%!
B
)(
1
$A%!

!(
:
Lemma 3 Given two sequences
and
, and the Haar transforms
of
,
are
and
respectively. Lengths of
,
,
and
are all
(
$#
1
and
is a powerof 2). (
) = (
%
$
$
<
. . .
$

). The
Euclidean distance
$A%'

(
=
&
( )*
@
can be expressed in terms
of (
%
?$
$
<
. . .
$
 "
) recursively by
'
7
)(
3
*
!
+
'
<
7
*
<
<
+
*
<
<
+
(
-,.,/,/
*
<
<
+
>
"
2
for
"0
214365
<
;
'
3
7
(3)
Level S
i+1
Level S
1
Level S
0
x
i+1,2j
x
i+1,2j+1
d
2 +j
i
x
i,j
log n,1
x
x
log n,0
x
log n,n-1
x
log n,n-2
x
i+1,2j+3
d i
2 +j+1
x
i,j+1
(2 terms)
d
1
x
1,0
x
1,1
0,0
x
Level S
i
x
i+1,2j+2
Level S
log n
i
Figure 2. Hierarchy of Haar wavelet transform of se-
quence
of length
Proof: In Figure 2, the original sequence
is represented at level
;>=2?
<
. The values of
7
8 9
and
<
+
(:9
are defined by
.
7
;8 9
3
.
7
)(
8
<
9
.
7
<(
8
<
9/(
<
+
(:9
3
.
7
)(
8
<
9
;
.
7
<(
8
<
9/(
The Haar transform of
,
%'
(
is represented by (

8
<
...
<
+
(=9
<
+
(:9>(
...

). A similar hierarchy exists for another
sequence
. Denote
%

8


8
and
$
7
7
of sequence
7
of sequence
, where
7
@?BAC?
7
.
We can treat the elements at each horizontal level of the hierar-
chy to be a data sequence. Hencethe sequence at level
&
7
contains
data

7
;8

7
8

7
;8
<
+

. Let us define
&
7
to be
'
7
3
DE
E
F
<
+
"
6
9
#8
+
.
7
;8 9
;
1
7
;8 9
2
<
&
7
can be seen as the Euclidean distance between the data se-
quences at level
A
(
G
?HA@?
;>=.?
<
) in the hierarchies for
and
. Also,
&
( )*
@
is the Euclidean distance between the given time
series.
Next we prove the following statement:
'
7
<(
3
*
!
+
'
<
7
*
<
<
+
*
<
<
+
(
I,J,.,!
*
<
<
+
>

2
for
"0
"14365
<
;
'
3
7
(4)
The base case is shown true by Lemma 2 when
A
= 0.
'
3
K
!
+
'
<
*
<
2
We next prove the case for
A

C
LMG
. We first note that in the
given hierarchy, for a pair of adjacent elements at a level
L
0 of
the form

7
<(
8
<
9

7
<(
8
<
9>(
, we have the following relation

+
.
7
)(
8
<
9
;
1
7
<(
8
<
9
2
<
+
.
7
)(
8
<
9/(
;
1
7
<(
8
<
9/(
2
<
3
+
.
7
8 9
;
1
7
8 9
2
<
<
+
(:9
;
<
+
(=9
<

(5)
where
<
+
(=9
is the element in the hierarchy for
corre-
sponding to
<
+
(:9
. This can be shown by repeating the
proof in Lemma 2, replacing
by

7
<(
8
<
9

7
<(
8
<
9>(
'
,
by

7
<(
8
<
9

7
<(
8
<
9>(

,
by
!
7
8 9
<
+
(=9
, and
by

7
8 9
<
+
(=9
.
Note that
<
+
(=9
<
+
(=9

<
$
<
<
+
(=9
.
For
A

C
,
'

(
3
D
E
E
F
<

>
"
6
9
#8
+
.
(
8 9
;
1
(
8 9
2
<
3
+
.
(
8
;
1
(
8
2
<
+
.
(
8
;
1
(
8
2
<
I,J,/,
+
.
(
8
<
>

;
1
(
8
<
>

2
<

>
@
By Equation (5), we have
&
(
3

+
.
8
;
1
8
2
<
*
<
<


+
.
8
;
1
8
2
<
*
<
<

(
,/,/,/
+
.
8
<


;
1
8
<

"
2
<
*
<
<

(
<

"
>
@
3

+
.
8
;
1
8
2
<
+
.
8
;
1
8
2
<
-,J,.,
+
.
8
<


;
1
8
<

"
2
<
*
<
<

*
<
<

(
-,.,J,!
*
<
<

(
<

"
>
@
Finally by definition of
&
,
&
(
*
1

%
&
<
$
<
<

$
<
<

(
8353532
$
<
<

>

(
which completes the proof.
:
The expression of the Euclidean distance between time se-
quences in terms of their Haar coefficients is not sufficient for
proper use in multi-dimensional index trees until Euclidean dis-
tance preserves in both Haar and time domains, as for DFT in
(1). This can be achieved by a normalization step which replaces
the scaling factor in Equation (2) from
721
to
7
1
in the Haar
transformation. After the normalization step, Euclidean distance
between sequences in Haar domain will be equivalent to
&
( )*
@
in Equation (3). The preservation of Euclidean distance of Haar
transform ensures the completeness of feature extraction as in
DFT.
If only the first

dimensions (
7
@?

?
) of Haar transform
are used in calculation of Euclidean distance in Equation (3), then
we should replace 0’s in the Haar transformed sequences. This
replacement starts from

+1 th to
th coefficients in the trans-
formed sequences.
Lemma 4 If the first

(
7
?

?
) dimensions of Haar trans-
form are used, no false dismissal will occur for range queries.
Proof: Considering the inequality in Definition 1 and Lemma 3
$A%!

(
&
( )*
@
?
(6)
Using the first

dimensionsas index, the value of
$
7
in Equation
(3) will become zero for
A
#

. Thus the Euclidean distance
between two sequences is
?
&
( )*
@
?
. This completes the
proof.
:
4. The Overall Strategy
In this section, we present the overall strategy of our time se-
ries matching method and propose our own method for nearest
neighbor query. Before querying is performed, we shall do some
pre-processing to extract the feature vectors with reduced dimen-
sionality, and to build the index. After the index is built, content-
based search can be performed for two types of querying: range
querying and
-nearest-neighbors querying.
4.1. Pre-processing
Step 1 - Similarity Model Selection: According to their applica-
tions users may choose to use either the simple Euclidean distance
(Definition 1) or the v-shift similarity (Definition 2) as their sim-
ilarity measurements. For Definition 1, Haar transform is applied
to time series. For Definition 2, Haar transform is applied to time
series, but the first Haar coefficient will not be used in indexing, as
there is no need to match their average values.
Step 2 - Index Construction: Given a database of time series of
varying length. We pre-process the time series as follows. We ob-
tain the
-point Haar transform by applying Equation (2) with the
normalization factor, for each subsequenceswith a sliding window
of size
to each sequence in the database. An index structure
such as an R-Tree is built, using the first

Haar coefficients
where

is an optimal value found by experiments based on the
number of page accesses. This is because of a trade off between
post-processing cost and index dimension.
4.2. Range Query
After we have built the index, we can carry out range query or
nearest neighbor query evaluation. For range queries, two steps
are involved:
1. Similar sequences with distances
?
from the query are
looked up in the index and returned.
2. A post processingstep is applied on these sequencesto find
the true distancesin time domain to remove all falsealarms.
4.3. Nearest Neighbor Query
For nearest neighbor query, we proposea two-phase evaluation
as follows.
Phase 1
In the first phase,
nearest neighbors of query
are found
in the R-Tree index using the algorithm in [25]. The Eu-
clidean distances
$
in time domain (full dimension) are
computed between the query sequence and all
nearest
neighbors obtained which are
$&%


7
(
, where

7
de-
notes the nearest neighbor
A
(
7
? A ?
), with

farthest
from the query
.
Phase 2
A range query evaluation is then performed on the same in-
dex by setting
=
$A%
0

)(
initially. During the search,
Using Definition 2, one dimension can be savedin the index tree.

Citations
More filters
Proceedings ArticleDOI

A symbolic representation of time series, with implications for streaming algorithms

TL;DR: A new symbolic representation of time series is introduced that is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.
Journal ArticleDOI

The Analysis of Time Series: An Introduction

TL;DR: The analysis of time series: An Introduction, 4th edn. as discussed by the authors by C. Chatfield, C. Chapman and Hall, London, 1989. ISBN 0 412 31820 2.
Journal ArticleDOI

Dimensionality reduction for fast similarity search in large time series databases

TL;DR: This work introduces a new dimensionality reduction technique which it is called Piecewise Aggregate Approximation (PAA), and theoretically and empirically compare it to the other techniques and demonstrate its superiority.
Journal ArticleDOI

Experiencing SAX: a novel symbolic representation of time series

TL;DR: The utility of the new symbolic representation of time series formed is demonstrated, which allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.
Journal ArticleDOI

Querying and mining of time series data: experimental comparison of representations and distance measures

TL;DR: An extensive set of time series experiments are conducted re-implementing 8 different representation methods and 9 similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains to provide a unified validation of some of the existing achievements.
References
More filters
Proceedings ArticleDOI

The R*-tree: an efficient and robust access method for points and rectangles

TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.
Book

The Analysis of Time Series: An Introduction

TL;DR: In this paper, simple descriptive techniques for time series estimation in the time domain forecasting stationary processes in the frequency domain spectral analysis bivariate processes linear systems state-space models and the Kalman filter non-linear models multivariate time series modelling some other topics.
Journal ArticleDOI

An introduction to wavelets

TL;DR: The mathematics have been worked out in excruciating detail, and wavelet theory is now in the refinement stage, which involves generalizing and extending wavelets, such as in extending wavelet packet techniques.
Book

Introduction to Wavelets and Wavelet Transforms: A Primer

TL;DR: This work describes the development of the Basic Multiresolution Wavelet System and some of its components, as well as some of the techniques used to design and implement these systems.
Book ChapterDOI

Efficient Similarity Search In Sequence Databases

TL;DR: An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the future works mentioned in the paper "Efficient time series matching by wavelets" ?

The authors have some suggestions for future work. The authors can study the possibility of using other wavelets like Symmlet [ 18 ] to boost up the performance further. The authors can also try to apply wavelets that did not work well with stock data in other signals, e. g. sinusoidal signals, electrocardiographs ( ECGs ). 

While the use of DFT and K-L transform or SVD have been studied in the literature, to their knowledge, there is no in-depth study on the application of DWT. In this paper, the authors propose to use Haar Wavelet Transform for time series indexing. The major contributions are: ( 1 ) the authors show that Euclidean distance is preserved in the Haar transformed domain and no false dismissal will occur, ( 2 ) they show that Haar transform can outperform DFT through experiments, ( 3 ) a new similarity model is suggested to accommodate vertical shift of time series, and ( 4 ) a two-phase method is proposed for efficient -nearest neighbor query in time series databases. 

The authors obtain the -point Haar transform by applying Equation (2) with the normalization factor, for each subsequences with a sliding window of size to each sequence in the database. 

To avoid missing any qualifying object, the Euclidean distance in the reduced C -dimensional space should be less than or equal to the Euclidean distance between the two original time sequences. 

Experiments show that their method outperforms the F-index (Discrete Fourier Transform) method in terms of pruning power, number of page accesses, scalability, and complexity. 

After the first multiplication of and , half of the Haar transform coefficients can be found which are and in interleaving with some intermediate coefficients and . 

As most of the page accesses of a query are devoted to removing false alarm, the precision is crucial to the overall performance of query evaluation. 

Feature points in nearby offsets will form a trail due to the effect of stepwise sliding window, the minimum bounding rectangle (MBR) of a trail is then being indexed in an R-Tree instead of the feature points themselves. 

The poorer precision of DFT creates more work in the post-processing step and this affects the overall performance, especially in terms of the amount of disk accesses for large databases with long sequences. 

An index structure such as an R-Tree is built, using the first Haar coefficients where is an optimal value found by experiments based on the number of page accesses. 

The extra step introduced in Phase 2 to update can enhance the performance by pruning more non-qualifying MBRs during the traversal of R-Tree. 

Apart from Euclidean distance, their model can easily accommodate v-shift similarity of two time sequences (Definition 2) at a little more cost. 

Time series data are of growing importance in many new database applications, such as data warehousing and data mining [3, 8, 2, 12]. 

From experiments,we find that the other wavelets seem to also preserve Euclidean distances, however, so far the authors have a proof of this property only for the Haar wavelets.