On the marriage of Lp-norms and edit distance

doi:10.1016/B978-012088469-8.50070-X

On The Marriage of Lp-norms and Edit Distance

Lei Chen

School of Computer Science

University of Waterloo

l6chen@uwaterloo.ca

Raymond Ng

Department of Computer Science

University of British Columbia

rng@cs.ubc.ca

Abstract

Existing studies on time series are based on

two categories of distance functions. The ﬁrst

category consists of the Lp-norms. They are

metric distance functions but cannot support

local time shifting. The second category con-

sists of distance functions which are capable

of handling local time shifting but are non-

metric. The ﬁrst contribution of this paper

is the proposal of a new distance function,

which we call ERP (“Edit distance with Real

Penalty”). Representing a marriage of L1-

norm and the edit distance, ERP can support

local time shifting, and is a metric.

The second contribution of the paper is the de-

velopment of pruning strategies for large time

series databases. Given that ERP is a met-

ric, one way to prune is to apply the trian-

gle inequality. Another way to prune is to

develop a lower bound on the ERP distance.

We propose such a lower bound, which has

the nice computational property that it can

be eﬃciently indexed with a standard B+-

tree. Moreover, we show that these two ways

of pruning can be used simultaneously for

ERP distances. Speciﬁcally, the false posi-

tives obtained from the B+-tree can be further

minimized by applying the triangle inequal-

ity. Based on extensive experimentation with

existing benchmarks and techniques, we show

that this combination delivers superb pruning

power and search time performance, and dom-

inates all existing strategies.

Permission to copy without fee all or part of this material is

granted provided that the copies are not made or distributed for

direct commercial advantage, the VLDB copyright notice and

the title of the publication and its date appear, and notice is

given that copying is by permission of the Very Large Data Base

Endowment. To copy otherwise, or to republish, requires a fee

and/or special permission from the Endowment.

Proceedings of the 30th VLDB Conference,

Toronto, Canada, 2004

1 Introduction

Many applications require the retrieval of similar time

series. Examples include ﬁnancial data analysis and

market prediction [1, 2, 10], moving object trajectory

determination [6] and music retrieval [31]. Studies in

this area revolve around two key issues: the choice of

a distance function (similarity mo del), and the mech-

anism to improve retrieval eﬃciency.

Concerning the ﬁrst issue, many distance functions

have been considered, including Lp-norms [1, 10], dy-

namic time wraping (DTW) [30, 18, 14], longest com-

mon subsequence (LCSS) [4, 25] and edit distance on

real sequence (EDR) [6]. Lp-norms are easy to com-

pute. However, they cannot handle local time shifting,

which is essential for time series similarity matching.

DTW, LCSS and EDR have been prop osed to exactly

deal with local time shifting. However, they are non-

metric distance functions.

This leads to the second issue of improving retrieval

eﬃciency. Speciﬁcally, non-metric distance functions

complicate matters, as the violation of the triangle in-

equality renders most indexing structures inapplica-

ble. To this end, studies on this topic propose various

lower bounds on the actual distance to guarantee no

false dismissals [30, 18, 14, 31]. However, those lower

bounds can admit a high percentage of false positives.

In this paper, we consider both issues and explore

the following questions.

• Is there a way to combine Lp-norms and the other

distance functions so that we can get the best of

both worlds – namely being able to support local

time shifting and being a metric distance func-

tion?

• With such a metric distance function, we can ap-

ply the triangle inequality for pruning, but can we

develop a lo wer bound for the distance function?

If so, is lower bounding more eﬃcient than apply-

ing the triangle inequality? Or, is it possible to

do both?

Our contributions are as follows:

• We propose in Section 3 a distance function which

we call Edit distance with Real Penalty (ERP). It

792

can be viewed as a variant of L1-norm, except

that it can support local time shifting. It can

also be viewed as a variant of EDR and DTW,

except that it is a metric distance function. We

present benchmark results showing that this dis-

tance function is natural for time series data.

• We propose in Section 4 a new lower bound for

ERP, which can be eﬃciently indexed with a stan-

dard B+-tree. Given that ERP is a metric dis-

tance function, we can also apply the triangle in-

equality. We pr esent benchmark results in Section

5 comparing the eﬃciency of lower bounding ver-

sus applying the triangle inequality.

• Last but not least, we develop in Section 4 a k-

nearest neighbor (k-NN) algorithm that applies

both lowering bounding and the triangle inequal-

ity. We give extensive experimental results in Sec-

tion 5 showing that this algorithm gets the best

of both paradigms, delivers super b retrieval eﬃ-

ciency and dominates all existing strategies.

2 Related Work

Many studies on similarity-based retrieval of time se-

ries were conducted in the past decade. The pioneering

work by Agrawal et al. [1] used Euclidean distance to

measure similarity. Discrete Fourier Transform (DFT)

was used as a dimensionality reduction technique for

time series data, and an R-tree was used as the index

structure. Faloutsos et al. [10] extended this work to

allow subsequence matching and proposed the GEM-

INI framework for indexing time ser ies. The key is the

use of a lower bound on the true distance to guarantee

no false dismissals when the index is used as a ﬁlter.

Subsequent work have focused on two main aspects:

new dimensionality reduction techniques (assuming

that the Euclidean distance is the similarity measure);

and new approaches for measuring the similarity be-

tween two time series. Examples of dimensionality

reduction techniques include Single Value Decompo-

sition [19], Discrete Wavelet Transform [20, 22], Piece-

wise Aggregate Approximation [15, 29], and Adaptive

Piecewise Constant Approximation [14].

The motivation for seeking new similarity measures

is that the Euclidean distance is very weak on han-

dling noise and local time shifting. Berndt and Clif-

ford [3] introduced DTW to allow a time series to be

“stretched” to provide a better match with another

time series. Das et al. [9] and Vlachos et al. [25] ap-

plied the LCSS measure to time series matching. Chen

et al. [6] applied EDR to trajectory data retrieval and

proposed a dimensionality reduction technique via a

symbolic representation of trajectories. However, none

of DTW, LCSS and EDR is a metric distance function

for time series.

Most of the approaches on indexing time series fol-

low the GEMINI framework. However, if the distance

measure is a metric, then existing indexing structures

Symbols Meaning

S a time series [s

1

, . . . , s

n

]

Rest(S) [s

2

, . . . , s

n

]

dist(s

i

, r

i

) the distance between two elements

e

S

S after aligned with another series

DLB a lower bound of the distance

Figure 1: Meanings of Symbols Used

proposed for metrics may be applicable. Examples in-

clude the MVP-tree [5], the M-tree [8], the Sa-tree [21],

and the OMNI-family of access methods [11]. A sur-

vey of metric space indexing is given in [7]. In our

experiments, we pick M-trees and OMNI-sequential as

the strawman structures for comparison; MVP-trees

and Sa-trees are not compared because they are main

memory resident structures. The other access methods

of OMNI-family are not used because the dimension-

ality of OMNI-coordinates is high (e.g., ≥ 20), which

may lead to dimensionality curse [28]. In general, a

common strategy to apply the triangle inequality for

pruning is to use a set of reference points (time series

in this case). Diﬀerent studies propose diﬀerent ways

to choose the reference points. In our experiments,

we compare our strategies in selecting reference points

with the HF algorithm of the OMNI-family.

3 Edit Distance With Real Penalty

3.1 Reviewing Existing Distance Functions

A time series S is deﬁned as a sequence of real val-

ues, with each value s

i

sampled at a speciﬁc time,

i.e., S = [s

1

, s

2

, . . . , s

n

]. The length of S is n,

and the n values are referred to as the n elements.

This sequence is called the raw representation of

the time ser ies. Given S, we can normalize it us-

ing its mean (µ) and standard deviation (σ) [13]:

Norm(S) = [

s

1

−µ

σ

,

s

2

−µ

σ

, . . . ,

s

n

−µ

σ

]. Normalization is

recommended so that the distance between two time

series is invariant to amplitude scaling and (global)

shifting of the time series. Throughout this paper, we

use S to denote Norm(S) for simplicity, even though

all the results developed below apply to the raw rep-

resentation as well. Figure 1 summarizes the main

symbols used in this paper.

Given two time series R and S of the same

length n, the L1-norm distance between R and S is:

P

n

i=1

dist(r

i

, s

i

) =

P

n

i=1

|r

i

− s

i

|. This distance func-

tion satisﬁes the triangle inequality and is a metric.

The problem in using L1-norm for time series is that

it requires the time series to be of the same length and

does not support local time shifting.

To cope with local time shifting, one can borrow

ideas from the domain of strings. A string is a se-

quence of elements, each of which is a symbol in an

alphabet. Two strings, possibly of diﬀerent lengths,

are aligned so that they become identical with the

smallest number of added, deleted or changed sym-

bols. Among these three operations, deletion can be

793

treated as adding a symbol in the other string. Here-

after, we refer to an added symbol as a gap element.

This distance is called the string edit distance. The

cost/distance of introducing a gap element is set to 1.

dist(r

i

, s

i

) =







0 if r

i

= s

i

1 if r

i

or s

i

is a gap

1 otherwise

(1)

In the above formula, we highlight the second case to

indicate that if a gap is introduced in the alignment,

the cost is 1. String edit distance satisﬁes the triangle

inequality and is a metric [27].

To generalize from strings to time series, the compli-

cation is that the elements r

i

and s

i

are not symbols,

but real values. For most applications, strict equal-

ity would not make sense as, for instance, the pair

r

i

= 1, s

i

= 2 should be considered more similar than

the pair r

i

= 1, s

i

= 10000. To take the real values

into account, one way is to relax equality to be within

a certain tolerance δ:

dist

edr

(r

i

, s

i

) =







0 if |r

i

− s

i

| ≤ δ

1 if r

i

or s

i

is a gap

1 otherwise

(2)

This is a simple generalization of Formula (1).

Based on Formula (2) on individual elements and

gaps, the edit distance between two time sequences

R and S of length m and n res pectively is de-

ﬁned in [6] as Formula (3) in Figure 2. r

1

and

Rest(R) denote the ﬁrst element and the remain-

ing sequence of R respectively. Notice that given

Formula (2), the last case in Formula (3) can

be simpliﬁed to: min{EDR(Rest(R), Rest(S)) +

1, EDR(Rest(R), S)+1, EDR(R, Rest(S))+1}. Local

time shifting is essentially implemented by a dynamic-

programming style minimization of the above three

possibilities.

While EDR can handle local time shifting, it no

longer satisﬁes the triangle inequality. The problem

arises precisely from relaxing equality, i.e., |r

i

−s

i

| ≤ δ.

More speciﬁcally, for three elements q

i

, r

i

, s

i

, we can

have |q

i

− r

i

| ≤ δ, |r

i

− s

i

| ≤ δ, but |q

i

− s

i

| > δ.

To illustrate, let us consider a very simple exam-

ple of three time series: Q = [0], R = [1, 2] and

S = [2, 3, 3]. Let δ = 1. To best match R, Q is

aligned to be

e

Q = [0, −], where the symbol “-” de-

notes a gap. (There may exist many alternative ways

to align sequences to get their best match. We only

show one of the possible alignments for simplicity.)

Thus, EDR(Q, R) = 0 + 1 = 1. Similarly, to best

match S, R is aligned to be

e

R = [1, 2, −], giving rise

to EDR(R, S) = 1. Finally, to best match S, Q is

aligned to be

e

Q = [0, −, −], leading to EDR(Q, S) =

3 > EDR(Q, R) + EDR(R, S) = 1 + 1 = 2!

DTW diﬀers from EDR in two key ways, summa-

rized in the following formula:

dist

dtw

(r

i

, s

i

) =







|r

i

− s

i

| if r

i

, s

i

not gaps

|r

i

− s

i−1

| if s

i

is a gap

|s

i

− r

i−1

| if r

i

is a gap

(6)

First, unlike EDR, DTW does not use a δ threshold

to relax equality, the actual L1-norm is used. Second,

unlike EDR, there is no explicit gap concept being in-

troduced in its original deﬁnition [3]. We treat the

replicated elements during the process of aligning two

sequences as gaps of DTW. Therefore, the cost of a

gap is not set to 1 as EDR does; it amounts to repli-

cating the previous element, based on which the L1-

norm is computed. Based on the above formula, the

dynamic warping distance between two time series, de-

noted as DT W (R, S), is deﬁned formally as Formula

(4) in Figure 2. The last case in the formula deals with

the possibilities of replicating either s

i−1

or r

i−1

.

Let us repeat the previous example with DTW:

Q = [0], R = [1, 2] and S = [2, 3, 3]. To best match

R, Q is aligned to be

e

Q = [0, −] = [0, 0]. Thus,

DT W (Q, R) = 1 + 2 = 3. Similarly, to best match

S, R is aligned to be

e

R = [1, 2, −] = [1, 2, 2], giving

rise to DT W (R , S) = 3. Finally, to best match S,

Q is aligned to be

e

Q = [0, −, −] = [0, 0, 0], leading

to DT W (Q, S) = 8 > DT W (Q, R) + DT W (R , S) =

3 + 3 = 6.

It has been shown in [24] that for speech applica-

tions, DTW “loosely” satisﬁes the triangle inequality.

We veriﬁed this observation with the 24 benchmark

data sets used in [14, 31]. It appears that this obser-

vation is not true in general, as on average nearly 30%

of all the triplets do not satisfy the triangle inequality.

3.2 ERP and its Properties

The key reason why DTW does not satisfy the trian-

gle inequality is that, when a gap needs to be added,

it replicates the previous element. Thus, as shown in

the second and third cases of Formula (6), the diﬀer-

ence between an element and a gap varies according to

r

i−1

or s

i−1

. Contrast this situation with EDR, which

makes every diﬀerence to be a constant 1 (second case

in Formula (2)). On the other hand, the problem for

EDR lies in its use of a δ tolerance. DTW does not

have this problem because it uses the L1-norm between

two non-gap elements.

We propose ERP such that it uses real penalty be-

tween two non-gap elements, but a constant value for

computing the distance for gaps. Thus, ERP uses the

following distance formula:

dist

erp

(r

i

, s

i

) =







|r

i

− s

i

| if r

i

, s

i

not gaps

|r

i

− g| if s

i

is a gap

|s

i

− g| if r

i

is a gap

(7)

where g is a constant value. Based on Formula (7),

we deﬁne the ERP distance between two time series,

794

EDR(R, S) =











n if m = 0

m if n = 0

EDR(Rest(R), Rest(S)) if dist

edr

(r

1

, s

1

) = 0

min{EDR(Rest(R), Rest(S)) + dist

edr

(r

1

, s

1

), otherwise

EDR(Rest(R), S) + dist

edr

(r

1

, gap), EDR(R, Rest(S)) + dist

edr

(g ap, s

1

)}

(3)

DT W (R, S) =











0 if m = n = 0

∞ if m = 0 or n = 0

dist

dtw

(r

1

, s

1

) + min{D T W (Rest(R), Rest(S)), otherwise

DT W (Rest(R), S), DT W (R, Rest(S))}

(4)

ERP (R, S) =











P

n

1

|s

i

− g| if m = 0

P

m

1

|r

i

− g| if n = 0

min{ERP (Rest(R), Rest(S)) + dist

erp

(r

1

, s

1

), otherwise

ERP (Rest(R), S) + dist

erp

(r

1

, gap), ERP (R, Rest(S)) + dist

erp

(s

1

, gap)}

(5)

Figure 2: Comparing the Distance Functions

denoted as ERP (R, S), as Formula (5) in Figure 2. A

careful comparison of the formulas reveals that ERP

can be seen as a combination of L1-norm and EDR.

ERP diﬀers from EDR in avoiding the δ tolerance. On

the other hand, ERP diﬀers from DTW in not replicat-

ing the previous elements. The following lemma shows

that for any ﬁxed constant g, the triangle inequality is

satisﬁed.

Lemma 1 For any three elements q

i

, r

i

, s

i

, any of

which can be a gap element, it is necessary that

dist(q

i

, s

i

) ≤ dist(q

i

, r

i

) + dist(r

i

, s

i

) based on For-

mula (7).

Theorem 1 Let Q, R, S be three time series of arbi-

trary length. Then it is necessary that ERP (Q, S) ≤

ERP (Q, R) + ERP (R, S).

The proof of this theorem is a consequence of

Lemma 1 and the proof of the result by Waterman et

al. [27] on string edit distance. The Waterman proof

essentially shows that deﬁning the distance between

two strings based on their best alignment in a dynamic

programming style preserves the triangle inequality, as

long as the underlying distance function also satisﬁes

the triangle inequality. The latter requirement is guar-

anteed by Lemma 1. Due to lack of space, we omit a

detailed proof.

3.2.1 Picking a Value for g

A natural question to ask here is: what is an appro-

priate value of g? The above lemma says that any

value of g, as long as it is ﬁxed, satisﬁes the triangle

inequality. We pick g = 0 for two reasons. First, g = 0

admits an intuitive geometric interpretation. Consider

plotting the time series with the x-axis representing

(equally-spaced) time points and the y-axis represent-

ing the values of the elements. In this case, the x-axis

corresponds to g = 0. Thus, the distance between two

time series R, S corresponds to the diﬀerence between

the area under R and the area under S.

Second, to best match R, S is aligned to form

e

S

with the addition of gap elements. However, since

the gap elements are of value g = 0, it is easy to

see that

P

es

i

=

P

s

j

, making the area under S and

that under

e

S the same. The following lemma states

this property. In the next section, we will see the

computational signiﬁcance of this lemma.

Lemma 2 Let R, S be two time series. By setting

g = 0 in Formula (7),

P

es

i

=

P

s

j

, where S is aligned

to form

e

S to match R.

Let us repeat the previous example with ERP: Q =

[0], R = [1, 2] and S = [2, 3, 3]. To best match R,

Q is aligned to be

e

Q = [0, 0]. Thus, ERP (Q, R) =

1+2 = 3. Similarly, to best match S, R is aligned to be

e

R = [1, 2, 0], giving rise to ERP (R, S) = 5. Finally, to

best match S, Q is aligned to be

e

Q = [0, 0, 0], leading

to ERP (Q, S) = 8 ≤ ERP (Q, R) + ERP (R, S) =

3 + 5 = 8, satisfying the triangle inequality.

To see how local time shifting works for ERP, let us

change Q = [3] instead. Then ERP (Q, R) = 1+1 = 2,

as

e

Q = [0, 3]. Similarly, ERP (Q, S) = 2 + 3 = 5, as

e

Q = [0, 3, 0]. The triangle inequality is satisﬁed as

expected.

Notice that none of the results in this section are

restricted to L1-norm. That is, if we use another Lp-

norm to r eplace L1-norm in Formula (7), the lemma

and the theorem remain valid. For the rest of the

paper, we continue with L1-norm for simplicity.

3.3 On the Naturalness of ERP

Even though ERP is a metric distance function, it is

a valid question to ask whether ERP is “natural” for

time series. In general, whether a distance function is

natural mainly depends on the application semantics.

Nonetheless, we show two experiments below suggest-

ing that ERP appears to be at least as natural as the

existing distance functions.

The ﬁrst experiment is a simple sanity check. We

ﬁrst generated a simple time series Q shown in Fig-

ure 3. Then we generated 5 other time series (T

1

-T

5

)

by adding time shifting or noise data on one or two

positions of Q as shown in Figure 3. For example, T

1

was generated by shifting the sequence values of Q to

the left starting from position 4, and T

2

was derived

from Q by introducing noise in position 4. Finally, we

used L1-norm, DTW, EDR, ERP and LCSS to rank

the ﬁve time series relative to Q. The rankings are

listed left to right, with the leftmost being the most

similar to Q. The rankings are as follow:

795

Figure 3: Subjective Evaluation of Distance Functions

L1-norm: T

1

, T

4

, T

5

, T

3

, T

2

LCSS: T

1

, {T

2

, T

3

, T

4

}, T

5

EDR: T

1

, {T

2

, T

3

}, T

4

, T

5

DTW: T

1

, T

4

, T

3

, T

5

, T

2

ERP: T

1

, T

2

, T

4

, T

5

, T

3

As shown from the above results, L1-norm is sen-

sitive to noise, as T

2

is considered the worst match.

LCSS focuses only on the matched parts and ignores

all the unmatched portions. As such, it gives T

2

, T

3

, T

4

the same rank, and considers T

5

the worst match.

EDR gives T

2

, T

3

the same rank, higher than T

4

. DTW

gives T

3

a higher rank than T

5

. Finally, ERP gives a

ranked list diﬀerent from all the others. Notice that

the point here is not that ERP is the most natural.

Rather, the point is that ERP appears to be no worse,

if not better, than the existing distance functions.

In the second exper iment, we turn to a more ob-

jective evaluation. Recently, Keogh et al. [17] have

proposed using classiﬁcation on labelled data to eval-

uate the eﬃcacy of a distance function on time series.

Speciﬁcally, each time series is assigned a class label.

Then the “leave one out” prediction mechanism is ap-

plied to each time series in turn. That is, the class

label of the chosen time series is predicted to be the

class label of its nearest neighbour, deﬁned based on

the given distance function. If the prediction is correct,

then it is a hit; otherwise, it is a miss. The classiﬁca-

tion error rate is deﬁned as the ratio of the number of

misses to the total number of the time series. In the

table below, we show the average classiﬁcation error

rate using three benchmarks: the Cylinder-Bell-Funnel

(CBFtr) data [12, 14], the ASL data [25] and the “cam-

eramouse” (CM) data [25]. (All can be downloaded

from http://db.uwaterloo.ca/

∼

l6chen/testdata). Com-

pared to the standard CBF data [17], temporal shifting

is introduced in the CBFtr data set. The CBFtr data

set is a 3-class problem. The ASL data set from UCI

KDD archive consists of of signs from the Australian

Sign Language. The ASL data set is a 10-class prob-

lem; The “cameramouse” data set contains 15 trajec-

tories of 5 classes (words) (3 for each word). As shown

in the table below, for three data sets, ERP performs

(one of) the best, showing that it is not dominated by

other well known alternatives.

Avg. Error Rate L1 DTW LCSS EDR ERP

CBFtr 0.03 0.01 0.01 0.01 0.01

ASL 0.16 0.10 0.11 0.11 0.09

CM 0.4 0.00 0.06 0.00 0.00

4 Indexing for ERP

Recall from Figure 2 that ERP can be seen as a vari-

ant of EDR and DTW. In particular, they share the

same computational behavior. Thus, like EDR and

DTW, it takes O(mn) time to compute ERP (Q, S)

for time series Q, S of length m, n respectively. For

large time series databases, it is important that for a

given query Q, we try to minimize the computation of

the true distance between Q and S for all series S in

the database. The topic explored here is indexing for

k-NN queries. An extension to range queries is rather

straightforward; we omit details for brevity.

Given that ERP is a metric distance function, one

obvious way to prune is to apply the triangle inequal-

ity. In Section 4.1, we present an algorithm to do just

that. Metric or not, another common way to prune

is to apply the GEMINI framework – that is, using

lower bounds to guarantee no false negatives. Specif-

ically, even though DTW is not a metric, three lower

bounds have been proposed [30, 18, 14]. In Section

4.2.1, we show how to adapt these lower bounds for

ERP. In Section 4.2.2, we propose a new lower bound

for ERP. The beauty of this lower bound is that it can

be indexed by a simple B+-tree.

4.1 Pruning by the Triangle Inequality

The procedure TrianglePruning shown in Figure 4

shows a skeleton of how the Triangle inequality is

applied. The array procArray stores the true ERP

distances computed so far. That is, if {R

1

, . . . , R

u

}

is the set of time series for which ERP (Q, R

i

) has

been computed, the distance ERP (Q, R

i

) is recorded

in procArray. Thus, for time series S currently be-

ing evaluated, the triangle inequality ensures that

ERP (Q, S) ≥ ERP (Q, R

i

) − ERP (R

i

, S), for all

1 ≤ i ≤ u. Thus, it is necessary that ERP (Q, S) ≥

(max

1≤i≤u

{ERP (Q, R

i

) − ERP (R

i

, S)}). This is

implemented in lines 2 to 4. If this distance

maxP runeDist is already worse than the current k-

NN distance s tored in result, then S can b e skipped

entirely. Otherwise, the true distance ERP (Q, S) is

computed, and procArray is updated to include S.

Finally, the result array is updated, if necessary, to

reﬂect the current k-NN neighbours and distances in

sorted order.

The algorithm given in Figure 5 shows how the

result and procArray should be initialized when the

procedure TrianglePruning is called repeatedly in line

4. Line 3 of the algorithm represents a simple sequen-

tial scan of all the time series in the database. Note

that we are not saying that a sequential scan should be

used. We include it for two reasons. The ﬁrst reason

is to show how the procedure TrianglePruning can be

796

On the marriage of Lp-norms and edit distance

Figures

Citations

Querying and mining of time series data: experimental comparison of representations and distance measures

Trajectory Data Mining: An Overview

Time-series clustering - A decade review

Robust and fast similarity search for moving object trajectories

The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances

References

Efficient Similarity Search In Sequence Databases

Exact indexing of dynamic time warping

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

Fast subsequence matching in time-series databases

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

Related Papers (5)

Robust and fast similarity search for moving object trajectories

Discovering similar multidimensional trajectories

Fast subsequence matching in time-series databases

Querying and mining of time series data: experimental comparison of representations and distance measures

Using dynamic time warping to find patterns in time series