scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 2002"


Proceedings ArticleDOI
26 Feb 2002
TL;DR: This work formalizes non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences.
Abstract: We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount of noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translation of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and time warping distance functions (for real and synthetic data) and show the superiority of our approach, especially in the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.

1,504 citations


Book ChapterDOI
01 Jan 2002
TL;DR: Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis, but does not obey the triangular inequality and, thus, has resisted attempts at exact indexing.
Abstract: Publisher Summary The indexing of very large time series databases has attracted the attention of database community in recent years. The vast majority of work in this area has focused on indexing under the Euclidean distance metric. The problem of indexing time series has attracted much research interest in the database community. Most algorithms that are used to index time series utilize the Euclidean distance or some variation thereof. However, it has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis. Because of this flexibility, DTW is widely used in science, medicine, industry, and finance. Unfortunately, however, DTW does not obey the triangular inequality and, thus, has resisted attempts at exact indexing. Instead, many researchers have introduced approximate indexing techniques, or abandoned the idea of indexing and concentrated on speeding up sequential search.

1,033 citations


Proceedings Article
20 Aug 2002
TL;DR: In this paper, a technique for the exact indexing of Dynamic Time Warping (DTW) is proposed. But the technique is not suitable for time series and does not guarantee no false dismissals.
Abstract: The problem of indexing time series has attracted much research interest in the database community. Most algorithms used to index time series utilize the Euclidean distance or some variation thereof. However is has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dynamic Time Warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis. Because of this flexibility, DTW is widely used in science, medicine, industry and finance. Unfortunately however, DTW does not obey the triangular inequality, and thus has resisted attempts at exact indexing. Instead, many researchers have introduced approximate indexing techniques, or abandoned the idea of indexing and concentrated on speeding up sequential search. In this work we introduce a novel technique for the exact indexing of DTW. We prove that our method guarantees no false dismissals and we demonstrate its vast superiority over all competing approaches in the largest and most comprehensive set of time series indexing experiments ever undertaken.

668 citations


Proceedings ArticleDOI
06 Aug 2002
TL;DR: A novel classification approach for online handwriting recognition is described that combines dynamic time warping (DTW) and support vector machines (SVMs) by establishing a new SVM kernel that is directly addresses the problem of discrimination by creating class boundaries and thus is less sensitive to modeling assumptions.
Abstract: In this paper we describe a novel classification approach for online handwriting recognition. The technique combines dynamic time warping (DTW) and support vector machines (SVMs) by establishing a new SVM kernel. We call this kernel Gaussian DTW (GDTW) kernel. This kernel approach has a main advantage over common HMM techniques. It does not assume a model for the generative class conditional densities. Instead, it directly addresses the problem of discrimination by creating class boundaries and thus is less sensitive to modeling assumptions. By incorporating DTW in the kernel function, general classification problems with variable-sized sequential data can be handled. In this respect the proposed method can be straightforwardly applied to all classification problems, where DTW gives a reasonable distance measure, e.g., speech recognition or genome processing. We show experiments with this kernel approach on the UNIPEN handwriting data, achieving results comparable to an HMM-based technique.

377 citations


Proceedings Article
01 Jan 2002
TL;DR: Almost all algorithms that operate on time series data need to compute the similarity between them, and Euclidean distance, or some extension or modification thereof, is typically used.
Abstract: Time series are a ubiquitous form of data occurring in virtually every scientific discipline and business application. There has been much recent work on adapting data mining algorithms to time series databases. For example, Das et al. attempt to show how association rules can be learned from time series [7]. Debregeas and Hebrail [8] demonstrate a technique for scaling up time series clustering algorithms to massive datasets. Keogh and Pazzani introduced a new, scalable time series classification algorithm [16]. Almost all algorithms that operate on time series data need to compute the similarity between them. Euclidean distance, or some extension or modification thereof, is typically used. However as we will demonstrate in Section 2.1, Euclidean distance can be an extremely brittle distance measure.

285 citations


Journal ArticleDOI
TL;DR: A time warping algorithm for alignment of LC-MS data in the chromatographic direction has been examined and with moderate time shifts present in the data, pre-processing with this algorithm yields approximately trilinear data for which reasonable models can be made.

226 citations


Journal ArticleDOI
TL;DR: Two techniques for alignment of profiles, namely dynamic time Warping (DTW) and correlation optimized warping (COW) were tested and compared and the attention was focused on chromatographic and spectroscopic profiles.

212 citations


Book ChapterDOI
28 May 2002
TL;DR: This work analyzes walking people using a gait sequence representation that bypasses the need for frame-to-frame tracking of body parts and finds that the frieze groups of the gait patterns and their canonical tiles enable us to estimate viewing direction of human walking videos.
Abstract: We analyze walking people using a gait sequence representation that bypasses the need for frame-to-frame tracking of body parts The gait representation maps a video sequence of silhouettes into a pair of two-dimensional spatio-temporal patterns that are near-periodic along the time axis Mathematically, such patterns are called "frieze" patterns and associated symmetry groups "frieze groups" With the help of a walking humanoid avatar, we explore variation in gait frieze patterns with respect to viewing angle, and find that the frieze groups of the gait patterns and their canonical tiles enable us to estimate viewing direction of human walking videos In addition, analysis of periodic patterns allows us to determine the dynamic time warping and affine scaling that aligns two gait sequences from similar viewpoints We also show how gait alignment can be used to perform human identification and model-based body part segmentation

204 citations


Proceedings Article
01 Jan 2002
TL;DR: A system is described which measures the similarity of two arbitrary rhythmic patterns, and behaved consistently by assigning high similarity measures to similar musical rhythms, even when performed using different sound sets.
Abstract: A system is described which measures the similarity of two arbitrary rhythmic patterns. The patterns are represented as acoustic signals, and are not assumed to have been performed with similar sound sets. Two novel methods are presented that constitute the algorithmic core of the system. First, a probabilistic musical meter estimation process is described, which segments a continuous musical signal into patterns. As a side-product, the method outputs tatum, tactus (beat), and measure lengths. A subsequent process performs the actual similarity measurements. Acoustic features are extracted which model the fluctuation of loudness and brightness within the pattern, and dynamic time warping is then applied to align the patterns to be compared. In simulations, the system behaved consistently by assigning high similarity measures to similar musical rhythms, even when performed using different sound sets.

135 citations


Proceedings ArticleDOI
02 Sep 2002
TL;DR: This work proposes the use of non-metric distance functions based on the longest common subsequence (LCSS), in conjunction with a sigmoidal matching function for similarity analysis of spatio-temporal trajectories for mobile objects.
Abstract: We investigate techniques for similarity analysis of spatio-temporal trajectories for mobile objects. Such data may contain a large number of outliers, which degrade the performance of Euclidean and time warping distance. Therefore, we propose the use of non-metric distance functions based on the longest common subsequence (LCSS), in conjunction with a sigmoidal matching function. Finally, we compare these new methods to various L/sub p/ norms and also to time warping distance (for real and synthetic data) and present experimental results that validate the accuracy and efficiency of our approach, especially in the presence of noise.

95 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: An electrocardiogram (ECG) frame classification technique realized by a dynamic time warping (DTW) matching technique, which has been used successfully in speech recognition, is presented, which is used to classify ECG frames because ECG and speech signals have similar nonstationary characteristics.
Abstract: Presents an electrocardiogram (ECG) frame classification technique realized by a dynamic time warping (DTW) matching technique, which has been used successfully in speech recognition. We use the DTW to classify ECG frames because ECG and speech signals have similar nonstationary characteristics. The DTW mapping function is obtained by searching the frame from its end to start. A threshold is setup for DWT matching residual either to classify an ECG frame or to add a new class. Classification and establishment of a template set are carried out simultaneously. A frame is classified into a category with a minimal residual and satisfying a threshold requirement. A classification residual of 1.33% is achieved by the DTW for a 10-minute ECG recording.

Journal ArticleDOI
TL;DR: The isolated word speech recognition system based on dynamic time warping (DTW) has been developed and performance is evaluated using 12 words of Lithuanian language pronounced ten times by ten speakers.
Abstract: The isolated word speech recognition system based on dynamic time warping (DTW) has been developed. Speaker adaptation is performed using speaker recognition techniques. Vector quantization is used to create reference templates for speaker recognition. Linear predictive coding (LPC) parameters are used as features for recognition. Performance is evaluated using 12 words of Lithuanian language pronounced ten times by ten speakers.

Patent
Ajit V. Rao1
27 Jun 2002
TL;DR: In this article, the warp contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function, using only a subset of possible contours contained within a sub-range of the range of contours.
Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques. Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames. The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation. In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours. The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function.

Journal ArticleDOI
TL;DR: This paper explores the use of NFL for speaker identification in terms of limited data and examines how the NFL performs in such a vexing problem of various mismatches between training and test, and proposes an alternative method for similarity measure.

Book ChapterDOI
01 Jun 2002
TL;DR: A complete scheme for face recognition based on salient feature extraction in challenging conditions, which is performed without an a priori or learned model, and makes face recognition robust to low frequency variations as well as to high frequency variations.
Abstract: The utility of face recognition for multimedia indexing is enhanced by using accurate detection and alignment of salient invariant face features. The face recognition can be performed using template matching or a feature-based-approach, but both these methods suffer from occlusion and require an a priori model for extracting information. To avoid these drawbacks, we present in this paper a complete scheme for face recognition based on salient feature extraction in challenging conditions, which is performed without an a priori or learned model. These features are used in a matching process that overcomes occlusion effects and facial expressions using the dynamic space warping which aligns each feature in the query image, if possible, with its corresponding feature in the gallery set. Thus, we make face recognition robust to low frequency variations (like the presence of occlusion, etc) as well as to high frequency variations (like expression, gender, etc). A maximum likelihood scheme is used to make the recognition process more precise, as is shown in the experiments.

Book ChapterDOI
03 Jun 2002
TL;DR: This paper constructs motion models to easier extract features of given motions and proposes measure of discrepancy between motions, which shows how two motions are similar to each other, normalizes length of motions and decreases high dimension of considered motion data, so clustering may take place in dimensionally reduced space.
Abstract: This paper concerns essential, practical problem in automatic animation human-like figures with the support of informatics technologies connected with motion capture domain. The main problem we want to solve is partition set of primitive motions into appropriate groups according to similarity between motions. Up to now, experiments in systems of this kind, appeared be not too adequate to needs. In this situation, we had been faced with the necessity of creating new methods for supporting process of managing motion data. We construct motion models to easier extract features of given motions. Using these models we propose measure of discrepancy between motions. It shows how two motions are similar to each other, normalizes length of motions and decreases high dimension of considered motion data, so clustering may take place in dimensionally reduced space.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: It is claimed that a good set of prototypes can be formed from the combined results of different clustering algorithms, however, the number of clusters cannot be determined automatically, but some human interventions are required.
Abstract: This work reports experiments with four hierarchical clustering algorithms and two clustering indices for online handwritten character recognition. The main motivation of the work is to develop an automatic method for finding a set of prototypical characters which would represent well the different writing styles present in a large international database. One of the major obstacles in achieving this goal is the uneven representation of different writing styles in the database. On the basis of the results of the experiments, we claim that a good set of prototypes can be formed from the combined results of different clustering algorithms. However, the number of clusters cannot be determined automatically, but some human interventions are required.

01 Jan 2002
TL;DR: Soundspotter is presented, which allows the user to select a specific passage within an audio file and retrieve perceptually similar passages, and comprises several alternative retrieval algorithms, including dynamic time warping and trajectory matching based on a self-organizing map.
Abstract: We present the audio retrieval system “Soundspotter,” which allows the user to select a specific passage within an audio file and retrieve perceptually similar passages. The system extracts framebased features from the sound signal and performs pattern matching on the resulting sequences of feature vectors. Finally, an adjustable number of best matches is returned, ranked by their similarity to the reference passage. Soundspotter comprises several alternative retrieval algorithms, including dynamic time warping and trajectory matching based on a self-organizing map. We explain the algorithms and report initial results of a comparative evaluation.

Journal ArticleDOI
TL;DR: The new case-based reasoning algorithm with dynamic time warping as the measure of similarity allows extension of the use of automatic laboratory alerting systems to conditions in which abnormal laboratory results are the norm and critical states can be detected only by recognition of pathological changes over time.

Journal ArticleDOI
TL;DR: A neural network based statistical appearance model of the lips which classifies pixels as belonging to the lips, skin, or inner mouth classes is presented which reduces the parameter space dimensionality in the red-hue energy minimization, thus yielding better contour shape and location estimates.
Abstract: We aim at modeling the appearance of the lower face region to assist visual feature extraction for audio-visual speech processing applications. In this paper, we present a neural network based statistical appearance model of the lips which classifies pixels as belonging to the lips, skin, or inner mouth classes. This model requires labeled examples to be trained, and we propose to label images automatically by employing a lip-shape model and a red-hue energy function. To improve the performance of lip-tracking, we propose to use blue marked-up image sequences of the same subject uttering the identical sentences as natural nonmarked-up ones. The easily extracted lip shapes from blue images are then mapped to the natural ones using acoustic information. The lip-shape estimates obtained simplify lip-tracking on the natural images, as they reduce the parameter space dimensionality in the red-hue energy minimization, thus yielding better contour shape and location estimates. We applied the proposed method to a small audio-visual database of three subjects, achieving errors in pixel classification around 6%, compared to 3% for hand-placed contours and 20% for filtered red-hue.

Journal ArticleDOI
TL;DR: The results of the simulations showed that the adaptation strategies are able to improve the system's recognition rate and the prototype inactivation methods do reduce the harmful effects of erroneous learning samples.

Book ChapterDOI
01 Jan 2002
TL;DR: This paper analyses the different techniques used for speech recognition and identifies those that can be used for non-speech sound recognition and performs benchmarks on these techniques and determines which technique is better suited forNon- speech sound recognition.
Abstract: This paper discusses the use of speech recognition techniques in non-speech sound recognition. It analyses the different techniques used for speech recognition and identifies those that can be used for non-speech sound recognition. It then performs benchmarks on these techniques and determines which technique is better suited for non-speech sound recognition. As a comparison, it also gives results for the use of learning vector quantization (LVQ) and artificial neural network (ANN) techniques in speech recognition.

Journal Article
TL;DR: Wang et al. as mentioned in this paper proposed an effective and efficient approach for shape-based retrieval of subsequences, which supports various combinations of transformations such as shifting, scaling, moving average, and time warping.
Abstract: This paper deals with the problem of shape-based retrieval in time-series databases. The shape-based retrieval is defined as the operation that searches for the (sub)sequences whose shapes are similar to that of a given query sequence regardless of their actual element values. In this paper, we propose an effective and efficient approach for shape-based retrieval of subsequences. We first introduce a new similarity model for shape-based retrieval that supports various combinations of transformations such as shifting, scaling, moving average, and time warping. For efficient processing of the shape-based retrieval based on the similarity model, we also propose the indexing and query processing methods. To verify the superiority of our approach, we perform extensive experiments with the real-world S&P 500 stock data. The results reveal that our approach successfully finds all the subsequences that have the shapes similar to that of the query sequence, and also achieves significant speedup up to around 66 times compared with the sequential scan method.

Proceedings ArticleDOI
11 Mar 2002
TL;DR: This paper introduces a new similarity model for shape-based retrieval that supports various combinations of transformations such as shifting, scaling, moving average, and time warping and proposes the indexing and query processing methods.
Abstract: This paper deals with the problem of shape-based retrieval in time-series databases. The shape-based retrieval is defined as the operation that searches for the (sub)sequences whose shapes are similar to that of a given query sequence. In this paper, we propose an effective and efficient approach for shape-based retrieval of subsequences. We first introduce a new similarity model for shape-based retrieval that supports various combinations of transformations such as shifting, scaling, moving average, and time warping. For efficient processing of the shape-based retrieval, we also propose the indexing and query processing methods. To verify the superiority of our approach, we perform extensive experiments with the real-world S&P 500 stock data. The results reveal that our approach successfully finds all the subsequences that have the shapes similar to that of the query sequence, and also achieves significant speedup over the sequential scan method.

Dissertation
01 Jan 2002
TL;DR: This manuscript presents research on model-based parameters extraction from video sequences for automatic speechreading in natural weaklv constrained, conditions and describes the a posteriori lip shape and appearance models learnt from corpora that are proposed.
Abstract: In this manuscript, we present our research on model-based parameters extraction from video sequences for automatic speechreading in natural weaklv constrained, conditions. More precisely we describe the a posteriori lip shape and appearance models learnt from corpora that we propose. To be trained, these models require that lips can be located easily on images, which is not the case on nutural images. As manually labelling images is time-consuming, and hardly possible on a large corpus, we propose to use automatic methods instead through the use of make up and speech's bimodality. First, we defined a shape model for the lips containing two polygons : one for the outer lip contour and the other for the inner lip contour. This rnodel gives the opportunity to extract most lipreading information according to a in depth bibliographical study. To train statistically this model, we use video sequences where the speakers wear bIue lipstick on their lips, which enables easy boundary extraction. Welearn the mean shape and the main deformations. Next, we studied statistical appearance models which can only be trained on natural images. On these images, automatic lip location without external constraints is still unsolved. To label lips automatically, we use two repetitions of the same sentence by the same subject, with and without blue make up : onceagain, the blue sequence enables easy lip location and dynamic time warping (dtw) allows to estimate lip shape on natural images using the extracted shapes on blue images. The appearance model obtained is very similar to the one obtained when training the same initial model with hand-Iabeled images and is quite better than other models relying on hue. Moreover, the model we built can be adapted to any subject.

Book ChapterDOI
TL;DR: A complete automatic speech segmentation technique has been studied in order to eliminate the need for manually segmented sentences and show the usefulness of the approach presented here is that manually segmenting data is not needed inorder to train acoustic models.
Abstract: A complete automatic speech segmentation technique has been studied in order to eliminate the need for manually segmented sentences. The goal is to fix the phoneme boundaries using only the speech waveform and the phonetic sequence of the sentences.The phonetic boundaries are established using a Dynamic Time Warping algorithm that uses the a posteriori probabilities of each phonetic unit given the acoustic frame. These a posteriori probabilities are calculated by combining the probabilities of acoustic classes which are obtained from a clustering procedure on the feature space and the conditional probabilities of each acoustic class with respect to each phonetic unit.The usefulness of the approach presented here is that manually segmented data is not needed in order to train acoustic models. The results of the obtained segmentation are similar to those obtained using the HTK toolkit with the "flat-start" option activated. Finally, results using Artificial Neural Networks and manually segmented data are also reported for comparison purposes.

Journal ArticleDOI
TL;DR: A DTW-based statistical model is proposed to explore the subspace structures of speaker feature space for feature evaluation, dimension reduction and inter-class information discovery in pattern space to demonstrate its usefulness in isolated digits speaker identification.

Proceedings Article
07 Aug 2002
TL;DR: In this article, a generative mixture model is proposed to align and cluster sets of multidimensional curves measured on a discrete time grid, which allows both local nonlinear time warping and global linear shifts of the observed curves in both time and measurement spaces relative to the mean curves within the clusters.
Abstract: In this paper we present a family of models and learning algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based on a generative mixture model that allows both local nonlinear time warping and global linear shifts of the observed curves in both time and measurement spaces relative to the mean curves within the clusters. The resulting model can be viewed as a form of Bayesian network with a special temporal structure. The Expectation-Maximization (EM) algorithm is used to simultaneously recover both the curve models for each cluster, and the most likely alignments and cluster membership for each curve. We evaluate the methodology on two real-world data sets, and show that the Bayesian network models provide systematic improvements in predictive power over more conventional clustering approaches.

Book ChapterDOI
28 May 2002
TL;DR: The graph partitioning method is extended and in particular, the Normalised Cut model originally introduced for static image segmentation is extended to unsupervised clustering of temporal trajectories withfully automated model order selection.
Abstract: We present a novel approach for automatically learning models of temporal trajectories extracted from video data Instead of using a representation of linearly time-normalised vectors of fixed-length, our approach makes use of Dynamic Time Warp distance as a similarity measure to capture the underlying ordered structure of variable-length temporal data while removing the non-linear warping of the time scale We reformulate the structure learning problem as an optimal graphpartitioning of the dataset to solely exploit Dynamic Time Warp similarity weights without the need for intermediate cluster centroid representations We extend the graph partitioning method and in particular, the Normalised Cut model originally introduced for static image segmentation to unsupervised clustering of temporal trajectories withfully automated model order selection By computing hierarchical average Dynamic Time Warp for eachcluster, we learn warp-free trajectory models and recover the time warp profiles and structural variance in the data We demonstrate the approach on modelling trajectories of continuous hand-gestures and moving objects in an indoor environment

Patent
18 Dec 2002
TL;DR: The Gaussian Dynamic Time Warping model as discussed by the authors provides a hierarchical statistical model for representing an acoustic pattern, which is useful in speech processing application, particularly in applications such as word and speaker recognition.
Abstract: The Gaussian Dynamic Time Warping model provides a hierarchical statistical model for representing an acoustic pattern. The first layer of the model represents the general acoustic space; the second layer represents each speaker space and the third layer represents the temporal structure information contained in each enrollment speech utterance, based on equally-spaced time intervals. These three layers are hierarchically developed: the second layer is derived from the first, and the third layer is derived from the second. The model is useful in speech processing application, particularly in applications such as word and speaker recognition, using a spotting recognition mode.