scispace - formally typeset
Search or ask a question
Topic

Dynamic time warping

About: Dynamic time warping is a research topic. Over the lifetime, 6013 publications have been published within this topic receiving 133130 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, Hidden Markov Models (HMMs) are used to organize observed activity into meaningful states by minimizing the entropy of the joint distribution of the HMMs' internal state machine.
Abstract: Hidden Markov models (HMMs) have become the workhorses of the monitoring and event recognition literature because they bring to time-series analysis the utility of density estimation and the convenience of dynamic time warping. Once trained, the internals of these models are considered opaque; there is no effort to interpret the hidden states. We show that by minimizing the entropy of the joint distribution, an HMM's internal state machine can be made to organize observed activity into meaningful states. This has uses in video monitoring and annotation, low bit-rate coding of scene activity, and detection of anomalous behavior. We demonstrate with models of office activity and outdoor traffic, showing how the framework learns principal modes of activity and patterns of activity change. We then show how this framework can be adapted to infer hidden state from extremely ambiguous images, in particular, inferring 3D body orientation and pose from sequences of low-resolution silhouettes.

361 citations

01 Jan 2004
TL;DR: The Dynamic Time Warping distance measure is a technique that has long been known in speech recognition community and has been applied to a variety of problems in various disciplines, particularly in the last three years.
Abstract: The Dynamic Time Warping (DTW) distance measure is a technique that has long been known in speech recognition community. It allows a non-linear mapping of one signal to another by minimizing the distance between the two. A decade ago, DTW was introduced into Data Mining community as a utility for various tasks for time series problems including classification, clustering, and anomaly detection. The technique has flourished, particularly in the last three years, and has been applied to a variety of problems in various disciplines. In spite of DTW’s great success, there are still several persistent “myths” about it. These myths have caused confusion and led to much wasted research effort. In this work, we will dispel these myths with the most comprehensive set of time series experiments ever conducted.

354 citations

Proceedings ArticleDOI
01 Dec 2009
TL;DR: An unsupervised learning framework is presented to address the problem of detecting spoken keywords by using segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances and obtaining the keyword detection result.
Abstract: In this paper, we present an unsupervised learning framework to address the problem of detecting spoken keywords. Without any transcription information, a Gaussian Mixture Model is trained to label speech frames with a Gaussian posteriorgram. Given one or more spoken examples of a keyword, we use segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances. The keyword detection result is then obtained by ranking the distortion scores of all the test utterances. We examine the TIMIT corpus as a development set to tune the parameters in our system, and the MIT Lecture corpus for more substantial evaluation. The results demonstrate the viability and effectiveness of our unsupervised learning framework on the keyword spotting task.

350 citations

Journal ArticleDOI
TL;DR: It is shown that, based on a set of assumptions about the distributions of the distances, the warping algorithm that minimizes the overall probability of making a word error is the modified time Warping algorithm with unconstrained endpoints.
Abstract: The technique of dynamic time warping for time registration of a reference and test utterance has found widespread use in the areas of speaker verification and discrete word recognition. As originally proposed, the algorithm placed strong constraints on the possible set of dynamic paths-namely it was assumed that the initial and final frames of both the test and reference utterances were in exact time synchrony. Because of inherent practical difficulties with satisfying the assumptions under which the above constraints are valid, we have considered some modifications to the dynamic time warping algorithm. In particular, an algorithm in which an uncertainty exists in the registration both for initial and final frames was studied. Another modification constrains the dynamic path to follow (within a given range) the path which is locally optimum at each frame. This modification tends to work well when the location of the final frame of the test utterance is significantly in error due to breath noise, etc. To test the different time warping algorithms a set of ten isolated words spoken by 100 speakers was used. Probability density functions of the distances from each of the 100 versions of a word to a reference version of the word were estimated for each of three dynamic warping algorithms. From these data, it is shown that, based on a set of assumptions about the distributions of the distances, the warping algorithm that minimizes the overall probability of making a word error is the modified time warping algorithm with unconstrained endpoints. A discussion of this key result along with some ideas on where the other modifications would be most useful is included.

349 citations

Proceedings ArticleDOI
Li Lee1, Richard Rose1
07 May 1996
TL;DR: An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filter-bank in mel-frequency cepstrum feature analysis are presented.
Abstract: In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filter-bank in mel-frequency cepstrum feature analysis. An experimental study comparing these techniques to other well-known techniques for reducing variability is described. The results showed that frequency warping was consistently able to reduce word error rate by 20% even for very short utterances.

344 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
91% related
Convolutional neural network
74.7K papers, 2M citations
87% related
Deep learning
79.8K papers, 2.1M citations
87% related
Image segmentation
79.6K papers, 1.8M citations
86% related
Artificial neural network
207K papers, 4.5M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023236
2022471
2021341
2020416
2019420
2018377