Topic
Dynamic time warping
About: Dynamic time warping is a research topic. Over the lifetime, 6013 publications have been published within this topic receiving 133130 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: In this article, Hidden Markov Models (HMMs) are used to organize observed activity into meaningful states by minimizing the entropy of the joint distribution of the HMMs' internal state machine.
Abstract: Hidden Markov models (HMMs) have become the workhorses of the monitoring and event recognition literature because they bring to time-series analysis the utility of density estimation and the convenience of dynamic time warping. Once trained, the internals of these models are considered opaque; there is no effort to interpret the hidden states. We show that by minimizing the entropy of the joint distribution, an HMM's internal state machine can be made to organize observed activity into meaningful states. This has uses in video monitoring and annotation, low bit-rate coding of scene activity, and detection of anomalous behavior. We demonstrate with models of office activity and outdoor traffic, showing how the framework learns principal modes of activity and patterns of activity change. We then show how this framework can be adapted to infer hidden state from extremely ambiguous images, in particular, inferring 3D body orientation and pose from sequences of low-resolution silhouettes.
361 citations
01 Jan 2004
TL;DR: The Dynamic Time Warping distance measure is a technique that has long been known in speech recognition community and has been applied to a variety of problems in various disciplines, particularly in the last three years.
Abstract: The Dynamic Time Warping (DTW) distance measure is a technique that has long been known in speech recognition community. It allows a non-linear mapping of one signal to another by minimizing the distance between the two. A decade ago, DTW was introduced into Data Mining community as a utility for various tasks for time series problems including classification, clustering, and anomaly detection. The technique has flourished, particularly in the last three years, and has been applied to a variety of problems in various disciplines. In spite of DTW’s great success, there are still several persistent “myths” about it. These myths have caused confusion and led to much wasted research effort. In this work, we will dispel these myths with the most comprehensive set of time series experiments ever conducted.
354 citations
••
01 Dec 2009TL;DR: An unsupervised learning framework is presented to address the problem of detecting spoken keywords by using segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances and obtaining the keyword detection result.
Abstract: In this paper, we present an unsupervised learning framework to address the problem of detecting spoken keywords. Without any transcription information, a Gaussian Mixture Model is trained to label speech frames with a Gaussian posteriorgram. Given one or more spoken examples of a keyword, we use segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances. The keyword detection result is then obtained by ranking the distortion scores of all the test utterances. We examine the TIMIT corpus as a development set to tune the parameters in our system, and the MIT Lecture corpus for more substantial evaluation. The results demonstrate the viability and effectiveness of our unsupervised learning framework on the keyword spotting task.
350 citations
••
TL;DR: It is shown that, based on a set of assumptions about the distributions of the distances, the warping algorithm that minimizes the overall probability of making a word error is the modified time Warping algorithm with unconstrained endpoints.
Abstract: The technique of dynamic time warping for time registration of a reference and test utterance has found widespread use in the areas of speaker verification and discrete word recognition. As originally proposed, the algorithm placed strong constraints on the possible set of dynamic paths-namely it was assumed that the initial and final frames of both the test and reference utterances were in exact time synchrony. Because of inherent practical difficulties with satisfying the assumptions under which the above constraints are valid, we have considered some modifications to the dynamic time warping algorithm. In particular, an algorithm in which an uncertainty exists in the registration both for initial and final frames was studied. Another modification constrains the dynamic path to follow (within a given range) the path which is locally optimum at each frame. This modification tends to work well when the location of the final frame of the test utterance is significantly in error due to breath noise, etc. To test the different time warping algorithms a set of ten isolated words spoken by 100 speakers was used. Probability density functions of the distances from each of the 100 versions of a word to a reference version of the word were estimated for each of three dynamic warping algorithms. From these data, it is shown that, based on a set of assumptions about the distributions of the distances, the warping algorithm that minimizes the overall probability of making a word error is the modified time warping algorithm with unconstrained endpoints. A discussion of this key result along with some ideas on where the other modifications would be most useful is included.
349 citations
••
07 May 1996TL;DR: An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filter-bank in mel-frequency cepstrum feature analysis are presented.
Abstract: In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filter-bank in mel-frequency cepstrum feature analysis. An experimental study comparing these techniques to other well-known techniques for reducing variability is described. The results showed that frequency warping was consistently able to reduce word error rate by 20% even for very short utterances.
344 citations