scispace - formally typeset
Search or ask a question
Topic

Dynamic time warping

About: Dynamic time warping is a research topic. Over the lifetime, 6013 publications have been published within this topic receiving 133130 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences and good results have been obtained in speaker-independent speech recognition.
Abstract: The Self-Organizing Map (SOM) and Learning Vector Quantization (LVQ) algorithms are constructed in this work for variable-length and warped feature sequences. The novelty is to associate an entire feature vector sequence, instead of a single feature vector, as a model with each SOM node. Dynamic time warping is used to obtain time-normalized distances between sequences with different lengths. Starting with random initialization, ordered feature sequence maps then ensue, and Learning Vector Quantization can be used to fine tune the prototype sequences for optimal class separation. The resulting SOM models, the prototype sequences, can then be used for the recognition as well as synthesis of patterns. Good results have been obtained in speaker-independent speech recognition.

170 citations

Posted Content
TL;DR: Li et al. as mentioned in this paper incorporated the Lie group structure into a deep network architecture to learn more appropriate Lie group features for skeleton-based action recognition, and designed rotation mapping layers to transform the input Lie group feature into desirable ones, which are aligned better in the temporal domain.
Abstract: In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time warping, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture to learn more appropriate Lie group features for 3D action recognition. Within the network structure, we design rotation mapping layers to transform the input Lie group features into desirable ones, which are aligned better in the temporal domain. To reduce the high feature dimensionality, the architecture is equipped with rotation pooling layers for the elements on the Lie group. Furthermore, we propose a logarithm mapping layer to map the resulting manifold data into a tangent space that facilitates the application of regular output layers for the final classification. Evaluations of the proposed network for standard 3D human action recognition datasets clearly demonstrate its superiority over existing shallow Lie group feature learning methods as well as most conventional deep learning methods.

170 citations

Proceedings ArticleDOI
16 May 2016
TL;DR: Through extensive experimentation on 72 datasets, it is demonstrated that the simple collective formed by including all classifiers in one ensemble is significantly more accurate than any of its components and any other previously published TSC algorithm.
Abstract: We have proposed an ensemble scheme for TSC based on constructing classifiers on different data representations. The standard baseline algorithms used in TSC research are 1-NN with Euclidean distance and/or Dynamic Time Warping. We have conclusively shown that COTE significantly out-performs both of these approaches, and that COTE it is significantly better than all of the competing algorithms that have been proposed in the literature. We believe the results we present represents a new state-of-the-art in TSC that new algorithms should be compared to in terms of accuracy.

169 citations

Proceedings ArticleDOI
03 Mar 2016
TL;DR: In this article, a parallel version of Word2Vec is proposed, which offers the vector representations of fixed dimensionality for variable-length audio segments, with very attractive real world applications such as query-by-example Spoken Term Detection (STD).
Abstract: The vector representations of fixed dimensionality for words (in text) offered by Word2Vec have been shown to be very useful in many application scenarios, in particular due to the semantic information they carry. This paper proposes a parallel version, the Audio Word2Vec. It offers the vector representations of fixed dimensionality for variable-length audio segments. These vector representations are shown to describe the sequential phonetic structures of the audio segments to a good degree, with very attractive real world applications such as query-by-example Spoken Term Detection (STD). In this STD application, the proposed approach significantly outperformed the conventional Dynamic Time Warping (DTW) based approaches at significantly lower computation requirements. We propose unsupervised learning of Audio Word2Vec from audio data without human annotation using Sequence-to-sequence Audoencoder (SA). SA consists of two RNNs equipped with Long Short-Term Memory (LSTM) units: the first RNN (encoder) maps the input audio sequence into a vector representation of fixed dimensionality, and the second RNN (decoder) maps the representation back to the input audio sequence. The two RNNs are jointly trained by minimizing the reconstruction error. Denoising Sequence-to-sequence Autoencoder (DSA) is furthered proposed offering more robust learning.

169 citations

Journal ArticleDOI
TL;DR: In this paper, a multi-channels deep convolutional neural networks (MC-DCNN) is proposed for multivariate time series classification, which first learns features from individual univariate time-series in each channel, and combines information from all channels as feature representation at the final layer.
Abstract: Time series classification is related to many different domains, such as health informatics, finance, and bioinformatics. Due to its broad applications, researchers have developed many algorithms for this kind of tasks, e.g., multivariate time series classification. Among the classification algorithms, k-nearest neighbor (k-NN) classification (particularly 1-NN) combined with dynamic time warping (DTW) achieves the state of the art performance. The deficiency is that when the data set grows large, the time consumption of 1-NN with DTWwill be very expensive. In contrast to 1-NN with DTW, it is more efficient but less effective for feature-based classification methods since their performance usually depends on the quality of hand-crafted features. In this paper, we aim to improve the performance of traditional feature-based approaches through the feature learning techniques. Specifically, we propose a novel deep learning framework, multi-channels deep convolutional neural networks (MC-DCNN), for multivariate time series classification. This model first learns features from individual univariate time series in each channel, and combines information from all channels as feature representation at the final layer. Then, the learnt features are applied into a multilayer perceptron (MLP) for classification. Finally, the extensive experiments on real-world data sets show that our model is not only more efficient than the state of the art but also competitive in accuracy. This study implies that feature learning is worth to be investigated for the problem of time series classification.

168 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
91% related
Convolutional neural network
74.7K papers, 2M citations
87% related
Deep learning
79.8K papers, 2.1M citations
87% related
Image segmentation
79.6K papers, 1.8M citations
86% related
Artificial neural network
207K papers, 4.5M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023236
2022471
2021341
2020416
2019420
2018377