scispace - formally typeset
Search or ask a question
Topic

Dynamic time warping

About: Dynamic time warping is a research topic. Over the lifetime, 6013 publications have been published within this topic receiving 133130 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Two methods, which are based on full factorial experiment design and optimal orthogonal experiment design, are proposed for selecting discriminative features among candidates to improve the robustness of on-line handwritten signatures.

44 citations

Posted Content
TL;DR: In this paper, a discriminative embedding model based on recurrent neural networks (RNNs) was proposed for word discrimination tasks, which can outperform dynamic time warping on query-by-example search.
Abstract: Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search. Such embeddings can be learned discriminatively so that they are similar for speech segments corresponding to the same word, while being dissimilar for segments corresponding to different words. Recent work has found that acoustic word embeddings can outperform dynamic time warping on query-by-example search and related word discrimination tasks. However, the space of embedding models and training approaches is still relatively unexplored. In this paper we present new discriminative embedding models based on recurrent neural networks (RNNs). We consider training losses that have been successful in prior work, in particular a cross entropy loss for word classification and a contrastive loss that explicitly aims to separate same-word and different-word pairs in a "Siamese network" training setting. We find that both classifier-based and Siamese RNN embeddings improve over previously reported results on a word discrimination task, with Siamese RNNs outperforming classification models. In addition, we present analyses of the learned embeddings and the effects of variables such as dimensionality and network structure.

44 citations

Journal ArticleDOI
TL;DR: Time warping and motion vector blending at the juncture of two divisemes and the algorithm to search the optimal concatenated visible speech are developed to provide the final concatenative motion sequence.
Abstract: We present a technique for accurate automatic visible speech synthesis from textual input. When provided with a speech waveform and the text of a spoken sentence, the system produces accurate visible speech synchronized with the audio signal. To develop the system, we collected motion capture data from a speaker's face during production of a set of words containing all diviseme sequences in English. The motion capture points from the speaker's face are retargeted to the vertices of the polygons of a 3D face model. When synthesizing a new utterance, the system locates the required sequence of divisemes, shrinks or expands each diviseme based on the desired phoneme segment durations in the target utterance, then moves the polygons in the regions of the lips and lower face to correspond to the spatial coordinates of the motion capture data. The motion mapping is realized by a key-shape mapping function learned by a set of viseme examples in the source and target faces. A well-posed numerical algorithm estimates the shape blending coefficients. Time warping and motion vector blending at the juncture of two divisemes and the algorithm to search the optimal concatenated visible speech are also developed to provide the final concatenative motion sequence. Copyright © 2004 John Wiley & Sons, Ltd.

44 citations

Journal ArticleDOI
TL;DR: Empirical evidence of loose satisfaction of these properties with real speech will be presented, allowing the assumption of a “loose metric space” structure in the set of parametric representations of words in a given vocabulary.

44 citations

Book ChapterDOI
18 Nov 2007
TL;DR: This paper presents an efficient indexing and retrieval scheme for searching in document image databases that achieves high precision and recall, using a large image corpus consisting of seven Kalidasa's books in the Telugu language.
Abstract: This paper presents an efficient indexing and retrieval scheme for searching in document image databases. In many non-European languages, optical character recognizers are not very accurate. Word spotting - word image matching - may instead be used to retrieve word images in response to a word image query. The approaches used for word spotting so far, dynamic time warping and/or nearest neighbor search, tend to be slow. Here indexing is done using locality sensitive hashing (LSH) - a technique which computes multiple hashes - using word image features computed at word level. Efficiency and scalability is achieved by content-sensitive hashing implemented through approximate nearest neighbor computation. We demonstrate that the technique achieves high precision and recall (in the 90% range), using a large image corpus consisting of seven Kalidasa's (a well known Indian poet of antiquity) books in the Telugu language. The accuracy is comparable to using dynamic time warping and nearest neighbor search while the speed is orders of magnitude better - 20000 word images can be searched in milliseconds.

44 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
91% related
Convolutional neural network
74.7K papers, 2M citations
87% related
Deep learning
79.8K papers, 2.1M citations
87% related
Image segmentation
79.6K papers, 1.8M citations
86% related
Artificial neural network
207K papers, 4.5M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023236
2022471
2021341
2020416
2019420
2018377