scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 1979"


Journal ArticleDOI
H. Sakoe1
TL;DR: A general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns and Computation time and memory requirement are both proved to be within reasonable limits.
Abstract: This paper reports a pattern matching approach to connected word recognition. First, a general principle of connected word recognition is given based on pattern matching between unknown continuous speech and artificially synthesized connected reference patterns. Time-normalization capability is allowed by use of dynamic programming-based time-warping technique (DP-matching). Then, it is shown that the matching process is efficiently carried out by breaking it down into two steps. The derived algorithm is extensively subjected to recognition experiments. It is shown in a talker-adapted recognition experiment that digit data (one to four digits) connectedly spoken by five persons are recognized with as high as 99.6 percent accuracy. Computation time and memory requirement are both proved to be within reasonable limits.

289 citations


Journal ArticleDOI
TL;DR: A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary, and shows error rates that are comparable to, or better than, those obtained with speaker-trained isolatedword recognition systems.
Abstract: A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large database consisting of 100 replications of each word (i.e., once by each of 100 talkers). The recognition system, which accepts telephone quality speech input, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule. Results for several test sets of data are presented. They show error rates that are comparable to, or better than, those obtained with speaker-trained isolated word recognition systems.

245 citations


Proceedings ArticleDOI
01 Apr 1979
TL;DR: In this paper, a speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary, which are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers).
Abstract: A speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers). The recognition system, which uses telephone recordings, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule to lower the probability of error. Results are presented on two test sets of data which show error rates that are comparable to, or better than, those obtained with speaker trained, isolated word recognition systems.

120 citations



Journal ArticleDOI
TL;DR: The purpose of this investigation is to study the effects of variations on the performance of different algorithms for a realistic speech data base, and the performance index is based on speed of operation, memory requirements, and recognition accuracy of the algorithm.
Abstract: The technique of dynamic programming for time registration of a reference and a test utterance has found widespread use in the area of discrete word recognition. Recently a number of variations on the basic time warping algorithms have been proposed by Sakoe and Chiba, and Rabiner, Rosenberg, and Levinson. These algorithms all assume the test input is an isolated word whose endpoints are known (at least approximately). The major difference in the methods are the global path constraints (i.e., the region of possible paths), the local continuity constraints on the path, and the distance weighting and normalization used to give the overall minimum distance. The purpose of this investigation is to study the effects of such variations on the performance of different algorithms for a realistic speech data base. The performance index is based on speed of operation, memory requirements, and recognition accuracy of the algorithm. Preliminary results indicate, in most cases, only small differences in performance among the various methods.

14 citations


Journal ArticleDOI
TL;DR: It is shown that a first-in first-out (FIFO) assumption for channels that produce time warping, or delay modulation, in signals passing through them is compelling on physical grounds and vastly simplifies ensuing analysis.
Abstract: Channels (i.e., operators) are studied that produce time warping, or delay modulation, in signals passing through them, and many interesting properties of these channels are developed. It is shown that a first-in first-out (FIFO) assumption for such channels is compelling on physical grounds and vastly simplifies ensuing analysis. Two descriptions of the channel, the "send-delay" and "receive-delay" functions, are compared, and it is shown that one is precisely the shape needed to equalize or unwarp signals warped by the other. A series expansion for time-warped signals is developed, and the unitary nature of the warp operators is exploited to generate rich sets of orthonormal signals. The random time-warp channel is then analyzed, and certain statistics such as the autocorrelation function of the output signals are developed, along with conditions on their stationarity. Finally, optimum linear filters for extracting a signal from a noisy and time-warped version are derived and compared with some previous results.

7 citations


Proceedings ArticleDOI
01 Apr 1979
TL;DR: Under the present restriction to vowel spectra adaptation methods by spectral amplitude weighting and by spectral shifting are investigated, by a special method it was enabled to adapt test spectra class specifically.
Abstract: An automatic speech recognition system based on the reference set of a single speaker can be extended for use by several speakers by applying appropriate preprocessing transformations. These transformations adapt the incoming patterns of a new speaker to the patterns of the reference set. Under the present restriction to vowel spectra adaptation methods by spectral amplitude weighting and by spectral shifting are investigated. By a special method it was enabled to adapt test spectra class specifically.

4 citations


Journal ArticleDOI
TL;DR: In this paper, a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance are expanded by orthogonal polynomial representations and compared with stored reference functions.
Abstract: This paper describes new techniques for automatic speaker verification using telephone speech. The operation of the system is based on a set of functions of time obtained from acoustic analysis of a fixed, sentence‐long utterance. These time functions are expanded by orthogonal polynomial representations and compared with stored reference functions. After dynamic time warping, a decision is made to accept or reject an identity claim. Three sets of experimental utterances were used for the evaluation of the system. The first and second sets each comprises 50 utterances by 10 customers each and a single utterance by 40 imposters recorded over a conventional telephone connection. The third set comprises 26 utterances by 21 customers each and a single utterance by 55 imposters recorded over a high quality microphone. The first and third sets were uttered by male speakers, whereas the second set was uttered by female speakers. Reference functions and decision thresholds were updated for each customer. The eval...

2 citations