scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 1981"


Journal ArticleDOI
TL;DR: In this paper, a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed.
Abstract: This paper describes new techniques for automatic speaker verification using telephone speech. The operation of the system is based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance. Cepstrum coefficients are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed. The time functions are expanded by orthogonal polynomial representations and, after a feature selection procedure, brought into time registration with stored reference functions to calculate the overall distance. This is accomplished by a new time warping method using a dynamic programming technique. A decision is made to accept or reject an identity claim, based on the overall distance. Reference functions and decision thresholds are updated for each customer. Several sets of experimental utterances were used for the evaluation of the system, which include male and female utterances recorded over a conventional telephone connection. Male utterances processed by ADPCM and LPC coding systems were used together with unprocessed utterances. Results of the experiment indicate that verification error rate of one percent or less can be obtained even if the reference and test utterances are subjected to different transmission conditions.

1,187 citations


Journal ArticleDOI
TL;DR: The theoretical differences and similarities among the various algorithms for automatic connected-word recognition are discussed and an experimental comparison shows that for typical applications, the level-building algorithm performs better than either the two-level DP matching or the sampling algorithm.
Abstract: Several different algorithms have been proposed for time registering a test pattern and a concatenated (isolated word) sequence of reference patterns for automatic connected-word recognition. These algorithms include the two-level, dynamic programming algorithm, the sampling approach and the level-building approach. In this paper, we discuss the theoretical differences and similarities among the various algorithms. An experimental comparison of these algorithms for a connected-digit recognition task is also given. The comparison shows that for typical applications, the level-building algorithm performs better than either the two-level DP matching or the sampling algorithm.

458 citations


Journal ArticleDOI
TL;DR: The resulting algorithm is shown to be significantly more efficient than the one recently proposed by Sakoe for connected word recognition, while maintaining the same accuracy in estimating the best possible matching string.
Abstract: Dynamic time warping has been shown to be an effective method of handling variations in the time scale of polysyllabic words spoken in isolation. This class of techniques has recently been applied to connected word recognition with high degrees of success. In this paper a level building technique is proposed for optimally time aligning a sequence of connected words with a sequence of isolated word reference patterns. The resulting algorithm, which has been found to be a special case of an algorithm previously described by Bahl and Jelinek, is shown to be significantly more efficient than the one recently proposed by Sakoe for connected word recognition, while maintaining the same accuracy in estimating the best possible matching string. An analysis of the level building method shows that it can be obtained as a modification to the Sakoe method by reversing the order of minimizations in the two-pass technique with some subsequent processing. This level building algorithm has a number of implementation parameters that can be used to control the efficiency of the method, as well as its accuracy. The nature of these parameters is discussed in this paper. In a companion paper we discuss the application of this level building time warping method to a connected digit recognition problem.

288 citations


Journal ArticleDOI
TL;DR: A novel method for recognizing a string of connected digits based upon the use of a recently proposed level-building dynamic time warping (DTW) algorithm that attempts to build up the string, level-by-level, by comparing portions of the test string to isolated digit reference patterns.
Abstract: In this paper we present a novel method for recognizing a string of connected digits based upon the use of a recently proposed level-building dynamic time warping (DTW) algorithm. The recognition system attempts to build up the string, level-by-level (i.e., digit-by-digit), by comparing portions of the test string to isolated digit reference patterns. A backtracking procedure is used to find the "best" string (i.e., minimum accumulated distance) as well as a set of reasonable alternative candidates. The system was tested on a number of talkers speaking variable length digit strings (from two to five digits) over dialed up telephone lines. String error rates of 4.8 percent and 4.6 percent were obtained for speaker-trained and speaker-independent systems. Word error rates of 0.7 percent (for speaker-trained tests) and 0.9 percent (for speaker-independant tests) were obtained. The digit reference templates were obtained from autocorrelation averaging of a pair of isolated word templates for each digit of the speaker-trained system, and from a clustering analysis of isolated words for the speaker-independent system.

113 citations


Patent
20 Mar 1981
TL;DR: In this article, an orthogonal array of interconnected cells which are adapted for dynamic programming and for extending data and control information in a generally left-toright direction as well as in a bottom-to-top direction is presented.
Abstract: Known signal processors for matching signal patterns commonly compare an unknown signal with one of a set of reference signals Various comparison techniques are known One comparison technique for solving a parenthesization problem includes an orthogonal array of interconnected cells which are adapted for dynamic programming and for extending data and control information in a generally left-to-right direction as well as in a bottom-to-top direction For solving a pattern matching problem, known arrangements for extending control information in a generally left-to-right or bottom-to-top direction do not appear to be satisfactory The disclosed signal processor for matching signal patterns and for dynamically time warping an unknown input signal with a reference input signal generates a measure of the correspondence between the input signals In generating the correspondence measure, the processor includes an arrangement for controlling all processor cells on a predetermined diagonal of the array of cells Thereby all cells coupled to the diagonal can operate in parallel to increase and improve the efficiency of the signal processor The processor also includes an arrangement for controlling all processor cells on each diagonal of the array of cells As a result, not only can all cells on each diagonal operate in parallel but also each of the plurality of diagonals can operate in parallel for processing the same or different sets of input signals Thereby, a still further increase in the efficiency of the signal processor obtains

57 citations


Proceedings ArticleDOI
01 Apr 1981
TL;DR: This algorithm is shown to be significantly more efficient than the one proposed by Sakoe while solving the exact same problem and the effectiveness of the proposed algorithm for connected digit recognition is experimentally verified.
Abstract: The technique of dynamic time warping has proven itself reliable and robust for a wide variety of isolated word recognition tasks. Recently extensions of the algorithm have been investigated for application to the problem of connected word recognition. In this paper a level building technique is proposed for optimally aligning a test pattern, consisting of a sequence of connected words, with a sequence of isolated word reference patterns. This algorithm is shown to be significantly more efficient than the one proposed by Sakoe while solving the exact same problem. Implementation parameters for the level building algorithm are presented and the effectiveness of the proposed algorithm for connected digit recognition is experimentally verified.

31 citations


PatentDOI
TL;DR: In this paper, the reference candidate series of overlap-words is transformed under dynamic time warping so as to time-match the utterance series of overlapping-words, i.e., words whose first phoneme is the end phoneme of the preceding word in a string of words.
Abstract: Recognition of continuous speech by comparison with prestored isolated words may be confused by the merging together of spoken adjacent words (coarticulation). Improved recognition is attained by generating overlap-words, e.g., words whose first phoneme is the end phoneme of the preceding word in a string of words. The reference candidate series of overlap-words is transformed under dynamic time warping so as to time-match the utterance series of overlap-words.

30 citations


Journal ArticleDOI
TL;DR: It is shown that, based on results from some simple word spotting and connected word recognition experiments, the local minimum method performs considerably better than the fixed-range method.
Abstract: Several variations on algorithms for dynamic time warping for speech processing applications have been proposed. This paper compares two of these algorithms, the fixed-range method and the local minimum method. We show that, based on results from some simple word spotting and connected word recognition experiments, the local minimum method performs considerably better than the fixed-range method. We describe explanations of this behavior and techniques for optimizing the parameters of the local minimum algorithm for both word spotting and connected word recognition.

20 citations


Proceedings ArticleDOI
01 Apr 1981
TL;DR: This paper describes a CMOS integrated array processor for computing the dynamic time warp algorithm which allows many popular variations including LPC and frequency domain representations of speech.
Abstract: Dynamic time warping is an established technique for time alignment and comparison of speech segments in speech recognition. This paper describes a CMOS integrated array processor for computing the dynamic time warp algorithm. It allows many popular variations including LPC and frequency domain representations of speech. High speed is obtained by extensive pipelining, parallel computation, and simultaneous matching of multiple patterns. A realistic application using 40 nine-component LPC vectors per word permits 10,000 word comparisons per second or, equivalently, real time recognition of a 10,000 word vocabulary.

18 citations


01 Jun 1981
TL;DR: In this paper, the effects of two major design choices on the performance of an isolated word speech recognition system are examined in detail, including the choice of a warping algorithm among the Itakura asymmetric, the Sakoe and Chiba symmetric, and the SAKO and CHI asymmetric.
Abstract: In this paper, the effects of two major design choices on the performance of an isolated word speech recognition system are examined in detail. They are: 1) the choice of a warping algorithm among the Itakura asymmetric, the Sakoe and Chiba symmetric, and the Sakoe and Chiba asymmetric, and 2) the size of the warping window to reduce computation time. Two vocabularies were used: the digits (zero, one,..., nine) and a highly confusable subset of the alphabet (b, c, d, e, g, p, t, v, z). The Itakura asymmetric warping algorithm appears to be slightly better than the other two for the confusable vocabulary. We discuss the reasons why the performance of the algorithms is vocabulary dependent. Finally, for the data used in our experiments, a warping window of about 100 ms appears to be optimal.

13 citations


Proceedings ArticleDOI
01 Apr 1981
TL;DR: The results of this investigation clearly show that Markel's technique is superior for applications using very short speech segments for both the speaker models and the recognition trials.
Abstract: This paper describes the design and implementation of a realtime speaker recognition system. The system performs text independent, closed set speaker recognition with up to 30 talkers in realtime. In addition, the reference speech used to characterize the 30 talkers can be extracted from as little as 10 seconds of speech from each talker, and the actual recognition performed with less than one minute of speech from the unknown talker. Two speaker recognition algorithms previously developed by Markel and Pfeifer were investigated for use in the realtime system. The results of this investigation clearly show that Markel's technique is superior for applications using very short speech segments for both the speaker models and the recognition trials. Markel's technique was implemented in realtime in a high speed progranmable signal processor. A test of this implementation with a set of 30 male speakers resulted in recognition accuracies of 93-100% for models generated with only 10 seconds of speech, and recognition trials using only 10 seconds of unknown speech.

Proceedings ArticleDOI
01 Jan 1981
TL;DR: In this paper, the authors describe a system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least.
Abstract: In this paper we describe a system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least. This is accomplished by means of a single monolithic algorithm in which temporal registration, segmentation and grammatical analysis are performed simultaneously. The algorithm is a syntax-directed version of the level building dynamic time warping algorithm of Myers and Rabiner. A test was conducted on a total of 208 sentences comprising 1781 words and spoken by two male and two female speakers. The sentences were composed from a 127 word vocabulary according to a moderately complex grammar and semantic structure appropriate to an airline information and reservation task. Test results revealed a 13% sentence error rate and a 6% word error rate.

01 Jan 1981
TL;DR: A system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least by means of a single monolithic algorithm.
Abstract: In this paper we describe a system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least. This is accomplished by means of a single monolithic algorithm in which temporal registration, segmentation and grammatical analysis are performed simultaneously. The algorithm is a syntax-directed version of the level building dynamic time warping algorithm of Myers and Rabiner. A test was conducted on a total of 208 sentences comprising 1781 words and spoken by two male and two female speakers. The sentences were composed from a 127 word vocabulary according to a moderately complex grammar and semantic structure appropriate to an airline information and reservation task. Test results revealed a 13% sentence error rate and a 6% word error rate.

Proceedings ArticleDOI
Hermann Ney1
01 Apr 1981
TL;DR: A speaker recognition system is investigated which operates on telephone speech and performs speech analysis by means of the clipped autocorrelation function, and the time warping method based on dynamic programming is used to bring sample utterances into time registration with reference utterances.
Abstract: A speaker recognition system is investigated which operates on telephone speech and performs speech analysis by means of the clipped autocorrelation function. The advantages of the clipped autocorrelation function are its simple computation and its reduced dynamic variability as compared to the standard autocorrelation function. Utterances are represented by time contours of clipped autocorrelation coefficients. The time warping method based on dynamic programming is used to bring sample utterances into time registration with reference utterances. Different methods of preprocessing the time contours are studied with respect to speaker discrimination. For cooperative speakers, verification error rates of 3% and less than 2% were obtained using speaker independent and speaker individual thresholds, respectively.

Journal ArticleDOI
TL;DR: An architecture which exploits the capabilities of custom MOS‐LSI designs to implement a complete speech recognition system which would operate in real time using dynamic time warping, yet it would only require 4–5 integrated circuits for a moderate vocabulary.
Abstract: In the past several years, a number of very accurate, isolated word speech recognition systems based on dynamic programming techniques have been designed and tested. However, as these techniques are computationally intensive, commercial systems using dynamic time warping have been costly. We have designed an architecture which exploits the capabilities of custom MOS‐LSI designs to implement a complete speech recognition system. This system would operate in real time using dynamic time warping, yet it would only require 4–5 integrated circuits for a moderate (50–200 word) vocabulary. This system is designed to be expandable so that larger vocabularies can be used by including additional IC's in parallel with the others. The integrated circuits which are required are two custom‐designed chips, a memory IC, and a low‐performance microcomputer for overall control. The custom chips include a front end processor for spectral analysis (currently a switched‐capacitor filter bank) with an endpoint detector, and an...

Journal ArticleDOI
TL;DR: In this article, a linguistically based set of duration rules were developed to predict syllable duration as a function of syllable stress level and the position of the syllable within the word.
Abstract: It has previously been demonstrated that reliable, speaker‐trained, isolated word recognition on a 1109‐word Basic English vocabulary can be performed using word templates formed by concatenation of elements from a corpus of demisyllables. Since a dynamic time warping (DTW) algorithm is used to align test and reference patterns, small to moderate differences in duration between test and reference words present no major problem in performing the time alignment. However, improved results (i.e., smaller word distances) are obtained from the DTW algorithm if the syllables of the test and reference words are properly aligned prior to dynamic time warping. In our earlier experiments, each concatenated reference word was linearly prenormalized to the duration of that word in the test set, but this procedure is clearly not applicable for continuous speech recognition. We have now developed a linguistically based set of duration rules which we apply to the demisyllables during the word‐creation process (i.e., before DTW), which predict syllable duration as a function of syllable stress level and the position of the syllable within the word. Using the automatic duration rules, we have achieved recognition accuracies comparable to those based on known word durations.