Showing papers on "Dynamic time warping published in 1981"

PDF

Open Access

Journal Article•DOI•

Cepstral analysis technique for automatic speaker verification

[...]

01 Apr 1981-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: In this paper, a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed.

...read moreread less

Abstract: This paper describes new techniques for automatic speaker verification using telephone speech. The operation of the system is based on a set of functions of time obtained from acoustic analysis of a fixed, sentence-long utterance. Cepstrum coefficients are extracted by means of LPC analysis successively throughout an utterance to form time functions, and frequency response distortions introduced by transmission systems are removed. The time functions are expanded by orthogonal polynomial representations and, after a feature selection procedure, brought into time registration with stored reference functions to calculate the overall distance. This is accomplished by a new time warping method using a dynamic programming technique. A decision is made to accept or reject an identity claim, based on the overall distance. Reference functions and decision thresholds are updated for each customer. Several sets of experimental utterances were used for the evaluation of the system, which include male and female utterances recorded over a conventional telephone connection. Male utterances processed by ADPCM and LPC coding systems were used together with unprocessed utterances. Results of the experiment indicate that verification error rate of one percent or less can be obtained even if the reference and test utterances are subjected to different transmission conditions.

...read moreread less

1,187 citations

Journal Article•DOI•

A comparative study of several dynamic time-warping algorithms for connected-word recognition

[...]

C. S. Myers, Lawrence R. Rabiner

01 Sep 1981-Bell System Technical Journal

TL;DR: The theoretical differences and similarities among the various algorithms for automatic connected-word recognition are discussed and an experimental comparison shows that for typical applications, the level-building algorithm performs better than either the two-level DP matching or the sampling algorithm.

...read moreread less

Abstract: Several different algorithms have been proposed for time registering a test pattern and a concatenated (isolated word) sequence of reference patterns for automatic connected-word recognition. These algorithms include the two-level, dynamic programming algorithm, the sampling approach and the level-building approach. In this paper, we discuss the theoretical differences and similarities among the various algorithms. An experimental comparison of these algorithms for a connected-digit recognition task is also given. The comparison shows that for typical applications, the level-building algorithm performs better than either the two-level DP matching or the sampling algorithm.

...read moreread less

458 citations

Journal Article•DOI•

A level building dynamic time warping algorithm for connected word recognition

[...]

C. Myers¹, Lawrence R. Rabiner²•Institutions (2)

Massachusetts Institute of Technology¹, Bell Labs²

01 Apr 1981-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The resulting algorithm is shown to be significantly more efficient than the one recently proposed by Sakoe for connected word recognition, while maintaining the same accuracy in estimating the best possible matching string.

...read moreread less

Abstract: Dynamic time warping has been shown to be an effective method of handling variations in the time scale of polysyllabic words spoken in isolation. This class of techniques has recently been applied to connected word recognition with high degrees of success. In this paper a level building technique is proposed for optimally time aligning a sequence of connected words with a sequence of isolated word reference patterns. The resulting algorithm, which has been found to be a special case of an algorithm previously described by Bahl and Jelinek, is shown to be significantly more efficient than the one recently proposed by Sakoe for connected word recognition, while maintaining the same accuracy in estimating the best possible matching string. An analysis of the level building method shows that it can be obtained as a modification to the Sakoe method by reversing the order of minimizations in the two-pass technique with some subsequent processing. This level building algorithm has a number of implementation parameters that can be used to control the efficiency of the method, as well as its accuracy. The nature of these parameters is discussed in this paper. In a companion paper we discuss the application of this level building time warping method to a connected digit recognition problem.

...read moreread less

288 citations

Journal Article•DOI•

Connected digit recognition using a level-building DTW algorithm

[...]

C. S. Myers¹, Lawrence R. Rabiner•Institutions (1)

Bell Labs¹

01 Jun 1981-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A novel method for recognizing a string of connected digits based upon the use of a recently proposed level-building dynamic time warping (DTW) algorithm that attempts to build up the string, level-by-level, by comparing portions of the test string to isolated digit reference patterns.

...read moreread less

Abstract: In this paper we present a novel method for recognizing a string of connected digits based upon the use of a recently proposed level-building dynamic time warping (DTW) algorithm. The recognition system attempts to build up the string, level-by-level (i.e., digit-by-digit), by comparing portions of the test string to isolated digit reference patterns. A backtracking procedure is used to find the "best" string (i.e., minimum accumulated distance) as well as a set of reasonable alternative candidates. The system was tested on a number of talkers speaking variable length digit strings (from two to five digits) over dialed up telephone lines. String error rates of 4.8 percent and 4.6 percent were obtained for speaker-trained and speaker-independent systems. Word error rates of 0.7 percent (for speaker-trained tests) and 0.9 percent (for speaker-independant tests) were obtained. The digit reference templates were obtained from autocorrelation averaging of a pair of isolated word templates for each digit of the speaker-trained system, and from a clustering analysis of isolated words for the speaker-independent system.

...read moreread less

113 citations

Patent•

Time warp signal recognition processor for matching signal patterns

[...]

Bryan D. Ackland¹, David J. Burr¹, Neil Weste¹•Institutions (1)

Bell Labs¹

20 Mar 1981

TL;DR: In this article, an orthogonal array of interconnected cells which are adapted for dynamic programming and for extending data and control information in a generally left-toright direction as well as in a bottom-to-top direction is presented.

...read moreread less

Abstract: Known signal processors for matching signal patterns commonly compare an unknown signal with one of a set of reference signals Various comparison techniques are known One comparison technique for solving a parenthesization problem includes an orthogonal array of interconnected cells which are adapted for dynamic programming and for extending data and control information in a generally left-to-right direction as well as in a bottom-to-top direction For solving a pattern matching problem, known arrangements for extending control information in a generally left-to-right or bottom-to-top direction do not appear to be satisfactory The disclosed signal processor for matching signal patterns and for dynamically time warping an unknown input signal with a reference input signal generates a measure of the correspondence between the input signals In generating the correspondence measure, the processor includes an arrangement for controlling all processor cells on a predetermined diagonal of the array of cells Thereby all cells coupled to the diagonal can operate in parallel to increase and improve the efficiency of the signal processor The processor also includes an arrangement for controlling all processor cells on each diagonal of the array of cells As a result, not only can all cells on each diagonal operate in parallel but also each of the plurality of diagonals can operate in parallel for processing the same or different sets of input signals Thereby, a still further increase in the efficiency of the signal processor obtains

...read moreread less

57 citations

Proceedings Article•DOI•

Connected word recognition using a level building dynamic time warping algorithm

[...]

C. S. Myers¹, Lawrence R. Rabiner•Institutions (1)

Bell Labs¹

01 Apr 1981

TL;DR: This algorithm is shown to be significantly more efficient than the one proposed by Sakoe while solving the exact same problem and the effectiveness of the proposed algorithm for connected digit recognition is experimentally verified.

...read moreread less

Abstract: The technique of dynamic time warping has proven itself reliable and robust for a wide variety of isolated word recognition tasks. Recently extensions of the algorithm have been investigated for application to the problem of connected word recognition. In this paper a level building technique is proposed for optimally aligning a test pattern, consisting of a sequence of connected words, with a sequence of isolated word reference patterns. This algorithm is shown to be significantly more efficient than the one proposed by Sakoe while solving the exact same problem. Implementation parameters for the level building algorithm are presented and the effectiveness of the proposed algorithm for connected digit recognition is experimentally verified.

...read moreread less

31 citations

Patent•DOI•

Continuous speech recognition system

[...]

Frank Christopher Pirz¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Apr 1981-Journal of the Acoustical Society of America

TL;DR: In this paper, the reference candidate series of overlap-words is transformed under dynamic time warping so as to time-match the utterance series of overlapping-words, i.e., words whose first phoneme is the end phoneme of the preceding word in a string of words.

...read moreread less

Abstract: Recognition of continuous speech by comparison with prestored isolated words may be confused by the merging together of spoken adjacent words (coarticulation). Improved recognition is attained by generating overlap-words, e.g., words whose first phoneme is the end phoneme of the preceding word in a string of words. The reference candidate series of overlap-words is transformed under dynamic time warping so as to time-match the utterance series of overlap-words.

...read moreread less

30 citations

Journal Article•DOI•

On the use of dynamic time warping for word spotting and connected word recognition

[...]

C. S. Myers, Lawrence R. Rabiner, Aaron E. Rosenberg

01 Mar 1981-Bell System Technical Journal

TL;DR: It is shown that, based on results from some simple word spotting and connected word recognition experiments, the local minimum method performs considerably better than the fixed-range method.

...read moreread less

Abstract: Several variations on algorithms for dynamic time warping for speech processing applications have been proposed. This paper compares two of these algorithms, the fixed-range method and the local minimum method. We show that, based on results from some simple word spotting and connected word recognition experiments, the local minimum method performs considerably better than the fixed-range method. We describe explanations of this behavior and techniques for optimizing the parameters of the local minimum algorithm for both word spotting and connected word recognition.

...read moreread less

20 citations

Proceedings Article•DOI•

A high speed array computer for dynamic time warping

[...]

D. Burr¹, Bryan D. Ackland¹, Neil Weste¹•Institutions (1)

Bell Labs¹

01 Apr 1981

TL;DR: This paper describes a CMOS integrated array processor for computing the dynamic time warp algorithm which allows many popular variations including LPC and frequency domain representations of speech.

...read moreread less

Abstract: Dynamic time warping is an established technique for time alignment and comparison of speech segments in speech recognition. This paper describes a CMOS integrated array processor for computing the dynamic time warp algorithm. It allows many popular variations including LPC and frequency domain representations of speech. High speed is obtained by extensive pipelining, parallel computation, and simultaneous matching of multiple patterns. A realistic application using 40 nine-component LPC vectors per word permits 10,000 word comparisons per second or, equivalently, real time recognition of a 10,000 word vocabulary.

...read moreread less

18 citations

Comparative study of nonlinear time warping techniques in isolated word speech recognition systems

[...]

Alex Waibel¹, B. Yegnanarayana•Institutions (1)

Carnegie Mellon University¹

01 Jun 1981

TL;DR: In this paper, the effects of two major design choices on the performance of an isolated word speech recognition system are examined in detail, including the choice of a warping algorithm among the Itakura asymmetric, the Sakoe and Chiba symmetric, and the SAKO and CHI asymmetric.

...read moreread less

Abstract: In this paper, the effects of two major design choices on the performance of an isolated word speech recognition system are examined in detail. They are: 1) the choice of a warping algorithm among the Itakura asymmetric, the Sakoe and Chiba symmetric, and the Sakoe and Chiba asymmetric, and 2) the size of the warping window to reduce computation time. Two vocabularies were used: the digits (zero, one,..., nine) and a highly confusable subset of the alphabet (b, c, d, e, g, p, t, v, z). The Itakura asymmetric warping algorithm appears to be slightly better than the other two for the confusable vocabulary. We discuss the reasons why the performance of the algorithms is vocabulary dependent. Finally, for the data used in our experiments, a warping window of about 100 ms appears to be optimal.

...read moreread less

13 citations

Proceedings Article•DOI•

A realtime implementation of a text independent speaker recognition system

[...]

E. Wrench

01 Apr 1981

TL;DR: The results of this investigation clearly show that Markel's technique is superior for applications using very short speech segments for both the speaker models and the recognition trials.

...read moreread less

Abstract: This paper describes the design and implementation of a realtime speaker recognition system. The system performs text independent, closed set speaker recognition with up to 30 talkers in realtime. In addition, the reference speech used to characterize the 30 talkers can be extracted from as little as 10 seconds of speech from each talker, and the actual recognition performed with less than one minute of speech from the unknown talker. Two speaker recognition algorithms previously developed by Markel and Pfeifer were investigated for use in the realtime system. The results of this investigation clearly show that Markel's technique is superior for applications using very short speech segments for both the speaker models and the recognition trials. Markel's technique was implemented in realtime in a high speed progranmable signal processor. A test of this implementation with a set of 30 male speakers resulted in recognition accuracies of 93-100% for models generated with only 10 seconds of speech, and recognition trials using only 10 seconds of unknown speech.

...read moreread less

Proceedings Article•DOI•

Connected word recognition using a syntax-directed dynamic programming temporal alignment procedure

[...]

C. S. Myers¹, Stephen E. Levinson•Institutions (1)

Bell Labs¹

01 Jan 1981

TL;DR: In this paper, the authors describe a system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least.

...read moreread less

Abstract: In this paper we describe a system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least. This is accomplished by means of a single monolithic algorithm in which temporal registration, segmentation and grammatical analysis are performed simultaneously. The algorithm is a syntax-directed version of the level building dynamic time warping algorithm of Myers and Rabiner. A test was conducted on a total of 208 sentences comprising 1781 words and spoken by two male and two female speakers. The sentences were composed from a 127 word vocabulary according to a moderately complex grammar and semantic structure appropriate to an airline information and reservation task. Test results revealed a 13% sentence error rate and a 6% word error rate.

...read moreread less

Connected word recognition using a syntax-directed dynamic programming temporal alignment procedure.

[...]

C. S. Myers¹, Stephen E. Levinson•Institutions (1)

Bell Labs¹

01 Jan 1981

TL;DR: A system for connected word recognition in which a sentence in a formal language, uttered without pauses between words, is recognized by finding the grammatically well formed sequence of isolated word templates to which its distance is least by means of a single monolithic algorithm.

...read moreread less

Proceedings Article•DOI•

Telephone-line speaker recognition using clipped autocorrelation analysis

[...]

Hermann Ney¹•Institutions (1)

Philips¹

01 Apr 1981

TL;DR: A speaker recognition system is investigated which operates on telephone speech and performs speech analysis by means of the clipped autocorrelation function, and the time warping method based on dynamic programming is used to bring sample utterances into time registration with reference utterances.

...read moreread less

Abstract: A speaker recognition system is investigated which operates on telephone speech and performs speech analysis by means of the clipped autocorrelation function. The advantages of the clipped autocorrelation function are its simple computation and its reduced dynamic variability as compared to the standard autocorrelation function. Utterances are represented by time contours of clipped autocorrelation coefficients. The time warping method based on dynamic programming is used to bring sample utterances into time registration with reference utterances. Different methods of preprocessing the time contours are studied with respect to speaker discrimination. For cooperative speakers, verification error rates of 3% and less than 2% were obtained using speaker independent and speaker individual thresholds, respectively.

...read moreread less

Journal Article•DOI•

An architecture of an MOS‐LSI speech recognition system using dynamic programming

[...]

H. Murveit, M. Lowy, R. W. Brodersen

01 May 1981-Journal of the Acoustical Society of America

TL;DR: An architecture which exploits the capabilities of custom MOS‐LSI designs to implement a complete speech recognition system which would operate in real time using dynamic time warping, yet it would only require 4–5 integrated circuits for a moderate vocabulary.

...read moreread less

Abstract: In the past several years, a number of very accurate, isolated word speech recognition systems based on dynamic programming techniques have been designed and tested. However, as these techniques are computationally intensive, commercial systems using dynamic time warping have been costly. We have designed an architecture which exploits the capabilities of custom MOS‐LSI designs to implement a complete speech recognition system. This system would operate in real time using dynamic time warping, yet it would only require 4–5 integrated circuits for a moderate (50–200 word) vocabulary. This system is designed to be expandable so that larger vocabularies can be used by including additional IC's in parallel with the others. The integrated circuits which are required are two custom‐designed chips, a memory IC, and a low‐performance microcomputer for overall control. The custom chips include a front end processor for spectral analysis (currently a switched‐capacitor filter bank) with an endpoint detector, and an...

...read moreread less

Journal Article•DOI•

Automatic word duration rules for demisyllable‐based isolated word recognition

[...]

D. Kahn, A. E. Rosenberg

01 Nov 1981-Journal of the Acoustical Society of America

TL;DR: In this article, a linguistically based set of duration rules were developed to predict syllable duration as a function of syllable stress level and the position of the syllable within the word.

...read moreread less

Abstract: It has previously been demonstrated that reliable, speaker‐trained, isolated word recognition on a 1109‐word Basic English vocabulary can be performed using word templates formed by concatenation of elements from a corpus of demisyllables. Since a dynamic time warping (DTW) algorithm is used to align test and reference patterns, small to moderate differences in duration between test and reference words present no major problem in performing the time alignment. However, improved results (i.e., smaller word distances) are obtained from the DTW algorithm if the syllables of the test and reference words are properly aligned prior to dynamic time warping. In our earlier experiments, each concatenated reference word was linearly prenormalized to the duration of that word in the test set, but this procedure is clearly not applicable for continuous speech recognition. We have now developed a linguistically based set of duration rules which we apply to the demisyllables during the word‐creation process (i.e., before DTW), which predict syllable duration as a function of syllable stress level and the position of the syllable within the word. Using the automatic duration rules, we have achieved recognition accuracies comparable to those based on known word durations.

...read moreread less