scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 1984"


Book ChapterDOI
01 Jan 1984
TL;DR: Two programs are provided, one that generates Ipc and autocorrelation coefficients from the speech utterances and the other that, using dynamic programming, compares the test utterance with the reference utterance and finds the best match.
Abstract: Two programs are provided, one that generates Ipc and autocorrelation coefficients from the speech utterances and the other that, using dynamic programming, compares the test utterance with the reference utterances and finds the best match. The method used is Constrained Endpoint with 2-to-l range of slope.

412 citations


Journal ArticleDOI
Hermann Ney1
TL;DR: The algorithm to be developed is essentially identical to one presented by Vintsyuk and later by Bridle and Brown, but the notation and the presentation have been clarified and the computational expenditure per word is independent of the number of words in the input string.
Abstract: This paper is of tutorial nature and describes a one-stage dynamic programming algorithm for file problem of connected word recognition. The algorithm to be developed is essentially identical to one presented by Vintsyuk [1] and later by Bridle and Brown [2] ; but the notation and the presentation have been clarified. The derivation used for optimally time synchronizing a test pattern, consisting of a sequence of connected words, is straightforward and simple in comparison with other approaches decomposing the pattern matching problem into several levels. The approach presented relies basically on parameterizing the time warping path by a single index and on exploiting certain path constraints both in the word interior and at the word boundaries. The resulting algorithm turns out to be significantly more efficient than those proposed by Sakoe [3] as well as Myers and Rabiner [4], while providing the same accuracy in estimating the best possible matching string. Its most important feature is that the computational expenditure per word is independent of the number of words in the input string. Thus, it is well suited for recognizing comparatively long word sequences and for real-time operation. Furthermore, there is no need to specify the maximum number of words in the input string. The practical implementation of the algorithm is discussed; it requires no heuristic rules and no overhead. The algorithm can be modified to deal with syntactic constraints in terms of a finite state syntax.

364 citations


Journal ArticleDOI
Biing-Hwang Juang1
TL;DR: A unified theoretical view of the Dynamic Time Warping (DTW) and the Hidden Markov Model (HMM) techniques for speech recognition problems is given and offers insights into the effectiveness of the probabilistic models in speech recognition applications.
Abstract: This paper gives a unified theoretical view of the Dynamic Time Warping (DTW) and the Hidden Markov Model (HMM) techniques for speech recognition problems. The application of hidden Markov models in speech recognition is discussed. We show that the conventional dynamic time-warping algorithm with Linear Predictive (LP) signal modeling and distortion measurements can be formulated in a strictly statistical framework. It is further shown that the DTW/LP method is implicitly associated with a specific class of Markov models and is equivalent to the probability maximization procedures for Gaussian autoregressive multivariate probabilistic functions of the underlying Markov model. This unified view offers insights into the effectiveness of the probabilistic models in speech recognition applications.

165 citations


Journal ArticleDOI
TL;DR: This paper discusses reduced arrays which allow a continuum of tradeoffs between speed and circuit complexity, and permit design of fixed-size systems for problems which are unbounded in size.
Abstract: In a previous paper an array architecture was revealed for real-time dynamic time warping. An integrated processor was designed and built for use in such an array. This paper discusses reduced arrays which allow a continuum of tradeoffs between speed and circuit complexity. Reduced arrays permit design of fixed-size systems for problems which are unbounded in size.

22 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: This paper presents a single chip that is capable of the performing the dynamic time warp processing necessary for recognizing 1000 words in real-time.
Abstract: Dynamic time warping is considered a superior way to perform time alignment in speech recognition. [1] Unfortunately dynamic programming algorithms require too much computation for conventional computer architectures to handle and still provide good response time with 1000 reference words. This paper presents a single chip that is capable of the performing the dynamic time warp processing necessary for recognizing 1000 words in real-time.

18 citations


Journal ArticleDOI
TL;DR: The augmented continuous DP matching algorithm obtains only a near-optimal solution for the recognition principle based on pattern matching, however, it is computationally more efficient and does not require much memory storage.

15 citations


Journal ArticleDOI
TL;DR: A new approach to the design of IWSR systems involves a dynamic matching strategy based on the nature of the input speech segment, called signal-dependent matching, which is significantly better than the standard dynamic time warping matching algorithm for confusable as well as nonconfusable vocabulary.

14 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: Special-purpose hardware for calculating dynamic time-warp distances has been designed and tested utilizing technology and timing simulations indicate that the DTW time of 1 ms implemented at the board level can also be met on the integrated circuit.
Abstract: Special purpose hardware for calculating dynamic time warp distances has been designed and tested utilizing TTL technology. The "Dynamic Time Warp Processor" (DTWP) performs all of the necessary arithmetic and decision making operations for selecting a word from a given vocabulary based on log likelihood distance measurements. The speed limitation in previously designed hardware was due to programmed decision making (often referred to as combinatorics). The combinatorics have been implemented in hardware in such a way that the decisions are made in the time of several gate delays rather than the time of several program cycles. Thus, a dynamic time warp (DTW) is performed on typical 40 frame templates in less than one millisecond. The DTWP serves as a slave to a 16-bit microcomputer. It performs all of the computation and control necessary for pattern classification. A board level implementation is now operational. VLSI implementation of the processor is currently in progress. All logic has been designed in 2.5µm CMOS polycells and has been simulated on the MOTIS timing simulator. The timing simulations indicate that the DTW time of 1 msec implemented at the board level can also be met on the integrated circuit.

10 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: A systolic array wafer scale architecture is presented which exploits the parallelism and local interconnect properties of this computation.
Abstract: Current state-of-the-art speech recognition systems are based on dynamic time warping (DTW) techniques in which the dominant computational task is input/reference word template matching and input/reference word non-linear time registration. A systolic array wafer scale architecture is presented which exploits the parallelism and local interconnect properties of this computation. The array executes either isolated or connected word recognition using either LPC or spectrally based templates. Restructurable VLSI (RVLSI) technology is being used to implement such an array comprised of 65 bit-serial arithmetic processing elements on a monolithic silicon 3" wafer. Speech recognition systems based on the RVLSI circuits are projected to be capable of supporting real-time vocabularies as large as 12,000 words.

8 citations


Proceedings ArticleDOI
01 Mar 1984
TL;DR: Dynamic Time Warping is implemented using an array of identical processing elements designed to compute a local distance and update a global measure of dissimilarity.
Abstract: Dynamic Time Warping is implemented using an array of identical processing elements. Each processing element is designed to compute a local distance and update a global measure of dissimilarity. It is made up of 1900 transistors using a 2.5 micron NMOS technology. 25 processing elements and their local interconnections fit within 35mm2 of silicon that can be packaged in a standard 40 pin packaging. A single chip can handle 300 words in real time. An array of 22 chips will recognize within 200msec a syllable size pattern from a set of 6000. Various applications are taken up.

8 citations


Journal ArticleDOI
TL;DR: Special-purpose hardware for calculating dynamic time-warp distances has been designed and tested utilizing technology and timing simulations indicate that the DTW time of 1 ms implemented at the board level can also be met on the integrated circuit.
Abstract: Special-purpose hardware for calculating dynamic time-warp distances has been designed and tested utilizing technology. The Dynamic Time-Warp Processor (DTWP) performs all of the necessary arithmetic and decision-making operations for selecting a word from a given vocabulary based on log likelihood distance measurements. The speed limitation in previously designed hardware was due to programmed decision making (often referred to as combinatorics). The combinatorics have been implemented in hardware in such a way that the decisions are made in the time of several gate delays rather than the time of several program cycles. Thus, a dynamic time warp (DTW) is performed on typical 40-frame templates in less than one millisecond. The DTWP serves as a slave to a 16-bit microcomputer. It performs all of the computation and control necessary for pattern classification, and is now operating on the board level. The processor is now being implemented for very large-scale integration. All logic has been designed in 2.5 μm, Complementary Metal-Oxide Semiconductor polycells and has been simulated on the Metal-Oxide Semiconductor Timing Simulator (MOTIS). The timing simulations indicate that the DTW time of 1 ms implemented at the board level can also be met on the integrated circuit.

Journal ArticleDOI
TL;DR: This talk presents results of a series of speaker independent, isolated word recognition tests using a 10‐word digits vocabularies, and shows that the information in the prosodic energy contour complements the segmental information of the LPC spectrum, thereby providing small but consistent improvements in performance for small word vocABularies.
Abstract: The technique of vector quantization has been widely applied in the area of speech coding and has recently been introduced into the area of speech recognition. For the conventional statistical pattern recognition word recognizer using LPC feature sets as the analysis frames, the use of vector quantization leads to a large reduction in computation for the dynamic time warping pattern matching, and a concomittant small increase in average word error rate. A second technique that has been recommended for improving the performance of isolated word recognizers is the addition of temporal energy information into the distance metric for comparing frames of speech. It has been shown that the information in the prosodic energy contour complements the segmental information of the LPC spectrum, thereby providing small but consistent improvements in performance for small word vocabularies. In this talk we present results of a series of speaker independent, isolated word recognition tests using a 10‐word digits vocabu...

Proceedings ArticleDOI
19 Mar 1984
TL;DR: A new method to compensate for endpoint detection errors is proposed and an improved method is compared with two existing methods on the alphadigit vocabulary.
Abstract: Inaccurate detection of the endpoints of the test and reference patterns is a major source of errors in discrete utterance recognition by dynamic time warping. If the vocabulary contains similar sounding words whose differences are at their beginnings or ends as in the alphadigit vocabulary, the error rate may greatly increase due to endpoint detection errors. Several methods to improve the recognition accuracy by relaxing or adjusting the endpoints have been suggested. They, however, do not work well in all cases and actually the error rate may increase. We propose a new method to compensate for endpoint detection errors and compare our improved method with two existing methods on the alphadigit vocabulary.

Proceedings ArticleDOI
J. Ackenhusen1
01 Mar 1984
TL;DR: The architecture of a single board processor for executing a variety of frame-by-frame connected pattern matching techniques which use dynamic time warping is described, and the connected dynamic time warp processor (CDTWP) will operate with existing hardware presently used for isolated word recognition.
Abstract: The architecture of a single board processor for executing a variety of frame-by-frame connected pattern matching techniques which use dynamic time warping is described. The connected dynamic time warp processor (CDTWP) will operate with existing hardware presently used for isolated word recognition. The CDTWP receives input in the form of a sequence of LPC-based feature vectors calculated from a spoken string of connected words. Each input vector, presented at a period of 15 msec, is compared with each frame of every reference template and the results are used to continuously update a hypothesized concatenation of reference templates that best matches the input. This comparison and update operation is completed for a given test frame before the next test frame arrives 15 msec later, and as a result, the CDTWP may be used to recognize the earlier portions of a connected string before the later portions have been spoken. The CDTWP is an experimental tool to examine a class of connected word recognition algorithms and architectures in real time. As a result, it is designed to be programmable and to fit within an existing word recognition system. The programmability and single board constraints limit the total number of reference frames to 455 (about 11 word templates). Current design work on a processor for isolated word dynamic time warping, the DTWP, suggests that a second generation CDTWP may handle 6000 reference frames (150 templates) by the use of existing nonprogrammable processing elements and addition of more template memory.

Proceedings ArticleDOI
01 Mar 1984
TL;DR: A maximum likelihood (ML) classifier for discriminating between nonstationary Gaussian time series can be implemented by correlating the data spectrogram with templates that are constructed from ensemble average reference spectrograms, which implies that peak-oriented models of random processes are suboptimum for ML classification under high SNR conditions.
Abstract: A maximum likelihood (ML) classifier for discriminating between nonstationary Gaussian time series can be implemented by correlating the data spectrogram with templates that are constructed from ensemble average reference spectrograms. The time window used to synthesize the spectrograms must have a duration that is longer than the decorrelation time of the data in the neighborhood of the window. If the data time series exhibits significant nonstationarity within this decorrelation time, Karhunen-Loeve (K-L) basis functions should ideally be used to construct a generalized spectrogram, rather than using a standard spectrogram constructed with the usual sinusoidal basis functions. Utilization of a standard spectrogram imposes forced, pseudo-stationarity by approximating the autocovariance function of the data by the short-time autocorrelation function. This forced stationarity is routinely used to obtain linear prediction coefficients (LPC). When signal to interference ratio (SNR) is large, the templates that are used to classify a data spectrogram are sensitive to differences in the locations of nulls or zeroes in the expected signal spectrograms from different data classes. This null sensitivity seems to imply that peak-oriented models of random processes, e.g. , the all pole representation that is associated with LPC, are suboptimum for ML classification under high SNR conditions. Compensation for time warping is especially necessary if window durations are data dependent. Spectrogram implementation of the ML classifier yields a new similarity index for time warp compensation.

Journal ArticleDOI
TL;DR: It is found that a simple rule that reduces the length of rhyme (final) demisyllables in nonword-final stressed syllables to approximately half their isolated-syllable duration provides recognition accuracy as high as that attained through use of complex, highly context-sensitive rules.
Abstract: In a recently proposed approach to isolated‐word recognition, word reference templates are constructed from a universal set of demisyllable units by concatenating the appropriate demisyllables for each vocabulary item. A dynamic time warping (DTW) algorithm is used to align test and reference patterns optimally. Nevertheless some sort of syllable duration preadjustment is necessary because of the large potential difference in duration between isolated and in‐context syllables. We have found that a simple rule that reduces the length of rhyme (final) demisyllables in nonword‐final stressed syllables to approximately half their isolated‐syllable duration provides recognition accuracy as high as that attained through use of complex, highly context‐sensitive rules. In addition to its practical application, this result can be regarded as a further demonstration of the power of DTW. We have also investigated the requirements for parameter smoothing at demisyllable boundaries. We find that an optimal window duration for smoothing is about 60–90 ms, but that failure to smooth reduces recognition accuracy only about 2% in an 1109 word test set; that linear and parabolic smoothing are equally effective; and that it does not appear that recognition accuracy can be improved by smoothing in certain phonetic contexts only. Taken together, these results can be viewed as confirming the suitability of the demisyllable as the basic unit in recognition systems.

Journal ArticleDOI
TL;DR: An analysis of recognition errors in a speaker‐independent digit recognition experiment has demonstrated that the temporal energy contour of the teat tokens is an important feature for recognition and perception.
Abstract: An analysis of recognition errors in a speaker‐independent digit recognition experiment has demonstrated that the temporal energy contour of the teat tokens is an important feature for recognition and perception. This finding supports the conclusions of other recent studies [Rabiner et al., Proc. ICASSP 17.1.1 (1984)], [Rabiner et al., J. Acoust. Soc. Am. Suppl. 1 75, S93 (1984)]. A conventional LPC analysis with a likelihood ratio distance measure and dynamic time warping was used. This recognizer ignores the temporal energy contour. To help us understand the nature of the recognition errors, we played the test tokens through an LPG synthesizer after time alignment with the correct and with the successful (but incorrect) reference templates. Examples on tape demonstrate that the percept of either the correct or of the successful reference word can be evoked by replacing the input energy contour with the reference energy contour.

Book ChapterDOI
01 Jan 1984
TL;DR: When a problem can be divided into independent subproblems (cf problem reduction, AND/OR trees) concurrent solution of them is possible and may be advantageous, and Kornfeld’s ETHER language permits experimentation with concurrency in heuristic search.
Abstract: When a problem can be divided into independent subproblems (cf problem reduction , AND/OR trees) concurrent solution of them is possible and may be advantageous. For example. Kornfeld’s ETHER language permits experimentation with concurrency in heuristic search . and Smith has implemented a contract-net system motivated by the metaphor of manager — contractor linkage.