scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 1992"


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors extend the dynamic time warping algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for application in the field of optical character recognition (OCR) or similar applications.
Abstract: The authors extend the dynamic time warping (DTW) algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for application in the field of optical character recognition (OCR) or similar applications. Although direct application of the optimality principle reduced the computational complexity somewhat, the DPW (or image alignment) problem is exponential in the dimensions of the image. It is shown that by applying constraints to the image alignment problem, e.g., limiting the class of possible distortions, one can reduce the computational complexity dramatically, and find the optimal solution to the constrained problem in linear time. A statistical model, the planar hidden Markov model (PHMM), describing statistical properties of images is proposed. The PHMM approach was evaluated using a set of isolated handwritten digits. An overall digit recognition accuracy of 95% was achieved. It is expected that the advantage of this approach will be even more significant for harder tasks, such cursive-writing recognition and spotting. >

162 citations


Journal ArticleDOI
Kuldip K. Paliwal1
TL;DR: In [2,3], Furui investigated the use of temporal derivatives of cepstral coefficients and energy as recognition features in a dynamic time warping-based isolated word recognizer and showed how the recognition performance improves with the inclusion of first derivatives in the feature set.

46 citations


Journal ArticleDOI
TL;DR: In this study, a new method was developed for analyzing waveform perturbations of voice, and noise components of voice were calculated from the discrepancies between wavelets after they had been optimally aligned in time.
Abstract: The harmonics‐to‐noise ratio (HNR) has been widely accepted for quantifying the irregular or noise component of voice. HNR, however, is usually inflated by cycle‐to‐cycle variations of fundamental frequency period because zero padding is used for time normalization of the wavelet. In this study, a new method was developed for analyzing waveform perturbations of voice. In this method, noise components of voice were calculated from the discrepancies between wavelets after they had been optimally aligned in time. The optimal time normalization of wavelets was accomplished using procedures of dynamic time warping (DTW). This method was evaluated using both synthetic and natural voices, and significant reductions in noise were obtained. The harmonics‐to‐noise ratio obtained using DTW for time normalization was also shown to be independent of fundamental frequency perturbations.

41 citations


PatentDOI
TL;DR: Speech recognition is carried out by performing a first analysis of a speech signal using a Hidden Semi Markov Model and an asymmetric time warping algorithm and a second analysis using Multi-Layer Perceptron techniques in conjunction with a neural net.
Abstract: Speech recognition is carried out by performing a first analysis of a speech signal using a Hidden Semi Markov Model and an asymmetric time warping algorithm. A second analysis is also performed using Multi-Layer Perceptron techniques in conjunction with a neural net. The first analysis is used by the second to identify word boundaries. Where the first analysis provides an indication of the word spoken above a certain level of confidence, an output representative of the word spoken may be generated solely in response to the first analysis, the second analysis being utilized when the level of confidence falls. The output controls a function of an aircraft and provides feedback to the speaker of the words spoken.

36 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors extend LVQ into a prototype-based classifier appropriate for the classification of various long speech units, and their results reveal clear gains in performance as a result of using PBMEC.
Abstract: It has since been shown that learning vector quantisation (LVQ) is a special case of a more general method, generalized probabilistic descent (GPD), for gradient descent on a rigorously defined classification loss measure that closely reflects the misclassification rate. The authors to extend LVQ into a prototype-based classifier appropriate for the classification of various long speech units. For word recognition, a dynamic time warping procedure is integrated into the GPD learning procedure. The resulting minimum error classifier (MEC) is no longer a purely LVQ-like method, and it is called the prototype-based minimum error classifier (PBMEC). Results for the difficult Bell Labs E-set task as well as for speaker-dependent isolated word recognition for a vocabulary of 5240 words are presented. They reveal clear gains in performance as a result of using PBMEC. >

35 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: A novel keyword-spotting system that combines both neural network and dynamic programming techniques is presented, which makes use of the strengths of time delay neural networks (TDNNs), which include strong generalization ability, potential for parallel implementations, robustness to noise, and time shift invariant learning.
Abstract: A novel keyword-spotting system that combines both neural network and dynamic programming techniques is presented. This system makes use of the strengths of time delay neural networks (TDNNs), which include strong generalization ability, potential for parallel implementations, robustness to noise, and time shift invariant learning. Dynamic programming models are used by this system because they have the useful capability of time warping input speech patterns. This system was trained and tested on the Stonehenge Road Rally database, which is a 20-keyword-vocabulary, speaker-independent, continuous-speech corpus. Currently, this system performs at a figure of merit (FOM) rate of 82.5%. FOM is the detection rate averaged from 0 to 10 false alarms per keyword hour. This measure is explained in detail. >

35 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors present a neural network which is trained on word examples to perform the wordspotting task and has multiple recurrent connections with time delay to account for temporal dynamics.
Abstract: The authors present a neural network which is trained on word examples to perform the wordspotting task. This network has multiple recurrent connections with time delay to account for temporal dynamics. A single network may be trained to recognize one word or many words. A hybrid wordspotter is evaluated in which a conventional wordspotter (based on dynamic time warping word matching) is used to screen incoming speech for potential keywords which are then passed to the network for the final accept/reject decision. Initial tests on a standard wordspotting test corpora resulted in improved keyword recognition at false alarm rates above zero. >

31 citations


Journal ArticleDOI
TL;DR: Experimental evaluation results in tasks of classifying syllables and phonemes clearly demonstrate GPD's superiority, and it is shown that the design algorithm appraised in this paper can be considered a new version of learning vector quantization, which is incorporated with the dynamic programming.
Abstract: Although many pattern classifiers based on artificial neural networks have been vigorous-ly studied, they are still inadequate from a viewpoint of classifyingdynamic (variable-and unspecified-duration) speech patterns. To cope with this problem, the generalized probabilistic descent method (GPD) has recently been proposed. GPD not only allows one to train a discriminative system to classify dynamic patterns, but also possesses a remarkable advantage, namely guaranteeing the learning optimality (in the sense of a probabilistic descent search). A practical implementation of this theory, however, remains to be evaluated. In this light, we particularly focus on evaluating GPD in designing a widely-used speech recognizer based on dynamic time warping distance-measurement. We also show that the design algorithm appraised in this paper can be considered a new version of learning vector quantization, which is incorporated with the dynamic programming. Experimental evaluation results in tasks of classifying syllables and phonemes clearly demonstrate GPD's superiority.

22 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: A generalized probabilistic descent method (GPD) is evaluated in designing a speech recognizer incorporated with the dynamic time warping methodology and results clearly demonstrate that GPD can be a viable candidate for a method to realize a high-performance speech recognizers.
Abstract: Although several kinds of discriminative training methods based on artificial neural networks have been vigorously tested, the pursuit of highly capable classification of variable-duration speech patterns has been unsatisfactory. In this light, the authors evaluate a generalized probabilistic descent method (GPD) in designing a speech recognizer incorporated with the dynamic time warping methodology. The algorithm can be viewed as generalized learning vector quantization suited to the dynamic programming-based time warping. Experiments were conducted on two tasks: English syllable classification and Japanese phoneme classification. Results clearly demonstrate that GPD can be a viable candidate for a method to realize a high-performance speech recognizer. >

22 citations


Proceedings Article
30 Nov 1992
TL;DR: Analysis has shown that TWINN completely removes time warping and is able to handle difficult classification problem, and has certain advantages over the current available sequential processing schemes.
Abstract: We proposed a model of Time Warping Invariant Neural Networks (TWINN) to handle the time warped continuous signals. Although TWINN is a simple modification of well known recurrent neural network, analysis has shown that TWINN completely removes time warping and is able to handle difficult classification problem. It is also shown that TWINN has certain advantages over the current available sequential processing schemes: Dynamic Programming(DP)[1], Hidden Markov Model(HMM)[2], Time Delayed Neural Networks(TDNN) [3] and Neural Network Finite Automata(NNFA)[4]. We also analyzed the time continuity employed in TWINN and pointed out that this kind of structure can memorize longer input history compared with Neural Network Finite Automata (NNFA). This may help to understand the well accepted fact that for learning grammatical reference with NNFA one had to start with very short strings in training set. The numerical example we used is a trajectory classification problem. This problem, making a feature of variable sampling rates, having internal states, continuous dynamics, heavily time-warped data and deformed phase space trajectories, is shown to be difficult to other schemes. With TWINN this problem has been learned in 100 iterations. For benchmark we also trained the exact same problem with TDNN and completely failed as expected.

21 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: An algorithm for inferring correspondences between letters and phonemes from a large set of word spellings and their associated phonemic forms is described, which uses delimiting and dynamic time warping to derive correspondences.
Abstract: An algorithm for inferring correspondences between letters and phonemes from a large set of word spellings and their associated phonemic forms is described. The algorithm uses two techniques to infer correspondences: delimiting and dynamic time warping (DTW). The first technique delimits the part of the word spelling and pronunciation that cannot be aligned with the existing set of correspondences. The second technique derives correspondences from the delimited part of that word. The inferred correspondences are evaluated in terms of translation performance tested with unseen words, proper names and novel words. The translation performance is compared with those obtained using the manually driven correspondences as the benchmark. Nonparametric statistical tests are used to establish whether the performances of inferred correspondences are significantly different from the manually derived correspondences. >

Proceedings Article
30 Nov 1992
TL;DR: A model of visual word recognition that accounts for several aspects of the temporal processing of sequences of briefly presented words, based on dynamic time warping and multidimensional scaling is described.
Abstract: We describe a model of visual word recognition that accounts for several aspects of the temporal processing of sequences of briefly presented words. The model utilizes a new representation for written words, based on dynamic time warping and multidimensional scaling. The visual input passes through cascaded perceptual, comparison, and detection stages. We describe how these dynamical processes can account for several aspects of word recognition, including repetition priming and repetition blindness.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: The TWRNN has several advantages over such schemes as dynamic programming, hidden Markov models, time-delayed neural networks, and neural network finite automata for trajectory classification, and is shown to have built-in time warping ability.
Abstract: The authors propose a model of a time warping recurrent neural network (TWRNN) to handle temporal pattern classification where severely time warped and deformed data may occur. This model is shown to have built-in time warping ability. The authors analyze the properties of TWRNN and show that for trajectory classification it has several advantages over such schemes as dynamic programming, hidden Markov models, time-delayed neural networks, and neural network finite automata. A numerical example of trajectory classification is presented. This problem, making a feature of variable sampling rates, having internal states, continuous dynamics, heavily time-warped data, and deformed phase space trajectories, is shown to be difficult for the other schemes. The TWRNN has learned it easily. The authors also trained it with TDNN and failed. >

Proceedings ArticleDOI
07 Jun 1992
TL;DR: A dynamic time warping based speech recognition system with neural network trained templates is proposed and it is demonstrated through experiments that the discriminative training algorithm is far superior to the nondiscriminative one, providing both smaller recognition error rate and greater discrimination power.
Abstract: A dynamic time warping based speech recognition system with neural network trained templates is proposed. The algorithm for training the templates is derived based on minimizing classification error of the speech classifier. A speaker-independent isolated digit recognition experiment is conducted and achieves a 0.89% average recognition error rate with only one template for each digit, indicating that the derived templates are able to capture the speaker-invariant features of speech signals. Both nondiscriminative and discriminative versions of the neural net template training algorithm are considered. The former is based on maximum likelihood estimation. The latter is based on minimizing classification error. It is demonstrated through experiments that the discriminative training algorithm is far superior to the nondiscriminative one, providing both smaller recognition error rate and greater discrimination power. Experiments using different feature representation schemes are considered. It is demonstrated that the combination of the feature vector and the delta feature vector yields the best recognition result. >

Journal ArticleDOI
TL;DR: A diagnostic technique has been developed to analyze process parameters and observables that change over time known as dynamic time warping (DTW) and knowledge-based diagnosis is performed on the symbolic data to determine malfunctions.
Abstract: Detecting manufacturing problems as soon as they occur is important for efficient manufacturing in today's factories. Many of these problems could be minimized by installing diagnostic systems to monitor manufacturing steps. A diagnostic technique has been developed to analyze process parameters and observables that change over time. Process parameters control the operation of equipment, and observables are attributes of a partially completed product. The technique uses a specified digital signal processing algorithm known as dynamic time warping (DTW) to transform the input signal into symbolic data. Knowledge-based diagnosis is performed on the symbolic data to determine malfunctions. A detailed description of the DTW algorithm and knowledge-based analysis is presented. Two different applications-one in the glass industry and another one in the semiconductor industry-are discussed to illustrate the general use of this technique. >

Journal ArticleDOI
TL;DR: This paper investigates text-independent speaker verification, which involves the determination of whether or not a test utterance belongs to a specific reference speaker and the required information stored in the templates is different in this case.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: The authors present an algorithm for isolated-word recognition that takes into consideration the duration variability of the different utterances of the same word and shows that all these words could be recognized.
Abstract: The authors present an algorithm for isolated-word recognition that takes into consideration the duration variability of the different utterances of the same word. The algorithm is based on extracting acoustical features from the speech signal and using them as the input to a sequence of multilayer perceptron neural networks. The networks were implemented as predictors for the speech samples for a certain duration of time. The networks were trained by a combination of the back-propagation and the dynamic time warping (DTW) techniques. The DTW technique was implemented to normalize the duration variability. The networks were trained to recognize the correct words and to reject the wrong words. The training set consisted of ten words, each uttered seven times by three different speakers. The test set consisted of three utterances of each of the ten words. The results show that all these words could be recognized. >

Journal ArticleDOI
TL;DR: A speaker-dependent, isolated-word speech recognition system is presented which is based on the use of the fast Fourier transform for extracting features from the speech input and compares them against previously stored word templates using dynamic time warping to identify the uttered word.
Abstract: A speaker-dependent, isolated-word speech recognition system is presented which is based on the use of the fast Fourier transform for extracting features from the speech input. The algorithm then normalizes those features and compares them against previously stored word templates using dynamic time warping in order to identify the uttered word. The system has been successfully implemented and provided good results when tested using a small dictionary.

Proceedings ArticleDOI
30 Aug 1992
TL;DR: The special purpose architecture is used to perform the band matrix multiplication in order to compute the local distance metric based on Itakura's log likelihood distance.
Abstract: Describes an area and time efficient systolic array architecture for computations in Dynamic Time Warping (DTW). The special purpose architecture is used to perform the band matrix multiplication in order to compute the local distance metric based on Itakura's log likelihood distance. The time complexity of the algorithm is O(nk) where n and k are the number of elements in the row of the first and second input matrices. The number of processors is equal to the bandwidth w of the output band matrix. The speedup of the parallel algorithm compared to the sequential algorithm is wz where z is the multiplier stages within a PE. The parallel algorithm can be implemented as a single VLSI chip. >

Journal ArticleDOI
TL;DR: The proposed Time-Warping Neural Network (TWNN) demonstrates a higher phoneme recognition accuracy than a baseline recognizer composed of time-delay neural networks with a linear time alignment mechanism.
Abstract: This paper proposes a novel neural network architecture for phoneme-based speech recognition. The new architecture is composed of five time-warping sub-networks and an output layer which integrates the sub-networks. Each time-warping sub-network has a different time-warping function embedded between the input layer and the first hidden layer. A time-warping sub-network recognizes the input speech warping the time axis using its time-warping function. The network is called the Time-Warping Neural Network (TWNN). The purpose of this network is to cope with the temporal variability of acoustic-phonetic features. The TWNN demonstrates a higher phoneme recognition accuracy than a baseline recognizer composed of time-delay neural networks with a linear time alignment mechanism.

Book ChapterDOI
01 Jan 1992
TL;DR: Automatic training procedures are developed to obtain the model or models for a certain type of linguistic unit, under the framework of a Distance-based approach, and some preliminary experimental results for single-spe speaker and multi-speaker tasks are reported.
Abstract: Automatic training procedures are developed to obtain the model or models for a certain type of linguistic unit, under the framework of a Distance-based approach. The chosen units are phonetic-units and the models are templates. In a first approach, one prototype (centroid) per phonetic-unit is obtained through an iterative process and by using Dynamic Time Warping techniques. A refinement is performed through a Clustering procedure that obtains several prototypes per phonetic-unit. Another refinement, which is based on Multiedit Condensing techniques, is also proposed. Some preliminary experimental results for single-speaker and multi-speaker tasks are reported.

Proceedings ArticleDOI
26 Oct 1992
TL;DR: The gamma memory, a recursive linear structure, is presented as a generalization of the tapped delay line or the context memory units to construct nonuniform time warping scales that may be useful in speech recognition.
Abstract: A framework for designing and characterizing short-term memory structures for neural networks is presented. The gamma memory, a recursive linear structure, is presented as a generalization of the tapped delay line or the context memory units. The gamma memory principle can be enhanced to construct nonuniform time warping scales that may be useful in speech recognition. >

Book ChapterDOI
01 Jan 1992
TL;DR: The method attempts to address the shortcomings of traditional time alignment approaches, commonly based on dynamic programming algorithms, by employing the branch and bound search algorithm coupled with the Mahalanobis distance measure as the matching criterion.
Abstract: In this paper, a new method for dynamic time alignment of speech waveforms is introduced. The method attempts to address the shortcomings of traditional time alignment approaches, commonly based on dynamic programming algorithms. Such methods, usually called dynamic time warping (DTW) algorithms, make the assumption that the samples of the speech waveform under consideration are statistically independent. The proposed method makes no such assumption. Instead, the method is based on models of speech entities with Gaussian distributions and general covariance matrices. These ideas are implemented by employing the branch and bound search algorithm [1] coupled with the Mahalanobis distance measure as the matching criterion. Hence, the new method attempts to utilise more discriminatory information than is presently incorporated. Preliminary results on a spoken letter recognition problem are reported validating the approach.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: A dynamic time warping algorithm using the Hopfield neural network to achieve an optimum match between a reference and a test signal is described.
Abstract: A dynamic time warping (DTW) algorithm using the Hopfield neural network is described. A DTW energy function is constructed to achieve an optimum match between a reference and a test signal and mapped to the network's Lyapunov function to determine the connection weights and the biases for the neurons. The experimental results verify that the Hopfield network can be effectively used to solve this optimization problem. >

Proceedings ArticleDOI
11 Sep 1992
TL;DR: A new, unsupervised speaker adaptation scheme which requires no prior training phase is proposed, which improves the recognition rate as more speech data becomes available, making it most suitable for real-time implementation.
Abstract: A speaker-independent speech recognition system is desirable in many applications where speaker-specific data does not exist. It speaker-independent data is available, the system could be adapted to the specific speaker, thereby reducing the recognition error rate. A new, unsupervised speaker adaptation scheme which requires no prior training phase is proposed. The algorithm improves the recognition rate as more speech data becomes available, making it most suitable for real-time implementation. In the tests conducted this algorithm yields an improvement of almost 50% on the recognition error rate. >

Book ChapterDOI
01 Jan 1992
TL;DR: A dynamic time warping algorithm is used to match the original and the resampled speech signals and the results showed only a slight decrease in performance when using the new labelings.
Abstract: In this paper a method is described to generate automatically the labels for a new speech database from an existing manually labeled speech database. This becomes necessary when new standards are introduced and the speech signals have to be resampled. A dynamic time warping algorithm is used to match the original and the resampled speech signals. The comparison is carried out on mel based features. To improve computation time the search space for the DTW algorithm is restricted. Several experiments were carried out with a normal density Bayes classifier to check the quality of the new labelings. The results showed only a slight decrease in performance when using the new labelings.