scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 1985"


Journal ArticleDOI
TL;DR: This study compared several different spectral distortion measures including the Itakura-Saito distortion measure, the log likelihood ratio and weighted slope metric distortion measures, and two proposed perceptually based distortion measures in terms of their effects on the performance of standard dynamic time warping (DTW) based, isolated word, speech recognizer.

66 citations


Proceedings ArticleDOI
26 Apr 1985
TL;DR: This study compared several different spectral distortion measures including the Itakura-Saito (IS), the log likelihood ratio (LLR), thelihood ratio (LR), the cepstral (CEP), and two perceptually based distortion measures, the weighted likelihood ratios (WLR) and the weighted slope metric (WSM) in terms of their effects on the performance of a standard dynamic time warping (DTW) based, isolated word, speech recognizer.
Abstract: In this study we compared several different spectral distortion measures including the Itakura-Saito (IS), the log likelihood ratio (LLR), the likelihood ratio (LR), the cepstral (CEP), and two perceptually based distortion measures, the weighted likelihood ratio (WLR) and the weighted slope metric (WSM) distortion measures, in terms of their effects on the performance of a standard dynamic time warping (DTW) based, isolated word, speech recognizer. Two modifications of the basic forms of each measure were also investigated, namely a Bark-scale frequency warping and the incorporation of suprasegmental energy information. All distortion measures and their modifications were tested on an alpha-digit vocabulary, 4-talker, telephone recording data base. The results can be summarized as: (1) All LPC-based distortion measures performed reasonably well. The LLR and WSM distortion measures gave the highest recognition accuracy, while the IS distortion measure gave the lowest score; (2) Whereas the addition of suprasegmental energy information helped the recognition performance, the use of gain and absolute loudness degraded the performance; (3) Bark-scale frequency warping did not perform as well as its unwarped counterpart; (4) The WLR distortion measure did not perform as well as its unweighted counterpart.

65 citations


Journal ArticleDOI
TL;DR: Empirical evidence of loose satisfaction of these properties with real speech will be presented, allowing the assumption of a “loose metric space” structure in the set of parametric representations of words in a given vocabulary.

44 citations


Journal ArticleDOI
TL;DR: The feasibility of using dynamic time-warping to cluster EEG waveforms was studied and it was revealed that DTW based clustering could distinguish between waves only slightly different in frequency, amplitude, peak location, or initial phase.
Abstract: The feasibility of using dynamic time-warping (DTW) to cluster EEG waveforms was studied. DTW compresses and extends the time axes of pairs of digitized waveforms to reduce the effects of minor differences in shape due to noise and normal, random shape fluctuations. The sum of the absolute amplitude differences that remain after time-warping can be used as a similarity index in a clustering procedure. Experiments with stimulated data revealed that DTW based clustering could distinguish between waves only slightly different in frequency, amplitude, peak location, or initial phase. DTW clustering was also applied to sharp waves and spikes taken from actual EEG data and compared with an approach based on features extracted from the waveforms, and one based on computing the peak-aligned difference between waveforms. The results indicated that the DTW approach yielded more homogeneous clusters than the other two methods.

40 citations


Journal ArticleDOI
TL;DR: The results show that the proposed preprocessor has the capability of reducing computation for recognition by up to an order of magnitude, while maintaining the same performance as that obtained using a DTW comparison without the pre-processor.
Abstract: In this paper, we propose a speaker-independent isolated ward recognition system whose performance is comparable to that of a conventional isolated word recognizer, but whose computation is greatly reduced. The structure of the proposed recognizer consists of a word-based vector quantization (VQ) preprocessor, followed by a conventional DTW postprocessor. The purpose of the preprocessor is essentially to eliminate from further consideration all words in the vocabulary which are unlikely recognition candidates. In some cases, the preprocessor will be able to eliminate all word candidates except one; for such cases, there is no further processing required for word recognition. In all other cases (i.e., when more than one word candidate is passed on), a dynamic time warping (DTW) processor is used to re-solve finer acoustical distinctions among the remaining word candidates. The performance of this type of recognizer (i.e., using a word-based preprocessor and a standard DTW comparison to make finer distinctions) is affected by a number of factors involved with the details of exactly how the system is implemented-e.g., the distortion measure used in the preprocessor and in the DTW comparison, the size of the VQ codebook for each vocabulary word, the decision thresholds of the preprocessor, etc. Several of these factors were studied experimentally using testing databases consisting of isolated digits and words from a vocabulary of 129 airline terms. The results show that the proposed preprocessor has the capability of reducing computation for recognition by up to an order of magnitude, while maintaining the same performance as that obtained using a DTW comparison without the pre-processor. A somewhat smaller reduction in memory over the straight DTW implementation is also obtained in the proposed approach.

32 citations


Proceedings ArticleDOI
01 Apr 1985
TL;DR: The speech recognition accuracy of this method in recognizing non-training voice data was 95.8% with automatic segmentation, and the category of the nearest reference pattern is taken as the result.
Abstract: This paper descrives recognition method, reference pattern generation method, and evaluation about the speaker independent recognition for telephone speech response systems. Input utterance is analyzed by 19 channel BPFs. The power and vocal cord source characteristics are normalized. The time normalization is realized by linearly compressing or expanding to 32 frames. The speech pattern undergoes pattern matching with male and female reference patterns, and the category of the nearest reference pattern is taken as the result. It is necessary to optimize the reference patterns so that the speech can be correctly recognized in spite of the difference of formant frequencies, and slight segmentation errors. To optimize the reference patterns, the recognition of the training patterns and updating of the reference patterns are repeated. A total of 256 male and female reference patterns were generated The speech recognition accuracy of this method in recognizing non-training voice data was 95.8% with automatic segmentation.

21 citations


Proceedings ArticleDOI
19 Dec 1985
TL;DR: A speech recognition time warping algorithm is adapted to picture analysis to recognize patterns despite variations in scale and orientation so that objects may be recognized regardless of whether they are embedded in other parts or they are distorted.
Abstract: The aim of this study is to adapt a speech recognition time warping algorithm to picture analysis. Our goal is to recognize patterns despite variations in scale and orientation. We may recognize objects regardless of whether they are embedded in other parts or they are distorted. The programs input real pictures, extract the contours and then encode and compare them to a pattern dictionary. The computer time is particularly short for such a recognition process.

11 citations


Journal ArticleDOI
TL;DR: Two modifications of dynamic time warping methods for discrete utterance recognition are proposed, which compensate for inaccurate endpoint detection and emphasize the differentiating regions of similar sounding utterances.
Abstract: Two modifications of dynamic time warping methods for discrete utterance recognition are proposed. They compensate for inaccurate endpoint detection and emphasize the differentiating regions of similar sounding utterances. The methods proposed are shown to give an increased recognition accuracy on a difficult vocabulary containing many similar sounding words.

9 citations


Proceedings ArticleDOI
26 Apr 1985
TL;DR: This paper examines the use of a single instruction stream - multiple data stream (SIMD) parallel architecture to reduce the computation time for a dynamic time warping template matching based isolated word recognition system.
Abstract: As the complexity of speech recognition systems increases, conventional computers are unable to perform all the operations quickly enough to process the speech input in real time. This paper examines the use of a single instruction stream - multiple data stream (SIMD) parallel architecture to reduce the computation time for a dynamic time warping template matching based isolated word recognition system. Each of the components of the recognition system was written as an SMD parallel algorithm. The component SIMD algorithms and the complete word recognition system were simulated. The simulations showed that with 100 processing elements, each an 8 MHz MC68000, the speech system can perform isolated word recognition over a large vocabulary in real time. This real-time speech system used a 20 KHz sampling rate, 16 bits per sample, 100 samples per frame, 8 LPC coefficients per frame, 16 bits per LPC coefficient, and a 1,000 word vocabulary.

8 citations


Proceedings ArticleDOI
26 Apr 1985
TL;DR: A custom designed NMOS VLSI circuit for speech recognition that calculates the distance between a trial utterance and reference templates stored in an attached 32 kilobyte dynamic memory for a vocabulary of up to 200 words is described.
Abstract: A custom designed NMOS VLSI circuit for speech recognition is described. It calculates the distance between a trial utterance and reference templates stored in an attached 32 kilobyte dynamic memory for a vocabulary of up to 200 words. Time warping based on a dynamic-programming minimum distance is employed. Connected words are processed on a continuous real-time basis while isolated word recognition proceeds after the utterance is complete. Multiple stages of pipelining are used in a unique microprogrammed architecture so that template memory bandwidth is fully utilized in the data-intensive distance calculation. Two seperate arithmetic logic units (ALUs) are used, one for numerical data and one for template memory address generation. The basic microinstruction cycle time is 200 nanoseconds. Typical recognition time for a 40 isolated word vocabulary is less than 200 milliseconds.

7 citations


Journal ArticleDOI
J. Ackenhusen1, Y. H. Oh1
TL;DR: A single-chip implementation of Linear Predictive Coding (LPC)-based feature measurement for speech recognition, called the FXDSP, has been developed by programming the AT&T DSP20™ programmable Digital Signal Processor and has been verified by both numerical simulation and system use.
Abstract: A single-chip implementation of Linear Predictive Coding (LPC)-based feature measurement for speech recognition, called the Feature Extracting Digital Signal Processor (FXDSP), has been developed by programming the AT&T DSP20™ programmable Digital Signal Processor (DSP) and has been verified by both numerical simulation and system use. For identical input, the recognition distance between floating point simulation and the DSP implementation was found to be negligibly small when compared with distances for word matches. The feature-measurement technique is identical to that used in numerical simulations of LPC-based isolated- and connected-word recognition using combinations of dynamic time warping, vector quantization, and hidden Markov modeling. As a result, the FXDSP represents a single-chip common building block for real-time implementation of most speech recognition techniques under investigation at AT&T Bell Laboratories. The FXDSP performs eighth-order LPC analysis on speech received from a standard CODEC. In every frame period (15 ms) it produces a feature vector consisting of the log energy, nine amplitude-normalized autocorrelation coefficients, and nine LPC-based test-pattern coefficients. The feature-measurement program requires 1023 locations of the 1024 available in on-chip program ROM, 211 of 256 available RAM locations, and 75 percent of available real time.

Proceedings ArticleDOI
A. Smith1, J. Denenberg, T. Slack, C. Tan, R. Wohlford 
01 Apr 1985
TL;DR: An Experimental Learning Element for learning and recognizing sequential patterns is being developed as an adaptable pattern classifier of a larger learning system and its performance with a Dynamic Time Wrap (DTW) based speech recognition system on the task of connected digit recognition is compared.
Abstract: An Experimental Learning Element (ELE) for learning and recognizing sequential patterns is being developed as an adaptable pattern classifier of a larger learning system. Once external patterns are converted into a linear sequence of named objects, the ELE can build models that associate input object sequences with expected output state sequences. The ELE has been successfully demonstrated in learning and recognizing hand-printed characters. This paper describes the ELE and compares its performance with a Dynamic Time Wrap (DTW) based speech recognition system on the task of connected digit recognition. If permitted to continually learn the ELE reaches the same performance level as the DTW-CSR on the same quantized speech test data.

Journal ArticleDOI
TL;DR: It was found that the DTW approach resulted in more homogeneous clusters than the other two approaches, which clearly indicate the feasibility of applying this new method for wave form clustering.

Journal ArticleDOI
TL;DR: A dynamic time warping algorithm for recognizing isolated-word sentences read with pauses between the words is described and a new method for averaging several training patterns to give a reference template is proposed.
Abstract: A dynamic time warping algorithm for recognizing isolated-word sentences read with pauses between the words is described. Also a new method for averaging several training patterns to give a reference template is proposed. The methods are tested on a 2000-word office correspondence vocabulary.