Showing papers on "Dynamic time warping published in 1985"

PDF

Open Access

Journal Article•DOI•

Comparative study of several distortion measures for speech recognition

[...]

N. Nocerino¹, Frank Kao-Ping Soong¹, Lawrence R. Rabiner¹, Dennis H. Klatt¹•Institutions (1)

01 Dec 1985-Speech Communication

TL;DR: This study compared several different spectral distortion measures including the Itakura-Saito distortion measure, the log likelihood ratio and weighted slope metric distortion measures, and two proposed perceptually based distortion measures in terms of their effects on the performance of standard dynamic time warping (DTW) based, isolated word, speech recognizer.

...read moreread less

66 citations

Proceedings Article•DOI•

Comparative study of several distortion measures for speech recognition

[...]

N. Nocerino¹, Frank Kao-Ping Soong, Lawrence R. Rabiner, Dennis H. Klatt•Institutions (1)

Bell Labs¹

26 Apr 1985

TL;DR: This study compared several different spectral distortion measures including the Itakura-Saito (IS), the log likelihood ratio (LLR), thelihood ratio (LR), the cepstral (CEP), and two perceptually based distortion measures, the weighted likelihood ratios (WLR) and the weighted slope metric (WSM) in terms of their effects on the performance of a standard dynamic time warping (DTW) based, isolated word, speech recognizer.

...read moreread less

Abstract: In this study we compared several different spectral distortion measures including the Itakura-Saito (IS), the log likelihood ratio (LLR), the likelihood ratio (LR), the cepstral (CEP), and two perceptually based distortion measures, the weighted likelihood ratio (WLR) and the weighted slope metric (WSM) distortion measures, in terms of their effects on the performance of a standard dynamic time warping (DTW) based, isolated word, speech recognizer. Two modifications of the basic forms of each measure were also investigated, namely a Bark-scale frequency warping and the incorporation of suprasegmental energy information. All distortion measures and their modifications were tested on an alpha-digit vocabulary, 4-talker, telephone recording data base. The results can be summarized as: (1) All LPC-based distortion measures performed reasonably well. The LLR and WSM distortion measures gave the highest recognition accuracy, while the IS distortion measure gave the lowest score; (2) Whereas the addition of suprasegmental energy information helped the recognition performance, the use of gain and absolute loudness degraded the performance; (3) Bark-scale frequency warping did not perform as well as its unwarped counterpart; (4) The WLR distortion measure did not perform as well as its unweighted counterpart.

...read moreread less

65 citations

Journal Article•DOI•

Is the DTW “distance” really a metric? An algorithm reducing the number of DTW comparisons in isolated word recognition

[...]

Enrique Vidal Ruiz¹, Francisco Casacuberta Nolla¹, Héctor Rulot Segovia¹•Institutions (1)

University of Valencia¹

01 Dec 1985-Speech Communication

TL;DR: Empirical evidence of loose satisfaction of these properties with real speech will be presented, allowing the assumption of a “loose metric space” structure in the set of parametric representations of words in a given vocabulary.

...read moreread less

44 citations

Journal Article•DOI•

EEG waveform analysis by means of dynamic time-warping.

[...]

H.-C Huang¹, Ben H. Jansen¹•Institutions (1)

University of Houston¹

01 Sep 1985-International Journal of Bio-medical Computing

TL;DR: The feasibility of using dynamic time-warping to cluster EEG waveforms was studied and it was revealed that DTW based clustering could distinguish between waves only slightly different in frequency, amplitude, peak location, or initial phase.

...read moreread less

Abstract: The feasibility of using dynamic time-warping (DTW) to cluster EEG waveforms was studied. DTW compresses and extends the time axes of pairs of digitized waveforms to reduce the effects of minor differences in shape due to noise and normal, random shape fluctuations. The sum of the absolute amplitude differences that remain after time-warping can be used as a similarity index in a clustering procedure. Experiments with stimulated data revealed that DTW based clustering could distinguish between waves only slightly different in frequency, amplitude, peak location, or initial phase. DTW clustering was also applied to sharp waves and spikes taken from actual EEG data and compared with an approach based on features extracted from the waveforms, and one based on computing the peak-aligned difference between waveforms. The results indicated that the DTW approach yielded more homogeneous clusters than the other two methods.

...read moreread less

40 citations

Journal Article•DOI•

A vector-quantization-based preprocessor for speaker-independent isolated word recognition

[...]

Kuk-Chin Pan¹, F. Soong², Lawrence R. Rabiner²•Institutions (2)

Hewlett-Packard¹, AT&T²

01 Jun 1985-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The results show that the proposed preprocessor has the capability of reducing computation for recognition by up to an order of magnitude, while maintaining the same performance as that obtained using a DTW comparison without the pre-processor.

...read moreread less

Abstract: In this paper, we propose a speaker-independent isolated ward recognition system whose performance is comparable to that of a conventional isolated word recognizer, but whose computation is greatly reduced. The structure of the proposed recognizer consists of a word-based vector quantization (VQ) preprocessor, followed by a conventional DTW postprocessor. The purpose of the preprocessor is essentially to eliminate from further consideration all words in the vocabulary which are unlikely recognition candidates. In some cases, the preprocessor will be able to eliminate all word candidates except one; for such cases, there is no further processing required for word recognition. In all other cases (i.e., when more than one word candidate is passed on), a dynamic time warping (DTW) processor is used to re-solve finer acoustical distinctions among the remaining word candidates. The performance of this type of recognizer (i.e., using a word-based preprocessor and a standard DTW comparison to make finer distinctions) is affected by a number of factors involved with the details of exactly how the system is implemented-e.g., the distortion measure used in the preprocessor and in the DTW comparison, the size of the VQ codebook for each vocabulary word, the decision thresholds of the preprocessor, etc. Several of these factors were studied experimentally using testing databases consisting of isolated digits and words from a vocabulary of 129 airline terms. The results show that the proposed preprocessor has the capability of reducing computation for recognition by up to an order of magnitude, while maintaining the same performance as that obtained using a DTW comparison without the pre-processor. A somewhat smaller reduction in memory over the straight DTW implementation is also obtained in the proposed approach.

...read moreread less

32 citations

Proceedings Article•DOI•

Speaker independent telephone speech recognition

[...]

H. Iizuka

01 Apr 1985

TL;DR: The speech recognition accuracy of this method in recognizing non-training voice data was 95.8% with automatic segmentation, and the category of the nearest reference pattern is taken as the result.

...read moreread less

Abstract: This paper descrives recognition method, reference pattern generation method, and evaluation about the speaker independent recognition for telephone speech response systems. Input utterance is analyzed by 19 channel BPFs. The power and vocal cord source characteristics are normalized. The time normalization is realized by linearly compressing or expanding to 32 frames. The speech pattern undergoes pattern matching with male and female reference patterns, and the category of the nearest reference pattern is taken as the result. It is necessary to optimize the reference patterns so that the speech can be correctly recognized in spite of the difference of formant frequencies, and slight segmentation errors. To optimize the reference patterns, the recognition of the training patterns and updating of the reference patterns are repeated. A total of 256 male and female reference patterns were generated The speech recognition accuracy of this method in recognizing non-training voice data was 95.8% with automatic segmentation.

...read moreread less

21 citations

Proceedings Article•DOI•

Pattern Recognition Through Dynamic Programming

[...]

B. Burg, Ph. Missakian, B. Zavidovique

19 Dec 1985

TL;DR: A speech recognition time warping algorithm is adapted to picture analysis to recognize patterns despite variations in scale and orientation so that objects may be recognized regardless of whether they are embedded in other parts or they are distorted.

...read moreread less

Abstract: The aim of this study is to adapt a speech recognition time warping algorithm to picture analysis. Our goal is to recognize patterns despite variations in scale and orientation. We may recognize objects regardless of whether they are embedded in other parts or they are distorted. The programs input real pictures, extract the contours and then encode and compare them to a pattern dictionary. The computer time is particularly short for such a recognition process.

...read moreread less

11 citations

Journal Article•DOI•

Improved dynamic time warping methods for discrete utterance recognition

[...]

S. Haltsonen¹•Institutions (1)

Helsinki University of Technology¹

01 Apr 1985-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Two modifications of dynamic time warping methods for discrete utterance recognition are proposed, which compensate for inaccurate endpoint detection and emphasize the differentiating regions of similar sounding utterances.

...read moreread less

Abstract: Two modifications of dynamic time warping methods for discrete utterance recognition are proposed. They compensate for inaccurate endpoint detection and emphasize the differentiating regions of similar sounding utterances. The methods proposed are shown to give an increased recognition accuracy on a difficult vocabulary containing many similar sounding words.

...read moreread less

9 citations

Proceedings Article•DOI•

Simulation of a highly parallel system for word recognition

[...]

M. Yoder¹, Leah H. Jamieson¹•Institutions (1)

Purdue University¹

26 Apr 1985

TL;DR: This paper examines the use of a single instruction stream - multiple data stream (SIMD) parallel architecture to reduce the computation time for a dynamic time warping template matching based isolated word recognition system.

...read moreread less

Abstract: As the complexity of speech recognition systems increases, conventional computers are unable to perform all the operations quickly enough to process the speech input in real time. This paper examines the use of a single instruction stream - multiple data stream (SIMD) parallel architecture to reduce the computation time for a dynamic time warping template matching based isolated word recognition system. Each of the components of the recognition system was written as an SMD parallel algorithm. The component SIMD algorithms and the complete word recognition system were simulated. The simulations showed that with 100 processing elements, each an 8 MHz MC68000, the speech system can perform isolated word recognition over a large vocabulary in real time. This real-time speech system used a 20 KHz sampling rate, 16 bits per sample, 100 samples per frame, 8 LPC coefficients per frame, 16 bits per LPC coefficient, and a 1,000 word vocabulary.

...read moreread less

8 citations

Proceedings Article•DOI•

A VLSI dynamic time warp processor for connected and isolated word speech recognition

[...]

R. Owen

26 Apr 1985

TL;DR: A custom designed NMOS VLSI circuit for speech recognition that calculates the distance between a trial utterance and reference templates stored in an attached 32 kilobyte dynamic memory for a vocabulary of up to 200 words is described.

...read moreread less

Abstract: A custom designed NMOS VLSI circuit for speech recognition is described. It calculates the distance between a trial utterance and reference templates stored in an attached 32 kilobyte dynamic memory for a vocabulary of up to 200 words. Time warping based on a dynamic-programming minimum distance is employed. Connected words are processed on a continuous real-time basis while isolated word recognition proceeds after the utterance is complete. Multiple stages of pipelining are used in a unique microprogrammed architecture so that template memory bandwidth is fully utilized in the data-intensive distance calculation. Two seperate arithmetic logic units (ALUs) are used, one for numerical data and one for template memory address generation. The basic microinstruction cycle time is 200 nanoseconds. Typical recognition time for a 40 isolated word vocabulary is less than 200 milliseconds.

...read moreread less

7 citations

Journal Article•DOI•

Single-chip implementation of feature measurement for LPC-based speech recognition

[...]

J. Ackenhusen¹, Y. H. Oh¹•Institutions (1)

Bell Labs¹

01 Oct 1985-AT&T technical journal

TL;DR: A single-chip implementation of Linear Predictive Coding (LPC)-based feature measurement for speech recognition, called the FXDSP, has been developed by programming the AT&T DSP20™ programmable Digital Signal Processor and has been verified by both numerical simulation and system use.

...read moreread less

Abstract: A single-chip implementation of Linear Predictive Coding (LPC)-based feature measurement for speech recognition, called the Feature Extracting Digital Signal Processor (FXDSP), has been developed by programming the AT&T DSP20™ programmable Digital Signal Processor (DSP) and has been verified by both numerical simulation and system use. For identical input, the recognition distance between floating point simulation and the DSP implementation was found to be negligibly small when compared with distances for word matches. The feature-measurement technique is identical to that used in numerical simulations of LPC-based isolated- and connected-word recognition using combinations of dynamic time warping, vector quantization, and hidden Markov modeling. As a result, the FXDSP represents a single-chip common building block for real-time implementation of most speech recognition techniques under investigation at AT&T Bell Laboratories. The FXDSP performs eighth-order LPC analysis on speech received from a standard CODEC. In every frame period (15 ms) it produces a feature vector consisting of the log energy, nine amplitude-normalized autocorrelation coefficients, and nine LPC-based test-pattern coefficients. The feature-measurement program requires 1023 locations of the 1024 available in on-chip program ROM, 211 of 256 available RAM locations, and 75 percent of available real time.

...read moreread less

Proceedings Article•DOI•

Application of a sequential pattern learning system to connected speech recognition

[...]

A. Smith¹, J. Denenberg, T. Slack, C. Tan, R. Wohlford - Show less +1 more•Institutions (1)

Advanced Technology Center¹

01 Apr 1985

TL;DR: An Experimental Learning Element for learning and recognizing sequential patterns is being developed as an adaptable pattern classifier of a larger learning system and its performance with a Dynamic Time Wrap (DTW) based speech recognition system on the task of connected digit recognition is compared.

...read moreread less

Abstract: An Experimental Learning Element (ELE) for learning and recognizing sequential patterns is being developed as an adaptable pattern classifier of a larger learning system. Once external patterns are converted into a linear sequence of named objects, the ELE can build models that associate input object sequences with expected output state sequences. The ELE has been successfully demonstrated in learning and recognizing hand-printed characters. This paper describes the ELE and compares its performance with a Dynamic Time Wrap (DTW) based speech recognition system on the task of connected digit recognition. If permitted to continually learn the ELE reaches the same performance level as the DTW-CSR on the same quantized speech test data.

...read moreread less

Journal Article•DOI•

Automated morphological analysis by means of dynamic time-warping.

[...]

Ben H. Jansen¹, H.-c Huang¹•Institutions (1)

University of Houston¹

01 Mar 1985-Electroencephalography and Clinical Neurophysiology

TL;DR: It was found that the DTW approach resulted in more homogeneous clusters than the other two approaches, which clearly indicate the feasibility of applying this new method for wave form clustering.

...read moreread less

Journal Article•DOI•

Recognition of isolated-word sentences from a large vocabulary using dynamic time warping methods

[...]

S. Haltsonen

01 Aug 1985-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A dynamic time warping algorithm for recognizing isolated-word sentences read with pauses between the words is described and a new method for averaging several training patterns to give a reference template is proposed.

...read moreread less

Abstract: A dynamic time warping algorithm for recognizing isolated-word sentences read with pauses between the words is described. Also a new method for averaging several training patterns to give a reference template is proposed. The methods are tested on a 2000-word office correspondence vocabulary.

...read moreread less