Showing papers on "Dynamic time warping published in 1991"

PDF

Open Access

Proceedings Article•DOI•

Speech recognition using time-warping neural networks

[...]

30 Sep 1991

TL;DR: The author proposes a time-warping neural network (TWNN) for phoneme-based speech recognition that demonstrates higher phoneme recognition accuracy than a baseline recognizer based on conventional feedforward neural networks and linear time alignment.

...read moreread less

Abstract: The author proposes a time-warping neural network (TWNN) for phoneme-based speech recognition. The TWNN is designed to accept phonemes with arbitrary duration, whereas conventional phoneme recognition networks have a fixed-length input window. The purpose of this network is to cope with not only variability of phoneme duration but also time warping in a phoneme. The proposed network is composed of several time-warping units which each have a time-warping function. The TWNN is characterized by time-warping functions embedded between the input layer and the first hidden layer in the network. The proposed network demonstrates higher phoneme recognition accuracy than a baseline recognizer based on conventional feedforward neural networks and linear time alignment. The recognition accuracy is even higher than that achieved with discrete hidden Markov models. >

...read moreread less

165 citations

Proceedings Article•DOI•

New discriminative training algorithms based on the generalized probabilistic descent method

[...]

Shigeru Katagiri, C.-H. Lee, Biing-Hwang Juang

30 Sep 1991

TL;DR: A family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM).

...read moreread less

Abstract: The authors developed a generalized probabilistic descent (GPD) method by extending the classical theory on adaptive training by Amari (1967). Their generalization makes it possible to treat dynamic patterns (of a variable duration or dimension) such as speech as well as static patterns (of a fixed duration or dimension), for pattern classification problems. The key ideas of GPD formulations include the embedding of time normalization and the incorporation of smooth classification error functions into the gradient search optimization objectives. As a result, a family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM). Experimental results are also provided to show the superiority of this new family of GPD-based, adaptive training algorithms for speech recognition. >

...read moreread less

159 citations

Proceedings Article•DOI•

Time-warping neural network for phoneme recognition

[...]

K. Aikawa

18 Nov 1991

TL;DR: The author investigates a feedforward neural network that can accept phonemes with an arbitrary duration coping with nonlinear time warping and demonstrated higher phoneme recognition accuracy than the baseline recognizer based on conventional feed forward neural networks.

...read moreread less

Abstract: The author investigates a feedforward neural network that can accept phonemes with an arbitrary duration coping with nonlinear time warping The time-warping neural network is characterized by the time-warping functions embedded between the input layer and the first hidden layer in the network The input layer accesses three different time points The accessing points are determined by the time-warping functions The input spectrum sequence itself is not warped but the accessing-point sequence is warped The advantage of this network architecture is that the input layer can access the original spectrum sequence The proposed network demonstrated higher phoneme recognition accuracy than the baseline recognizer based on conventional feedforward neural networks The recognition accuracy was even higher than that achieved with discrete hidden Markov models >

...read moreread less

113 citations

Journal Article•DOI•

psi -s correlation and dynamic time warping: two methods for tracking ice floes in SAR images

[...]

Ross M. McConnell, Ron Kwok, John C. Curlander¹, W. Kober, S.S. Pang¹ - Show less +1 more•Institutions (1)

California Institute of Technology¹

01 Nov 1991-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The authors present two algorithms for performing shape matching on ice floe boundaries in SAR (synthetic aperture radar) images that use normalized correlation to match the psi -s curves, while the second uses dynamic programming to compute an elastic match that better accommodatesIce floe deformation.

...read moreread less

Abstract: The authors present two algorithms for performing shape matching on ice floe boundaries in SAR (synthetic aperture radar) images. These algorithms quickly produce a set of ice motion and rotation vectors that can be used to guide a pixel value correlator. The algorithms match a shape descriptor known as the psi -s curve. The first algorithm uses normalized correlation to match the psi -s curves, while the second uses dynamic programming to compute an elastic match that better accommodates ice floe deformation. Some empirical data on the performance of the algorithms on Seasat SAR images are presented. >

...read moreread less

92 citations

Book Chapter•DOI•

Mean field networks that learn to discriminate temporally distorted strings

[...]

Christopher Williams¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Jan 1991

TL;DR: This type of mean field network (MFN) with tied weights that is capable of approximating the recognizer for a hidden markov model (HMM) is presented as a way of allowing more powerful representations without abandoning the automatic parameter estimation procedures.

...read moreread less

Abstract: Neural networks can be used to discriminate between very similar phonemes and they can handle the variability in time of occurrence by using a time-delay architecture followed by a temporal integration (Lang, Hinton and Waibel, 1990) So far, however, neural networks have been less successful at handling longer duration events that require something equivalent to “time warping” in order to match stored knowledge to the data We present a type of mean field network (MFN) with tied weights that is capable of approximating the recognizer for a hidden markov model (HMM) In the process of settling to a stable state, the MFN finds a blend of likely ways of generating the input string given its internal model of the probabilities of transitions between hidden states and the probabilities of input symbols given a hidden state This blend is a heuristic approximation to the full set of path probabilities that is implicitly represented by an HMM recognizer The learning algorithm for the MFN is less efficient than for an HMM of the same size However, the MFN is capable of using distributed representations of the hidden state, and this can make it exponentially more efficient than an HMM when modelling strings produced by a generator that itself has componential states We view this type of MFN as a way of allowing more powerful representations without abandoning the automatic parameter estimation procedures that have allowed relatively simple models like HMM's to outperform complex AI representations on real tasks

...read moreread less

40 citations

Journal Article•DOI•

Speech recognition techniques

[...]

Peter Grant¹•Institutions (1)

University of Edinburgh¹

01 Feb 1991-Electronics & Communication Engineering Journal

TL;DR: These signal-processor-intensive transform and graph-search-based pattern-matching techniques are reviewed and currently achievable recognition accuracies are reported.

...read moreread less

Abstract: Transformation of a segment of acoustic signal, by processing into a vectorial representation such as the spectrum, can permit the identification of the constituent phonemes within spoken speech. Subsequent comparison against a previously stored representation using techniques such as dynamic time warping or hidden Markov modelling then permits a speech recognition operation to be accomplished. These signal-processor-intensive transform and graph-search-based pattern-matching techniques are reviewed and currently achievable recognition accuracies are reported.

...read moreread less

35 citations

Proceedings Article•DOI•

Velocity and acceleration features in speaker recognition

[...]

J.S. Mason, X. Zhang

14 Apr 1991

TL;DR: It is demonstrated that while the static feature gives the best individual performance, multiple linear combinations of feature sets based on regression analysis can reduce error rates.

...read moreread less

Abstract: The performance of dynamic features in automatic speaker recognition is examined. Second- and third-order regression analysis examining the performance of the associated feature sets independently, in combination, and in the presence of noise is included. It is shown that each regression order has a clear optimum. These are independent of the analysis order of the static feature from which the dynamic features are derived, and insensitive to low-level noise added to the test speech. It is also demonstrated that while the static feature gives the best individual performance, multiple linear combinations of feature sets based on regression analysis can reduce error rates. >

...read moreread less

33 citations

Proceedings Article•DOI•

Vector-quantization-based speech recognition and speaker recognition techniques

[...]

S. Furui

04 Nov 1991

TL;DR: It is concluded that not only has the VQ technique reduced the amount of computation and storage, but it has also created new ideas for solving various problems in speech/speaker recognition.

...read moreread less

Abstract: The author reviews major methods of applying the vector quantization (VQ) technique to speech and speaker recognition. These include speech recognition based on the combination of VQ and the DTW/HMM (dynamic time warping/hidden Markov model) technique. VQ-distortion-based recognition, learning VQ algorithms, speaker adaptation by VQ-codebook mapping, and VQ-distortion-based speaker recognition. It is concluded that not only has the VQ technique reduced the amount of computation and storage, but it has also created new ideas for solving various problems in speech/speaker recognition. >

...read moreread less

30 citations

Proceedings Article•DOI•

Robust speech parameters extraction for word recognition in noise using neural networks

[...]

L. Barbier¹, Gérard Chollet¹•Institutions (1)

Télécom ParisTech¹

14 Apr 1991

TL;DR: An attempt was made to enhance the performance of a DTW (dynamic time warping) speech recognizer by preprocessing speech parameters using a neural network transformation, using a multilayer perceptron trained with speech utterances of a single speaker.

...read moreread less

Abstract: An attempt was made to enhance the performance of a DTW (dynamic time warping) speech recognizer by preprocessing speech parameters using a neural network transformation. A multilayer perceptron trained with speech utterances of a single speaker has been used in front of a DTW recognizer. Results show an improvement of about 15% in the recognition rate in all cases, even with a speaker that was not used for training. If the network is not completely speaker independent, a dynamic adaptation to the speaker could be performed. >

...read moreread less

21 citations

Proceedings Article•DOI•

Multiple neural network topologies applied to keyword spotting

[...]

D.P. Morgan, C.L. Scofield¹, J.E. Adcock¹•Institutions (1)

Bell Labs¹

14 Apr 1991

TL;DR: The authors describe several experiments in which the use of artificial neural networks for the continuous speech speaker-independent keyword recognition problem was investigated and methodologies for reducing a primary keyword spotting system's susceptibility to false alarms while maintaining recognition accuracy are discussed.

...read moreread less

Abstract: The authors describe several experiments in which the use of artificial neural networks (ANNs) for the continuous speech speaker-independent keyword recognition problem was investigated. They discuss methodologies for reducing a primary keyword spotting system's susceptibility to false alarms while maintaining recognition accuracy. The keyword spotter uses a conventional dynamic time warping algorithm to detect the start- and end-point of each potential keyword. The ANNs serve as a secondary processing stage for this segmented utterance. The ANNs attempt to classify this utterance by formulating the recognition problem as a pattern matching problem. In the hybrid network experiments, the utterance was processed into features derived from the activation at the hidden layer of a back-propagation trained network. Hybrid representations were grouped with two other feature representations in a multiple neural network system. A recognition accuracy of 78% on the Stonehenge X database was obtained while rejecting 72% of the false alarms which were detected by the primary keyword spotting system. >

...read moreread less

19 citations

Proceedings Article•DOI•

A study on speaker-adaptive speech recognition

[...]

Xuedong Huang

19 Feb 1991

TL;DR: In this paper, DARPA Resource Management task is used as the domain to investigate the performance of speaker-adaptive speech recognition, and preliminary results indicate that speaker differences can be well minimized.

...read moreread less

Abstract: Speaker-independent system is desirable in many applications where speaker-specific data do not exist. However, if speaker-dependent data are available, the system could be adapted to the specific speaker such that the error rate could be significantly reduced. In this paper, DARPA Resource Management task is used as the domain to investigate the performance of speaker-adaptive speech recognition. Since adaptation is based on speaker-independent systems with only limited adaptation data, a good adaptation algorithm should be consistent with the speaker-independent parameter estimation criterion, and adapt those parameters that are less sensitive to the limited training data. Two parameter sets, the codebook mean vector and the output distribution, are regarded to be most important. They are modified in the framework of maximum likelihood estimation criterion according to the characteristics of each speaker. In order to reliably estimate those parameters, output distributions are shared with each other if they exhibit certain acoustic similarity. In addition to modify these parameters, speaker normalization with neural networks is also studied in the hope that acoustic data normalization will not only rapidly adapt the system but also enhance the robustness of speaker-independent speech recognition. Preliminary results indicate that speaker differences can be well minimized. In comparison with speaker-independent speech recognition, the error rate has been reduced from 4.3% to 3.1% by only using parameter adaptation techniques, with 40 adaptation sentences for each speaker. When the number of speaker adaptation sentences is comparable to that of speaker-dependent training, speaker-adaptive recognition works better than the best speaker-dependent recognition results on the same test set, which indicates the robustness of speaker-adaptive speech recognition.

...read moreread less

Journal Article•DOI•

Computer analysis of bird sounds: a guide to current methods

[...]

J.M. Williams, Peter J. B. Slater

01 Jan 1991-Bioacoustics-the International Journal of Animal Sound and Its Recording

TL;DR: Bioacoustics researchers can use a computer as a powerful tool to measure, classify, compare and synthetise sounds, but variations in both the time and frequency dimensions of a noise are a problem.

...read moreread less

Abstract: Bioacoustics researchers can use a computer as a powerful tool to measure, classify, compare and synthetise sounds. Vocalisations on tape are commonly converted to a digital format suitable for a computer by using an analogue to digital converter and then a Fourier transformation. Alternatively, sonagrams can be measured, for example by using a digitising pad or an image analysis system. Correlations and indices of similarity have been used to compare sounds, but variations in both the time and frequency dimensions of a noise are a problem. A solution may be the use of pattern recognition methods such as elastic matching and time warping. These methods are briefly described and assessed.

...read moreread less

Proceedings Article•DOI•

Modular connectionist structure for 100-word recognition

[...]

H. Hackbarth¹, J. Mantel¹•Institutions (1)

Alcatel-Lucent¹

08 Jul 1991

TL;DR: Under the criterion of high recognition rates along with very short reaction time, modular subnet assemblies provide for successful recognition of 100 words and are also recommended for speaker-independent classification.

...read moreread less

Abstract: A modular neural structure is presented which is dedicated to the recognition of larger vocabularies. It contains several so-called scaly subnets, assembled into a compound network by neural glue elements. This architecture and the corresponding training scheme have been investigated for various network parameters during speaker-dependent 100-word recognition. Important simulation results were compared with a multilayer perceptron showing scaly input-to-hidden connections and with standard dynamic time warping. Under the criterion of high recognition rates along with very short reaction time, modular subnet assemblies provide for successful recognition of 100 words. They are also recommended for speaker-independent classification. >

...read moreread less

Proceedings Article•DOI•

Phoneme recognition by combining discriminant analysis and HMM

[...]

Tatsuya Kawahara¹, Shuji Doshita¹•Institutions (1)

Kyoto University¹

14 Apr 1991

TL;DR: Experimental results showed that the combined method is more effective than the conventional VQ (vector quantization) HMM and that utilizing the score of the classifier on the local features is significant.

...read moreread less

Abstract: The authors present a novel phoneme recognition method which combines two stochastic methods; discriminant analysis and the hidden Markov model (HMM) method. The HMM is powerful in time-warping and in capturing the global dynamic features, but its discriminating ability is not sufficient. The approach used is to construct the HMM with a phonetic element classifier front-end. Each phonetic element belongs to one phoneme and represents a local pattern of the phoneme. The classifier is a modified version of discriminant analysis, that is, a combination of Bayes classifiers. It extracts optimal features to separate the phonetic elements and consequently contributes to separate HMMs of the phonemes. Furthermore, the score of the classifier is combined with that of the HMM. Since the classifier is based on a statistical method, combination of the scores is straightforward both in theory and in practice. Experimental results showed that the combined method is more effective than the conventional VQ (vector quantization) HMM and that utilizing the score of the classifier on the local features is significant. >

...read moreread less

Proceedings Article•

Time-Warping Network: A Hybrid Framework for Speech Recognition

[...]

Esther Levin¹, Roberto Pieraccini¹, Enrico Bocchieri¹•Institutions (1)

Bell Labs¹

02 Dec 1991

TL;DR: A time-warping neuron is defined that extends the operation of the formal neuron of a back-propagation network by warping the input pattern to match it optimally to its weights.

...read moreread less

Abstract: Recently, much interest has been generated regarding speech recognition systems based on Hidden Markov Models (HMMs) and neural network (NN) hybrids. Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we define a time-warping (TW) neuron that extends the operation of the formal neuron of a back-propagation network by warping the input pattern to match it optimally to its weights. We show that a single-layer network of TW neurons is equivalent to a Gaussian density HMM-based recognition system, and we propose to improve the discriminative power of this system by using back-propagation discriminative training, and/or by generalizing the structure of the recognizer to a multi-layered net. The performance of the proposed network was evaluated on a highly confusable, isolated word, multi speaker recognition task. The results indicate that not only does the recognition performance improve, but the separation between classes is enhanced also, allowing us to set up a rejection criterion to improve the confidence of the system.

...read moreread less

Journal Article•DOI•

A shunting multilayer perceptron network for confusing/composite pattern recognition

[...]

Chung-Hsien Wu¹, Jhing-Fa Wang¹, Wen-Horng Wu¹•Institutions (1)

National Cheng Kung University¹

01 Nov 1991-Pattern Recognition

TL;DR: Comparisons to MLP, dynamic time warping algorithm, and nearest neighbor classifier showed satisfactory improvements in recognition accuracy for confusing/composite patterns.

...read moreread less

Journal Article•DOI•

Large-vocabulary spoken word recognition using time-delay neural network phoneme spotting and predictive lr-parsing

[...]

Yasuhiro Minami¹, Hidefumi Sawai, Masanori Miyatake•Institutions (1)

Keio University¹

01 Jan 1991-Systems and Computers in Japan

TL;DR: This paper proposes a large-vocabulary speech recognition system using a phoneme spotting method by a time-delay neural network (TDNN) and a predictive LR parser and results via TDNN are realized using a DTW (dynamic time warping) method.

...read moreread less

Abstract: This paper proposes a large-vocabulary speech recognition system using a phoneme spotting method by a time-delay neural network (TDNN) and a predictive LR parser. This is the first attempt to recognize large vocabulary speech using neural networks. The prediction of phonemes in words is performed by a predictive LR parser. Time alignment between predicted phonemes by the predicted LR parser and phoneme spotting results via TDNN is realized using a DTW (dynamic time warping) method. Speaker-dependent recognition for a 5240-word vocabulary using 2620 test words uttered by a male announcer resulted in a rate of 92.6 percent for the top choices, rates of 97.6 and 99.1 percent for the second and fifth choices, respectively.

...read moreread less

Journal Article•DOI•

A modularized processor LSI with a highly parallel structure for continuous speech recognition

[...]

J. Takahashi, S. Hamaguchi, K. Tansho, T. Kimura

01 Jun 1991-IEEE Journal of Solid-state Circuits

TL;DR: Combining the PE-LSI architecture with the proposed array architecture for highly parallel dynamic time warping (DTW) processing, a real-time continuous speech recognition system based on continuous dynamic programming matching using the SPLIT method for a 1000-word vocabulary can be constructed using a ring array processor consisting of 30 PEs.

...read moreread less

Abstract: A speech recognition processor CMOS LSI was developed as the processing element (PE) of a ring array processor previously proposed by the authors as architecture to carry out highly parallel recognition processing with array size flexibility. There are three key features for the LSI: (1) a highly parallel I/O structure of triple buffer with cyclical-mode transition control methods to solve the serious problem of inter-PE data transfer overhead versus the array processing; (2) a control structure with two direct memory access (DMA) controllers to realize inter-PE data I/O processing and intra-PE processing in parallel; and (3) a pipelined recognition processing at a high execution rate realized by a pipelined structure and a balanced clock distribution design technique. These effective designs for the PE LSI allow high-speed recognition processing without any inter-PE data transfer overhead in the ring array processor. Combining the PE-LSI architecture with the proposed array architecture for highly parallel dynamic time warping (DTW) processing, a real-time continuous speech recognition system based on continuous dynamic programming matching using the SPLIT method for a 1000-word vocabulary, can be constructed using a ring array processor consisting of 30 PEs. >

...read moreread less

Journal Article•DOI•

Speaker-independent recognition of isolated words using concatenated neural networks

[...]

Chung-Hsien Wu¹, Jhing-Fa Wang¹, Chaug-Ching Huang¹, Jau-Yien Lee¹•Institutions (1)

National Cheng Kung University¹

01 Dec 1991-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: Comparisons with K-means and DTW algorithms show that the integration of the splitting LVQ and LVQ2 algorithms makes this system well suited to speaker-independent isolated word recognition.

...read moreread less

Abstract: A speaker-independent isolated word recognizer is proposed. It is obtained by concatenating a Bayesian neural network and a Hopfield time-alignment network. In this system, the Bayesian network outputs the a posteriori probability for each speech frame, and the Hopfield network is then concatenated for time warping. A proposed splitting Learning Vector Quantization (LVQ) algorithm derived from the LBG clustering algorithm and the Kohonen LVQ algorithm is first used to train the Bayesian network. The LVQ2 algorithm is subsequently adopted as a final refinement step. A continuous mixture of Gaussian densities for each frame and multi-templates for each word are employed to characterize each word pattern. Experimental evaluation of this system with four templates/word and five mixtures/frame, using 53 speakers (28 males, 25 females) and isolated words (10 digits and 30 city names) databases, gave average recognition accuracies of 97.3%, for the speaker-trained mode and 95.7% for the speaker-independent mode, respectively. Comparisons with K-means and DTW algorithms show that the integration of the splitting LVQ and LVQ2 algorithms makes this system well suited to speaker-independent isolated word recognition. A cookbook approach for the determination of parameters in the Hopfield time-alignment network is also described.

...read moreread less

Proceedings Article•DOI•

Speech recognition using dynamic features of acoustic subword spectra

[...]

K.L. Brown¹, V.R. Algazi¹•Institutions (1)

University of California, Davis¹

14 Apr 1991

TL;DR: A novel approach for speech signal analysis has been developed that incorporates both steady-state and dynamic spectral features into a unified model that has been successfully applied in automatic speech recognition contexts and does not require frame-based optimal search algorithms.

...read moreread less

Abstract: A novel approach for speech signal analysis has been developed that incorporates both steady-state and dynamic spectral features into a unified model. This model has been successfully applied in automatic speech recognition contexts and does not require frame-based optimal search algorithms. The model decomposes an utterance into a chain of acoustic subwords and simultaneously generates a mathematical description of instantaneous acoustic-phonetic features and dynamic transitions. The algorithm was tested using a speaker-dependent limited vocabulary recognition task and achieved higher recognition rates than both vector quantization and hidden Markov models. >

...read moreread less

Proceedings Article•DOI•

Eyes free dialing for cellular telephones

[...]

G. Bendelac¹, I.D. Shallom¹•Institutions (1)

DSP Group¹

19 May 1991

TL;DR: The authors describe a chip set that implements voice dialing for cellular telephones using isolated-word speaker-dependent speech-recognition algorithms and incorporates speech synthesis for guiding, prompting, and verification by vocal feedback.

...read moreread less

Abstract: The authors describe a chip set that implements voice dialing for cellular telephones using isolated-word speaker-dependent speech-recognition algorithms. High recognition rates of over 98% in noisy car cabins are achieved using noise-immune techniques for preprocessing, feature extraction, template training, and template comparison. A unique work endpoint detector is realized with a modified dynamic time warping algorithm and a noise-robust voice-activated switch. The chip set supports up to two users and a vocabulary of 32 words. It incorporates speech synthesis for guiding, prompting, and verification by vocal feedback, and allows dialing digit by digit or speed dialing of up to 10 stored numbers invoked with user-defined phrases. >

...read moreread less

Proceedings Article•DOI•

Eyes free dialing for cellular telephones

[...]

G. Bendelac¹, I.D. Shallom¹, I. Markovitch¹•Institutions (1)

DSP Group¹

05 Mar 1991

TL;DR: This paper describes a chip set, the DSPG5006, that implements digital signal processing algorithms for voice dialing in cellular telephones that supports an isolated word, speaker dependent, speech recognition system built around a modified dynamic time warping algorithm.

...read moreread less

Abstract: This paper describes a chip set, the DSPG5006, that implements digital signal processing algorithms for voice dialing in cellular telephones. The problem of achieving high performance speech recognition in noisy car cabins is addressed by using noise immune techniques for preprocessing, feature extraction, template training and template comparison. The chip set supports an isolated word, speaker dependent, speech recognition system built around a modified dynamic time warping algorithm. A speech specific voice activated switch is used in conjunction with a simplified endpoint detector. A two-stage training mode permits building of compact and reliable templates that are compared with incoming words in the recognition mode. Recognition rates are greater than 98% in car cabins, and over 99% in the laboratory. The chip set can be configured to support speech synthesis, so that the user is guided in a friendly manner throughout the use of the system for both prompting and verification. >

...read moreread less

Proceedings Article•DOI•

Speech recognition using a sequential neural network

[...]

W. Luz¹, Y. Kobayashi¹, Y. Niimi¹•Institutions (1)

Kyoto Institute of Technology¹

18 Nov 1991

TL;DR: A hybrid model of the neural network and the DTW (dynamic time warping) algorithm, where the activation values of the ordered states of a pattern class are summed up to evaluate the likelihood that an input pattern belongs to the class.

...read moreread less

Abstract: The authors propose a hybrid model of the neural network and the DTW (dynamic time warping) algorithm. The model is basically a state transition system. Each state of the model has a neural network which is activated for some portion of a sequential speech pattern, for example, the first consonantal part of a monosyllable. States of the model are partially ordered corresponding to classes of sequential patterns. Based on the DTW algorithm, the activation values of the ordered states of a pattern class are summed up to evaluate the likelihood that an input pattern belongs to the class. This model has been applied to the discrimination among monosyllables like mod ba mod , mod da mod , and mod ga mod . >

...read moreread less

Journal Article•DOI•

Speaker‐independent speech recognition using a neural prediction model

[...]

Ken-ichi Iso¹, Takao Watanabe¹•Institutions (1)

NEC¹

01 Jan 1991-Electronics and Communications in Japan Part Iii-fundamental Electronic Science

TL;DR: This paper proposes a speech recognition system based on the pattern prediction using neural network and an iterative algorithm combining the dynamic programming and the error backpropagation is proposed, together with the proof for the convergence.

...read moreread less

Abstract: This paper proposes a speech recognition system based on the pattern prediction using neural network. In the proposed system, an independent nonlinear predictor composed of a series of multilayer perceptrons (MLP) is prepared for each class which is the object of recognition. The temporal structure of the speech pattern, especially the temporal correlation structure between feature vector sequence, is represented by the nonlinear mapping between the input and the output, and is utilized as the important feature in the recognition. On the other hand, the variation of the temporal structure of the speech pattern, due to the difference of speakers and the fluctuation of the utterance, is normalized by the dynamic programming. As the training algorithm to determine the MLP parameters composing each predictor, an iterative algorithm combining the dynamic programming and the error backpropagation is proposed, together with the proof for the convergence. A speaker independent isolated digit recognition experiment is executed to examine the basic operation of the proposed system. The parameters are estimated in a satisfactory way even from a small number of training data, and it is indicated that a high recognition performance is realized.

...read moreread less

Journal Article•DOI•

Speech recognition by a self-organizing feature finder

[...]

Solomon Zvi Lerner, John R. Deller¹•Institutions (1)

Michigan State University¹

01 Jan 1991-International Journal of Neural Systems

TL;DR: A self-organizing neural network is presented which automatically learns the number and type of spectral features from speech examples and the “strength” of the presence of the learned features is registered by the network to effect recognition of further speech presentations.

...read moreread less

Abstract: A self-organizing neural network is presented which automatically learns the number and type of spectral features from speech examples. The learning algorithm is analyzed with respect to its convergence and stability properties. The “strength” of the presence of the learned features is registered by the network to effect recognition of further speech presentations. The network consists of two layers of feature detectors, each layer of which is self-organized, and the outputs of the second layer are time-aligned in the present design using dynamic time warping. The significance of the two-layer structure, as well as general architectural advantages of the network, are discussed. Results of experiments involving various isolated word recognition tasks, including single and multi-speaker training and recognition, and the recognition of speech of a nonverbal individual, are reported.

...read moreread less

Journal Article•DOI•

A probability decision criterion for speech recognition

[...]

C.K. Yu¹, P. C. Ching¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Jan 1991-Computer Speech & Language

TL;DR: Overall accuracy of 96% has been obtained for speaker independent recognition of a small vocabulary and the simplicity of the algorithm enables a low-cost real-time implementation of the recognizer.

...read moreread less

Proceedings Article•DOI•

A space-perturbance/time-delay neural network for speech recognition

[...]

Ji Ming, Chen Huihuang, Shen Zhenkang

30 Sep 1991

TL;DR: By introducing the space-perturbance arrangement, the SPTDNN has the ability to be robust to both temporal and dynamic acoustic variance of speech features, thus, is a potentially component approach to speaker-independent and/or noisy speech recognition.

...read moreread less

Abstract: The authors present a space-perturbance time-delay neural network (SPTDNN), which is a generalization of the time-delay neural network (TDNN) approach. It is shown that by introducing the space-perturbance arrangement, the SPTDNN has the ability to be robust to both temporal and dynamic acoustic variance of speech features, thus, is a potentially component approach to speaker-independent and/or noisy speech recognition. The authors introduce the architecture, learning algorithm, and theoretical evaluation of the SPTDNN, along with experimental results. Experimental comparisons show that the SPTDNN obtains a performance that improves upon the TDNN for both speaker-dependent/-independent and noisy phoneme recognition. >

...read moreread less

Journal Article•

Antagonistic Stiffness Characteristics in Robotic Linkage Systems

[...]

Byung-Ju Yi, Sang-Kee Song, Whang Cho

01 Jan 1991-The Journal of the Acoustical Society of Korea

TL;DR: It is proved that recogntion rate by DTW using modified dynamic averaging method is the best as 97.6 percent.

...read moreread less

Abstract: This paper is a study on isolated word recognition by independent speaker, we propose DTW speech recognition system by modified dynamic averaging method as reference pattern. 57 city names are selected as recognition vocabulary and 2th LPC cepstrum coefficients are used as the feature parameter. In this paper, besides recognition experiment using modified dynamic averaging method as reference pattern, we perform recognition experiments using causal method, dynamic averaging method, linear averaging method and clustering method with the same data in the same conditions for comparison with it. Through the experiment result, it is proved that recogntion rate by DTW using modified dynamic averaging method is the best as 97.6 percent.

...read moreread less

Book Chapter•DOI•

An Adaptive Resonance Theory Architecture for the Automatic Recognition of on-line Handwritten Symbols of a Mathematical Editor

[...]

Yannis Dimitriadis¹, Juan López Coronado¹, José Luis Contreras Vidal²•Institutions (2)

University of Valladolid¹, Boston University²

17 Sep 1991

TL;DR: An architecture based on neural modules of the Adaptive Resonance Theory (ART) is proposed, for recognizing handwritten symbols employed in an on-line mathematical editor, thus defining a run-on time discrete symbol as a sequence of strokes.

...read moreread less

Abstract: An architecture based on neural modules of the Adaptive Resonance Theory (ART) is proposed, for recognizing handwritten symbols employed in an on-line mathematical editor. The dynamic information generated during the handwriting process is used by the system, thus defining a run-on time discrete symbol as a sequence of strokes. An ART2 module is used to classify each individual stroke, while a Recurrent Competitive Field (RCF) is employed in order to classify the sequence of the strokes. ARTMAP modules are also proposed for the association of the different versions of strokes and symbols. Preliminary results of the application are very encouraging.

...read moreread less

Journal Article•

A system for prosodic transplantation with research applications

[...]

Wde Werner Verhelst¹•Institutions (1)

Vrije Universiteit Brussel¹

01 Jan 1991-IPO Annual Progress Report

TL;DR: A transplantation system that can be used to interchange selected pro-sodie features among repetitions of the same utterance by the same speaker is presented and applications to the study of a number of important problems in text-to-speech synthesis and speech perception are suggested.

...read moreread less

Abstract: In this paper we present a transplantation system that can be used to interchange selected pro-sodie features among repetitions of the same utterance by the same speaker. The system is based on the time-domain pitch-synchronous overlap-and-add (TD-PSOLA) algorithm that allows high-quality prosodic modifieations of speech and uses dynamic time warping (DTW) for proper time alignment of transplanted features. The quality of the system is evaluated in relation to the underlying TD-PSOLA algorithm. Applications to the study of a number of important problems in text-to-speech synthesis and speech perception are suggested.

...read moreread less