scispace - formally typeset
Search or ask a question

Showing papers on "Dynamic time warping published in 1991"


Proceedings ArticleDOI
30 Sep 1991
TL;DR: The author proposes a time-warping neural network (TWNN) for phoneme-based speech recognition that demonstrates higher phoneme recognition accuracy than a baseline recognizer based on conventional feedforward neural networks and linear time alignment.
Abstract: The author proposes a time-warping neural network (TWNN) for phoneme-based speech recognition. The TWNN is designed to accept phonemes with arbitrary duration, whereas conventional phoneme recognition networks have a fixed-length input window. The purpose of this network is to cope with not only variability of phoneme duration but also time warping in a phoneme. The proposed network is composed of several time-warping units which each have a time-warping function. The TWNN is characterized by time-warping functions embedded between the input layer and the first hidden layer in the network. The proposed network demonstrates higher phoneme recognition accuracy than a baseline recognizer based on conventional feedforward neural networks and linear time alignment. The recognition accuracy is even higher than that achieved with discrete hidden Markov models. >

165 citations


Proceedings ArticleDOI
30 Sep 1991
TL;DR: A family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM).
Abstract: The authors developed a generalized probabilistic descent (GPD) method by extending the classical theory on adaptive training by Amari (1967). Their generalization makes it possible to treat dynamic patterns (of a variable duration or dimension) such as speech as well as static patterns (of a fixed duration or dimension), for pattern classification problems. The key ideas of GPD formulations include the embedding of time normalization and the incorporation of smooth classification error functions into the gradient search optimization objectives. As a result, a family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM). Experimental results are also provided to show the superiority of this new family of GPD-based, adaptive training algorithms for speech recognition. >

159 citations


Proceedings ArticleDOI
18 Nov 1991
TL;DR: The author investigates a feedforward neural network that can accept phonemes with an arbitrary duration coping with nonlinear time warping and demonstrated higher phoneme recognition accuracy than the baseline recognizer based on conventional feed forward neural networks.
Abstract: The author investigates a feedforward neural network that can accept phonemes with an arbitrary duration coping with nonlinear time warping The time-warping neural network is characterized by the time-warping functions embedded between the input layer and the first hidden layer in the network The input layer accesses three different time points The accessing points are determined by the time-warping functions The input spectrum sequence itself is not warped but the accessing-point sequence is warped The advantage of this network architecture is that the input layer can access the original spectrum sequence The proposed network demonstrated higher phoneme recognition accuracy than the baseline recognizer based on conventional feedforward neural networks The recognition accuracy was even higher than that achieved with discrete hidden Markov models >

113 citations


Journal ArticleDOI
TL;DR: The authors present two algorithms for performing shape matching on ice floe boundaries in SAR (synthetic aperture radar) images that use normalized correlation to match the psi -s curves, while the second uses dynamic programming to compute an elastic match that better accommodatesIce floe deformation.
Abstract: The authors present two algorithms for performing shape matching on ice floe boundaries in SAR (synthetic aperture radar) images. These algorithms quickly produce a set of ice motion and rotation vectors that can be used to guide a pixel value correlator. The algorithms match a shape descriptor known as the psi -s curve. The first algorithm uses normalized correlation to match the psi -s curves, while the second uses dynamic programming to compute an elastic match that better accommodates ice floe deformation. Some empirical data on the performance of the algorithms on Seasat SAR images are presented. >

92 citations


Book ChapterDOI
01 Jan 1991
TL;DR: This type of mean field network (MFN) with tied weights that is capable of approximating the recognizer for a hidden markov model (HMM) is presented as a way of allowing more powerful representations without abandoning the automatic parameter estimation procedures.
Abstract: Neural networks can be used to discriminate between very similar phonemes and they can handle the variability in time of occurrence by using a time-delay architecture followed by a temporal integration (Lang, Hinton and Waibel, 1990) So far, however, neural networks have been less successful at handling longer duration events that require something equivalent to “time warping” in order to match stored knowledge to the data We present a type of mean field network (MFN) with tied weights that is capable of approximating the recognizer for a hidden markov model (HMM) In the process of settling to a stable state, the MFN finds a blend of likely ways of generating the input string given its internal model of the probabilities of transitions between hidden states and the probabilities of input symbols given a hidden state This blend is a heuristic approximation to the full set of path probabilities that is implicitly represented by an HMM recognizer The learning algorithm for the MFN is less efficient than for an HMM of the same size However, the MFN is capable of using distributed representations of the hidden state, and this can make it exponentially more efficient than an HMM when modelling strings produced by a generator that itself has componential states We view this type of MFN as a way of allowing more powerful representations without abandoning the automatic parameter estimation procedures that have allowed relatively simple models like HMM's to outperform complex AI representations on real tasks

40 citations


Journal ArticleDOI
TL;DR: These signal-processor-intensive transform and graph-search-based pattern-matching techniques are reviewed and currently achievable recognition accuracies are reported.
Abstract: Transformation of a segment of acoustic signal, by processing into a vectorial representation such as the spectrum, can permit the identification of the constituent phonemes within spoken speech. Subsequent comparison against a previously stored representation using techniques such as dynamic time warping or hidden Markov modelling then permits a speech recognition operation to be accomplished. These signal-processor-intensive transform and graph-search-based pattern-matching techniques are reviewed and currently achievable recognition accuracies are reported.

35 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: It is demonstrated that while the static feature gives the best individual performance, multiple linear combinations of feature sets based on regression analysis can reduce error rates.
Abstract: The performance of dynamic features in automatic speaker recognition is examined. Second- and third-order regression analysis examining the performance of the associated feature sets independently, in combination, and in the presence of noise is included. It is shown that each regression order has a clear optimum. These are independent of the analysis order of the static feature from which the dynamic features are derived, and insensitive to low-level noise added to the test speech. It is also demonstrated that while the static feature gives the best individual performance, multiple linear combinations of feature sets based on regression analysis can reduce error rates. >

33 citations


Proceedings ArticleDOI
04 Nov 1991
TL;DR: It is concluded that not only has the VQ technique reduced the amount of computation and storage, but it has also created new ideas for solving various problems in speech/speaker recognition.
Abstract: The author reviews major methods of applying the vector quantization (VQ) technique to speech and speaker recognition. These include speech recognition based on the combination of VQ and the DTW/HMM (dynamic time warping/hidden Markov model) technique. VQ-distortion-based recognition, learning VQ algorithms, speaker adaptation by VQ-codebook mapping, and VQ-distortion-based speaker recognition. It is concluded that not only has the VQ technique reduced the amount of computation and storage, but it has also created new ideas for solving various problems in speech/speaker recognition. >

30 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: An attempt was made to enhance the performance of a DTW (dynamic time warping) speech recognizer by preprocessing speech parameters using a neural network transformation, using a multilayer perceptron trained with speech utterances of a single speaker.
Abstract: An attempt was made to enhance the performance of a DTW (dynamic time warping) speech recognizer by preprocessing speech parameters using a neural network transformation. A multilayer perceptron trained with speech utterances of a single speaker has been used in front of a DTW recognizer. Results show an improvement of about 15% in the recognition rate in all cases, even with a speaker that was not used for training. If the network is not completely speaker independent, a dynamic adaptation to the speaker could be performed. >

21 citations


Proceedings ArticleDOI
14 Apr 1991
TL;DR: The authors describe several experiments in which the use of artificial neural networks for the continuous speech speaker-independent keyword recognition problem was investigated and methodologies for reducing a primary keyword spotting system's susceptibility to false alarms while maintaining recognition accuracy are discussed.
Abstract: The authors describe several experiments in which the use of artificial neural networks (ANNs) for the continuous speech speaker-independent keyword recognition problem was investigated. They discuss methodologies for reducing a primary keyword spotting system's susceptibility to false alarms while maintaining recognition accuracy. The keyword spotter uses a conventional dynamic time warping algorithm to detect the start- and end-point of each potential keyword. The ANNs serve as a secondary processing stage for this segmented utterance. The ANNs attempt to classify this utterance by formulating the recognition problem as a pattern matching problem. In the hybrid network experiments, the utterance was processed into features derived from the activation at the hidden layer of a back-propagation trained network. Hybrid representations were grouped with two other feature representations in a multiple neural network system. A recognition accuracy of 78% on the Stonehenge X database was obtained while rejecting 72% of the false alarms which were detected by the primary keyword spotting system. >

19 citations


Proceedings ArticleDOI
19 Feb 1991
TL;DR: In this paper, DARPA Resource Management task is used as the domain to investigate the performance of speaker-adaptive speech recognition, and preliminary results indicate that speaker differences can be well minimized.
Abstract: Speaker-independent system is desirable in many applications where speaker-specific data do not exist. However, if speaker-dependent data are available, the system could be adapted to the specific speaker such that the error rate could be significantly reduced. In this paper, DARPA Resource Management task is used as the domain to investigate the performance of speaker-adaptive speech recognition. Since adaptation is based on speaker-independent systems with only limited adaptation data, a good adaptation algorithm should be consistent with the speaker-independent parameter estimation criterion, and adapt those parameters that are less sensitive to the limited training data. Two parameter sets, the codebook mean vector and the output distribution, are regarded to be most important. They are modified in the framework of maximum likelihood estimation criterion according to the characteristics of each speaker. In order to reliably estimate those parameters, output distributions are shared with each other if they exhibit certain acoustic similarity. In addition to modify these parameters, speaker normalization with neural networks is also studied in the hope that acoustic data normalization will not only rapidly adapt the system but also enhance the robustness of speaker-independent speech recognition. Preliminary results indicate that speaker differences can be well minimized. In comparison with speaker-independent speech recognition, the error rate has been reduced from 4.3% to 3.1% by only using parameter adaptation techniques, with 40 adaptation sentences for each speaker. When the number of speaker adaptation sentences is comparable to that of speaker-dependent training, speaker-adaptive recognition works better than the best speaker-dependent recognition results on the same test set, which indicates the robustness of speaker-adaptive speech recognition.

Journal ArticleDOI
TL;DR: Bioacoustics researchers can use a computer as a powerful tool to measure, classify, compare and synthetise sounds, but variations in both the time and frequency dimensions of a noise are a problem.
Abstract: Bioacoustics researchers can use a computer as a powerful tool to measure, classify, compare and synthetise sounds. Vocalisations on tape are commonly converted to a digital format suitable for a computer by using an analogue to digital converter and then a Fourier transformation. Alternatively, sonagrams can be measured, for example by using a digitising pad or an image analysis system. Correlations and indices of similarity have been used to compare sounds, but variations in both the time and frequency dimensions of a noise are a problem. A solution may be the use of pattern recognition methods such as elastic matching and time warping. These methods are briefly described and assessed.

Proceedings ArticleDOI
H. Hackbarth1, J. Mantel1
08 Jul 1991
TL;DR: Under the criterion of high recognition rates along with very short reaction time, modular subnet assemblies provide for successful recognition of 100 words and are also recommended for speaker-independent classification.
Abstract: A modular neural structure is presented which is dedicated to the recognition of larger vocabularies. It contains several so-called scaly subnets, assembled into a compound network by neural glue elements. This architecture and the corresponding training scheme have been investigated for various network parameters during speaker-dependent 100-word recognition. Important simulation results were compared with a multilayer perceptron showing scaly input-to-hidden connections and with standard dynamic time warping. Under the criterion of high recognition rates along with very short reaction time, modular subnet assemblies provide for successful recognition of 100 words. They are also recommended for speaker-independent classification. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: Experimental results showed that the combined method is more effective than the conventional VQ (vector quantization) HMM and that utilizing the score of the classifier on the local features is significant.
Abstract: The authors present a novel phoneme recognition method which combines two stochastic methods; discriminant analysis and the hidden Markov model (HMM) method. The HMM is powerful in time-warping and in capturing the global dynamic features, but its discriminating ability is not sufficient. The approach used is to construct the HMM with a phonetic element classifier front-end. Each phonetic element belongs to one phoneme and represents a local pattern of the phoneme. The classifier is a modified version of discriminant analysis, that is, a combination of Bayes classifiers. It extracts optimal features to separate the phonetic elements and consequently contributes to separate HMMs of the phonemes. Furthermore, the score of the classifier is combined with that of the HMM. Since the classifier is based on a statistical method, combination of the scores is straightforward both in theory and in practice. Experimental results showed that the combined method is more effective than the conventional VQ (vector quantization) HMM and that utilizing the score of the classifier on the local features is significant. >

Proceedings Article
02 Dec 1991
TL;DR: A time-warping neuron is defined that extends the operation of the formal neuron of a back-propagation network by warping the input pattern to match it optimally to its weights.
Abstract: Recently, much interest has been generated regarding speech recognition systems based on Hidden Markov Models (HMMs) and neural network (NN) hybrids. Such systems attempt to combine the best features of both models: the temporal structure of HMMs and the discriminative power of neural networks. In this work we define a time-warping (TW) neuron that extends the operation of the formal neuron of a back-propagation network by warping the input pattern to match it optimally to its weights. We show that a single-layer network of TW neurons is equivalent to a Gaussian density HMM-based recognition system, and we propose to improve the discriminative power of this system by using back-propagation discriminative training, and/or by generalizing the structure of the recognizer to a multi-layered net. The performance of the proposed network was evaluated on a highly confusable, isolated word, multi speaker recognition task. The results indicate that not only does the recognition performance improve, but the separation between classes is enhanced also, allowing us to set up a rejection criterion to improve the confidence of the system.

Journal ArticleDOI
TL;DR: Comparisons to MLP, dynamic time warping algorithm, and nearest neighbor classifier showed satisfactory improvements in recognition accuracy for confusing/composite patterns.

Journal ArticleDOI
TL;DR: This paper proposes a large-vocabulary speech recognition system using a phoneme spotting method by a time-delay neural network (TDNN) and a predictive LR parser and results via TDNN are realized using a DTW (dynamic time warping) method.
Abstract: This paper proposes a large-vocabulary speech recognition system using a phoneme spotting method by a time-delay neural network (TDNN) and a predictive LR parser. This is the first attempt to recognize large vocabulary speech using neural networks. The prediction of phonemes in words is performed by a predictive LR parser. Time alignment between predicted phonemes by the predicted LR parser and phoneme spotting results via TDNN is realized using a DTW (dynamic time warping) method. Speaker-dependent recognition for a 5240-word vocabulary using 2620 test words uttered by a male announcer resulted in a rate of 92.6 percent for the top choices, rates of 97.6 and 99.1 percent for the second and fifth choices, respectively.

Journal ArticleDOI
TL;DR: Combining the PE-LSI architecture with the proposed array architecture for highly parallel dynamic time warping (DTW) processing, a real-time continuous speech recognition system based on continuous dynamic programming matching using the SPLIT method for a 1000-word vocabulary can be constructed using a ring array processor consisting of 30 PEs.
Abstract: A speech recognition processor CMOS LSI was developed as the processing element (PE) of a ring array processor previously proposed by the authors as architecture to carry out highly parallel recognition processing with array size flexibility. There are three key features for the LSI: (1) a highly parallel I/O structure of triple buffer with cyclical-mode transition control methods to solve the serious problem of inter-PE data transfer overhead versus the array processing; (2) a control structure with two direct memory access (DMA) controllers to realize inter-PE data I/O processing and intra-PE processing in parallel; and (3) a pipelined recognition processing at a high execution rate realized by a pipelined structure and a balanced clock distribution design technique. These effective designs for the PE LSI allow high-speed recognition processing without any inter-PE data transfer overhead in the ring array processor. Combining the PE-LSI architecture with the proposed array architecture for highly parallel dynamic time warping (DTW) processing, a real-time continuous speech recognition system based on continuous dynamic programming matching using the SPLIT method for a 1000-word vocabulary, can be constructed using a ring array processor consisting of 30 PEs. >

Journal ArticleDOI
TL;DR: Comparisons with K-means and DTW algorithms show that the integration of the splitting LVQ and LVQ2 algorithms makes this system well suited to speaker-independent isolated word recognition.
Abstract: A speaker-independent isolated word recognizer is proposed. It is obtained by concatenating a Bayesian neural network and a Hopfield time-alignment network. In this system, the Bayesian network outputs the a posteriori probability for each speech frame, and the Hopfield network is then concatenated for time warping. A proposed splitting Learning Vector Quantization (LVQ) algorithm derived from the LBG clustering algorithm and the Kohonen LVQ algorithm is first used to train the Bayesian network. The LVQ2 algorithm is subsequently adopted as a final refinement step. A continuous mixture of Gaussian densities for each frame and multi-templates for each word are employed to characterize each word pattern. Experimental evaluation of this system with four templates/word and five mixtures/frame, using 53 speakers (28 males, 25 females) and isolated words (10 digits and 30 city names) databases, gave average recognition accuracies of 97.3%, for the speaker-trained mode and 95.7% for the speaker-independent mode, respectively. Comparisons with K-means and DTW algorithms show that the integration of the splitting LVQ and LVQ2 algorithms makes this system well suited to speaker-independent isolated word recognition. A cookbook approach for the determination of parameters in the Hopfield time-alignment network is also described.

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A novel approach for speech signal analysis has been developed that incorporates both steady-state and dynamic spectral features into a unified model that has been successfully applied in automatic speech recognition contexts and does not require frame-based optimal search algorithms.
Abstract: A novel approach for speech signal analysis has been developed that incorporates both steady-state and dynamic spectral features into a unified model. This model has been successfully applied in automatic speech recognition contexts and does not require frame-based optimal search algorithms. The model decomposes an utterance into a chain of acoustic subwords and simultaneously generates a mathematical description of instantaneous acoustic-phonetic features and dynamic transitions. The algorithm was tested using a speaker-dependent limited vocabulary recognition task and achieved higher recognition rates than both vector quantization and hidden Markov models. >

Proceedings ArticleDOI
G. Bendelac1, I.D. Shallom1
19 May 1991
TL;DR: The authors describe a chip set that implements voice dialing for cellular telephones using isolated-word speaker-dependent speech-recognition algorithms and incorporates speech synthesis for guiding, prompting, and verification by vocal feedback.
Abstract: The authors describe a chip set that implements voice dialing for cellular telephones using isolated-word speaker-dependent speech-recognition algorithms. High recognition rates of over 98% in noisy car cabins are achieved using noise-immune techniques for preprocessing, feature extraction, template training, and template comparison. A unique work endpoint detector is realized with a modified dynamic time warping algorithm and a noise-robust voice-activated switch. The chip set supports up to two users and a vocabulary of 32 words. It incorporates speech synthesis for guiding, prompting, and verification by vocal feedback, and allows dialing digit by digit or speed dialing of up to 10 stored numbers invoked with user-defined phrases. >

Proceedings ArticleDOI
05 Mar 1991
TL;DR: This paper describes a chip set, the DSPG5006, that implements digital signal processing algorithms for voice dialing in cellular telephones that supports an isolated word, speaker dependent, speech recognition system built around a modified dynamic time warping algorithm.
Abstract: This paper describes a chip set, the DSPG5006, that implements digital signal processing algorithms for voice dialing in cellular telephones. The problem of achieving high performance speech recognition in noisy car cabins is addressed by using noise immune techniques for preprocessing, feature extraction, template training and template comparison. The chip set supports an isolated word, speaker dependent, speech recognition system built around a modified dynamic time warping algorithm. A speech specific voice activated switch is used in conjunction with a simplified endpoint detector. A two-stage training mode permits building of compact and reliable templates that are compared with incoming words in the recognition mode. Recognition rates are greater than 98% in car cabins, and over 99% in the laboratory. The chip set can be configured to support speech synthesis, so that the user is guided in a friendly manner throughout the use of the system for both prompting and verification. >

Proceedings ArticleDOI
18 Nov 1991
TL;DR: A hybrid model of the neural network and the DTW (dynamic time warping) algorithm, where the activation values of the ordered states of a pattern class are summed up to evaluate the likelihood that an input pattern belongs to the class.
Abstract: The authors propose a hybrid model of the neural network and the DTW (dynamic time warping) algorithm. The model is basically a state transition system. Each state of the model has a neural network which is activated for some portion of a sequential speech pattern, for example, the first consonantal part of a monosyllable. States of the model are partially ordered corresponding to classes of sequential patterns. Based on the DTW algorithm, the activation values of the ordered states of a pattern class are summed up to evaluate the likelihood that an input pattern belongs to the class. This model has been applied to the discrimination among monosyllables like mod ba mod , mod da mod , and mod ga mod . >

Journal ArticleDOI
Ken-ichi Iso1, Takao Watanabe1
TL;DR: This paper proposes a speech recognition system based on the pattern prediction using neural network and an iterative algorithm combining the dynamic programming and the error backpropagation is proposed, together with the proof for the convergence.
Abstract: This paper proposes a speech recognition system based on the pattern prediction using neural network. In the proposed system, an independent nonlinear predictor composed of a series of multilayer perceptrons (MLP) is prepared for each class which is the object of recognition. The temporal structure of the speech pattern, especially the temporal correlation structure between feature vector sequence, is represented by the nonlinear mapping between the input and the output, and is utilized as the important feature in the recognition. On the other hand, the variation of the temporal structure of the speech pattern, due to the difference of speakers and the fluctuation of the utterance, is normalized by the dynamic programming. As the training algorithm to determine the MLP parameters composing each predictor, an iterative algorithm combining the dynamic programming and the error backpropagation is proposed, together with the proof for the convergence. A speaker independent isolated digit recognition experiment is executed to examine the basic operation of the proposed system. The parameters are estimated in a satisfactory way even from a small number of training data, and it is indicated that a high recognition performance is realized.

Journal ArticleDOI
TL;DR: A self-organizing neural network is presented which automatically learns the number and type of spectral features from speech examples and the “strength” of the presence of the learned features is registered by the network to effect recognition of further speech presentations.
Abstract: A self-organizing neural network is presented which automatically learns the number and type of spectral features from speech examples. The learning algorithm is analyzed with respect to its convergence and stability properties. The “strength” of the presence of the learned features is registered by the network to effect recognition of further speech presentations. The network consists of two layers of feature detectors, each layer of which is self-organized, and the outputs of the second layer are time-aligned in the present design using dynamic time warping. The significance of the two-layer structure, as well as general architectural advantages of the network, are discussed. Results of experiments involving various isolated word recognition tasks, including single and multi-speaker training and recognition, and the recognition of speech of a nonverbal individual, are reported.

Journal ArticleDOI
TL;DR: Overall accuracy of 96% has been obtained for speaker independent recognition of a small vocabulary and the simplicity of the algorithm enables a low-cost real-time implementation of the recognizer.

Proceedings ArticleDOI
30 Sep 1991
TL;DR: By introducing the space-perturbance arrangement, the SPTDNN has the ability to be robust to both temporal and dynamic acoustic variance of speech features, thus, is a potentially component approach to speaker-independent and/or noisy speech recognition.
Abstract: The authors present a space-perturbance time-delay neural network (SPTDNN), which is a generalization of the time-delay neural network (TDNN) approach. It is shown that by introducing the space-perturbance arrangement, the SPTDNN has the ability to be robust to both temporal and dynamic acoustic variance of speech features, thus, is a potentially component approach to speaker-independent and/or noisy speech recognition. The authors introduce the architecture, learning algorithm, and theoretical evaluation of the SPTDNN, along with experimental results. Experimental comparisons show that the SPTDNN obtains a performance that improves upon the TDNN for both speaker-dependent/-independent and noisy phoneme recognition. >

Journal Article
TL;DR: It is proved that recogntion rate by DTW using modified dynamic averaging method is the best as 97.6 percent.
Abstract: This paper is a study on isolated word recognition by independent speaker, we propose DTW speech recognition system by modified dynamic averaging method as reference pattern. 57 city names are selected as recognition vocabulary and 2th LPC cepstrum coefficients are used as the feature parameter. In this paper, besides recognition experiment using modified dynamic averaging method as reference pattern, we perform recognition experiments using causal method, dynamic averaging method, linear averaging method and clustering method with the same data in the same conditions for comparison with it. Through the experiment result, it is proved that recogntion rate by DTW using modified dynamic averaging method is the best as 97.6 percent.

Book ChapterDOI
17 Sep 1991
TL;DR: An architecture based on neural modules of the Adaptive Resonance Theory (ART) is proposed, for recognizing handwritten symbols employed in an on-line mathematical editor, thus defining a run-on time discrete symbol as a sequence of strokes.
Abstract: An architecture based on neural modules of the Adaptive Resonance Theory (ART) is proposed, for recognizing handwritten symbols employed in an on-line mathematical editor. The dynamic information generated during the handwriting process is used by the system, thus defining a run-on time discrete symbol as a sequence of strokes. An ART2 module is used to classify each individual stroke, while a Recurrent Competitive Field (RCF) is employed in order to classify the sequence of the strokes. ARTMAP modules are also proposed for the association of the different versions of strokes and symbols. Preliminary results of the application are very encouraging.

Journal Article
TL;DR: A transplantation system that can be used to interchange selected pro-sodie features among repetitions of the same utterance by the same speaker is presented and applications to the study of a number of important problems in text-to-speech synthesis and speech perception are suggested.
Abstract: In this paper we present a transplantation system that can be used to interchange selected pro-sodie features among repetitions of the same utterance by the same speaker. The system is based on the time-domain pitch-synchronous overlap-and-add (TD-PSOLA) algorithm that allows high-quality prosodic modifieations of speech and uses dynamic time warping (DTW) for proper time alignment of transplanted features. The quality of the system is evaluated in relation to the underlying TD-PSOLA algorithm. Applications to the study of a number of important problems in text-to-speech synthesis and speech perception are suggested.