scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 1991"


Journal ArticleDOI
TL;DR: The role of statistical methods in this powerful technology as applied to speech recognition is addressed and a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations are discussed.
Abstract: The use of hidden Markov models for speech recognition has become predominant in the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons this method has become so popular are the inherent statistical (mathematically precise) framework; the ease and availability of training algorithms for cstimating the parameters of the models from finite training sets of speech data; the flexibility of the resulting recognition system in which one can easily change the size, type, or architecture of the models to suit particular words, sounds, and so forth; and the ease of implementation of the overall recognition system. In this expository article, we address the role of statistical methods in this powerful technology as applied to speech recognition and discuss a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations.

1,480 citations


Book
08 Apr 1991
TL;DR: In this article, the authors unified theory with semi-continuous models using hidden Markov models for speech recognition experimental examples, using vector quantization and mixture densities hidden markov models.
Abstract: Vector quantisation and mixture densities hidden Markov models and basic algorithms continuous hidden Markov models unified theory with semi-continuous models using hidden Markov models for speech recognition experimental examples.

768 citations


Journal ArticleDOI
TL;DR: A speaker adaptation procedure which is easily integrated into the segmental k-means training procedure for obtaining adaptive estimates of the CDHMM parameters is presented and shows that much better performance is achieved when two or more training tokens are used for speaker adaptation.
Abstract: For a speech-recognition system based on continuous-density hidden Markov models (CDHMM), speaker adaptation of the parameters of CDHMM is formulated as a Bayesian learning procedure. A speaker adaptation procedure which is easily integrated into the segmental k-means training procedure for obtaining adaptive estimates of the CDHMM parameters is presented. Some results for adapting both the mean and the diagonal covariance matrix of the Gaussian state observation densities of a CDHMM are reported. The results from tests on a 39-word English alpha-digit vocabulary in isolated word mode indicate that the speaker adaptation procedure achieves the same level of performance as that of a speaker-independent system, when one training token from each word is used to perform speaker adaptation. It shows that much better performance is achieved when two or more training tokens are used for speaker adaptation. When compared with the speaker-dependent system, it is found that the performance of speaker adaptation is always equal to or better than that of speaker-dependent training using the same amount of training data. >

299 citations


PatentDOI
Lynn D. Wilcox1, Marcia A. Bush1
TL;DR: The wordspotter is intended for interactive applications, such as the editing of voice mail or mixed-media documents, and for keyword indexing in single-speaker audio or video recordings.
Abstract: A technique for wordspotting based on hidden Markov models (HMM's). The technique allows a speaker to specify keywords dynamically and to train the associated HMM's via a single repetition of a keyword. Non-keyword speech is modeled using an HMM trained from a prerecorded sample of continuous speech. The wordspotter is intended for interactive applications, such as the editing of voice mail or mixed-media documents, and for keyword indexing in single-speaker audio or video recordings.

265 citations


PatentDOI
TL;DR: A voice log-in system is based on a person's spoken name input only, using speaker-dependent acoustic name recognition models in a performing speaker-independent name recognition.
Abstract: A voice log-in system is based on a person's spoken name input only, using speaker-dependent acoustic name recognition models in a performing speaker-independent name recognition. In an enrollment phase, a dual pass endpointing procedure defines both the person's full name (broad endpoints), and the component names separated by pauses (precise endpoints). An HMM (Hidden Markov Model) recognition model generator generates a corresponding HMM name recognition model modified by the insertion of additional skip transitions for the pauses between component names. In a recognition/update phase, a spoken-name speech signal is input to an HMM name recognition engine which performs speaker-independent name recognition--the modified HMM name recognition model permits the name recognition operation to accommodate pauses between component names of variable duration.

217 citations


Journal ArticleDOI
Y. He1, A. Kundu1
TL;DR: The authors present a planar shape recognition approach based on the hidden Markov model and autoregressive parameters that segments closed shapes to make classifications at a finer level and does not have to be trained again when a new class of shapes is added.
Abstract: The authors present a planar shape recognition approach based on the hidden Markov model and autoregressive parameters. This approach segments closed shapes to make classifications at a finer level. The algorithm can tolerate a lot of shape contour perturbation and a moderate amount of occlusion. An orientation scheme is described to make the overall classification insensitive to shape orientation. Excellent recognition results have been reported. A distinct advantage of the approach is that the classifier does not have to be trained again when a new class of shapes is added. >

187 citations


Journal ArticleDOI
01 Jun 1991
TL;DR: Models and control strategies for dynamic obstacle avoidance in visual guidance of mobile robots and a stochastic motion-control algorithm based on a hidden Markov model are presented, which simplifies the control process of robot motion.
Abstract: Models and control strategies for dynamic obstacle avoidance in visual guidance of mobile robots are presented. Characteristics that distinguish the visual computation and motion control requirements in dynamic environments from that in static environments are discussed. Objectives of the vision and motion planning are formulated, such as finding a collision-free trajectory that takes account of any possible motions of obstacles in the local environments. Such a trajectory should be consistent with a global goal or plan of the motion and the robot should move at as high a speed as possible, subject to its kinematic constraints. A stochastic motion-control algorithm based on a hidden Markov model is developed. Obstacle motion prediction applies a probabilistic evaluation scheme. Motion planning of the robot implements a trajectory-guided parallel-search strategy in accordance with the obstacle motion prediction models. The approach simplifies the control process of robot motion. >

174 citations


Journal ArticleDOI
TL;DR: In this article, a family of multivariate models for the occurrence/nonoccurrence of precipitation at N sites is constructed by assuming a different joint probability of events at the sites for each of a number of unobservable climate states.
Abstract: A family of multivariate models for the occurrence/nonoccurrence of precipitation at N sites is constructed by assuming a different joint probability of events at the sites for each of a number of unobservable climate states. The climate process is assumed to follow a Markov chain. Simple formulae for first- and second-order parameter functions are derived, and used to find starting values for a numerical maximization of the likelihood. The method is illustrated by applying it to data for one site in Washington and to data for a network in the Great Plains.

173 citations


Proceedings ArticleDOI
19 Feb 1991
TL;DR: DECIPHER as discussed by the authors is a speaker-independent continuous speech recognition system based on hidden Markov model (HMM) technology, which is used in SRI's Air Travel Information Systems (ATIS) and Resource Management systems.
Abstract: This paper describes improvements to DECIPHER, the speech recognition component in SRI's Air Travel Information Systems (ATIS) and Resource Management systems. DECIPHER is a speaker-independent continuous speech recognition system based on hidden Markov model (HMM) technology. We show significant performance improvements in DECIPHER due to (1) the addition of tied-mixture HMM modeling (2) rejection of out-of-vocabulary speech and background noise while continuing to recognize speech (3) adapting to the current speaker (4) the implementation of N-gram statistical grammars with DECIPHER. Finally we describe our performance in the February 1991 DARPA Resource Management evaluation (4.8 percent word error) and in the February 1991 DARPA-ATIS speech and SLS evaluations (95 sentences correct, 15 wrong of 140). We show that, for the ATIS evaluation, a well-conceived system integration can be relatively robust to speech recognition errors and to linguistic variability and errors.

172 citations


Journal ArticleDOI
TL;DR: A speaker-independent phoneme and word recognition system based on a recurrent error propagation network trained on the TIMIT database and analysis of the phoneme recognition results shows that information available from bigram and durational constraints is adequately handled within the network allowing for efficient parsing of the network output.

170 citations


Proceedings ArticleDOI
30 Sep 1991
TL;DR: A family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM).
Abstract: The authors developed a generalized probabilistic descent (GPD) method by extending the classical theory on adaptive training by Amari (1967). Their generalization makes it possible to treat dynamic patterns (of a variable duration or dimension) such as speech as well as static patterns (of a fixed duration or dimension), for pattern classification problems. The key ideas of GPD formulations include the embedding of time normalization and the incorporation of smooth classification error functions into the gradient search optimization objectives. As a result, a family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM). Experimental results are also provided to show the superiority of this new family of GPD-based, adaptive training algorithms for speech recognition. >

Journal ArticleDOI
15 Dec 1991
TL;DR: In this paper, a class of methods with a Monte Carlo flavour is presented for parameter estimation from noisy versions of realizations of Markov models, and their performance on simple examples suggests that they should be valuable, practically feasible procedures in the context of a range of problems.
Abstract: Parameter estimation from noisy versions of realizations of Markov models is extremely difficult in all but very simple examples. The paper identifies these difficulties, reviews ways of coping with them in practice, and discusses in detail a class of methods with a Monte Carlo flavour. Their performance on simple examples suggests that they should be valuable, practically feasible procedures in the context of a range of otherwise intractable problems. An illustration is provided based on satellite data.

PatentDOI
TL;DR: A flexible vocabulary speech recognition system is provided for recognizing speech transmitted via the public switched telephone network and phoneme models are modelled as hidden Markov models.
Abstract: A flexible vocabulary speech recognition system is provided for recognizing speech transmitted via the public switched telephone network. The flexible vocabulary recognition (FVR) system is a phoneme based system. The phonemes are modelled as hidden Markov models. The vocabulary is represented as concatenated phoneme models. The phoneme models are trained using Viterbi training enhanced by: substituting the covariance matrix of given phonemes by others, applying energy level thresholds and voiced, unvoiced, silence labelling constraints during Viterbi training. Specific vocabulary members, such as digits, are represented by allophone models. A* searching of the lexical network is facilitated by providing a reduced network which provides estimate scores used to evaluate the recognition path through the lexical network. Joint recognition and rejection of out-of-vocabulary words are provided by using both cepstrum and LSP parameter vectors.

Patent
21 Mar 1991
TL;DR: In this article, the capacity for discriminating between models is taken into consideration so as to allow a high level of recognition accuracy to be obtained, where a probability of a vector sequence appearing from HMMs is computed with respect to an input vector and continuous mixture density HMMs.
Abstract: Disclosed is an Hidden Markov Model (HMM) training apparatus in which a capacity for discriminating between models is taken into consideration so as to allow a high level of recognition accuracy to be obtained. A probability of a vector sequence appearing from HMMs is computed with respect to an input vector and continuous mixture density HMMs. Through this computation, the nearest different-category HMM, with which the maximum probability is obtained and which belongs to a category different from that of a training vector sequence of a known category, is selected. The respective central vectors of continuous densities constituting the output probability densities of the same-category HMM belonging to the same category as that of the training vector sequence and the nearest different-category HMM are moved on the basis of the vector sequence.

Journal ArticleDOI
N.Z. Tisby1
TL;DR: The results show that even with a short sequence of only four isolated digits, a speaker can be verified with an average equal-error rate of less than 3 %, and the small improvement over the vector quantization approach indicates the weakness of the Markovian transition probabilities for characterizing speaker-dependent transitional information.
Abstract: Linear predictive hidden Markov models have proved to be efficient for statistically modeling speech signals. The possible application of such models to statistical characterization of the speaker himself is described and evaluated. The results show that even with a short sequence of only four isolated digits, a speaker can be verified with an average equal-error rate of less than 3 %. These results are slightly better than the results obtained using speaker-dependent vector quantizers, with comparable numbers of spectral vectors. The small improvement over the vector quantization approach indicates the weakness of the Markovian transition probabilities for characterizing speaker-dependent transitional information. >

Journal ArticleDOI
TL;DR: The model uses the Hidden Markov Model (stochastic functions of Markov nets; HMM) to describe the task structure, the operator or intelligent controller's goal structure, and the sensor sig nals such as forces and torques arising from interaction with the environment.
Abstract: A new model is developed for prediction and analysis of sensor information recorded during robotic performance of tasks by telemanipulation. The model uses the Hidden Markov Model (stochastic functions of Markov nets; HMM) to describe the task structure, the operator or intelligent controller's goal structure, and the sensor signals such as forces and torques arising from interaction with the environment. The Markov process portion encodes the task sequence/subgoal structure, and the observation densities associated with each subgoal state encode the expected sensor signals associated with carrying out that subgoal. Methodology is described for construction of the model parameters based on engineering knowledge of the task. The Viterbi algorithm is used for model based analysis of force signals measured during experimental teleoperation and achieves excellent segmentation of the data into subgoal phases. The Baum-Welch algorithm is used to identify the most likely HMM from a given experiment. The HMM achieves a structured, knowledge-base model with explicit uncertainties and mature, optimal identification algorithms.

Proceedings ArticleDOI
18 Nov 1991
TL;DR: The author investigates a feedforward neural network that can accept phonemes with an arbitrary duration coping with nonlinear time warping and demonstrated higher phoneme recognition accuracy than the baseline recognizer based on conventional feed forward neural networks.
Abstract: The author investigates a feedforward neural network that can accept phonemes with an arbitrary duration coping with nonlinear time warping The time-warping neural network is characterized by the time-warping functions embedded between the input layer and the first hidden layer in the network The input layer accesses three different time points The accessing points are determined by the time-warping functions The input spectrum sequence itself is not warped but the accessing-point sequence is warped The advantage of this network architecture is that the input layer can access the original spectrum sequence The proposed network demonstrated higher phoneme recognition accuracy than the baseline recognizer based on conventional feedforward neural networks The recognition accuracy was even higher than that achieved with discrete hidden Markov models >

Proceedings ArticleDOI
19 Feb 1991
TL;DR: A general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies, and results in a large reduction in computation for word recognition using the stochastic segment model.
Abstract: This paper describes a general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies. In this formalism, one system uses the N-best search strategy to generate a list of candidate sentences; the list is rescored by other systems; and the different scores are combined to optimize performance. Specifically, we report on combining the BU system based on stochastic segment models and the BBN system based on hidden Markov models. In addition to facilitating integration of different systems, the N-best approach results in a large reduction in computation for word recognition using the stochastic segment model.

Proceedings ArticleDOI
Jay G. Wilpon1, L.G. Miller1, P. Modi1
14 Apr 1991
TL;DR: A hidden Markov model based key wordspotting algorithm developed previously can recognize key words from a predefined vocabulary list spoken in an unconstrained fashion and improvements in the feature analysis and modeling techniques used to train the system are explored.
Abstract: A hidden Markov model based key wordspotting algorithm developed previously can recognize key words from a predefined vocabulary list spoken in an unconstrained fashion. Improvements in the feature analysis used to represent the speech signal and modeling techniques used to train the system are explored. The authors discuss several task domain issues which influence evaluation criteria. They present results from extensive evaluations on three speaker independent databases: the 20 word vocabulary Stonehenge Road Rally database, distributed by the National Security Agency, a five word vocabulary used to automate operator-assisted calls, and a three word Spanish vocabulary that is currently being tested in Spain's telephone network. Currently, recognition accuracies range from 99.9% on the Spanish database to 74% (with 8.8 FA/H/W) on the Stonehenge task. >

Proceedings ArticleDOI
Bernard Merialdo1
14 Apr 1991
TL;DR: Experiments show that the best training is obtained by using as much tagged text as is available, and a maximum likelihood training may improve the accuracy of the tagging.
Abstract: Experiments on the use of a probabilistic model to tag English text, that is, to assign to each word the correct tag (part of speech) in the context of the sentence, are presented. A simple triclass Markov model is used, and the best way to estimate the parameters of this model, depending on the kind and amount of training data that is provided, is found. Two approaches are compared: the use of text that has been tagged by hand and comparing relative frequency counts; and use text without tags and training the model as a hidden Markov process, according to a maximum likelihood principle. Experiments show that the best training is obtained by using as much tagged text as is available, a maximum likelihood training may improve the accuracy of the tagging. >

Journal ArticleDOI
TL;DR: Simulations show that in some cases, it is possible to avoid data association and directly compute the maximum a posteriori mixed track.
Abstract: The authors consider the application of hidden Markov models (HMMs) to the problem of multitarget tracking-specifically, to the problem of tracking multiple frequency lines. The idea of a mixed track is introduced, a multitrack Viterbi algorithm is described and a detailed analysis of the underlying Markov model is presented. Simulations show that in some cases, it is possible to avoid data association and directly compute the maximum a posteriori mixed track. Some practical aspects of the algorithm are discussed and simulation results, presented. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A speaker verification system using connected word verification phrases has been implemented and studied and the system has been evaluated on a 20-speaker telephone database of connected digital utterances.
Abstract: A speaker verification system using connected word verification phrases has been implemented and studied. Verification utterances are represented as concatenated speaker-dependent whole-word hidden Markov models (HMMs). Verification phrases are specified as strings of words drawn from a small fixed vocabulary, such as the digits. Phrases can either be individualized or randomized for greater security. Training techniques to create speaker-dependent models for verification are used in which initial word models are created by bootstrapping from existing speaker-independent models. The system has been evaluated on a 20-speaker telephone database of connected digital utterances. Using approximately 66 s of connected digit training utterances per speaker, the verification equal-error rate is approximately 3.5% for 1.1 s test utterances and 0.3% for 4.4 s test utterances. In comparison, the performance of a template-based system using the same amount of training data is 6.7% and 1.5%, respectively. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A corrective MMIE training algorithm is introduced, which, when applied to the TI/NIST connected digit database, has made it possible to reduce the string error rate by close to 50%.
Abstract: Recently, Gopalakrishnan et al (1989) introduced a reestimation formula for discrete HMMs (hidden Markov models) which applies to rational objective functions like the MMIE (maximum mutual information estimation) criterion The authors analyze the formula and show how its convergence rate can be substantially improved They introduce a corrective MMIE training algorithm, which, when applied to the TI/NIST connected digit database, has made it possible to reduce the string error rate by close to 50% Gopalakrishnan's result is extended to the continuous case by proposing a new formula for estimating the mean and variance parameters of diagonal Gaussian densities >

Proceedings ArticleDOI
19 Feb 1991
TL;DR: An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given.
Abstract: An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering, and corrective training. The goal of this study is to enhance model robustness in a CDHMM-based speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the CDHMM training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given.Performance improvements were observed on tests using the DARPA RM task. For speaker adaptation, under a supervised learning mode with 2 minutes of speaker-specific training data, a 31% reduction in word error rate was obtained compared to speaker-independent results. Using Baysesian learning for HMM parameter smoothing and sex-dependent modeling, a 21% error reduction was observed on the FEB91 test.

Proceedings ArticleDOI
23 Sep 1991
TL;DR: Experimental results indicate that in the l-D case, the mean field theory approach provides comparable results to those obtained by Baum’s algorithm, which is known to be (optimal).
Abstract: In many signal processing and pattern recognition applications, the hidden data are modeled as Markov processes, and the main difficulty of using the maximisation (EM) algorithm for these applications is the calculation of the conditional expectations of the hidden Markov processes. It is shown how the mean field theory from statistical mechanics can be used to calculate the conditional expectations for these problems efficiently. The efficacy of the mean field theory approach is demonstrated on parameter estimation for one-dimensional mixture data and two-dimensional unsupervised stochastic model-based image segmentation. Experimental results indicate that in the 1-D case, the mean field theory approach provides results comparable to those obtained by Baum's (1987) algorithm, which is known to be optimal. In the 2-D case, where Baum's algorithm can no longer be used, the mean field theory provides good parameter estimates and image segmentation for both synthetic and real-world images. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: The authors present a large vocabulary, continuous speech recognition system based on linked predictive neural networks (LPNNs), which achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 7, 111, and 402, respectively, outperforming several simple HMMs that have been tested.
Abstract: The authors present a large vocabulary, continuous speech recognition system based on linked predictive neural networks (LPNNs). The system uses neural networks as predictors of speech frames, yielding distortion measures which can be used by the one-stage DTW algorithm to perform continuous speech recognition. The system currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 7, 111, and 402, respectively, outperforming several simple HMMs that have been tested. It was also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. The strengths and weaknesses of the predictive approach are discussed. >

Journal ArticleDOI
TL;DR: This work proposes these augmented HMMs as a theory of adaptive skill acquisition and generation, and gives an example, the what-where-AHMM, which creates a hybrid skill from separate skills based on object location and object identity.
Abstract: Advances in technology and in active vision research allow and encourage sequential visual information acquisition. Hidden Markov models (HMMs) can represent probabilistic sequences and probabilistic graph structures: here we explore their use in controlling the acquisition of visual information. We include a brief tutorial with two examples: (1) use input sequences to derive an aspect graph and (2) similarly derive a finite state machine for control of visual processing.

Journal ArticleDOI
TL;DR: An analysis of the data shows that the spectra for the most of the phonemes are not normally distributed and that an alternative representation would be beneficial, so the model assumes that the observed spectral data were generated by a Gaussian source.
Abstract: The techniques used to develop an acoustic-phonetic hidden Markov model, the problems associated with representing the whole acoustic-phonetic structure, the characteristics of the model, and how it performs as a phonetic decoder for recognition of fluent speech are discussed. The continuous variable duration model was trained using 450 sentences of fluent speech, each of which was spoken by a single speaker, and segmented and labeled using a fixed number of phonemes, each of which has a direct correspondence to the states of the matrix. The inherent variability of each phoneme is modeled as the observable random process of the Markov chain, while the phonotactic model of the unobservable phonetic sequence is represented by the state transition matrix of the hidden Markov model. The model assumes that the observed spectral data were generated by a Gaussian source. However, an analysis of the data shows that the spectra for the most of the phonemes are not normally distributed and that an alternative representation would be beneficial. >

Proceedings ArticleDOI
14 Apr 1991
TL;DR: A hidden Markov model (HMM)-based approach to mechanical system monitoring is presented and it is shown to be useful for machining applications with the associated problems of tool wear detection and prediction.
Abstract: A hidden Markov model (HMM)-based approach to mechanical system monitoring is presented. The resulting system is shown to be useful for machining applications with the associated problems of tool wear detection and prediction. The approach is based on continuous density, left-right HMMs that closely match the one-way, fresh-to-worn transition process of machining tools. The Baum-Welch iterative training procedure is modified to incorporate prior knowledge of the transitions between tool wear states. Results presented demonstrate that a multisensor HMM-based system is an effective approach for tool wear detection and prediction. >

Proceedings ArticleDOI
11 Jun 1991
TL;DR: The authors summarize a speaker adaptation algorithm based on codebook mapping from one speaker to a standard speaker to be useful in various kinds of speech recognition systems such as hidden-Markov-model-based, feature- based, and neural-network-based systems.
Abstract: The authors summarize a speaker adaptation algorithm based on codebook mapping from one speaker to a standard speaker. This algorithm has been developed to be useful in various kinds of speech recognition systems such as hidden-Markov-model-based, feature-based, and neural-network-based systems. The codebook mapping speaker adaptation algorithm has been much improved by introducing several ideas based on fuzzy vector quantization. This fuzzy codebook mapping algorithm is also applicable to voice conversion between arbitrary speakers. >