Showing papers on "Hidden Markov model published in 1991"

PDF

Open Access

Journal Article•DOI•

Hidden Markov models for speech recognition

[...]

Biing-Hwang Juang¹, Lawrence R. Rabiner¹•Institutions (1)

01 Aug 1991-Technometrics

TL;DR: The role of statistical methods in this powerful technology as applied to speech recognition is addressed and a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations are discussed.

...read moreread less

Abstract: The use of hidden Markov models for speech recognition has become predominant in the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons this method has become so popular are the inherent statistical (mathematically precise) framework; the ease and availability of training algorithms for cstimating the parameters of the models from finite training sets of speech data; the flexibility of the resulting recognition system in which one can easily change the size, type, or architecture of the models to suit particular words, sounds, and so forth; and the ease of implementation of the overall recognition system. In this expository article, we address the role of statistical methods in this powerful technology as applied to speech recognition and discuss a range of theoretical and practical issues that are as yet unsolved in terms of their importance and their effect on performance for different system implementations.

...read moreread less

1,480 citations

Book•

Hidden Markov Models for Speech Recognition

[...]

Xuedong Huang, Yasuo Ariki, Mervyn Jack

08 Apr 1991

TL;DR: In this article, the authors unified theory with semi-continuous models using hidden Markov models for speech recognition experimental examples, using vector quantization and mixture densities hidden markov models.

...read moreread less

Abstract: Vector quantisation and mixture densities hidden Markov models and basic algorithms continuous hidden Markov models unified theory with semi-continuous models using hidden Markov models for speech recognition experimental examples.

...read moreread less

768 citations

Journal Article•DOI•

A study on speaker adaptation of the parameters of continuous density hidden Markov models

[...]

Chin-Hui Lee¹, C.-H. Lin¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

01 Apr 1991-IEEE Transactions on Signal Processing

TL;DR: A speaker adaptation procedure which is easily integrated into the segmental k-means training procedure for obtaining adaptive estimates of the CDHMM parameters is presented and shows that much better performance is achieved when two or more training tokens are used for speaker adaptation.

...read moreread less

Abstract: For a speech-recognition system based on continuous-density hidden Markov models (CDHMM), speaker adaptation of the parameters of CDHMM is formulated as a Bayesian learning procedure. A speaker adaptation procedure which is easily integrated into the segmental k-means training procedure for obtaining adaptive estimates of the CDHMM parameters is presented. Some results for adapting both the mean and the diagonal covariance matrix of the Gaussian state observation densities of a CDHMM are reported. The results from tests on a 39-word English alpha-digit vocabulary in isolated word mode indicate that the speaker adaptation procedure achieves the same level of performance as that of a speaker-independent system, when one training token from each word is used to perform speaker adaptation. It shows that much better performance is achieved when two or more training tokens are used for speaker adaptation. When compared with the speaker-dependent system, it is found that the performance of speaker adaptation is always equal to or better than that of speaker-dependent training using the same amount of training data. >

...read moreread less

299 citations

Patent•DOI•

Wordspotting for voice editing and indexing

[...]

Lynn D. Wilcox¹, Marcia A. Bush¹•Institutions (1)

Xerox¹

19 Sep 1991-Journal of the Acoustical Society of America

TL;DR: The wordspotter is intended for interactive applications, such as the editing of voice mail or mixed-media documents, and for keyword indexing in single-speaker audio or video recordings.

...read moreread less

Abstract: A technique for wordspotting based on hidden Markov models (HMM's). The technique allows a speaker to specify keywords dynamically and to train the associated HMM's via a single repetition of a keyword. Non-keyword speech is modeled using an HMM trained from a prerecorded sample of continuous speech. The wordspotter is intended for interactive applications, such as the editing of voice mail or mixed-media documents, and for keyword indexing in single-speaker audio or video recordings.

...read moreread less

265 citations

Patent•DOI•

Voice log-in using spoken name input

[...]

Joseph Picone¹, Barbara J. Wheatley¹•Institutions (1)

Texas Instruments¹

01 Jul 1991-Journal of the Acoustical Society of America

TL;DR: A voice log-in system is based on a person's spoken name input only, using speaker-dependent acoustic name recognition models in a performing speaker-independent name recognition.

...read moreread less

Abstract: A voice log-in system is based on a person's spoken name input only, using speaker-dependent acoustic name recognition models in a performing speaker-independent name recognition. In an enrollment phase, a dual pass endpointing procedure defines both the person's full name (broad endpoints), and the component names separated by pauses (precise endpoints). An HMM (Hidden Markov Model) recognition model generator generates a corresponding HMM name recognition model modified by the insertion of additional skip transitions for the pauses between component names. In a recognition/update phase, a spoken-name speech signal is input to an HMM name recognition engine which performs speaker-independent name recognition--the modified HMM name recognition model permits the name recognition operation to accommodate pauses between component names of variable duration.

...read moreread less

217 citations

Journal Article•DOI•

2-D shape classification using hidden Markov model

[...]

Y. He¹, A. Kundu¹•Institutions (1)

University at Buffalo¹

01 Nov 1991-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The authors present a planar shape recognition approach based on the hidden Markov model and autoregressive parameters that segments closed shapes to make classifications at a finer level and does not have to be trained again when a new class of shapes is added.

...read moreread less

Abstract: The authors present a planar shape recognition approach based on the hidden Markov model and autoregressive parameters. This approach segments closed shapes to make classifications at a finer level. The algorithm can tolerate a lot of shape contour perturbation and a moderate amount of occlusion. An orientation scheme is described to make the overall classification insensitive to shape orientation. Excellent recognition results have been reported. A distinct advantage of the approach is that the classifier does not have to be trained again when a new class of shapes is added. >

...read moreread less

187 citations

Journal Article•DOI•

Hidden Markov model for dynamic obstacle avoidance of mobile robot navigation

[...]

Q. Zhu

01 Jun 1991

TL;DR: Models and control strategies for dynamic obstacle avoidance in visual guidance of mobile robots and a stochastic motion-control algorithm based on a hidden Markov model are presented, which simplifies the control process of robot motion.

...read moreread less

Abstract: Models and control strategies for dynamic obstacle avoidance in visual guidance of mobile robots are presented. Characteristics that distinguish the visual computation and motion control requirements in dynamic environments from that in static environments are discussed. Objectives of the vision and motion planning are formulated, such as finding a collision-free trajectory that takes account of any possible motions of obstacles in the local environments. Such a trajectory should be consistent with a global goal or plan of the motion and the robot should move at as high a speed as possible, subject to its kinematic constraints. A stochastic motion-control algorithm based on a hidden Markov model is developed. Obstacle motion prediction applies a probabilistic evaluation scheme. Motion planning of the robot implements a trajectory-guided parallel-search strategy in accordance with the obstacle motion prediction models. The approach simplifies the control process of robot motion. >

...read moreread less

174 citations

Journal Article•DOI•

A Hidden Markov Model for Space‐Time Precipitation

[...]

Walter Zucchini, Peter Guttorp

01 Aug 1991-Water Resources Research

TL;DR: In this article, a family of multivariate models for the occurrence/nonoccurrence of precipitation at N sites is constructed by assuming a different joint probability of events at the sites for each of a number of unobservable climate states.

...read moreread less

Abstract: A family of multivariate models for the occurrence/nonoccurrence of precipitation at N sites is constructed by assuming a different joint probability of events at the sites for each of a number of unobservable climate states. The climate process is assumed to follow a Markov chain. Simple formulae for first- and second-order parameter functions are derived, and used to find starting values for a numerical maximization of the likelihood. The method is illustrated by applying it to data for one site in Washington and to data for a network in the Great Plains.

...read moreread less

173 citations

Proceedings Article•DOI•

Speech recognition in SRI's resource management and ATIS systems

[...]

Hy Murveit, John Butzberger, Mitch Weintraub

19 Feb 1991

TL;DR: DECIPHER as discussed by the authors is a speaker-independent continuous speech recognition system based on hidden Markov model (HMM) technology, which is used in SRI's Air Travel Information Systems (ATIS) and Resource Management systems.

...read moreread less

Abstract: This paper describes improvements to DECIPHER, the speech recognition component in SRI's Air Travel Information Systems (ATIS) and Resource Management systems. DECIPHER is a speaker-independent continuous speech recognition system based on hidden Markov model (HMM) technology. We show significant performance improvements in DECIPHER due to (1) the addition of tied-mixture HMM modeling (2) rejection of out-of-vocabulary speech and background noise while continuing to recognize speech (3) adapting to the current speaker (4) the implementation of N-gram statistical grammars with DECIPHER. Finally we describe our performance in the February 1991 DARPA Resource Management evaluation (4.8 percent word error) and in the February 1991 DARPA-ATIS speech and SLS evaluations (95 sentences correct, 15 wrong of 140). We show that, for the ATIS evaluation, a well-conceived system integration can be relatively robust to speech recognition errors and to linguistic variability and errors.

...read moreread less

172 citations

Journal Article•DOI•

A recurrent error propagation network speech recognition system

[...]

Tony Robinson¹, Frank Fallside¹•Institutions (1)

University of Cambridge¹

01 Jul 1991-Computer Speech & Language

TL;DR: A speaker-independent phoneme and word recognition system based on a recurrent error propagation network trained on the TIMIT database and analysis of the phoneme recognition results shows that information available from bigram and durational constraints is adequately handled within the network allowing for efficient parsing of the network output.

...read moreread less

170 citations

Proceedings Article•DOI•

New discriminative training algorithms based on the generalized probabilistic descent method

[...]

Shigeru Katagiri, C.-H. Lee, Biing-Hwang Juang

30 Sep 1991

TL;DR: A family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM).

...read moreread less

Abstract: The authors developed a generalized probabilistic descent (GPD) method by extending the classical theory on adaptive training by Amari (1967). Their generalization makes it possible to treat dynamic patterns (of a variable duration or dimension) such as speech as well as static patterns (of a fixed duration or dimension), for pattern classification problems. The key ideas of GPD formulations include the embedding of time normalization and the incorporation of smooth classification error functions into the gradient search optimization objectives. As a result, a family of new discriminative training algorithms can be rigorously formulated for various kinds of classifier frameworks, including the popular dynamic time warping (DTW) and hidden Markov model (HMM). Experimental results are also provided to show the superiority of this new family of GPD-based, adaptive training algorithms for speech recognition. >

...read moreread less

Journal Article•DOI•

Estimation of parameters in hidden Markov models

[...]

W. Qian¹, D. M. Titterington¹•Institutions (1)

University of Glasgow¹

15 Dec 1991

TL;DR: In this paper, a class of methods with a Monte Carlo flavour is presented for parameter estimation from noisy versions of realizations of Markov models, and their performance on simple examples suggests that they should be valuable, practically feasible procedures in the context of a range of problems.

...read moreread less

Abstract: Parameter estimation from noisy versions of realizations of Markov models is extremely difficult in all but very simple examples. The paper identifies these difficulties, reviews ways of coping with them in practice, and discusses in detail a class of methods with a Monte Carlo flavour. Their performance on simple examples suggests that they should be valuable, practically feasible procedures in the context of a range of otherwise intractable problems. An illustration is provided based on satellite data.

...read moreread less

Patent•DOI•

Phoneme based speech recognition

[...]

Vishwa Gupta¹, Matthew Lennig¹, P. Kenny¹, Christopher K. Toulson¹•Institutions (1)

Bell Canada¹

08 Oct 1991-Journal of the Acoustical Society of America

TL;DR: A flexible vocabulary speech recognition system is provided for recognizing speech transmitted via the public switched telephone network and phoneme models are modelled as hidden Markov models.

...read moreread less

Abstract: A flexible vocabulary speech recognition system is provided for recognizing speech transmitted via the public switched telephone network. The flexible vocabulary recognition (FVR) system is a phoneme based system. The phonemes are modelled as hidden Markov models. The vocabulary is represented as concatenated phoneme models. The phoneme models are trained using Viterbi training enhanced by: substituting the covariance matrix of given phonemes by others, applying energy level thresholds and voiced, unvoiced, silence labelling constraints during Viterbi training. Specific vocabulary members, such as digits, are represented by allophone models. A* searching of the lexical network is facilitated by providing a reduced network which provides estimate scores used to evaluate the recognition path through the lexical network. Joint recognition and rejection of out-of-vocabulary words are provided by using both cepstrum and LSP parameter vectors.

...read moreread less

Patent•

Pattern representation model training apparatus

[...]

Shinobu Mizuta¹, Kunio Nakajima¹•Institutions (1)

Mitsubishi¹

21 Mar 1991

TL;DR: In this article, the capacity for discriminating between models is taken into consideration so as to allow a high level of recognition accuracy to be obtained, where a probability of a vector sequence appearing from HMMs is computed with respect to an input vector and continuous mixture density HMMs.

...read moreread less

Abstract: Disclosed is an Hidden Markov Model (HMM) training apparatus in which a capacity for discriminating between models is taken into consideration so as to allow a high level of recognition accuracy to be obtained. A probability of a vector sequence appearing from HMMs is computed with respect to an input vector and continuous mixture density HMMs. Through this computation, the nearest different-category HMM, with which the maximum probability is obtained and which belongs to a category different from that of a training vector sequence of a known category, is selected. The respective central vectors of continuous densities constituting the output probability densities of the same-category HMM belonging to the same category as that of the training vector sequence and the nearest different-category HMM are moved on the basis of the vector sequence.

...read moreread less

Journal Article•DOI•

On the application of mixture AR hidden Markov models to text independent speaker recognition

[...]

N.Z. Tisby¹•Institutions (1)

Bell Labs¹

01 Mar 1991-IEEE Transactions on Signal Processing

TL;DR: The results show that even with a short sequence of only four isolated digits, a speaker can be verified with an average equal-error rate of less than 3 %, and the small improvement over the vector quantization approach indicates the weakness of the Markovian transition probabilities for characterizing speaker-dependent transitional information.

...read moreread less

Abstract: Linear predictive hidden Markov models have proved to be efficient for statistically modeling speech signals. The possible application of such models to statistical characterization of the speaker himself is described and evaluated. The results show that even with a short sequence of only four isolated digits, a speaker can be verified with an average equal-error rate of less than 3 %. These results are slightly better than the results obtained using speaker-dependent vector quantizers, with comparable numbers of spectral vectors. The small improvement over the vector quantization approach indicates the weakness of the Markovian transition probabilities for characterizing speaker-dependent transitional information. >

...read moreread less

Journal Article•DOI•

Hidden Markov model analysis of force/torque information in telemanipulation

[...]

Blake Hannaford¹, Paul E. Lee²•Institutions (2)

University of Washington¹, California Institute of Technology²

01 Oct 1991-The International Journal of Robotics Research

TL;DR: The model uses the Hidden Markov Model (stochastic functions of Markov nets; HMM) to describe the task structure, the operator or intelligent controller's goal structure, and the sensor sig nals such as forces and torques arising from interaction with the environment.

...read moreread less

Abstract: A new model is developed for prediction and analysis of sensor information recorded during robotic performance of tasks by telemanipulation. The model uses the Hidden Markov Model (stochastic functions of Markov nets; HMM) to describe the task structure, the operator or intelligent controller's goal structure, and the sensor signals such as forces and torques arising from interaction with the environment. The Markov process portion encodes the task sequence/subgoal structure, and the observation densities associated with each subgoal state encode the expected sensor signals associated with carrying out that subgoal. Methodology is described for construction of the model parameters based on engineering knowledge of the task. The Viterbi algorithm is used for model based analysis of force signals measured during experimental teleoperation and achieves excellent segmentation of the data into subgoal phases. The Baum-Welch algorithm is used to identify the most likely HMM from a given experiment. The HMM achieves a structured, knowledge-base model with explicit uncertainties and mature, optimal identification algorithms.

...read moreread less

Proceedings Article•DOI•

Time-warping neural network for phoneme recognition

[...]

K. Aikawa

18 Nov 1991

TL;DR: The author investigates a feedforward neural network that can accept phonemes with an arbitrary duration coping with nonlinear time warping and demonstrated higher phoneme recognition accuracy than the baseline recognizer based on conventional feed forward neural networks.

...read moreread less

Abstract: The author investigates a feedforward neural network that can accept phonemes with an arbitrary duration coping with nonlinear time warping The time-warping neural network is characterized by the time-warping functions embedded between the input layer and the first hidden layer in the network The input layer accesses three different time points The accessing points are determined by the time-warping functions The input spectrum sequence itself is not warped but the accessing-point sequence is warped The advantage of this network architecture is that the input layer can access the original spectrum sequence The proposed network demonstrated higher phoneme recognition accuracy than the baseline recognizer based on conventional feedforward neural networks The recognition accuracy was even higher than that achieved with discrete hidden Markov models >

...read moreread less

Proceedings Article•DOI•

Integration of diverse recognition methodologies through reevaluation of N-best sentence hypotheses

[...]

Mari Ostendorf, A. Kannan, S. Austin, Owen Kimball, Richard Schwartz, J.R. Rohlicek - Show less +2 more

19 Feb 1991

TL;DR: A general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies, and results in a large reduction in computation for word recognition using the stochastic segment model.

...read moreread less

Abstract: This paper describes a general formalism for integrating two or more speech recognition technologies, which could be developed at different research sites using different recognition strategies. In this formalism, one system uses the N-best search strategy to generate a list of candidate sentences; the list is rescored by other systems; and the different scores are combined to optimize performance. Specifically, we report on combining the BU system based on stochastic segment models and the BBN system based on hidden Markov models. In addition to facilitating integration of different systems, the N-best approach results in a large reduction in computation for word recognition using the stochastic segment model.

...read moreread less

Proceedings Article•DOI•

Improvements and applications for key word recognition using hidden Markov modeling techniques

[...]

Jay G. Wilpon¹, L.G. Miller¹, P. Modi¹•Institutions (1)

Bell Labs¹

14 Apr 1991

TL;DR: A hidden Markov model based key wordspotting algorithm developed previously can recognize key words from a predefined vocabulary list spoken in an unconstrained fashion and improvements in the feature analysis and modeling techniques used to train the system are explored.

...read moreread less

Abstract: A hidden Markov model based key wordspotting algorithm developed previously can recognize key words from a predefined vocabulary list spoken in an unconstrained fashion. Improvements in the feature analysis used to represent the speech signal and modeling techniques used to train the system are explored. The authors discuss several task domain issues which influence evaluation criteria. They present results from extensive evaluations on three speaker independent databases: the 20 word vocabulary Stonehenge Road Rally database, distributed by the National Security Agency, a five word vocabulary used to automate operator-assisted calls, and a three word Spanish vocabulary that is currently being tested in Spain's telephone network. Currently, recognition accuracies range from 99.9% on the Spanish database to 74% (with 8.8 FA/H/W) on the Stonehenge task. >

...read moreread less

Proceedings Article•DOI•

Tagging text with a probabilistic model

[...]

Bernard Merialdo¹•Institutions (1)

IBM¹

14 Apr 1991

TL;DR: Experiments show that the best training is obtained by using as much tagged text as is available, and a maximum likelihood training may improve the accuracy of the tagging.

...read moreread less

Abstract: Experiments on the use of a probabilistic model to tag English text, that is, to assign to each word the correct tag (part of speech) in the context of the sentence, are presented. A simple triclass Markov model is used, and the best way to estimate the parameters of this model, depending on the kind and amount of training data that is provided, is found. Two approaches are compared: the use of text that has been tagged by hand and comparing relative frequency counts; and use text without tags and training the model as a hidden Markov process, according to a maximum likelihood principle. Experiments show that the best training is obtained by using as much tagged text as is available, a maximum likelihood training may improve the accuracy of the tagging. >

...read moreread less

Journal Article•DOI•

Multiple target tracking and multiple frequency line tracking using hidden Markov models

[...]

X. Xie¹, Robin J. Evans¹•Institutions (1)

Newcastle University¹

01 Dec 1991-IEEE Transactions on Signal Processing

TL;DR: Simulations show that in some cases, it is possible to avoid data association and directly compute the maximum a posteriori mixed track.

...read moreread less

Abstract: The authors consider the application of hidden Markov models (HMMs) to the problem of multitarget tracking-specifically, to the problem of tracking multiple frequency lines. The idea of a mixed track is introduced, a multitrack Viterbi algorithm is described and a detailed analysis of the underlying Markov model is presented. Simulations show that in some cases, it is possible to avoid data association and directly compute the maximum a posteriori mixed track. Some practical aspects of the algorithm are discussed and simulation results, presented. >

...read moreread less

Proceedings Article•DOI•

Connected word talker verification using whole word hidden Markov models

[...]

Aaron E. Rosenberg¹, Chin-Hui Lee¹, S. Gokcen¹•Institutions (1)

Bell Labs¹

14 Apr 1991

TL;DR: A speaker verification system using connected word verification phrases has been implemented and studied and the system has been evaluated on a 20-speaker telephone database of connected digital utterances.

...read moreread less

Abstract: A speaker verification system using connected word verification phrases has been implemented and studied. Verification utterances are represented as concatenated speaker-dependent whole-word hidden Markov models (HMMs). Verification phrases are specified as strings of words drawn from a small fixed vocabulary, such as the digits. Phrases can either be individualized or randomized for greater security. Training techniques to create speaker-dependent models for verification are used in which initial word models are created by bootstrapping from existing speaker-independent models. The system has been evaluated on a 20-speaker telephone database of connected digital utterances. Using approximately 66 s of connected digit training utterances per speaker, the verification equal-error rate is approximately 3.5% for 1.1 s test utterances and 0.3% for 4.4 s test utterances. In comparison, the performance of a template-based system using the same amount of training data is 6.7% and 1.5%, respectively. >

...read moreread less

Proceedings Article•DOI•

An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition

[...]

Y. Normandin, Salvatore D. Morgera

14 Apr 1991

TL;DR: A corrective MMIE training algorithm is introduced, which, when applied to the TI/NIST connected digit database, has made it possible to reduce the string error rate by close to 50%.

...read moreread less

Abstract: Recently, Gopalakrishnan et al (1989) introduced a reestimation formula for discrete HMMs (hidden Markov models) which applies to rational objective functions like the MMIE (maximum mutual information estimation) criterion The authors analyze the formula and show how its convergence rate can be substantially improved They introduce a corrective MMIE training algorithm, which, when applied to the TI/NIST connected digit database, has made it possible to reduce the string error rate by close to 50% Gopalakrishnan's result is extended to the continuous case by proposing a new formula for estimating the mean and variance parameters of diagonal Gaussian densities >

...read moreread less

Proceedings Article•DOI•

Bayesian learning of Gaussian mixture densities for hidden Markov models

[...]

Jean-Luc Gauvain, Chin-Hui Lee

19 Feb 1991

TL;DR: An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given.

...read moreread less

Abstract: An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering, and corrective training. The goal of this study is to enhance model robustness in a CDHMM-based speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the CDHMM training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given.Performance improvements were observed on tests using the DARPA RM task. For speaker adaptation, under a supervised learning mode with 2 minutes of speaker-specific training data, a 31% reduction in word error rate was obtained compared to speaker-independent results. Using Baysesian learning for HMM parameter smoothing and sex-dependent modeling, a 21% error reduction was observed on the FEB91 test.

...read moreread less

Proceedings Article•DOI•

The Mean Field Theory In EM Procedures For Markov Random Fields

[...]

Jun Zhang¹•Institutions (1)

University of Wisconsin–Milwaukee¹

23 Sep 1991

TL;DR: Experimental results indicate that in the l-D case, the mean field theory approach provides comparable results to those obtained by Baum’s algorithm, which is known to be (optimal).

...read moreread less

Abstract: In many signal processing and pattern recognition applications, the hidden data are modeled as Markov processes, and the main difficulty of using the maximisation (EM) algorithm for these applications is the calculation of the conditional expectations of the hidden Markov processes. It is shown how the mean field theory from statistical mechanics can be used to calculate the conditional expectations for these problems efficiently. The efficacy of the mean field theory approach is demonstrated on parameter estimation for one-dimensional mixture data and two-dimensional unsupervised stochastic model-based image segmentation. Experimental results indicate that in the 1-D case, the mean field theory approach provides results comparable to those obtained by Baum's (1987) algorithm, which is known to be optimal. In the 2-D case, where Baum's algorithm can no longer be used, the mean field theory provides good parameter estimates and image segmentation for both synthetic and real-world images. >

...read moreread less

Proceedings Article•DOI•

Continuous speech recognition using linked predictive neural networks

[...]

J. Tebelskis¹, Alex Waibel¹, Bojan Petek¹, Otto Schmidbauer¹•Institutions (1)

Carnegie Mellon University¹

14 Apr 1991

TL;DR: The authors present a large vocabulary, continuous speech recognition system based on linked predictive neural networks (LPNNs), which achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 7, 111, and 402, respectively, outperforming several simple HMMs that have been tested.

...read moreread less

Abstract: The authors present a large vocabulary, continuous speech recognition system based on linked predictive neural networks (LPNNs). The system uses neural networks as predictors of speech frames, yielding distortion measures which can be used by the one-stage DTW algorithm to perform continuous speech recognition. The system currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 7, 111, and 402, respectively, outperforming several simple HMMs that have been tested. It was also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. The strengths and weaknesses of the predictive approach are discussed. >

...read moreread less

Journal Article•DOI•

Controlling eye movements with hidden Markov models

[...]

Raymond D. Rimey¹, Chris Brown¹•Institutions (1)

University of Rochester¹

01 Nov 1991-International Journal of Computer Vision

TL;DR: This work proposes these augmented HMMs as a theory of adaptive skill acquisition and generation, and gives an example, the what-where-AHMM, which creates a hybrid skill from separate skills based on object location and object identity.

...read moreread less

Abstract: Advances in technology and in active vision research allow and encourage sequential visual information acquisition. Hidden Markov models (HMMs) can represent probabilistic sequences and probabilistic graph structures: here we explore their use in controlling the acquisition of visual information. We include a brief tutorial with two examples: (1) use input sequences to derive an aspect graph and (2) similarly derive a finite state machine for control of visual processing.

...read moreread less

Journal Article•DOI•

Development of an acoustic-phonetic hidden Markov model for continuous speech recognition

[...]

Andrej Ljolje¹, Stephen E. Levinson¹•Institutions (1)

Bell Labs¹

01 Jan 1991-IEEE Transactions on Signal Processing

TL;DR: An analysis of the data shows that the spectra for the most of the phonemes are not normally distributed and that an alternative representation would be beneficial, so the model assumes that the observed spectral data were generated by a Gaussian source.

...read moreread less

Abstract: The techniques used to develop an acoustic-phonetic hidden Markov model, the problems associated with representing the whole acoustic-phonetic structure, the characteristics of the model, and how it performs as a phonetic decoder for recognition of fluent speech are discussed. The continuous variable duration model was trained using 450 sentences of fluent speech, each of which was spoken by a single speaker, and segmented and labeled using a fixed number of phonemes, each of which has a direct correspondence to the states of the matrix. The inherent variability of each phoneme is modeled as the observable random process of the Markov chain, while the phonotactic model of the unobservable phonetic sequence is represented by the state transition matrix of the hidden Markov model. The model assumes that the observed spectral data were generated by a Gaussian source. However, an analysis of the data shows that the spectra for the most of the phonemes are not normally distributed and that an alternative representation would be beneficial. >

...read moreread less

Proceedings Article•DOI•

Mechanical system monitoring using hidden Markov models

[...]

Larry Heck¹, James H. McClellan¹•Institutions (1)

Georgia Institute of Technology¹

14 Apr 1991

TL;DR: A hidden Markov model (HMM)-based approach to mechanical system monitoring is presented and it is shown to be useful for machining applications with the associated problems of tool wear detection and prediction.

...read moreread less

Abstract: A hidden Markov model (HMM)-based approach to mechanical system monitoring is presented. The resulting system is shown to be useful for machining applications with the associated problems of tool wear detection and prediction. The approach is based on continuous density, left-right HMMs that closely match the one-way, fresh-to-worn transition process of machining tools. The Baum-Welch iterative training procedure is modified to incorporate prior knowledge of the transitions between tool wear states. Results presented demonstrate that a multisensor HMM-based system is an effective approach for tool wear detection and prediction. >

...read moreread less

Proceedings Article•DOI•

Speaker adaptation and voice conversion by codebook mapping

[...]

K. Shikano, Satoshi Nakamura¹, M. Abe¹•Institutions (1)

Nippon Telegraph and Telephone¹

11 Jun 1991

TL;DR: The authors summarize a speaker adaptation algorithm based on codebook mapping from one speaker to a standard speaker to be useful in various kinds of speech recognition systems such as hidden-Markov-model-based, feature- based, and neural-network-based systems.

...read moreread less

Abstract: The authors summarize a speaker adaptation algorithm based on codebook mapping from one speaker to a standard speaker. This algorithm has been developed to be useful in various kinds of speech recognition systems such as hidden-Markov-model-based, feature-based, and neural-network-based systems. The codebook mapping speaker adaptation algorithm has been much improved by introducing several ideas based on fuzzy vector quantization. This fuzzy codebook mapping algorithm is also applicable to voice conversion between arbitrary speakers. >

...read moreread less

Collapse