scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 1989"


Journal ArticleDOI
Lawrence R. Rabiner1
01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

21,819 citations


Journal ArticleDOI
TL;DR: In this article, the authors presented a time-delay neural network (TDNN) approach to phoneme recognition, which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input
Abstract: The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input. As a recognition task, the speaker-dependent recognition of the phonemes B, D, and G in varying phonetic contexts was chosen. For comparison, several discrete hidden Markov models (HMM) were trained to perform the same task. Performance evaluation over 1946 testing tokens from three speakers showed that the TDNN achieves a recognition rate of 98.5% correct while the rate obtained by the best of the HMMs was only 93.7%. >

2,319 citations


Journal ArticleDOI
TL;DR: The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data, and can be used as benchmarks to evaluate future systems.
Abstract: Hidden Markov modeling is extended to speaker-independent phone recognition. Using multiple codebooks of various linear-predictive-coding (LPC) parameters and discrete hidden Markov models (HMMs) the authors obtain a speaker-independent phone recognition accuracy of 58.8-73.8% on the TIMIT database, depending on the type of acoustic and language models used. In comparison, the performance of expert spectrogram readers is only 69% without use of higher level knowledge. The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data. Since the results were evaluated on a standard database, they can be used as benchmarks to evaluate future systems. >

895 citations


Journal ArticleDOI
TL;DR: The DNA sequence is viewed as a stochastic process with local compositional properties determined by the states of a hidden Markov chain, a discrete-state, discrete-outcome version of a general model for non-stationary time series proposed by Kitagawa (1987).

425 citations


Proceedings Article
01 Jan 1989
TL;DR: It is shown that once the output layer of a multilayer perceptron is modified to provide mathematically correct probability distributions, and the usual squared error criterion is replaced with a probability-based score, the result is equivalent to Maximum Mutual Information training.
Abstract: One of the attractions of neural network approaches to pattern recognition is the use of a discrimination-based training method. We show that once we have modified the output layer of a multilayer perceptron to provide mathematically correct probability distributions, and replaced the usual squared error criterion with a probability-based score, the result is equivalent to Maximum Mutual Information training, which has been used successfully to improve the performance of hidden Markov models for speech recognition. If the network is specially constructed to perform the recognition computations of a given kind of stochastic model based classifier then we obtain a method for discrimination-based training of the parameters of the models. Examples include an HMM-based word discriminator, which we call an 'Alphanet'.

422 citations


Proceedings ArticleDOI
23 May 1989
TL;DR: A word-spotting system using Gaussian hidden Markov models is presented and it is observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions.
Abstract: A word-spotting system using Gaussian hidden Markov models is presented. Several aspects of this problem are investigated. Specifically, results are reported on the use of various signal processing and feature transformation techniques. The authors have observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions. Due to the open-set nature of the problem, the specific techniques for modeling out-of-vocabulary speech and the choice of scoring metric can have a significant effect on performance. >

280 citations


Journal ArticleDOI
TL;DR: The authors introduce a novel approach to modeling variable-duration phonemes, called the stochastic segment model, which allows the incorporation in Y of acoustic-phonetic features derived from X, in addition to the usual spectral features used in hidden Markov modeling and dynamic time warping approaches to speech recognition.
Abstract: The authors introduce a novel approach to modeling variable-duration phonemes, called the stochastic segment model. A phoneme X is observed as a variable-length sequence of frames, where each frame is represented by a parameter vector and the length of the sequence is random. The stochastic segment model consists of (1) a time warping of the variable-length segment X into a fixed-length segment Y called a resampled segment and (2) a joint density function of the parameters of X which in this study is a Gaussian density. The segment model represents spectra/temporal structure over the entire phoneme. The model also allows the incorporation in Y of acoustic-phonetic features derived from X, in addition to the usual spectral features that have been used in hidden Markov modeling and dynamic time warping approaches to speech recognition. The authors describe the stochastic segment model, the recognition algorithm, and an iterative training algorithm for estimating segment models from continuous speech. They present several results using segment models in two speaker-dependent recognition tasks and compare the performance of the stochastic segment model to the performance of the hidden Markov models. >

208 citations


Journal ArticleDOI
TL;DR: An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes is tested.
Abstract: The authors use an enhanced analysis feature set consisting of both instantaneous and transitional spectral information and test the hidden-Markov-model (HMM)-based connected-digit recognizer in speaker-trained, multispeaker, and speaker-independent modes. For the evaluation, both a 50-talker connected-digit database recorded over local, dialed-up telephone lines, and the Texas Instruments, 225-adult-talker, connected-digits database are used. Using these databases, the performance achieved was 0.35, 1.65, and 1.75% string error rates for known-length strings, for speaker-trained, multispeaker, and speaker-independent modes, respectively, and 0.78, 2.85, and 2.94% string error rates for unknown-length strings of up to seven digits in length for the three modes. Several experiments were carried out to determine the best set of conditions (e.g., training, recognition, parameters, etc.) for recognition of digits. The results and the interpretation of these experiments are described. >

205 citations


PatentDOI
TL;DR: A method of inputting Chinese characters into a computer directly from Mandarin speech which recognizes a series of monosyllables by separately recognizing syllables and Mandarin tones and assembling the recognized parts to recognize the mono-syllable using Hidden Markov Models.
Abstract: A method of inputting Chinese characters into a computer directly from Mandarin speech which recognizes a series of monosyllables by separately recognizing syllables and Mandarin tones and assembling the recognized parts to recognize the mono-syllable using Hidden Markov Models. The recognized mono-syllable is used by a Markov Chinese Language Model in a Linguistic decoder section to determine the corresponding Chinese character A Mandarin dictation machine which uses the above method, using a speech input device to receive the Mandarin speech and digitizing it so a personal computer can further process that information. A pitch frequency detector, a Voice signal pre-processing unit, a Hidden Markov Model processor, and a training facility are all attached to the personal computer to perform their associated functions of the method above.

196 citations


Journal ArticleDOI
A. Nadas1, David Nahamoo1, Michael Picheny1
TL;DR: A probabilistic mixture mode is described for a frame (the short term spectrum) of speech to be used in speech recognition and each component is regarded as a prototype for the labeling phase of a hidden Markov model based speech recognition system.
Abstract: A probabilistic mixture mode is described for a frame (the short term spectrum) of speech to be used in speech recognition. Each component of the mixture is regarded as a prototype for the labeling phase of a hidden Markov model based speech recognition system. Since the ambient noise during recognition can differ from that present in the training data, the model is designed for convenient updating in changing noise. Based on the observation that the energy in a frequency band is at any fixed time dominated either by signal energy or by noise energy, the energy is modeled as the larger of the separate energies of signal and noise in the band. Statistical algorithms are given for training this as a hidden variables model. The hidden variables are the prototype identities and the separate signal and noise components. Speech recognition experiments that successfully utilize this model are described. >

194 citations


Journal ArticleDOI
TL;DR: A maximum-a-posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed, based on statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes.
Abstract: A maximum-a-posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed. The approach is based on statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes. Hidden Markov models (HMMs) with mixtures of Gaussian autoregressive (AR) output probability distributions (PDs) are used to model the clean speech signal. The model for the noise process depends on its nature. The parameter set of the HMM model is estimated using the Baum or the EM (estimation-maximization) algorithm. The noisy speech is enhanced by reestimating the clean speech waveform using the EM algorithm. Efficient approximations of the training and enhancement procedures are examined. This results in the segmental k-means approach for hidden Markov modeling, in which the state sequence and the parameter set of the model are alternately estimated. Similarly, the enhancement is done by alternate estimation of the state and observation sequences. An approximate improvement of 4.0-6.0 dB in signal-to-noise ratio (SNR) is achieved at 10-dB input SNR. >

Proceedings ArticleDOI
23 May 1989
TL;DR: In this paper, an alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described, based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available.
Abstract: An alternative approach to speaker adaptation for a large-vocabulary hidden-Markov-model-based speech recognition system is described. The goal of this investigation was to train the IBM speech recognition system with only five minutes of speech data from a new speaker instead of the usual 20 minutes without the recognition rate dropping by more than 1-2%. The approach is based on the use of a stochastic model representing the different properties of the new speaker and an old speaker for which the full training set of 20 minutes is available. It is called a speaker Markov model. It is shown how the parameters of such a model can be derived and how it can be used for transforming the training set of the old speaker in order to use it in addition to the short training set of the new speaker. The adaptation algorithm was tested with 12 speakers. The average recognition rate dropped from 96.4% to 95.2% for a 5000-word vocabulary task. The decoding time increased by a factor of 1.35; this factor is often 3-5 if other adaptation algorithms are used. >

Journal ArticleDOI
TL;DR: This paper first derive instantaneous and cumulative measures of Markov and Markov reward model behavior, and compares the complexity of several competing algorithms for the computation of these measures.

Journal ArticleDOI
TL;DR: In this work, handwritten word recognition problem is modeled in the framework of hidden Markov model (HMM), and Viterbi algorithm is used to recognize the sequence of letters consisting the word.

Proceedings ArticleDOI
Philip A. Chou1
01 Nov 1989
TL;DR: This work proposes using two-dimensional stochastic context-free grammars for image recognition, in a manner analogous to using hidden Markov models for speech recognition, and demonstrates the value of the approach in a system that recognizes printed, noisy equations.
Abstract: We propose using two-dimensional stochastic context-free grammars for image recognition, in a manner analogous to using hidden Markov models for speech recognition. The value of the approach is demonstrated in a system that recognizes printed, noisy equations. The system uses a two-dimensional probabilistic version of the Cocke-Younger-Kasami parsing algorithm to find the most likely parse of the observed image, and then traverses the corresponding parse tree in accordance with translation formats associated with each production rule, to produce eqn I troff commands for the imaged equation. In addition, it uses two-dimensional versions of the Inside/Outside and Baum re-estimation algorithms for learning the parameters of the grammar from a training set of examples. Parsing the image of a simple noisy equation currently takes about one second of cpu time on an Alliant FX/80.

Proceedings ArticleDOI
Renals1, Rohwer1
01 Jan 1989
TL;DR: The application of a radial basis functions network to a static speech pattern classification problem is described and recognition results compare well with those obtained using backpropagation and a vector-quantized hidden Markov model on the same problem.
Abstract: The application of a radial basis functions network to a static speech pattern classification problem is described. The radial basis functions network offers training times two to three orders of magnitude faster than backpropagation, when training networks of similar power and generality. Recognition results compare well with those obtained using backpropagation and a vector-quantized hidden Markov model on the same problem. A computationally efficient method of exactly solving linear networks in a noniterative fashion is also described. The method was applied to classification of vowels into 20 classes using three different types of input analysis and varying numbers of radial basis functions. The three types of input vectors consisted of linear-prediction-coding cepstral coefficient; formant tracks with frequency, amplitude, and bandwidth information; and bark-scaled formant tracks. All input analyses were supplemented with duration information. The best test results were obtained using the cepstral coefficients and 170 or more radial basis functions. >

Journal ArticleDOI
TL;DR: A description is given of an implementation of a novel frame-synchronous network search algorithm for recognizing continuous speech as a connected sequence of words according to a specified grammar that is inherently based on hidden Markov model (HMM) representations.
Abstract: A description is given of an implementation of a novel frame-synchronous network search algorithm for recognizing continuous speech as a connected sequence of words according to a specified grammar. The algorithm, which has all the features of earlier methods, is inherently based on hidden Markov model (HMM) representations and is described in an easily understood, easily programmable manner. The new features of the algorithm include the capability of recording and determining (unique) word sequences corresponding to the several best paths to each grammar node, and the capability of efficiently incorporating a range of word and state duration scoring techniques directly into the forward search of the algorithm, thereby eliminating the need for a postprocessor as in previous implementations. It is also simple and straightforward to incorporate deterministic word transition rules and statistical constraints (probabilities) from a language model into the forward search of the algorithm. >

Proceedings ArticleDOI
T.H. Applebaum1, Brian Hanson1
23 May 1989
TL;DR: Results confirm that corrective training can improve on the recognition rate achieved by maximum-likelihood training, however, the algorithm is sensitive to selection of parameters.
Abstract: Corrective training is a recently proposed method of improving hidden Markov model parameters. Corrective training and related algorithms are applied to the domain of small-vocabulary, speaker-independent recognition. The contribution of each parameter of the algorithm is examined. Results confirm that corrective training can improve on the recognition rate achieved by maximum-likelihood training. However, the algorithm is sensitive to selection of parameters. A heuristic quantity is proposed to monitor the progress of the corrective training algorithm, and this quantity is used to adapt a parameter of corrective training. An alternative training algorithm is discussed and compared to corrective training. It yielded open test recognition rates comparable to those of maximum-likelihood training, but inferior to those of corrective training. >

Proceedings ArticleDOI
23 May 1989
TL;DR: The authors have developed a splitting procedure which initializes each new cluster (statistical model) by splitting off all tokens in the training set which were poorly represented by the current set of models, which gives excellent recognition performance in connected-word tasks.
Abstract: The authors describe an HMM (hidden Markov model) clustering procedure and discuss its application to connected-word systems and to large-vocabulary recognition based on phonelike units. It is shown that the conventional approach of maximizing likelihood is easily implemented but does not work well in practice, as it tends to give improved models of tokens for which the initial model was generally quite good, but does not improve tokens which are poorly represented by the initial model. The authors have developed a splitting procedure which initializes each new cluster (statistical model) by splitting off all tokens in the training set which were poorly represented by the current set of models. This procedure is highly efficient and gives excellent recognition performance in connected-word tasks. In particular, for speaker-independent connected-digit recognition, using two HMM-clustered models, the recognition performance is as good as or better than previous results using 4-6 models/digit obtained from template-based clustering. >

Proceedings ArticleDOI
23 May 1989
TL;DR: The authors present the results of speaker-verification technology development for use over long-distance telephone lines, using template-based dynamic time warping and hidden Markov modeling for discriminant analysis techniques which improve the discrimination between true speakers and imposters.
Abstract: The authors present the results of speaker-verification technology development for use over long-distance telephone lines. A description is given of two large speech databases that were collected to support the development of new speaker verification algorithms. Also discussed are the results of discriminant analysis techniques which improve the discrimination between true speakers and imposters. A comparison is made of the performance of two speaker-verification algorithms, one using template-based dynamic time warping, and the other, hidden Markov modeling. >

Journal ArticleDOI
Hervé Bourlard1, C. Wellekens1
TL;DR: A phoneme based real task application makes use of a particular MLP, based on the NETtalk architecture, and shows how the non-linear discriminant functions and the consideration of the temporal context dependence of the acoustic vectors are useful for the phonetic speech labeling.

Proceedings ArticleDOI
23 May 1989
TL;DR: An automatic speaker adaptation method for speech recognition in which a small amount of training material of unspecified text can be used and results of recognition experiments indicate that the proposed adaptation method is highly effective.
Abstract: An automatic speaker adaptation method is proposed for speech recognition in which a small amount of training material of unspecified text can be used. This method is easily applicable to vector-quantization-based speech recognition systems where each word is represented as multiple sequences of codebook entries. In the adaptation algorithm, either the codebook is modified for each new speaker or input speech spectra are adapted to the codebook, thereby using codebook sequences universally for all speakers. The important feature of this algorithm is that a set of spectra in training frames and the codebook entries are clustered hierarchically. Based on the deviation vectors between centroids of the training frame clusters and the corresponding codebook clusters, adaptation is performed hierarchically from small to large numbers of clusters. Results of recognition experiments indicate that the proposed adaptation method is highly effective. Possible variations using this method are presented. >

Journal ArticleDOI
TL;DR: The MDI approach is shown to be a descent algorithm for the discrimination information measure, and its local convergence is proved.
Abstract: An iterative approach for minimum-discrimination-information (MDI) hidden Markov modeling of information sources is proposed. The approach is developed for sources characterized by a given set of partial covariance matrices and for hidden Markov models (HMMs) with Gaussian autoregressive output probability distributions (PDs). The approach aims at estimating the HMM which yields the MDI with respect to all sources that could have produced the given set of partial covariance matrices. Each iteration of the MDI algorithm generates a new HMM as follows. First, a PD for the source is estimated by minimizing the discrimination information measure with respect to the old model over all PDs which satisfy the given set of partial covariance matrices. Then a new model that decreases the discrimination information measure between the estimated PD of the source and the PD of the old model is developed. The problem of estimating the PD of the source is formulated as a standard constrained minimization problem in the Euclidean space. The estimation of a new model given the PD of the source is done by a procedure that generalizes the Baum algorithm. The MDI approach is shown to be a descent algorithm for the discrimination information measure, and its local convergence is proved. >

Book ChapterDOI
19 Jun 1989
TL;DR: The Hidden Markov Model achieves a structured, knowledge based model with explicit uncertainties and mature optimal identification algorithms for prediction and analysis of sensor information recorded during robotic performance of tasks by telemanipulation.
Abstract: A new model is developed for prediction and analysis of sensor information recorded during robotic performance of tasks by telemanipulation. The model uses the Hidden Markov Model (Stochastic functions of Markov Nets) to describe the task structure, the operator or intelligent controller's goal structure, and the sensor signals such as forces and torques arising from interaction with the environment. The Markov process portion encodes the task sequence / sub-goal structure, and the observation densities associated with each sub-goal state encode the expected sensor signals associated with carrying out that sub-goal. Methodology is described for construction of the model parameters based on engineering knowledge of the task. The Viterbi algorithm is used for model based analysis of force signals measured during experimental teleoperation and achieves excellent segmentation of the data into sub-goal phases. The Hidden Markov Model achieves a structured, knowledge based model with explicit uncertainties and mature optimal identification algorithms.

Proceedings ArticleDOI
Jerome R. Bellegarda1, David Nahamoo1
23 May 1989
TL;DR: A class of very general hidden Markov models which can accommodate sequences of information-bearing acoustic feature vectors lying either in a discrete or in a continuous space are considered.
Abstract: The acoustic modeling problem in automatic speech recognition is estimated with the specific goal of unifying discrete and continuous parameter approaches. The authors consider a class of very general hidden Markov models which can accommodate sequences of information-bearing acoustic feature vectors lying either in a discrete or in a continuous space. More generally, the new class allows one to represent the prototypes in an assumption-limited, yet convenient, way, as (tied) mixtures of simple multivariate densities. Speech recognition experiments, reported for a large (5000-word) vocabulary office correspondence task, demonstrate some of the benefits associated with this technique. >

Proceedings ArticleDOI
23 May 1989
TL;DR: The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks.
Abstract: The Lincoln stress-resistant HMM (hidden Markov model) CSR has been extended to large-vocabulary continuous speech for both speaker-dependent (SD) and speaker-independent (SI) tasks. Performance on the DARPA resource management task (991-word vocabulary, perplexity 60 word-pair grammar) is 3.5% word error rate for SD training of word-context-dependent triphone models and 12.6% word error rate for SI training of (word-context-free) tied-mixture triphone models. >

Proceedings ArticleDOI
23 May 1989
TL;DR: A continuous-speech recognition method that uses an accurate and efficient parsing mechanism, an LR parser, and drives HMM (hidden Markov model) modules directly without any intervening structures such as a phoneme lattice is proposed.
Abstract: The authors propose a continuous-speech recognition method that uses an accurate and efficient parsing mechanism, an LR parser, and drives HMM (hidden Markov model) modules directly without any intervening structures such as a phoneme lattice. The method was tested in Japanese phrase recognition experiments. Two grammars were prepared, a general Japanese grammar and a task-specific grammar. The phrase recognition rate with the general grammar was 72% for top candidates and 95% for the five best candidates. With the task-specific grammar, recognition rate was 80% and 99% respectively. >

Proceedings ArticleDOI
23 May 1989
TL;DR: It is concluded that speech and linguistic knowledge sources can be used to improve the performance of HMM-based speech recognition systems provided that care is taken to incorporate these knowledge sources appropriately.
Abstract: A speaker-independent, continuous-speech, large-vocabulary speech recognition system, DECIPHER, has been developed. It provides state-of-the-art performance on the DARPA standard speaker-independent resource management training and testing materials. The approach is to integrate speech and linguistic knowledge into the HMM (hidden Markov model) framework. Performance improvements arising from detailed phonological modeling and from the incorporation of cross-word coarticulatory constraints are described. It is concluded that speech and linguistic knowledge sources can be used to improve the performance of HMM-based speech recognition systems provided that care is taken to incorporate these knowledge sources appropriately. >

Proceedings Article
01 Jan 1989
TL;DR: It is shown here that word recognition performance for a simple discrete density HMM system appears to be somewhat better when MLP methods are used to estimate the emission probabilities.
Abstract: We are developing a phoneme based, speaker-dependent continuous speech recognition system embedding a Multilayer Perceptron (MLP) (i.e., a feedforward Artificial Neural Network), into a Hidden Markov Model (HMM) approach. In [Bourlard & Wellekens], it was shown that MLPs were approximating Maximum a Posteriori (MAP) probabilities and could thus be embedded as an emission probability estimator in HMMs. By using contextual information from a sliding window on the input frames, we have been able to improve frame or phoneme classification performance over the corresponding performance for Simple Maximum Likelihood (ML) or even MAP probabilities that are estimated without the benefit of context. However, recognition of words in continuous speech was not so simply improved by the use of an MLP, and several modifications of the original scheme were necessary for getting acceptable performance. It is shown here that word recognition performance for a simple discrete density HMM system appears to be somewhat better when MLP methods are used to estimate the emission probabilities.

Proceedings ArticleDOI
23 May 1989
TL;DR: An analog of the Baum-Eagon inequality for rational functions makes it possible to use an E-M (expectation-maximization) algorithm for maximizing these functions.
Abstract: The well-known Baum-Eagon (1967) inequality provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefficients over a domain of probability values. However, in a large class of statistical problems, such as those arising in speech recognition based on hidden Markov models, it was found that estimation of parameters via some other criteria that use conditional likelihood, mutual information, or the recently introduced H-criteria can give better results than maximum-likelihood estimation. These problems require finding maxima for rational functions over domains of probability values, and an analog of the Baum-Eagon inequality for rational functions makes it possible to use an E-M (expectation-maximization) algorithm for maximizing these functions. The authors describe this extension. >