scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 1992"


Proceedings ArticleDOI
15 Jun 1992
TL;DR: The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer.
Abstract: A human action recognition method based on a hidden Markov model (HMM) is proposed. It is a feature-based bottom-up approach that is characterized by its learning capability and time-scale invariability. To apply HMMs, one set of time-sequential images is transformed into an image feature vector sequence, and the sequence is converted into a symbol sequence by vector quantization. In learning human action categories, the parameters of the HMMs, one per category, are optimized so as to best describe the training sequences from the category. To recognize an observed sequence, the HMM which best matches the sequence is chosen. Experimental results for real time-sequential images of sports scenes show recognition rates higher than 90%. The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer. >

1,477 citations


Proceedings ArticleDOI
31 Mar 1992
TL;DR: An implementation of a part-of-speech tagger based on a hidden Markov model that enables robust and accurate tagging with few resource requirements and accuracy exceeds 96%.
Abstract: We present an implementation of a part-of-speech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and optimizations which result in high-speed operation. Three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment.

737 citations


Journal ArticleDOI
Julian M. Kupiec1
TL;DR: A system for part-of-speech tagging is described, based on a hidden Markov model which can be trained using a corpus of untagged text, which results in a model that correctly tags approximately 96% of the text.

467 citations


Journal ArticleDOI
TL;DR: In this paper, the consistency of a sequence of maximum likelihood estimators is proved and the conclusion of the Shannon-McMillan-Breiman theorem on entropy convergence is established for hidden Markov models.

455 citations


Journal ArticleDOI
Philip Lockwood1, J. Boudy1
01 Jun 1992
TL;DR: The performance of an HMM-based recogniser rises from 56% (no compensation) to 98% after speech enhancement and the lower limit of applicability of the projection (low SNR values) can be loosened after combination with NSS.
Abstract: Achieving reliable performance for a speech recogniser is an important challenge, especially in the context of mobile telephony applications where the user can access telephone functions through voice The breakthrough of such a technology is appealing, since the driver can concentrate completely and safely on his task while composing and conversing in a “full” hands-free mode This paper addresses the problem of speaker-dependent discrete utterance recognition in noise Special reference is made to the mismatch effects due to the fact that training and testing are made in different environments A novel technique for noise compensation is proposed: nonlinear spectral subtraction (NSS) Robust variance estimates and robust pdf evaluations (projection) are also introduced and combined with NSS into the HMM framework We show that the lower limit of applicability of the projection (low SNR values) can be loosened after combination with NSS Experimental results are reported The performance of an HMM-based recogniser rises from 56% (no compensation) to 98% after speech enhancement More than 3300 utterances have been used to evaluate the systems (three databases, two European languages) This result is achieved by the use of robust training/recognition schemes and by preprocessing the noisy speech by NSS

383 citations


Journal ArticleDOI
TL;DR: The efficacy of the mean field theory approach is demonstrated on parameter estimation for one-dimensional mixture data and two-dimensional unsupervised stochastic model-based image segmentation and on parameter estimates for both synthetic and real-world images.
Abstract: In many signal processing and pattern recognition applications, the hidden data are modeled as Markov processes, and the main difficulty of using the maximisation (EM) algorithm for these applications is the calculation of the conditional expectations of the hidden Markov processes. It is shown how the mean field theory from statistical mechanics can be used to calculate the conditional expectations for these problems efficiently. The efficacy of the mean field theory approach is demonstrated on parameter estimation for one-dimensional mixture data and two-dimensional unsupervised stochastic model-based image segmentation. Experimental results indicate that in the 1-D case, the mean field theory approach provides results comparable to those obtained by Baum's (1987) algorithm, which is known to be optimal. In the 2-D case, where Baum's algorithm can no longer be used, the mean field theory provides good parameter estimates and image segmentation for both synthetic and real-world images. >

322 citations


Proceedings Article
30 Nov 1992
TL;DR: The algorithm is compared with the Baum-Welch method of estimating fixed-size models, and it is found that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge.
Abstract: This paper describes a technique for learning both the number of states and the topology of Hidden Markov Models from examples. The induction process starts with the most specific model consistent with the training data and generalizes by successively merging states. Both the choice of states to merge and the stopping criterion are guided by the Bayesian posterior probability. We compare our algorithm with the Baum-Welch method of estimating fixed-size models, and find that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge.

306 citations


PatentDOI
TL;DR: In this article, a linguistic model of words uses a state transition diagram (10) connecting states (12, 14, 16, 18, 20) with arcs (22, 24, 26, 28, 30, 32) that represent phoneme transitions within a word.
Abstract: A linguistic model of words uses a state transition diagram (10). The diagram connects states (12, 14, 16, 18, 20) with arcs (22, 24, 26, 28, 30, 32) that represent phoneme transitions within a word. Different pronunciations of the same word may be modelled using alternate transitions (26 or 28 and 30 or 32) from single states (16 and 18, respectively). This effectively models alternate pronunciations using similar phonemes such as substituting a ''D'' (28) phoneme for the ''T'' (26) phoneme in the word ''WATER'' (10). Such models could be extended for longer words or linked together for multiple words.

248 citations


Journal ArticleDOI
TL;DR: In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM, and an algorithm is proposed for global optimization of all the parameters.
Abstract: The integration of multilayered and recurrent artificial neural networks (ANNs) with hidden Markov models (HMMs) is addressed. ANNs are suitable for approximating functions that compute new acoustic parameters, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported. >

234 citations


Journal ArticleDOI
TL;DR: This paper concerns the use and implementation of maximum-penalized-likelihood procedures for choosing the number of mixing components and estimating the parameters in independent and Markov-dependent mixture models.
Abstract: SUMMARY This paper concerns the use and implementation of maximum-penalized-likelihood procedures for choosing the number of mixing components and estimating the parameters in independent and Markov-dependent mixture models. Computation of the estimates is achieved via algorithms for the automatic generation of starting values for the EM algorithm. Computation of the information matrix is also discussed. Poisson mixture models are applied to a sequence of counts of movements by a fetal lamb in utero obtained by ultrasound. The resulting estimates are seen to provide plausible mechanisms for the physiological process. The analysis of count data that are overdispersed relative to the Poisson distribution (i.e., variance > mean) has received considerable recent attention. Such data might arise in a clinical study in which overdispersion is caused by unexplained or random subject effects. Alternatively, we might observe a time series of counts in which temporal patterns in the data suggest that a Poisson model and its implied randomness are inappropriate. This paper is motivated by analysis of a time series of overdispersed count data generated in a study of central nervous system development in fetal lambs. Our data set consists of observed movement counts in 240 consecutive 5-second intervals obtained from a single animal. In analysing these data, we focus on the use of Poisson mixture models assuming independent observations and also Markov-dependent mixture models (or hidden Markov models). These models assume that the counts follow independent Poisson distributions conditional on the rates, which are generated from a mixing distribution either independently or with Markov dependence. We believe finite mixture models are particularly attractive because they provide plausible explanations for variation in the data. This paper will emphasize the following issues concerning estimation, inference, and application of mixture models: (i) choosing the number of model components; (ii) applying the EM algorithm to obtain parameter estimates; (iii) generating sufficiently many starting values to identify a global maximum of the likelihood; (iv) avoiding numerical instability

223 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The distortion-intersection measure (DIM), which was introduced as a VQ-distortion measure to increase the robustness against utterance variations, is effective and the speaker identification rates using a continuous ergodic HMM are strongly correlated with the total number of mixtures irrespective of the number of states.
Abstract: A VQ (vector quantization)-distortion-based speaker recognition method and discrete/continuous ergodic HMM (hidden Markov model)-based ones are compared, especially from the viewpoint of robustness against utterance variations. It is shown that a continuous ergodic HMM is far superior to a discrete ergodic HMM. It is also shown that the information on transitions between different states is ineffective for text-independent speaker recognition. Therefore, the speaker identification rates using a continuous ergodic HMM are strongly correlated with the total number of mixtures irrespective of the number of states. It is also found that, for continuous ergodic HMM-based speaker recognition, the distortion-intersection measure (DIM), which was introduced as a VQ-distortion measure to increase the robustness against utterance variations, is effective. >

PatentDOI
Jie Yi1
TL;DR: This article concatenated the triphone, diphone, and phoneme models of a target vocabulary, using triphone models if available, Diphone HMMs when triphone HMMs are not available, and PHE models when neither triphone nor DiphONE models are available.
Abstract: A speech recognition system starts by training hidden Markov models for all triphones, diphones, and phonemes occurring in a small training vocabulary. Hidden Markov models of a target vocabulary are created by concatenating the triphone, diphone, and phoneme models, using triphone models if available, diphone HMMs when triphone models are not available, and phoneme models when neither triphone nor diphone models are available. Utterances from the target vocabulary are recognized by choosing a model with maximum probability of reproducing quantized utterance features.

Journal ArticleDOI
Yariv Ephraim1
TL;DR: A Bayesian estimation approach for enhancing speech signals which have been degraded by statistically independent additive noise is motivated and developed, and minimum mean square error (MMSE) and maximum a posteriori (MAP) signal estimators are developed using hidden Markov models for the clean signal and the noise process.
Abstract: A Bayesian estimation approach for enhancing speech signals which have been degraded by statistically independent additive noise is motivated and developed. In particular, minimum mean square error (MMSE) and maximum a posteriori (MAP) signal estimators are developed using hidden Markov models (HMMs) for the clean signal and the noise process. It is shown that the MMSE estimator comprises a weighted sum of conditional mean estimators for the composite states of the noisy signal, where the weights equal the posterior probabilities of the composite states given the noisy signal. The estimation of several spectral functionals of the clean signal such as the sample spectrum and the complex exponential of the phase is also considered. A gain-adapted MAP estimator is developed using the expectation-maximization algorithm. The theoretical performance of the MMSE estimator is discussed, and convergence of the MAP estimator is proved. Both the MMSE and MAP estimators are tested in enhancing speech signals degraded by white Gaussian noise at input signal-to-noise ratios of from 5 to 20 dB. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: A novel training algorithm, segmental GPD (generalized probabilistic descent) training, for a hidden Markov model (HMM)-based speech recognizer using Viterbi decoding is proposed, based on the principle of minimum recognition error rate in which segmentation and discriminative training are jointly optimized.
Abstract: A novel training algorithm, segmental GPD (generalized probabilistic descent) training, for a hidden Markov model (HMM)-based speech recognizer using Viterbi decoding is proposed. This algorithm is based on the principle of minimum recognition error rate in which segmentation and discriminative training are jointly optimized. Various issues related to the special structure of HMM in segmental GPD training are studied. The authors tested this algorithm on two speaker-independent recognition tasks. The first experiment involves English E-set. Segmental GPD training was directly applied to HMM generated from nonoptimal uniform segmentation. A recognition rate of 88.7% was achieved on English E-set with whole word HMM. The second experiment involves the connected digits TI-database. Segmental GPD training was applied to HMM which were already trained using conventional training methods. A string recognition rate of 98.8% was achieved on 10-state word based HMM through segmental GPD training. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: The author addresses the problem of automatic speech recognition in the presence of interfering noise by decomposing the contaminated speech signal using a generalization of standard hidden Markov modeling, while utilizing a compact and effective parametrization of the speech signal.
Abstract: The author addresses the problem of automatic speech recognition in the presence of interfering noise. The novel approach described decomposes the contaminated speech signal using a generalization of standard hidden Markov modeling, while utilizing a compact and effective parametrization of the speech signal. The technique is compared to some existing noise compensation techniques, using data recorded in noise, and is found to have improved performance compared to existing model decomposition techniques. Performance is comparable to existing noise subtraction techniques, but the technique is applicable to a wider range of noise environments and is not dependent on an accurate endpointing of the speech. >

Proceedings ArticleDOI
23 Aug 1992
TL;DR: Preliminary experiments showing some of the advantages of SLTAG over stochastic context-free grammars are reported and an algorithm for computing the probability of a sentence generated by a SLTAG and an inside-outside-like iterative algorithm for estimating the parameters of a SL TAG are reported.
Abstract: The notion of stochastic lexicalized tree-adjoining grammar (SLTAG) is formally defined. The parameters of a SLTAG correspond to the probability of combining two structures each one associated with a word. The characteristics of SLTAG are unique and novel since it is lexieally sensitive (as N-gram models or Hidden Markov Models) and yet hierarchical (as stochastic context-free grammars).Then, two basic algorithms for SLTAG arc introduced: an algorithm for computing the probability of a sentence generated by a SLTAG and an inside-outside-like iterative algorithm for estimating the parameters of a SLTAG given a training corpus.Finally, we should how SLTAG enables to define a lexicalized version of stochastic context-free grammars and we report preliminary experiments showing some of the advantages of SLTAG over stochastic context-free grammars.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors extend the dynamic time warping algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for application in the field of optical character recognition (OCR) or similar applications.
Abstract: The authors extend the dynamic time warping (DTW) algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for application in the field of optical character recognition (OCR) or similar applications. Although direct application of the optimality principle reduced the computational complexity somewhat, the DPW (or image alignment) problem is exponential in the dimensions of the image. It is shown that by applying constraints to the image alignment problem, e.g., limiting the class of possible distortions, one can reduce the computational complexity dramatically, and find the optimal solution to the constrained problem in linear time. A statistical model, the planar hidden Markov model (PHMM), describing statistical properties of images is proposed. The PHMM approach was evaluated using a set of isolated handwritten digits. An overall digit recognition accuracy of 95% was achieved. It is expected that the advantage of this approach will be even more significant for harder tasks, such cursive-writing recognition and spotting. >

Yves Normandin1
01 Jan 1992
TL;DR: This work argues that the maximum mutual information estimation (MMIE) formulation for training is more appropriate vis-a-vis maximum likelihood estimation ( MLE) for reducing the error rate and proposes reestimation formulas for the case of diagonal Gaussian densities, experimentally demonstrate their convergence properties, and integrate them into the training algorithm.
Abstract: Hidden Markov Models (HMMs) are one of the most powerful speech recognition tools available today Even so, the inadequacies of HMMs as a "correct" modeling framework for speech are well known In that context, we argue that the maximum mutual information estimation (MMIE) formulation for training is more appropriate vis-a-vis maximum likelihood estimation (MLE) for reducing the error rate We also show how MMIE paves the way for new training possibilities We introduce Corrective MMIE training, a very efficient new training algorithm which uses a modified version of a discrete reestimation formula recently proposed by Gopalakrishnan et al We propose reestimation formulas for the case of diagonal Gaussian densities, experimentally demonstrate their convergence properties, and integrate them into our training algorithm In a connected digit recognition task, MMIE consistently improves the recognition performance of our recognizer

Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors propose an algorithm, successive state splitting (SSS), for simultaneously finding an optimal set of phoneme context classes, an optimal topology, and optimal parameters for hidden Markov models (HMMs) commonly using a maximum likelihood criterion.
Abstract: The authors propose an algorithm, successive state splitting (SSS), for simultaneously finding an optimal set of phoneme context classes, an optimal topology, and optimal parameters for hidden Markov models (HMMs) commonly using a maximum likelihood criterion. With this algorithm, a hidden Markov network (HM-Net), which is an efficient representation of phoneme-context-dependent HMMs, can be generated automatically. The authors implemented this algorithm, and tested it on the recognition of six Japanese consonants ( mod b mod , mod d mod , mod g mod , mod m mod , mod n mod and mod N mod ). The HM-Net gave better recognition results with a lower number of total output probability density distributions than conventional phoneme-context-independent mixture Gaussian density HMMs. >

Patent
Klaus Zuenkler1
04 Sep 1992
TL;DR: In this article, a method for recognizing patterns in time-variant measurement signals is specified which permits an improved discrimination between such signals by reclassifying in pairs, the discrimination-relevant features being examined separately in a second step after the main classification.
Abstract: In automatic speech recognition, confusion easily arises between phonetically similar words (for example, the German words "zwei" and "drei") in the case of previous recognition systems. Confusion of words which differ only in a single phoneme (for example, German phonemes "dem" and "den") occurs particularly easily with these recognition systems. In order to solve this problem, a method for recognizing patterns in time-variant measurement signals is specified which permits an improved discrimination between such signals by reclassifying in pairs. This method combines the Viterbi decoding algorithm with the method of hidden Markov models, the discrimination-relevant features being examined separately in a second step after the main classification. In this case, different components of feature vectors are weighted differently, it being the case that by contrast with known approaches these weightings are performed in a theoretically based way. The method is suitable, inter alia, for improving speech-recognizing systems.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: A method of manipulating sets of hidden Markov models (HMMs) by applying various kinds of parameter tying operations is described, the aim being to synthesize compact and robust context dependent models.
Abstract: A method of manipulating sets of hidden Markov models (HMMs) by applying various kinds of parameter tying operations is described, the aim being to synthesize compact and robust context dependent models. The method is illustrated via an experiment to build a set of generalized triphone models for the TIMIT database in which triphones are constructed by joining together left and right dependent biphones. Although simple, the method results in good performance and avoids the need to train large numbers of triphones. The use of tying to increase model robustness is also investigated. Tying the center states within triphones of the same phoneme class and tying variances within states is beneficial, but larger-scale tying of variances leads to degraded performance. >

Journal ArticleDOI
TL;DR: Analysis of the structure of some small complete genomes and a human genome segment using a hidden Markov chain model finds a variety of discrete compositional domains and their correlations with genome function are explored.

Proceedings ArticleDOI
23 Feb 1992
TL;DR: This paper reports recent efforts to apply the speaker-independent SPHINX-II system to the DARPA Wall Street Journal continuous speech recognition task, which includes sex-dependent, semi-continuous, shared-distribution hidden Markov models and left context dependent between-word triphones.
Abstract: This paper reports recent efforts to apply the speaker-independent SPHINX-II system to the DARPA Wall Street Journal continuous speech recognition task. In SPHINX-II, we incorporated additional dynamic and speaker-normalized features, replaced discrete models with sex-dependent semi-continuous hidden Markov models, augmented within-word triphones with between-word triphones, and extended generalized triphone models to shared-distribution models. The configuration of SPHINX-II being used for this task includes sex-dependent, semi-continuous, shared-distribution hidden Markov models and left context dependent between-word triphones. In applying our technology to this task we addressed issues that were not previously of concern owing to the (relatively) small size of the Resource Management task.

Journal ArticleDOI
Li Deng1
TL;DR: The trended HMM as discussed by the authors is a more faithful and structured representation of many classes of speech sounds whose production involves strong articulatory dynamics, and is expected to be a more suitable model for use in speech processing applications.

Journal ArticleDOI
01 Jun 1992
TL;DR: The approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters to enhance model robustness in a CDHMM-based speech recognition system.
Abstract: An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering and corrective training. The goal is to enhance model robustness in a CDHMM-based speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and results applying it to parameter smoothing, speaker adaptation, speaker clustering and corrective training are given.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: Preliminary senone modeling results are reported, which have significantly reduced the word error rate for speaker-independent continuous speech recognition and treat the state in phonetic hidden Markov models as the basic subphonetic unit-senone.
Abstract: There will never be sufficient training data to model all the various acoustic-phonetic phenomena. How to capture important clues and estimate those needed parameters reliably is one of the central issues in speech recognition. Successful examples include subword models, fenones and many other smoothing techniques. In comparison with subword models, subphonetic modeling may provide a finer level of details. The authors propose to model subphonetic events with Markov states and treat the state in phonetic hidden Markov models as the basic subphonetic unit-senone. Senones generalize fenones in several ways. A word model is a concatenation of senones and senones can be shared across different word models. Senone models not only allow parameter sharing, but also enable pronunciation optimization. The authors report preliminary senone modeling results, which have significantly reduced the word error rate for speaker-independent continuous speech recognition. >

Journal ArticleDOI
TL;DR: Speaker-dependent phoneme recognition experiments were conducted using variants of the semicontinuous hidden Markov model (SCHMM) with explicit state duration modeling, and results clearly demonstrated that the SCHMM with state duration offers significantly improved phoneme classification accuracy.
Abstract: Speaker-dependent phoneme recognition experiments were conducted using variants of the semicontinuous hidden Markov model (SCHMM) with explicit state duration modeling. Results clearly demonstrated that the SCHMM with state duration offers significantly improved phoneme classification accuracy compared to both the discrete HMM and the continuous HMM; the error rate was reduced by more than 30% and 20%, respectively. The use of a limited number of mixture densities significantly reduced the amount of computation. Explicit state duration modeling further reduced the error rate. >

Proceedings ArticleDOI
P. Ramesh1, Jay G. Wilpon1
23 Mar 1992
TL;DR: The authors present a way of modeling state durations in HMM using time-dependent state transitions and a suboptimal implementation of this scheme that requires no more computation than the traditional HMM but reduces recognition error rates by 14-25%.
Abstract: Hidden Markov modeling (HMM) techniques have been used successfully for connected speech recognition in the last several years. In the traditional HMM algorithms, the probability of duration of a state decreases exponentially with time which is not appropriate for representing the temporal structure of speech. Non-parametric modeling of duration using semi-Markov chains does accomplish the task with a large increase in the computational complexity. Applying a postprocessing state duration penalty after Viterbi decoding adds very little computation but does not affect the forward recognition path. The authors present a way of modeling state durations in HMM using time-dependent state transitions. This inhomogeneous HMM (IHMM) does increase the computation by a small amount but reduces recognition error rates by 14-25%. Also, a suboptimal implementation of this scheme that requires no more computation than the traditional HMM is presented which also has reduced errors by 14-22% on a variety of databases. >

Proceedings ArticleDOI
23 Feb 1992
TL;DR: Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training.
Abstract: We discuss maximum a posteriori estimation of continuous density hidden Markov models (CDHMM). The classical MLE reestimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: It is shown how, without any simplifying assumptions, one can estimate likelihoods for context-dependent phonetic models with nets that are not substantially larger than context-independent MLPs.
Abstract: A series of theoretical and experimental results have suggested that multilayer perceptrons (MLPs) are an effective family of algorithms for the smooth estimate of highly dimensioned probability density functions that are useful in continuous speech recognition. All of these systems have exclusively used context-independent phonetic models, in the sense that the probabilities or costs are estimated for simple speech units such as phonemes or words, rather than biphones or triphones. Numerous conventional systems based on hidden Markov models (HMMs) have been reported that use triphone or triphone like context-dependent models. In one case the outputs of many context-dependent MLPs (one per context class) were used to help choose the best sentence from the N best sentences as determined by a context-dependent HMM system. It is shown how, without any simplifying assumptions, one can estimate likelihoods for context-dependent phonetic models with nets that are not substantially larger than context-independent MLPs. >