Showing papers on "Hidden Markov model published in 1992"

PDF

Open Access

Proceedings Article•DOI•

Recognizing human action in time-sequential images using hidden Markov model

[...]

15 Jun 1992

TL;DR: The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer.

...read moreread less

Abstract: A human action recognition method based on a hidden Markov model (HMM) is proposed. It is a feature-based bottom-up approach that is characterized by its learning capability and time-scale invariability. To apply HMMs, one set of time-sequential images is transformed into an image feature vector sequence, and the sequence is converted into a symbol sequence by vector quantization. In learning human action categories, the parameters of the HMMs, one per category, are optimized so as to best describe the training sequences from the category. To recognize an observed sequence, the HMM which best matches the sequence is chosen. Experimental results for real time-sequential images of sports scenes show recognition rates higher than 90%. The recognition rate is improved by increasing the number of people used to generate the training data, indicating the possibility of establishing a person-independent action recognizer. >

...read moreread less

1,477 citations

Proceedings Article•DOI•

A Practical Part-of-Speech Tagger

[...]

Douglass R. Cutting¹, Julian M. Kupiec¹, Jan O. Pedersen¹, Penelope Sibun¹•Institutions (1)

PARC¹

31 Mar 1992

TL;DR: An implementation of a part-of-speech tagger based on a hidden Markov model that enables robust and accurate tagging with few resource requirements and accuracy exceeds 96%.

...read moreread less

Abstract: We present an implementation of a part-of-speech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and optimizations which result in high-speed operation. Three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment.

...read moreread less

737 citations

Journal Article•DOI•

Robust part-of-speech tagging using a hidden Markov model

[...]

Julian M. Kupiec¹•Institutions (1)

PARC¹

01 Jul 1992-Computer Speech & Language

TL;DR: A system for part-of-speech tagging is described, based on a hidden Markov model which can be trained using a corpus of untagged text, which results in a model that correctly tags approximately 96% of the text.

...read moreread less

467 citations

Journal Article•DOI•

Maximum-likelihood estimation for hidden Markov models

[...]

Brian G. Leroux¹•Institutions (1)

University of Washington¹

01 Feb 1992-Stochastic Processes and their Applications

TL;DR: In this paper, the consistency of a sequence of maximum likelihood estimators is proved and the conclusion of the Shannon-McMillan-Breiman theorem on entropy convergence is established for hidden Markov models.

...read moreread less

455 citations

Journal Article•DOI•

Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the projection, for robust speech recognition in cars

[...]

Philip Lockwood¹, J. Boudy¹•Institutions (1)

Matra¹

01 Jun 1992

TL;DR: The performance of an HMM-based recogniser rises from 56% (no compensation) to 98% after speech enhancement and the lower limit of applicability of the projection (low SNR values) can be loosened after combination with NSS.

...read moreread less

Abstract: Achieving reliable performance for a speech recogniser is an important challenge, especially in the context of mobile telephony applications where the user can access telephone functions through voice The breakthrough of such a technology is appealing, since the driver can concentrate completely and safely on his task while composing and conversing in a “full” hands-free mode This paper addresses the problem of speaker-dependent discrete utterance recognition in noise Special reference is made to the mismatch effects due to the fact that training and testing are made in different environments A novel technique for noise compensation is proposed: nonlinear spectral subtraction (NSS) Robust variance estimates and robust pdf evaluations (projection) are also introduced and combined with NSS into the HMM framework We show that the lower limit of applicability of the projection (low SNR values) can be loosened after combination with NSS Experimental results are reported The performance of an HMM-based recogniser rises from 56% (no compensation) to 98% after speech enhancement More than 3300 utterances have been used to evaluate the systems (three databases, two European languages) This result is achieved by the use of robust training/recognition schemes and by preprocessing the noisy speech by NSS

...read moreread less

383 citations

Journal Article•DOI•

The mean field theory in EM procedures for Markov random fields

[...]

Jun Zhang¹•Institutions (1)

University of Wisconsin-Madison¹

01 Oct 1992-IEEE Transactions on Signal Processing

TL;DR: The efficacy of the mean field theory approach is demonstrated on parameter estimation for one-dimensional mixture data and two-dimensional unsupervised stochastic model-based image segmentation and on parameter estimates for both synthetic and real-world images.

...read moreread less

Abstract: In many signal processing and pattern recognition applications, the hidden data are modeled as Markov processes, and the main difficulty of using the maximisation (EM) algorithm for these applications is the calculation of the conditional expectations of the hidden Markov processes. It is shown how the mean field theory from statistical mechanics can be used to calculate the conditional expectations for these problems efficiently. The efficacy of the mean field theory approach is demonstrated on parameter estimation for one-dimensional mixture data and two-dimensional unsupervised stochastic model-based image segmentation. Experimental results indicate that in the 1-D case, the mean field theory approach provides results comparable to those obtained by Baum's (1987) algorithm, which is known to be optimal. In the 2-D case, where Baum's algorithm can no longer be used, the mean field theory provides good parameter estimates and image segmentation for both synthetic and real-world images. >

...read moreread less

322 citations

Proceedings Article•

Hidden Markov Model Induction by Bayesian Model Merging

[...]

Andreas Stolcke¹, Stephen M. Omohundro¹•Institutions (1)

International Computer Science Institute¹

30 Nov 1992

TL;DR: The algorithm is compared with the Baum-Welch method of estimating fixed-size models, and it is found that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge.

...read moreread less

Abstract: This paper describes a technique for learning both the number of states and the topology of Hidden Markov Models from examples. The induction process starts with the most specific model consistent with the training data and generalizes by successively merging states. Both the choice of states to merge and the stopping criterion are guided by the Bayesian posterior probability. We compare our algorithm with the Baum-Welch method of estimating fixed-size models, and find that it can induce minimal HMMs from data in cases where fixed estimation does not converge or requires redundant parameters to converge.

...read moreread less

306 citations

Patent•DOI•

Method for recognizing speech using linguistically-motivated hidden markov models

[...]

Michael Cohen¹, Mitchel Weintraub¹, Patti Price¹, Hy Murveit¹, Jared Bernstein¹ - Show less +1 more•Institutions (1)

SRI International¹

29 Jan 1992-Journal of the Acoustical Society of America

TL;DR: In this article, a linguistic model of words uses a state transition diagram (10) connecting states (12, 14, 16, 18, 20) with arcs (22, 24, 26, 28, 30, 32) that represent phoneme transitions within a word.

...read moreread less

Abstract: A linguistic model of words uses a state transition diagram (10). The diagram connects states (12, 14, 16, 18, 20) with arcs (22, 24, 26, 28, 30, 32) that represent phoneme transitions within a word. Different pronunciations of the same word may be modelled using alternate transitions (26 or 28 and 30 or 32) from single states (16 and 18, respectively). This effectively models alternate pronunciations using similar phonemes such as substituting a ''D'' (28) phoneme for the ''T'' (26) phoneme in the word ''WATER'' (10). Such models could be extended for longer words or linked together for multiple words.

...read moreread less

248 citations

Journal Article•DOI•

Global optimization of a neural network-hidden Markov model hybrid

[...]

Yoshua Bengio¹, R. De Mori¹, G. Flammia¹, R. Kompe¹•Institutions (1)

McGill University¹

01 Mar 1992-IEEE Transactions on Neural Networks

TL;DR: In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM, and an algorithm is proposed for global optimization of all the parameters.

...read moreread less

Abstract: The integration of multilayered and recurrent artificial neural networks (ANNs) with hidden Markov models (HMMs) is addressed. ANNs are suitable for approximating functions that compute new acoustic parameters, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported. >

...read moreread less

234 citations

Journal Article•DOI•

Maximum-Penalized-Likelihood Estimation for Independent and Markov-Dependent Mixture Models

[...]

Brian G. Leroux¹, Martin L. Puterman²•Institutions (2)

Health and Welfare Canada¹, University of Washington²

01 Jun 1992-Biometrics

TL;DR: This paper concerns the use and implementation of maximum-penalized-likelihood procedures for choosing the number of mixing components and estimating the parameters in independent and Markov-dependent mixture models.

...read moreread less

Abstract: SUMMARY This paper concerns the use and implementation of maximum-penalized-likelihood procedures for choosing the number of mixing components and estimating the parameters in independent and Markov-dependent mixture models. Computation of the estimates is achieved via algorithms for the automatic generation of starting values for the EM algorithm. Computation of the information matrix is also discussed. Poisson mixture models are applied to a sequence of counts of movements by a fetal lamb in utero obtained by ultrasound. The resulting estimates are seen to provide plausible mechanisms for the physiological process. The analysis of count data that are overdispersed relative to the Poisson distribution (i.e., variance > mean) has received considerable recent attention. Such data might arise in a clinical study in which overdispersion is caused by unexplained or random subject effects. Alternatively, we might observe a time series of counts in which temporal patterns in the data suggest that a Poisson model and its implied randomness are inappropriate. This paper is motivated by analysis of a time series of overdispersed count data generated in a study of central nervous system development in fetal lambs. Our data set consists of observed movement counts in 240 consecutive 5-second intervals obtained from a single animal. In analysing these data, we focus on the use of Poisson mixture models assuming independent observations and also Markov-dependent mixture models (or hidden Markov models). These models assume that the counts follow independent Poisson distributions conditional on the rates, which are generated from a mixing distribution either independently or with Markov dependence. We believe finite mixture models are particularly attractive because they provide plausible explanations for variation in the data. This paper will emphasize the following issues concerning estimation, inference, and application of mixture models: (i) choosing the number of model components; (ii) applying the EM algorithm to obtain parameter estimates; (iii) generating sufficiently many starting values to identify a global maximum of the likelihood; (iv) avoiding numerical instability

...read moreread less

223 citations

Proceedings Article•DOI•

Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs

[...]

Tomoko Matsui, Sadaoki Furui

23 Mar 1992

TL;DR: The distortion-intersection measure (DIM), which was introduced as a VQ-distortion measure to increase the robustness against utterance variations, is effective and the speaker identification rates using a continuous ergodic HMM are strongly correlated with the total number of mixtures irrespective of the number of states.

...read moreread less

Abstract: A VQ (vector quantization)-distortion-based speaker recognition method and discrete/continuous ergodic HMM (hidden Markov model)-based ones are compared, especially from the viewpoint of robustness against utterance variations. It is shown that a continuous ergodic HMM is far superior to a discrete ergodic HMM. It is also shown that the information on transitions between different states is ineffective for text-independent speaker recognition. Therefore, the speaker identification rates using a continuous ergodic HMM are strongly correlated with the total number of mixtures irrespective of the number of states. It is also found that, for continuous ergodic HMM-based speaker recognition, the distortion-intersection measure (DIM), which was introduced as a VQ-distortion measure to increase the robustness against utterance variations, is effective. >

...read moreread less

Patent•DOI•

Speech recognition method and system using triphones, diphones, and phonemes

[...]

Jie Yi¹•Institutions (1)

Oki Electric Industry¹

21 Dec 1992-Journal of the Acoustical Society of America

TL;DR: This article concatenated the triphone, diphone, and phoneme models of a target vocabulary, using triphone models if available, Diphone HMMs when triphone HMMs are not available, and PHE models when neither triphone nor DiphONE models are available.

...read moreread less

Abstract: A speech recognition system starts by training hidden Markov models for all triphones, diphones, and phonemes occurring in a small training vocabulary. Hidden Markov models of a target vocabulary are created by concatenating the triphone, diphone, and phoneme models, using triphone models if available, diphone HMMs when triphone models are not available, and phoneme models when neither triphone nor diphone models are available. Utterances from the target vocabulary are recognized by choosing a model with maximum probability of reproducing quantized utterance features.

...read moreread less

Journal Article•DOI•

A Bayesian estimation approach for speech enhancement using hidden Markov models

[...]

Yariv Ephraim¹•Institutions (1)

Bell Labs¹

01 Apr 1992-IEEE Transactions on Signal Processing

TL;DR: A Bayesian estimation approach for enhancing speech signals which have been degraded by statistically independent additive noise is motivated and developed, and minimum mean square error (MMSE) and maximum a posteriori (MAP) signal estimators are developed using hidden Markov models for the clean signal and the noise process.

...read moreread less

Abstract: A Bayesian estimation approach for enhancing speech signals which have been degraded by statistically independent additive noise is motivated and developed. In particular, minimum mean square error (MMSE) and maximum a posteriori (MAP) signal estimators are developed using hidden Markov models (HMMs) for the clean signal and the noise process. It is shown that the MMSE estimator comprises a weighted sum of conditional mean estimators for the composite states of the noisy signal, where the weights equal the posterior probabilities of the composite states given the noisy signal. The estimation of several spectral functionals of the clean signal such as the sample spectrum and the complex exponential of the phase is also considered. A gain-adapted MAP estimator is developed using the expectation-maximization algorithm. The theoretical performance of the MMSE estimator is discussed, and convergence of the MAP estimator is proved. Both the MMSE and MAP estimators are tested in enhancing speech signals degraded by white Gaussian noise at input signal-to-noise ratios of from 5 to 20 dB. >

...read moreread less

Proceedings Article•DOI•

Segmental GPD training of HMM based speech recognizer

[...]

Wu Chou¹, Biing-Hwang Juang¹, Chin-Hui Lee¹•Institutions (1)

Bell Labs¹

23 Mar 1992

TL;DR: A novel training algorithm, segmental GPD (generalized probabilistic descent) training, for a hidden Markov model (HMM)-based speech recognizer using Viterbi decoding is proposed, based on the principle of minimum recognition error rate in which segmentation and discriminative training are jointly optimized.

...read moreread less

Abstract: A novel training algorithm, segmental GPD (generalized probabilistic descent) training, for a hidden Markov model (HMM)-based speech recognizer using Viterbi decoding is proposed. This algorithm is based on the principle of minimum recognition error rate in which segmentation and discriminative training are jointly optimized. Various issues related to the special structure of HMM in segmental GPD training are studied. The authors tested this algorithm on two speaker-independent recognition tasks. The first experiment involves English E-set. Segmental GPD training was directly applied to HMM generated from nonoptimal uniform segmentation. A recognition rate of 88.7% was achieved on English E-set with whole word HMM. The second experiment involves the connected digits TI-database. Segmental GPD training was applied to HMM which were already trained using conventional training methods. A string recognition rate of 98.8% was achieved on 10-state word based HMM through segmental GPD training. >

...read moreread less

Proceedings Article•DOI•

An improved approach to the hidden Markov model decomposition of speech and noise

[...]

Mark J. F. Gales¹, Steve Young¹•Institutions (1)

University of Cambridge¹

23 Mar 1992

TL;DR: The author addresses the problem of automatic speech recognition in the presence of interfering noise by decomposing the contaminated speech signal using a generalization of standard hidden Markov modeling, while utilizing a compact and effective parametrization of the speech signal.

...read moreread less

Abstract: The author addresses the problem of automatic speech recognition in the presence of interfering noise. The novel approach described decomposes the contaminated speech signal using a generalization of standard hidden Markov modeling, while utilizing a compact and effective parametrization of the speech signal. The technique is compared to some existing noise compensation techniques, using data recorded in noise, and is found to have improved performance compared to existing model decomposition techniques. Performance is comparable to existing noise subtraction techniques, but the technique is applicable to a wider range of noise environments and is not dependent on an accurate endpointing of the speech. >

...read moreread less

Proceedings Article•DOI•

Stochastic lexicalized tree-adjoining grammars

[...]

Yves Schabes¹•Institutions (1)

University of Pennsylvania¹

23 Aug 1992

TL;DR: Preliminary experiments showing some of the advantages of SLTAG over stochastic context-free grammars are reported and an algorithm for computing the probability of a sentence generated by a SLTAG and an inside-outside-like iterative algorithm for estimating the parameters of a SL TAG are reported.

...read moreread less

Abstract: The notion of stochastic lexicalized tree-adjoining grammar (SLTAG) is formally defined. The parameters of a SLTAG correspond to the probability of combining two structures each one associated with a word. The characteristics of SLTAG are unique and novel since it is lexieally sensitive (as N-gram models or Hidden Markov Models) and yet hierarchical (as stochastic context-free grammars).Then, two basic algorithms for SLTAG arc introduced: an algorithm for computing the probability of a sentence generated by a SLTAG and an inside-outside-like iterative algorithm for estimating the parameters of a SLTAG given a training corpus.Finally, we should how SLTAG enables to define a lexicalized version of stochastic context-free grammars and we report preliminary experiments showing some of the advantages of SLTAG over stochastic context-free grammars.

...read moreread less

Proceedings Article•DOI•

Dynamic planar warping for optical character recognition

[...]

Esther Levin¹, Roberto Pieraccini¹•Institutions (1)

Bell Labs¹

23 Mar 1992

TL;DR: The authors extend the dynamic time warping algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for application in the field of optical character recognition (OCR) or similar applications.

...read moreread less

Abstract: The authors extend the dynamic time warping (DTW) algorithm, widely used in automatic speech recognition (ASR), to a dynamic plane warping (DPW) algorithm, for application in the field of optical character recognition (OCR) or similar applications. Although direct application of the optimality principle reduced the computational complexity somewhat, the DPW (or image alignment) problem is exponential in the dimensions of the image. It is shown that by applying constraints to the image alignment problem, e.g., limiting the class of possible distortions, one can reduce the computational complexity dramatically, and find the optimal solution to the constrained problem in linear time. A statistical model, the planar hidden Markov model (PHMM), describing statistical properties of images is proposed. The PHMM approach was evaluated using a set of isolated handwritten digits. An overall digit recognition accuracy of 95% was achieved. It is expected that the advantage of this approach will be even more significant for harder tasks, such cursive-writing recognition and spotting. >

...read moreread less

Hidden Markov models, maximum mutual information estimation, and the speech recognition problem

[...]

Yves Normandin¹•Institutions (1)

McGill University¹

01 Jan 1992

TL;DR: This work argues that the maximum mutual information estimation (MMIE) formulation for training is more appropriate vis-a-vis maximum likelihood estimation ( MLE) for reducing the error rate and proposes reestimation formulas for the case of diagonal Gaussian densities, experimentally demonstrate their convergence properties, and integrate them into the training algorithm.

...read moreread less

Abstract: Hidden Markov Models (HMMs) are one of the most powerful speech recognition tools available today Even so, the inadequacies of HMMs as a "correct" modeling framework for speech are well known In that context, we argue that the maximum mutual information estimation (MMIE) formulation for training is more appropriate vis-a-vis maximum likelihood estimation (MLE) for reducing the error rate We also show how MMIE paves the way for new training possibilities We introduce Corrective MMIE training, a very efficient new training algorithm which uses a modified version of a discrete reestimation formula recently proposed by Gopalakrishnan et al We propose reestimation formulas for the case of diagonal Gaussian densities, experimentally demonstrate their convergence properties, and integrate them into our training algorithm In a connected digit recognition task, MMIE consistently improves the recognition performance of our recognizer

...read moreread less

Proceedings Article•DOI•

A successive state splitting algorithm for efficient allophone modeling

[...]

Jun-ichi Takami, Shigeki Sagayama

23 Mar 1992

TL;DR: The authors propose an algorithm, successive state splitting (SSS), for simultaneously finding an optimal set of phoneme context classes, an optimal topology, and optimal parameters for hidden Markov models (HMMs) commonly using a maximum likelihood criterion.

...read moreread less

Abstract: The authors propose an algorithm, successive state splitting (SSS), for simultaneously finding an optimal set of phoneme context classes, an optimal topology, and optimal parameters for hidden Markov models (HMMs) commonly using a maximum likelihood criterion. With this algorithm, a hidden Markov network (HM-Net), which is an efficient representation of phoneme-context-dependent HMMs, can be generated automatically. The authors implemented this algorithm, and tested it on the recognition of six Japanese consonants ( mod b mod , mod d mod , mod g mod , mod m mod , mod n mod and mod N mod ). The HM-Net gave better recognition results with a lower number of total output probability density distributions than conventional phoneme-context-independent mixture Gaussian density HMMs. >

...read moreread less

Patent•

Method for recognizing patterns in time-variant measurement signals

[...]

Klaus Zuenkler¹•Institutions (1)

Siemens¹

04 Sep 1992

TL;DR: In this article, a method for recognizing patterns in time-variant measurement signals is specified which permits an improved discrimination between such signals by reclassifying in pairs, the discrimination-relevant features being examined separately in a second step after the main classification.

...read moreread less

Abstract: In automatic speech recognition, confusion easily arises between phonetically similar words (for example, the German words "zwei" and "drei") in the case of previous recognition systems. Confusion of words which differ only in a single phoneme (for example, German phonemes "dem" and "den") occurs particularly easily with these recognition systems. In order to solve this problem, a method for recognizing patterns in time-variant measurement signals is specified which permits an improved discrimination between such signals by reclassifying in pairs. This method combines the Viterbi decoding algorithm with the method of hidden Markov models, the discrimination-relevant features being examined separately in a second step after the main classification. In this case, different components of feature vectors are weighted differently, it being the case that by contrast with known approaches these weightings are performed in a theoretically based way. The method is suitable, inter alia, for improving speech-recognizing systems.

...read moreread less

Proceedings Article•DOI•

The general use of tying in phoneme-based HMM speech recognisers

[...]

Steve Young¹•Institutions (1)

University of Cambridge¹

23 Mar 1992

TL;DR: A method of manipulating sets of hidden Markov models (HMMs) by applying various kinds of parameter tying operations is described, the aim being to synthesize compact and robust context dependent models.

...read moreread less

Abstract: A method of manipulating sets of hidden Markov models (HMMs) by applying various kinds of parameter tying operations is described, the aim being to synthesize compact and robust context dependent models. The method is illustrated via an experiment to build a set of generalized triphone models for the TIMIT database in which triphones are constructed by joining together left and right dependent biphones. Although simple, the method results in good performance and avoids the need to train large numbers of triphones. The use of tying to increase model robustness is also investigated. Tying the center states within triphones of the same phoneme class and tying variances within states is beneficial, but larger-scale tying of variances leads to degraded performance. >

...read moreread less

Journal Article•DOI•

Hidden Markov chains and the analysis of genome structure

[...]

Gary A. Churchill¹•Institutions (1)

Cornell University¹

01 Apr 1992-Computational Biology and Chemistry

TL;DR: Analysis of the structure of some small complete genomes and a human genome segment using a hidden Markov chain model finds a variety of discrete compositional domains and their correlations with genome function are explored.

...read moreread less

Proceedings Article•DOI•

Applying SPHINX-II to the DARPA Wall Street Journal CSR task

[...]

F. Alleva¹, Hsiao-Wuen Hon¹, Xuedong Huang¹, Mei-Yuh Hwang¹, Roni Rosenfeld¹, R. Weide¹ - Show less +2 more•Institutions (1)

Carnegie Mellon University¹

23 Feb 1992

TL;DR: This paper reports recent efforts to apply the speaker-independent SPHINX-II system to the DARPA Wall Street Journal continuous speech recognition task, which includes sex-dependent, semi-continuous, shared-distribution hidden Markov models and left context dependent between-word triphones.

...read moreread less

Abstract: This paper reports recent efforts to apply the speaker-independent SPHINX-II system to the DARPA Wall Street Journal continuous speech recognition task. In SPHINX-II, we incorporated additional dynamic and speaker-normalized features, replaced discrete models with sex-dependent semi-continuous hidden Markov models, augmented within-word triphones with between-word triphones, and extended generalized triphone models to shared-distribution models. The configuration of SPHINX-II being used for this task includes sex-dependent, semi-continuous, shared-distribution hidden Markov models and left context dependent between-word triphones. In applying our technology to this task we addressed issues that were not previously of concern owing to the (relatively) small size of the Resource Management task.

...read moreread less

Journal Article•DOI•

A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal

[...]

Li Deng¹•Institutions (1)

University of Waterloo¹

01 Apr 1992-Signal Processing

TL;DR: The trended HMM as discussed by the authors is a more faithful and structured representation of many classes of speech sounds whose production involves strong articulatory dynamics, and is expected to be a more suitable model for use in speech processing applications.

...read moreread less

Journal Article•DOI•

Bayesian learning for hidden Markov model with Gaussian mixture state observation densities

[...]

Jean-Luc Gauvain¹, Chin-Hui Lee¹•Institutions (1)

Bell Labs¹

01 Jun 1992

TL;DR: The approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters to enhance model robustness in a CDHMM-based speech recognition system.

...read moreread less

Abstract: An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering and corrective training. The goal is to enhance model robustness in a CDHMM-based speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and results applying it to parameter smoothing, speaker adaptation, speaker clustering and corrective training are given.

...read moreread less

Proceedings Article•DOI•

Subphonetic modeling with Markov states-Senone

[...]

Mei-Yuh Hwang¹, Xuedong Huang¹•Institutions (1)

Carnegie Mellon University¹

23 Mar 1992

TL;DR: Preliminary senone modeling results are reported, which have significantly reduced the word error rate for speaker-independent continuous speech recognition and treat the state in phonetic hidden Markov models as the basic subphonetic unit-senone.

...read moreread less

Abstract: There will never be sufficient training data to model all the various acoustic-phonetic phenomena. How to capture important clues and estimate those needed parameters reliably is one of the central issues in speech recognition. Successful examples include subword models, fenones and many other smoothing techniques. In comparison with subword models, subphonetic modeling may provide a finer level of details. The authors propose to model subphonetic events with Markov states and treat the state in phonetic hidden Markov models as the basic subphonetic unit-senone. Senones generalize fenones in several ways. A word model is a concatenation of senones and senones can be shared across different word models. Senone models not only allow parameter sharing, but also enable pronunciation optimization. The authors report preliminary senone modeling results, which have significantly reduced the word error rate for speaker-independent continuous speech recognition. >

...read moreread less

Journal Article•DOI•

Phoneme classification using semicontinuous hidden Markov models

[...]

X.D. Huang¹•Institutions (1)

University of Edinburgh¹

01 May 1992-IEEE Transactions on Signal Processing

TL;DR: Speaker-dependent phoneme recognition experiments were conducted using variants of the semicontinuous hidden Markov model (SCHMM) with explicit state duration modeling, and results clearly demonstrated that the SCHMM with state duration offers significantly improved phoneme classification accuracy.

...read moreread less

Abstract: Speaker-dependent phoneme recognition experiments were conducted using variants of the semicontinuous hidden Markov model (SCHMM) with explicit state duration modeling. Results clearly demonstrated that the SCHMM with state duration offers significantly improved phoneme classification accuracy compared to both the discrete HMM and the continuous HMM; the error rate was reduced by more than 30% and 20%, respectively. The use of a limited number of mixture densities significantly reduced the amount of computation. Explicit state duration modeling further reduced the error rate. >

...read moreread less

Proceedings Article•DOI•

Modeling state durations in hidden Markov models for automatic speech recognition

[...]

P. Ramesh¹, Jay G. Wilpon¹•Institutions (1)

Bell Labs¹

23 Mar 1992

TL;DR: The authors present a way of modeling state durations in HMM using time-dependent state transitions and a suboptimal implementation of this scheme that requires no more computation than the traditional HMM but reduces recognition error rates by 14-25%.

...read moreread less

Abstract: Hidden Markov modeling (HMM) techniques have been used successfully for connected speech recognition in the last several years. In the traditional HMM algorithms, the probability of duration of a state decreases exponentially with time which is not appropriate for representing the temporal structure of speech. Non-parametric modeling of duration using semi-Markov chains does accomplish the task with a large increase in the computational complexity. Applying a postprocessing state duration penalty after Viterbi decoding adds very little computation but does not affect the forward recognition path. The authors present a way of modeling state durations in HMM using time-dependent state transitions. This inhomogeneous HMM (IHMM) does increase the computation by a small amount but reduces recognition error rates by 14-25%. Also, a suboptimal implementation of this scheme that requires no more computation than the traditional HMM is presented which also has reduced errors by 14-22% on a variety of databases. >

...read moreread less

Proceedings Article•DOI•

MAP estimation of continuous density HMM: theory and applications

[...]

Jean-Luc Gauvain¹, Chin-Hui Lee¹•Institutions (1)

Bell Labs¹

23 Feb 1992

TL;DR: Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training.

...read moreread less

Abstract: We discuss maximum a posteriori estimation of continuous density hidden Markov models (CDHMM). The classical MLE reestimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach.

...read moreread less

Proceedings Article•DOI•

CDNN: a context dependent neural network for continuous speech recognition

[...]

Hervé Bourlard, Nelson Morgan¹, Chuck Wooters¹, Steve Renals¹•Institutions (1)

International Computer Science Institute¹

23 Mar 1992

TL;DR: It is shown how, without any simplifying assumptions, one can estimate likelihoods for context-dependent phonetic models with nets that are not substantially larger than context-independent MLPs.

...read moreread less

Abstract: A series of theoretical and experimental results have suggested that multilayer perceptrons (MLPs) are an effective family of algorithms for the smooth estimate of highly dimensioned probability density functions that are useful in continuous speech recognition. All of these systems have exclusively used context-independent phonetic models, in the sense that the probabilities or costs are estimated for simple speech units such as phonemes or words, rather than biphones or triphones. Numerous conventional systems based on hidden Markov models (HMMs) have been reported that use triphone or triphone like context-dependent models. In one case the outputs of many context-dependent MLPs (one per context class) were used to help choose the best sentence from the N best sentences as determined by a context-dependent HMM system. It is shown how, without any simplifying assumptions, one can estimate likelihoods for context-dependent phonetic models with nets that are not substantially larger than context-independent MLPs. >

...read moreread less

Collapse