Showing papers on "Hidden Markov model published in 1994"

PDF

Open Access

Proceedings Article•DOI•

Parameterisation of a stochastic model for human face identification

[...]

F.S. Samaria¹, Andy Harter•Institutions (1)

05 Dec 1994

TL;DR: This paper presents a set of experimental results in which various HMM parameterisations are analysed and shows that stochastic modelling can be used successfully to encode feature information.

...read moreread less

Abstract: Recent work on face identification using continuous density Hidden Markov Models (HMMs) has shown that stochastic modelling can be used successfully to encode feature information. When frontal images of faces are sampled using top-bottom scanning, there is a natural order in which the features appear and this can be conveniently modelled using a top-bottom HMM. However, a top-bottom HMM is characterised by different parameters, the choice of which has so far been based on subjective intuition. This paper presents a set of experimental results in which various HMM parameterisations are analysed. >

...read moreread less

2,677 citations

Journal Article•DOI•

Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains

[...]

Jean-Luc Gauvain¹, Chin-Hui Lee²•Institutions (2)

Centre national de la recherche scientifique¹, AT&T²

01 Apr 1994-IEEE Transactions on Speech and Audio Processing

TL;DR: A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.

...read moreread less

Abstract: In this paper, a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely, the choice of prior distribution family, the specification of the parameters of prior densities, and the evaluation of the MAP estimates, are addressed. Using HMM's with Gaussian mixture state observation densities as an example, it is assumed that the prior densities for the HMM parameters can be adequately represented as a product of Dirichlet and normal-Wishart densities. The classical maximum likelihood estimation algorithms, namely, the forward-backward algorithm and the segmental k-means algorithm, are expanded, and MAP estimation formulas are developed. Prior density estimation issues are discussed for two classes of applications/spl minus/parameter smoothing and model adaptation/spl minus/and some experimental results are given illustrating the practical interest of this approach. Because of its adaptive nature, Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications. >

...read moreread less

2,430 citations

Book•

Hidden Markov Models: Estimation and Control

[...]

Robert J. Elliott, Lakhdar Aggoun, John B. Moore

16 Dec 1994

TL;DR: This paper presents a meta-modelling procedure called Markov Model Processing that automates the very labor-intensive and therefore time-heavy and therefore expensive process of HMMEstimation.

...read moreread less

Abstract: Hidden Markov Model Processing.- Discrete-Time HMM Estimation.- Discrete States and Discrete Observations.- Continuous-Range Observations.- Continuous-Range States and Observations.- A General Recursive Filter.- Practical Recursive Filters.- Continuous-Time HMM Estimation.- Discrete-Range States and Observations.- Markov Chains in Brownian Motion.- Two-Dimensional HMM Estimation.- Hidden Markov Random Fields.- HMM Optimal Control.- Discrete-Time HMM Control.- Risk-Sensitive Control of HMM.- Continuous-Time HMM Control.

...read moreread less

1,415 citations

Proceedings Article•DOI•

Tree-based state tying for high accuracy acoustic modelling

[...]

Steve Young¹, JJ Odell¹, Philip C. Woodland¹•Institutions (1)

University of Cambridge¹

08 Mar 1994

TL;DR: This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree, which is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones.

...read moreread less

Abstract: The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many such contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. This tree-based clustering is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. State-tying is also compared with traditional model-based tying and shown to be clearly superior. Experimental results are presented for both the Resource Management and Wall Street Journal tasks.

...read moreread less

781 citations

Journal Article•

Tagging English text with a probabilistic model

[...]

Bernard Merialdo¹•Institutions (1)

Institut Eurécom¹

01 Jun 1994-Computational Linguistics

TL;DR: Experminents show that the best training is obtained by using as much tagged text as possible, and show that Maximum Likelihood training, the procedure that is routinely used to estimate hidden Markov models parameters from training data, will not necessarily improve the tagging accuracy.

...read moreread less

Abstract: In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined:using text that has been tagged by hand and computing relative frequency counts,using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle.Experminents show that the best training is obtained by using as much tagged text as possible. They also show that Maximum Likelihood training, the procedure that is routinely used to estimate hidden Markov models parameters from training data, will not necessarily improve the tagging accuracy. In fact, it will generally degrade this accuracy, except when only a limited amount of hand-tagged text is available.

...read moreread less

586 citations

Journal Article•DOI•

An application of recurrent nets to phone probability estimation

[...]

A.J. Robinson¹•Institutions (1)

University of Cambridge¹

01 Mar 1994-IEEE Transactions on Neural Networks

TL;DR: Recognition results are presented for the DARPA TIMIT and Resource Management tasks, and it is concluded that recurrent nets are competitive with traditional means for performing phone probability estimation.

...read moreread less

Abstract: This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed; a role for which the recurrent net appears suitable. An overview of early developments of recurrent nets for phone recognition is given along with the more recent improvements that include their integration with Markov models. Recognition results are presented for the DARPA TIMIT and Resource Management tasks, and it is concluded that recurrent nets are competitive with traditional means for performing phone probability estimation. >

...read moreread less

497 citations

Journal Article•DOI•

Hidden Markov models of biological primary sequence information

[...]

Pierre Baldi¹, Yves Chauvin¹, Tim Hunkapiller¹, Marcella A. McClure¹•Institutions (1)

California Institute of Technology¹

01 Feb 1994-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family, yielding an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.

...read moreread less

Abstract: Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.

...read moreread less

475 citations

Proceedings Article•

An Input Output HMM Architecture

[...]

Yoshua Bengio¹, Paolo Frasconi²•Institutions (2)

Université de Montréal¹, University of Florence²

01 Jan 1994

TL;DR: A recurrent architecture having a modular structure that has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation is introduced.

...read moreread less

Abstract: We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation.

...read moreread less

344 citations

Journal Article•DOI•

HMM-based architecture for face identification

[...]

Ferdinando Samaria¹, Ferdinando Samaria², Steve Young¹•Institutions (2)

University of Cambridge¹, Olivetti²

01 Oct 1994-Image and Vision Computing

TL;DR: It is described how two-dimensional face images can be converted into one-dimensional sequences to allow similar techniques to be applied and how a HMM can be used to automatically segment face images and extract features that can be use for identification.

...read moreread less

343 citations

Book Chapter•DOI•

Markov random field models in computer vision

[...]

Stan Z. Li¹•Institutions (1)

Nanyang Technological University¹

07 May 1994

TL;DR: A unified approach for Markov Random Field Models modeling in low and high level computer vision is presented, made possible due to a recent advance in MRF modeling for high level object recognition.

...read moreread less

Abstract: A variety of computer vision problems can be optimally posed as Bayesian labeling in which the solution of a problem is defined as the maximum a posteriori (MAP) probability estimate of the true labeling. The posterior probability is usually derived from a prior model and a likelihood model. The latter relates to how data is observed and is problem domain dependent. The former depends on how various prior constraints are expressed. Markov Random Field Models (MRF) theory is a tool to encode contextual constraints into the prior probability. This paper presents a unified approach for MRF modeling in low and high level computer vision. The unification is made possible due to a recent advance in MRF modeling for high level object recognition. Such unification provides a systematic approach for vision modeling based on sound mathematical principles.

...read moreread less

284 citations

Journal Article•DOI•

Connectionist probability estimators in HMM speech recognition

[...]

Steve Renals, Nelson Morgan, Hervé Bourlard, Michael Cohen, Horacio Franco - Show less +1 more

01 Jan 1994-IEEE Transactions on Speech and Audio Processing

TL;DR: It is shown that a connectionist component improves a state-of-the-art HMM system through a statistical interpretation of connectionist networks as probability estimators.

...read moreread less

Abstract: The authors are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as probability estimators. They review the basis of HMM speech recognition and point out the possible benefits of incorporating connectionist networks. Issues necessary to the construction of a connectionist HMM recognition system are discussed, including choice of connectionist probability estimator. They describe the performance of such a system using a multilayer perceptron probability estimator evaluated on the speaker-independent DARPA Resource Management database. In conclusion, they show that a connectionist component improves a state-of-the-art HMM system. >

...read moreread less

Patent•DOI•

Single tree method for grammar directed, very large vocabulary speech recognizer

[...]

Richard Schwartz, Long Nguyen

19 Jan 1994-Journal of the Acoustical Society of America

TL;DR: The invention provides a method of large vocabulary speech recognition that employs a single tree-structured phonetic hidden Markov model (HMM) at each frame of a time-synchronous process, and phonetic context information is exploited, even before the complete context of a phoneme is known.

...read moreread less

Abstract: The invention provides a method of large vocabulary speech recognition that employs a single tree-structured phonetic hidden Markov model (HMM) at each frame of a time-synchronous process. A grammar probability is utilized upon recognition of each phoneme of a word, before recognition of the entire word is complete. Thus, grammar probabilities are exploited as early as possible during recognition of a word. At each frame of the recognition process, a grammar probability is determined for the transition from the most likely preceding grammar state to a set of words that share at least one common phoneme. The grammar probability is combined with accumulating phonetic evidence to provide a measure of the likelihood that a state in the HMM will lead to the word most likely to have been spoken. In a preferred embodiment, phonetic context information is exploited, even before the complete context of a phoneme is known. Instead of an exact triphone model, wherein the phonemes previous and subsequent to a phoneme are considered, a composite triphone model is used that exploits partial phonetic context information to provide a phonetic model that is more accurate than aphonetic model that ignores context. In another preferred embodiment, the single phonetic tree method is used as the forward pass of a forward/backward recognition process, wherein the backward pass employs a recognition process other than the single phonetic tree method.

...read moreread less

Journal Article•DOI•

Keyword spotting in poorly printed documents using pseudo 2-D hidden Markov models

[...]

Shyh-Shiaw Kuo¹, O.E. Agazzi•Institutions (1)

Bell Labs¹

01 Aug 1994-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented, where two statistical models, called pseudo 2-D hidden Markov models, are created for representing the actual keyword and all the other extraneous words, respectively.

...read moreread less

Abstract: An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented For each keyword, two statistical models, called pseudo 2-D hidden Markov models, are created for representing the actual keyword and all the other extraneous words, respectively Dynamic programming is then used for matching an unknown input word with the two models and for making a maximum likelihood decision Although the models are pseudo 2-D in the sense that they are not fully connected 2-D networks, they are shown to be general enough in characterizing printed words efficiently These models facilitate a nice "elastic matching" property in both horizontal and vertical directions, which makes the recognizer not only independent of size and slant but also tolerant of highly deformed and noisy words The system is evaluated on a synthetically created database that contains about 26000 words Currently, the authors achieve a recognition accuracy of 99% when words in testing and training sets are of the same font size, and 96% when they are in different sizes In the latter case, the conventional 1-D HMM achieves only a 70% accuracy rate >

...read moreread less

Journal Article•DOI•

Off-line handwritten word recognition using a hidden Markov model type stochastic network

[...]

Mou-Yen Chen¹, A. Kundu¹, Jian Zhou¹•Institutions (1)

State University of New York System¹

01 May 1994-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A complete scheme for totally unconstrained handwritten word recognition based on a single contextual hidden Markov model type stochastic network is presented, which includes a morphology and heuristics based segmentation algorithm, a training algorithm that can adapt itself with the changing dictionary.

...read moreread less

Abstract: Because of large variations involved in handwritten words, the recognition problem is very difficult. Hidden Markov models (HMM) have been widely and successfully used in speech processing and recognition. Recently HMM has also been used with some success in recognizing handwritten words with presegmented letters. In this paper, a complete scheme for totally unconstrained handwritten word recognition based on a single contextual hidden Markov model type stochastic network is presented. Our scheme includes a morphology and heuristics based segmentation algorithm, a training algorithm that can adapt itself with the changing dictionary, and a modified Viterbi algorithm which searches for the (l+1)th globally best path based on the previous l best paths. Detailed experiments are carried out and successful recognition results are reported. >

...read moreread less

Posted Content•

Inducing Probabilistic Grammars by Bayesian Model Merging

[...]

Andreas Stolcke¹, Stephen M. Omohundro¹•Institutions (1)

Institute of Company Secretaries of India¹

13 Sep 1994-arXiv: Computation and Language

TL;DR: In this paper, the authors describe a framework for inducing probabilistic grammars from corpora of positive samples, where samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation.

...read moreread less

Abstract: We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are {\em incorporated} by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are {\em merged} to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (`Occam's Razor'). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based $n$-grams, and stochastic context-free grammars.

...read moreread less

Journal Article•DOI•

Document image decoding using Markov source models

[...]

Gary E. Kopec¹, Philip A. Chou¹•Institutions (1)

PARC¹

01 Jun 1994-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed approach is illustrated on the problem of decoding scanned telephone yellow pages to extract names and numbers from the listings by constructing a finite-state model for yellow page columns using a Viterbi-like dynamic programming algorithm.

...read moreread less

Abstract: Document image decoding (DID) is a communication theory approach to document image recognition. In DID, a document recognition problem is viewed as consisting of three elements: an image generator, a noisy channel and an image decoder. A document image generator is a Markov source (stochastic finite-state automaton) that combines a message source with an imager. The message source produces a string of symbols, or text, that contains the information to be transmitted. The imager is modeled as a finite-state transducer that converts the 1D message string into an ideal 2D bitmap. The channel transforms the ideal image into a noisy observed image. The decoder estimates the message, given the observed image, by finding the a posteriori most probable path through the combined source and channel models using a Viterbi-like dynamic programming algorithm. The proposed approach is illustrated on the problem of decoding scanned telephone yellow pages to extract names and numbers from the listings. A finite-state model for yellow page columns was constructed and used to decode a database of scanned column images containing about 1100 individual listings. >

...read moreread less

Journal Article•DOI•

An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition

[...]

Yunxin Zhao¹•Institutions (1)

Panasonic¹

01 Jul 1994-IEEE Transactions on Speech and Audio Processing

TL;DR: Experiments of speaker adaptation on the TIMIT database using short calibration speech have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units.

...read moreread less

Abstract: A new speaker adaptation technique is proposed for improving speaker-independent continuous speech recognition based on a decomposition of spectral variation sources. In this technique, the spectral variations are separated into two categories, one acoustic and the other phone-specific, where each variation source is modeled by a linear transformation system. The technique consists of two sequential steps: first, acoustic normalization is performed, and second, phone model parameters are adapted. Experiments of speaker adaptation on the TIMIT database using short calibration speech (5 s per speaker) have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units. For a vocabulary size of 853 and test set perplexity of 104, the recognition word accuracy has been improved from 86.9% for the baseline system to 90.5% after adaptation, corresponding to an error reduction of 27.5%. On a more difficult test set that contains an additional variation source due to recording channel mismatch, a more significant performance improvement has been obtained: for the same vocabulary and a test set perplexity of 101, the recognition word accuracy has been improved from 65.4% for the baseline to 86.0% after adaptation, corresponding to an error reduction of 59.5%. >

...read moreread less

Book Chapter•DOI•

Inducing Probabilistic Grammars by Bayesian Model Merging

[...]

Andreas Stolcke¹, Stephen M. Omohundro¹•Institutions (1)

International Computer Science Institute¹

21 Sep 1994

TL;DR: A framework for inducing probabilistic grammars from corpora of positive samples is described, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (‘Occam's Razor’).

...read moreread less

Abstract: We describe a framework for inducing probabilistic grammars from corpora of positive samples First, samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (‘Occam's Razor’) The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based n-grams, and stochastic context-free grammars

...read moreread less

Journal Article•DOI•

Rotation and gray scale transform invariant texture identification using wavelet decomposition and hidden Markov model

[...]

Jia-Lin Chen, A. Kundu

01 Feb 1994-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A rotation and gray scale transform invariant texture recognition scheme using the combination of quadrature mirror filter (QMF) bank and hidden Markov model (HMM) to capture the trend of changes caused by rotation.

...read moreread less

Abstract: In this correspondence, we have presented a rotation and gray scale transform invariant texture recognition scheme using the combination of quadrature mirror filter (QMF) bank and hidden Markov model (HMM). In the first stage, the QMF bank is used as the wavelet transform to decompose the texture image into subbands. The gray scale transform invariant features derived from the statistics based on first-order distribution of gray levels are then extracted from each subband image. In the second stage, the sequence of subbands is modeled as a hidden Markov model (HMM), and one HMM is designed for each class of textures. The HMM is used to exploit the dependence among these subbands, and is able to capture the trend of changes caused by rotation. During recognition, the unknown texture is matched against all the models. The best matched model identifies the texture class. Up to 93.33% classification accuracy is reported. >

...read moreread less

Journal Article•DOI•

A robust algorithm for word boundary detection in the presence of noise

[...]

J.-C. Junqua¹, Brian Mak, B. Reaves•Institutions (1)

Panasonic¹

01 Jul 1994-IEEE Transactions on Speech and Audio Processing

TL;DR: This new algorithm identifies islands of reliability (essentially the portion of speech contained between the first and the last vowel) using time and frequency-based features and then applies a noise adaptive procedure to refine the boundaries.

...read moreread less

Abstract: The authors address the problem of automatic word boundary detection in quiet and in the presence of noise. Attention has been given to automatic word boundary detection for both additive noise and noise-induced changes in the talker's speech production (Lombard reflex). After a comparison of several automatic word boundary detection algorithms in different noisy-Lombard conditions, they propose a new algorithm that is robust in the presence of noise. This new algorithm identifies islands of reliability (essentially the portion of speech contained between the first and the last vowel) using time and frequency-based features and then, after a noise classification, applies a noise adaptive procedure to refine the boundaries. It is shown that this new algorithm outperforms the commonly used algorithm developed by Lamel (1981) et al. and several other recently developed methods. They evaluated the average recognition error rate due to word boundary detection in an HMM-based recognition system across several signal-to-noise ratios and noise conditions. The recognition error rate decreased to about 20% compared to an average of approximately 50% obtained with a modified version of the Lamel et al. algorithm. >

...read moreread less

Journal Article•DOI•

Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states

[...]

Li Deng¹, M. Aksmanovic¹, Xiaodong Sun¹, C.F.J. Wu¹•Institutions (1)

University of Waterloo¹

01 Jan 1994-IEEE Transactions on Speech and Audio Processing

TL;DR: The authors develop an efficient dynamic programming technique which includes the state sojourn time as an optimization variable, in conjunction with a state-dependent orthogonal polynomial regression method, for estimating the model parameters.

...read moreread less

Abstract: Proposes, implements, and evaluates a class of nonstationary-state hidden Markov models (HMMs) having each state associated with a distinct polynomial regression function of time plus white Gaussian noise. The model represents the transitional acoustic trajectories of speech in a parametric manner, and includes the standard stationary-state HMM as a special, degenerated case. The authors develop an efficient dynamic programming technique which includes the state sojourn time as an optimization variable, in conjunction with a state-dependent orthogonal polynomial regression method, for estimating the model parameters. Experiments on fitting models to speech data and on limited-vocabulary speech recognition demonstrate consistent superiority of these nonstationary-state HMMs over the traditional stationary-state HMMs. >

...read moreread less

Proceedings Article•

Visual Speech Recognition with Stochastic Networks

[...]

Javier R. Movellan¹•Institutions (1)

University of California, San Diego¹

01 Jan 1994

TL;DR: The results indicate that simple hidden Markov models may be used to successfully recognize relatively unprocessed image sequences, and the system achieved performance levels equivalent to untrained humans when asked to recognize the first four English digits.

...read moreread less

Abstract: This paper presents ongoing work on a speaker independent visual speech recognition system. The work presented here builds on previous research efforts in this area and explores the potential use of simple hidden Markov models for limited vocabulary, speaker independent visual speech recognition. The task at hand is recognition of the first four English digits, a task with possible applications in car-phone dialing. The images were modeled as mixtures of independent Gaussian distributions, and the temporal dependencies were captured with standard left-to-right hidden Markov models. The results indicate that simple hidden Markov models may be used to successfully recognize relatively unprocessed image sequences. The system achieved performance levels equivalent to untrained humans when asked to recognize the first four English digits.

...read moreread less

Proceedings Article•DOI•

Does Baum-Welch Re-estimation Help Taggers?

[...]

David Elworthy

13 Oct 1994

TL;DR: Two experiments designed to determine how much manual training information is needed for speech tagging by Hidden Markov Model suggest that initial biasing of either lexical or transition probabilities is essential to achieve a good accuracy and reveal three distinct patterns of Baum-Welch reestimation.

...read moreread less

Abstract: In part of speech tagging by Hidden Markov Model, a statistical model is used to assign grammatical categories to words in a text. Early work in the field relied on a corpus which had been tagged by a human annotator to train the model. More recently, Cutting et al. (1992) suggest that training can be achieved with a minimal lexicon and a limited amount of a priori information about probabilities, by using an Baum-Welch re-estimation to automatically refine the model. In this paper, I report two experiments designed to determine how much manual training information is needed. The first experiment suggests that initial biasing of either lexical or transition probabilities is essential to achieve a good accuracy. The second experiment reveals that there are three distinct patterns of Baum-Welch reestimation. In two of the patterns, the re-estimation ultimately reduces the accuracy of the tagging rather than improving it. The pattern which is applicable can be predicted from the quality of the initial model and the similarity between the tagged training corpus (if any) and the corpus to be tagged. Heuristics for deciding how to use re-estimation in an effective manner are given. The conclusions are broadly in agreement with those of Merialdo (1994), but give greater detail about the contributions of different parts of the model.

...read moreread less

Proceedings Article•DOI•

Speaker adaptation of tied-mixture-based phoneme models for text-prompted speaker recognition

[...]

Tomoko Matsui, Sadaoki Furui

19 Apr 1994

TL;DR: A new method of creating speaker-specific phoneme models consisting of tied-mixture HMMs and adapts the feature space of the tied- mixtures to that of the speaker through phoneme-dependent/independent iterative training is proposed.

...read moreread less

Abstract: Speaker adaptation methods for tied-mixture-based phoneme models are investigated for text-prompted speaker recognition. For this type of speaker recognition, speaker-specific phoneme models are essential for verifying both the key text and the speaker. This paper proposes a new method of creating speaker-specific phoneme models. This uses speaker-independent (universal) phoneme models consisting of tied-mixture HMMs and adapts the feature space of the tied-mixtures to that of the speaker through phoneme-dependent/independent iterative training. Therefore, it can adapt models of phonemes that have a small amount of training data to the speaker. The proposed method was tested using 15 speakers' voices recorded over 10 months and achieved a speaker and text verification rate of 99.4% even when both the voices of different speakers and different texts uttered by the true speaker were to be rejected. >

...read moreread less

Posted Content•

Best-first Model Merging for Hidden Markov Model Induction.

[...]

Andreas Stolcke¹, Stephen M. Omohundro¹•Institutions (1)

Institute of Company Secretaries of India¹

10 May 1994-arXiv: Computation and Language

TL;DR: A new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy, and how the algorithm was incorporated in an operational speech understanding system, where it was combined with neural network acoustic likelihood estimators to improve performance over single-pronunciation word models.

...read moreread less

Abstract: This report describes a new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy (Omohundro 1992). The process begins with a maximum likelihood HMM that directly encodes the training data. Successively more general models are produced by merging HMM states. A Bayesian posterior probability criterion is used to determine which states to merge and when to stop generalizing. The procedure may be considered a heuristic search for the HMM structure with the highest posterior probability. We discuss a variety of possible priors for HMMs, as well as a number of approximations which improve the computational efficiency of the algorithm. We studied three applications to evaluate the procedure. The first compares the merging algorithm with the standard Baum-Welch approach in inducing simple finite-state languages from small, positive-only training samples. We found that the merging procedure is more robust and accurate, particularly with a small amount of training data. The second application uses labelled speech data from the TIMIT database to build compact, multiple-pronunciation word models that can be used in speech recognition. Finally, we describe how the algorithm was incorporated in an operational speech understanding system, where it is combined with neural network acoustic likelihood estimators to improve performance over single-pronunciation word models.

...read moreread less

Patent•DOI•

Method and apparatus for voice-interactive language instruction

[...]

Dimitry Rtischev¹, Jared Bernstein¹, George T. Chen¹, John Butzberger¹•Institutions (1)

SRI International¹

08 Mar 1994-Journal of the Acoustical Society of America

TL;DR: In this article, a finite state grammar set corresponding to the range of word sequence patterns in the lesson is employed as a constraint on a hidden Markov model (HMM) search apparatus in an HMM speech recognizer.

...read moreread less

Abstract: Spoken-language instruction method and apparatus employ context-based speech recognition for instruction and evaluation. A finite state grammar set (113) corresponding to the range of word sequence patterns in the lesson is employed as a constraint on a hidden Markov model (HMM) search apparatus in an HMM speech recognizer (112). The invention includes a system with an interactive decision mechanism which employs at least three levels of error tolerance to simulate a natural level of patience in human-based interactive instruction. A linguistically-sensitive utterance endpoint detector is provided for judging termination of a spoken utterance to simulate human turn-taking in conversational speech.

...read moreread less

Proceedings Article•

Automatic word recognition based on second-order hidden Markov models.

[...]

Jean-François Mari, Jean-Paul Haton

01 Jan 1994

Journal Article•DOI•

Smooth on-line learning algorithms for hidden Markov models

[...]

Pierre Baldi¹, Yves Chauvin²•Institutions (2)

California Institute of Technology¹, Stanford University²

01 Mar 1994-Neural Computation

TL;DR: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations, proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent.

...read moreread less

Abstract: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations. Unlike other classical algorithms such as the Baum-Welch algorithm, the algorithms described are smooth and can be used on-line (after each example presentation) or in batch mode, with or without the usual Viterbi most likely path approximation. The algorithms have simple expressions that result from using a normalized-exponential representation for the HMM parameters. All the algorithms presented are proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent. These algorithms can also be casted in the more general EM (Expectation-Maximization) framework where they can be viewed as exact or approximate GEM (Generalized Expectation-Maximization) algorithms. The mathematical properties of the algorithms are derived in the appendix.

...read moreread less

Journal Article•DOI•

Tone recognition of continuous mandarin speech based on hidden markov model

[...]

Yih-Ru Wang¹, Jyh-Ming Shieh¹, Sin-Horng Chen¹•Institutions (1)

National Chiao Tung University¹

01 Feb 1994-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: A scheme using two-layer network is proposed to cope with the difficulty resulting from the declination effect on the F0 contour of the declarative sentential utterance and the coarticulation effect coming from neighboring syllables is considered in the scheme using context-dependent model.

...read moreread less

Abstract: In this paper, several tone recognition schemes for continuous Mandarin speech are discussed. First, an SCHMM is used to model the acoustic features of a syllable for tone discrimination. Parameters extracted from the F0 and energy contours of the syllable by discrete Legendre orthonormal transform are used as the recognition features. Then, a scheme using two-layer network is proposed to cope with the difficulty resulting from the declination effect on the F0 contour of the declarative sentential utterance. The declination effect is modeled by a sentence-level HMM on the upper layer and the acoustic features of each tone are modeled by a state-dependent SCHMM on the lower layer. Lastly, the coarticulation effect coming from neighboring syllables is considered in the scheme using context-dependent model. Performance of these recognition schemes was examined by simulations. A recognition rate of 86.34% was achieved.

...read moreread less

Journal Article•DOI•

On-line cursive script recognition using time delay neural networks and hidden Markov models

[...]

M. Schenkel¹, Isabelle Guyon², D. Henderson³•Institutions (3)

ETH Zurich¹, École Polytechnique Fédérale de Lausanne², Bell Labs³

19 Apr 1994

TL;DR: In this article, a time delay neural network with local connections and shared weights is used to estimate a posteriori probabilities for characters in a word and a hidden Markov model segments the word into characters, which optimizes the global word score, taking a dictionary into account.

...read moreread less

Abstract: Presents a writer independent system for on-line handwriting recognition which can handle both cursive script and hand-print. The pen trajectory is recorded by a touch sensitive pad, such as those used by note-pad computers. The input to the system contains the pen trajectory information, encoded as a time-ordered sequence of feature vectors. Features include X and Y coordinates, pen-lifts, speed, direction and curvature of the pen trajectory. A time delay neural network with local connections and shared weights is used to estimate a posteriori probabilities for characters in a word. A hidden Markov model segments the word into characters in a way which optimizes the global word score, taking a dictionary into account. A geometrical normalization scheme and a fast but efficient dictionary search are also presented. Trained on 20000 unconstrained cursive words from 59 writers and using a 25000 word dictionary the authors reached a 89% character and 80% word recognition rate on test data from a disjoint set of writers. >

...read moreread less

Collapse