scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 1990"


Journal ArticleDOI
TL;DR: A translation-invariant back-propagation network is described that performs better than a sophisticated continuous acoustic parameter hidden Markov model on a noisy, 100-speaker confusable vocabulary isolated word recognition task.

635 citations


Proceedings ArticleDOI
03 Apr 1990
TL;DR: A technique of signal decomposition using hidden Markov models is described that provides an optimal method of decomposing simultaneous processes and has wide implications for signal separation in general and improved speech modeling in particular.
Abstract: The problem of automatic speech recognition in the presence of interfering signals and noise with statistical characteristics ranging from stationary to fast changing and impulsive is discussed. A technique of signal decomposition using hidden Markov models is described. This is a generalization of conventional hidden Markov modeling that provides an optimal method of decomposing simultaneous processes. The technique exploits the ability of hidden Markov models to model dynamically varying signals in order to accommodate concurrent processes, including interfering signals as complex as speech. This form of signal decomposition has wide implications for signal separation in general and improved speech modeling in particular. The application of decomposition to the problem of recognition of speech contaminated with noise is emphasized. >

530 citations


Journal ArticleDOI
TL;DR: A new approach to ECG arrhythmia analysis is described, based on hidden Markov modeling (HMM), a technique successfully used since the mid 1970s to model speech waveforms for automatic speech recognition.
Abstract: A new approach to ECG arrhythmia analysis is described. It is based on hidden Markov modeling (HMM), a technique successfully used since the mid 1970s to model speech waveforms for automatic speech recognition. Many ventricular arrhythmias can be classified by detecting and analyzing QRS complexes and determining R-R intervals. Classification of supraventricular arrhythmias, however, often requires detection of the P wave in addition to the QRS complex. The HMM approach combines structural and statistical knowledge of the ECG signal in a single parametric model. Model parameters are estimated from training data using an iterative, maximum-likelihood reestimation algorithm. Initial results suggest that this approach can provide improved supraventricular arrhythmia analysis through accurate representation of the entire beat, including the P-wave. >

527 citations


Proceedings ArticleDOI
03 Apr 1990
TL;DR: A speaker-independent hidden Markov model (HMM) keyword recognizer (KWR) based on a continuous-speech-recognition model is presented and techniques for dealing with nonkeyword speech and linear channel effects are discussed.
Abstract: A speaker-independent hidden Markov model (HMM) keyword recognizer (KWR) based on a continuous-speech-recognition model is presented. The baseline keyword recognition system is described, and techniques for dealing with nonkeyword speech and linear channel effects are discussed. The training of acoustic models to provide an explicit representation of nonvocabulary speech is investigated. A likelihood ratio scoring procedure is used to account for sources of variability affecting keyword likelihood scores. An acoustic class-dependent spectral normalization procedure is used to provide explicit compensation for linear channel effects. Keyword recognition results for a standard conversational speech task with a 20-keyword vocabulary reach 82% probability of detection at a false alarm rate of 12 false alarms per keyword per hour. >

498 citations


Journal ArticleDOI
TL;DR: SPHINX is a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition, based on discrete hidden Markov models with LPC- (linear-predictive-coding) derived parameters.
Abstract: A description is given of SPHINX, a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition. SPHINX is based on discrete hidden Markov models (HMMs) with LPC- (linear-predictive-coding) derived parameters. To provide speaker independence, knowledge was added to these HMMs in several ways: multiple codebooks of fixed-width parameters, and an enhanced recognizer with carefully designed models and word-duration modeling. To deal with coarticulation in continuous speech, yet still adequately represent a large vocabulary, two new subword speech units are introduced: function-word-dependent phone models and generalized triphone models. With grammars of perplexity 997, 60, and 20, SPHINX attained word accuracies of 71, 94, and 96%, respectively, on a 997-word task. >

487 citations


Journal ArticleDOI
TL;DR: The authors discuss and document a parameter estimation algorithm for data sequence modeling involving hidden Markov models that uses the state-optimized joint likelihood for the observation data and the underlying Markovian state sequence as the objective function for estimation.
Abstract: The authors discuss and document a parameter estimation algorithm for data sequence modeling involving hidden Markov models. The algorithm, called the segmental K-means method, uses the state-optimized joint likelihood for the observation data and the underlying Markovian state sequence as the objective function for estimation. The authors prove the convergence of the algorithm and compare it with the traditional Baum-Welch reestimation method. They also print out the increased flexibility this algorithm offers in the general speech modeling framework. >

473 citations


Journal ArticleDOI
TL;DR: The modifications made to a connected word speech recognition algorithm based on hidden Markov models which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described.
Abstract: The modifications made to a connected word speech recognition algorithm based on hidden Markov models (HMMs) which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described. The novelty of this approach is that statistical models of both the actual vocabulary word and the extraneous speech and background are created. An HMM-based connected word recognition system is then used to find the best sequence of background, extraneous speech, and vocabulary word models for matching the actual input. Word recognition accuracy of 99.3% on purely isolated speech (i.e., only vocabulary items and background noise were present), and 95.1% when the vocabulary word was embedded in unconstrained extraneous speech, were obtained for the five word vocabulary using the proposed recognition algorithm. >

472 citations


Journal ArticleDOI
Hervé Bourlard1, C. Wellekens1
TL;DR: It is shown theoretically and experimentally that the outputs of the MLP approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities.
Abstract: The statistical use of a particular classic form of a connectionist system, the multilayer perceptron (MLP), is described in the context of the recognition of continuous speech. A discriminant hidden Markov model (HMM) is defined, and it is shown how a particular MLP with contextual and extra feedback input units can be considered as a general form of such a Markov model. A link between these discriminant HMMs, trained along the Viterbi algorithm, and any other approach based on least mean square minimization of an error function (LMSE) is established. It is shown theoretically and experimentally that the outputs of the MLP (when trained along the LMSE or the entropy criterion) approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities. Results of a series of speech recognition experiments are reported. The possibility of embedding MLP into HMM is described. Relations with other recurrent networks are also explained. >

400 citations


Journal ArticleDOI
TL;DR: The use of hidden Markov models (HMMs) in continuous speech recognition is reviewed and a unified view is offered in which both linguistic decoding and acoustic matching are integrated into a single, optimal network search framework.
Abstract: The use of hidden Markov models (HMMs) in continuous speech recognition is reviewed. Markov models are presented as a generalization of their predecessor technology, dynamic programming. A unified view is offered in which both linguistic decoding and acoustic matching are integrated into a single, optimal network search framework. Advances in recognition architectures are discussed. The fundamentals of Viterbi beam search, the dominant search algorithm used today in speed recognition, are presented. Approaches to estimating the probabilities associated with an HMM model are examined. The HMM-supervised training paradigm is examined. Several examples of successful HMM-based speech recognition systems are reviewed. >

321 citations


Journal ArticleDOI
Jerome R. Bellegarda1, David Nahamoo1
TL;DR: A class of very general hidden Markov models which can accommodate feature vector sequences lying either in a discrete or in a continuous space is considered; the new class allows one to represent the prototypes in an assumption-limited, yet convenient way, as tied mixtures of simple multivariate densities.
Abstract: The acoustic-modeling problem in automatic speech recognition is examined with the goal of unifying discrete and continuous parameter approaches. To model a sequence of information-bearing acoustic feature vectors which has been extracted from the speech waveform via some appropriate front-end signal processing, a speech recognizer basically faces two alternatives: (1) assign a multivariate probability distribution directly to the stream of vectors, or (2) use a time-synchronous labeling acoustic processor to perform vector quantization on this stream, and assign a multinomial probability distribution to the output of the vector quantizer. With a few exceptions, these two methods have traditionally been given separate treatment. A class of very general hidden Markov models which can accommodate feature vector sequences lying either in a discrete or in a continuous space is considered; the new class allows one to represent the prototypes in an assumption-limited, yet convenient way, as tied mixtures of simple multivariate densities. Speech recognition experiments, reported for two (5000- and 20000-word vocabulary) office correspondence tasks, demonstrate some of the benefits associated with this technique. >

285 citations


Book ChapterDOI
TL;DR: Two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure.
Abstract: Context-dependent phone models are applied to speaker-independent continuous speech recognition and shown to be effective in this domain. Several previously proposed context-dependent models are evaluated, and two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure. The subword clustering procedure used for generalized triphones can find the optimal number of models, given a fixed amount of training data. It is shown that context-dependent modeling reduces the error rate by as much as 60%. >

Journal ArticleDOI
TL;DR: In this article, a semi-continuous hidden Markov model with the continuous output probability density functions sharing in a mixture Gaussian density codebook is proposed, which can be considered as a special form of continuous mixture HMM model.

Book ChapterDOI
01 Oct 1990
TL;DR: In this article, an abstract stochastic algorithm for combinatorial optimization problems is proposed, which generalizes and unifies genetic algorithms and simulated annealing, such that any GA or SA algorithm at hand is an instance of the abstract algorithm.
Abstract: In this paper we are trying to make a step towards a concise theory of genetic algorithms (GAs) and simulated annealing (SA). First, we set up an abstract stochastic algorithm for treating combinatorial optimization problems. This algorithm generalizes and unifies genetic algorithms and simulated annealing, such that any GA or SA algorithm at hand is an instance of our abstract algorithm. Secondly, we define the evolution belonging to the abstract algorithm as a Markov chain and find conditions implying that the evolution finds an optimum with probability 1. The results obtained can be applied when designing the components of a genetic algorithm.

PatentDOI
TL;DR: In this article, a method for efficient pruning which reduces central processing unit loading during real-time speech recognition was proposed. But this back-propagation increases CPU loading and is alleviated by referring the backpointer of a state within a model to its start state.
Abstract: A method for efficient pruning which reduces central processing unit loading during real time speech recognition. A CPU uses a predetermined threshold for discarding not useful or necessary information. Useful information is stored in an available scoring buffer slot. A slot is said to be available if its last-time field does not equal the current time-index. To prevent pruning of a slot in the best path, the current time-index has to be propagated to all slots in the best path. This back-propagation increases CPU loading and is alleviated by referring the backpointer of a state within a model to its start state.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A phoneme based, speaker-dependent continuous-speech recognition system embedding a multilayer perceptron (MLP) into a hidden Markov model (HMM) approach is described, which appears to be somewhat better when MLP methods are used to estimate the probabilities.
Abstract: A phoneme based, speaker-dependent continuous-speech recognition system embedding a multilayer perceptron (MLP) (i.e. a feedforward artificial neural network) into a hidden Markov model (HMM) approach is described. Contextual information from a sliding window on the input frames is used to improve frame or phoneme classification performance over the corresponding performance for simple maximum-likelihood probabilities, or even maximum a posteriori (MAP) probabilities which are estimated without the benefit of context. Performance for a simple discrete density HMM system appears to be somewhat better when MLP methods are used to estimate the probabilities. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: An approach to implementing spoken language systems that takes full advantage of syntactic and semantic constraints provided by a natural language processing component in the speed understanding task and provides a tractable search space is discussed.
Abstract: An approach to implementing spoken language systems is discussed. This approach takes full advantage of syntactic and semantic constraints provided by a natural language processing component in the speed understanding task and provides a tractable search space. The results indicate that the approach is a promising one for large-vocabulary spoken language systems. Parse times within a factor of 20 of real time are achieved for high-perplexity syntactic grammars with resulting hidden Markov model recognition computational requirements (2500 active words/frame) that are well within the capability of high-speed multiprocessor computers or special-purpose speech recognition hardware. >

Journal ArticleDOI
TL;DR: In this article, the authors used a first-order, finite-state, discrete-time Markov process to extract small, single channel ion currents from background noise, which can be used to detect signals that do not conform to a firstorder Markov model, but the method is less accurate when the background noise is not white.
Abstract: Techniques for extracting small, single channel ion currents from background noise are described and tested. It is assumed that single channel currents are generated by a first-order, finite-state, discrete-time, Markov process to which is added `white' background noise from the recording apparatus (electrode, amplifiers, etc.). Given the observations and the statistics of the background noise, the techniques described here yield a posteriori estimates of the most likely signal statistics, including the Markov model state transition probabilities, duration (open- and closed-time) probabilities, histograms, signal levels, and the most likely state sequence. Using variations of several algorithms previously developed for solving digital estimation problems, we have demonstrated that: (1) artificial, small, first-order, finite-state, Markov model signals embedded in simulated noise can be extracted with a high degree of accuracy, (2) processing can detect signals that do not conform to a first-order Markov model but the method is less accurate when the background noise is not white, and (3) the techniques can be used to extract from the baseline noise single channel currents in neuronal membranes. Some studies have been included to test the validity of assuming a first-order Markov model for biological signals. This method can be used to obtain directly from digitized data, channel characteristics such as amplitude distributions, transition matrices and open- and closed-time durations.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large-vocabulary continuous-speech recognition is introduced, and the usual probabilistic spectrum transformation can be applied to the reference HMM to model a new speaker.
Abstract: A technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large-vocabulary continuous-speech recognition is introduced. In contrast to other methods that use a pooled reference model, this technique normalizes the training speech from multiple reference speakers to a single common feature space before pooling it. The normalized and pooled speech is then treated as if it came from a single reference speaker for training the reference hidden Markov model (HMM). The usual probabilistic spectrum transformation can be applied to the reference HMM to model a new speaker. Preliminary experimental results are reported from applying this approach to over 100 reference speakers from the speaker-independent portion of the DARPA 1000-Word Resource Management Database. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A successfully implemented real-time Mandarin dictation machine which recognizes Mandarin speech with unlimited texts and very large vocabulary for the input of Chinese characters to computers is described.
Abstract: A successfully implemented real-time Mandarin dictation machine which recognizes Mandarin speech with unlimited texts and very large vocabulary for the input of Chinese characters to computers is described. Isolated syllables including the tones are first recognized using specially trained hidden Markov models with special feature parameters. The exact characters are then identified from the syllables using a Markov Chinese language model. The real-time implementation is on an IBM PC/AT, connected to a set of special hardware boards on which ten TMS 320C25 chips operate in parallel. It takes only 0.45 s to dictate a character. >

Journal ArticleDOI
TL;DR: One of the large vocabulary speech-recognition systems which is being investigated at AT&T Bell Laboratories is described, and the techniques used to provide the acoustic models of the sub-word units (both context-independent and context-dependent units) are discussed.

Proceedings ArticleDOI
01 Jul 1990
TL;DR: A rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, is introduced and the complexity of the training problem as a computational problem is analyzed.
Abstract: We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by a PA. We investigate the following question about this important, well-studied problem: Does there exist an efficient training algorithm such that the trained PAs provably converge to a model close to an optimum one with high confidence, after only a feasibly small set of training data? We model this problem in the framework of computational learning theory and analyze the sample as well as computational complexity. We show that the number of examples required for training PAs is moderate—except for some log factors the number of examples is linear in the number of transition probabilities to be trained and a low-degree polynomial in the example length and parameters quantifying the accuracy and confidence. Computationally, however, training PAs is quite demanding: Fixed state size PAs are trainable in time polynomial in the accuracy and confidence parameters and example length, but not in the alphabet size unless RP e NP. The latter result is shown via a strong non-approximability result for the single string maximum likelihood model probem for 2-state PAs, which is of independent interest.

Journal ArticleDOI
TL;DR: Frequency cells comprising a subset, or gate, of the spectral bins from fast Fourier transform (FFT) processing are identified with the states of the hidden Markov chain and analyzed in terms of physically meaningful quantities.
Abstract: Frequency cells comprising a subset, or gate, of the spectral bins from fast Fourier transform (FFT) processing are identified with the states of the hidden Markov chain. An additional zero state is included to allow for the possibility of track initiation and termination. Analytic expressions for the basic parameters of the hidden Markov model (HMM) are obtained in terms of physically meaningful quantities, and optimization of the HMM tracker is discussed. A measurement sequence based on a simple threshold detector forms the input to the tracker. The outputs of the HMM tracker are a discrete Viterbi track, a gate occupancy probability function, and a continuous mean cell occupancy track. The latter provides an estimate of the mean signal frequency as a function of time. The performance of the HMM tracker is evaluated for two sets of simulated data. The HMM tracker is compared to earlier, related trackers, and possible extensions are discussed. >

Journal ArticleDOI
TL;DR: A hidden Markov model isolated word recogniser using full likelihood scoring for each word model can be treated as a recurrent ‘neural’ network and can use back-propagation of partial derivatives to hill-climb on a measure of discriminability between words.

Journal ArticleDOI
TL;DR: A new type of Markov model developed to account for the correlations between successive frames of a speech signal that performs better than the standard multivariate Gaussian HMM (hidden Markov models) when it is incorporated into a large-vocabulary isolated-word recognizer.
Abstract: The authors describe a new type of Markov model developed to account for the correlations between successive frames of a speech signal. The idea is to treat the sequence of frames as a nonstationary autoregressive process whose parameters are controlled by a hidden Markov chain. It is shown that this type of model performs better than the standard multivariate Gaussian HMM (hidden Markov model) when it is incorporated into a large-vocabulary isolated-word recognizer. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A hybrid method for continuous-speech recognition which combines hidden Markov models (HMMs) and a connectionist technique called connectionist Viterbi training (CVT) is presented and can be run iteratively and applied to large-vocabulary recognition tasks.
Abstract: A hybrid method for continuous-speech recognition which combines hidden Markov models (HMMs) and a connectionist technique called connectionist Viterbi training (CVT) is presented. CVT can be run iteratively and can be applied to large-vocabulary recognition tasks. Successful completion of training the connectionist component of the system, despite the large network size and volume of training data, depends largely on several measures taken to reduce learning time. The system is trained and tested on the TI/NBS speaker-independent continuous-digits database. Performance on test data for unknown-length strings is 98.5% word accuracy and 95.0% string accuracy. Several improvements to the current system are expected to increase these accuracies significantly. >

Proceedings ArticleDOI
Esther Levin1
03 Apr 1990
TL;DR: The network architecture proposed, the hidden control neural network (HCNN), combines nonlinear prediction of conventional neural networks with hidden Markov modeling and is trained using an algorithm that is based on back-propagation and segmentation algorithms for estimating the unknown control together with the network's parameters.
Abstract: Neural networks are used to model nonlinear and time-varying systems. The proposed model attempts to cope with the time variability systems by adding an undetermined control input which modulates the mapping implemented by the network. The network architecture proposed, the hidden control neural network (HCNN), combines nonlinear prediction of conventional neural networks with hidden Markov modeling. This network is trained using an algorithm that is based on back-propagation and segmentation algorithms for estimating the unknown control together with the network's parameters. The HCNN approach is evaluated on multispeaker recognition of connected digits, yielding a word accuracy of 99.3%. >

Journal ArticleDOI
TL;DR: This paper introduces Hidden Markov Modelling techniques, analyzes the reason for their success, and describes some improvements to the standard HMM used in SPHINX.

PatentDOI
TL;DR: In this paper, a speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one for defined vocabulary words (e.g., collect, calling-card, person, third number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words.
Abstract: Speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one type for defined vocabulary words (e.g., collect, calling-card, person, third-number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words (e.g. `I want to make a collect call please`). For this type of key word spotting, modifications are made to a connected word speech recognition algorithm based on state-transitional (hidden Markov) models which allow it to recognize words from a pre-defined vocabulary list spoken in an unconstrained fashion. Statistical models of both the actual vocabulary words and the extraneous speech and background noises are created. A syntax-driven connected word recognition system is then used to find the best sequence of extraneous input and vocabulary word models for matching the actual input speech.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: An application of discriminative training methods, maximum mutual information (MMI) training, to large-vocabulary continuous speech recognition, and an algorithm is developed for efficient MMI estimation of HMM parameters, including exponential codebook coefficients.
Abstract: An application of discriminative training methods, maximum mutual information (MMI) training, to large-vocabulary continuous speech recognition is described. An algorithm is developed for efficient MMI estimation of HMM parameters, including exponential codebook coefficients, which cannot be estimated using maximum likelihood (ML) methods. Continuous speech recognition performance of the BYBLOS system on the DARPA 1000-word resource management speech corpus is presented. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: An architecture for a neural network that implements a hidden Markov model (HMM) that suggests integrating signal preprocessing (such as vector quantization) with the classifier and a probabilistic interpretation is given for a network with negative, and even complex-valued, parameters.
Abstract: An architecture for a neural network that implements a hidden Markov model (HMM) is presented. This HMM net suggests integrating signal preprocessing (such as vector quantization) with the classifier. A minimum mean-squared-error training criterion for the HMM/neural net is presented and compared to maximum-likelihood and maximum-mutual-information criteria. The HMM forward-backward algorithm is shown to be the same as the neural net backpropagation algorithm. The implications of probability constraints on the HMM parameters are discussed. Relaxing these constraints allows negative probabilities, equivalent to inhibitory connections. A probabilistic interpretation is given for a network with negative, and even complex-valued, parameters. >