scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 1983"


Journal ArticleDOI
TL;DR: This paper presents an approach to speaker-independent, isolated word recognition in which the well-known techniques of vector quantization and hidden Markov modeling are combined with a linear predictive coding analysis front end in the framework of a standard statistical pattern recognition model.
Abstract: In this paper we present an approach to speaker-independent, isolated word recognition in which the well-known techniques of vector quantization and hidden Markov modeling are combined with a linear predictive coding analysis front end. This is done in the framework of a standard statistical pattern recognition model. Both the vector quantizer and the hidden Markov models need to be trained for the vocabulary being recognized. Such training results in a distinct hidden Markov model for each word of the vocabulary. Classification consists of computing the probability of generating the test word with each word model and choosing the word model that gives the highest probability. There are several factors, in both the vector quantizer and the hidden Markov modeling, that affect the performance of the overall word recognition system, including the size of the vector quantizer, the structure of the hidden Markov model, the ways of handling insufficient training data, etc. The effects, on recognition accuracy, of many of these factors are discussed in this paper. The entire recognizer (training and testing) has been evaluated on a 10-word digits vocabulary. For training, a set of 100 talkers spoke each of the digits one time. For testing, an independent set of 100 tokens of each of the digits was obtained. The overall recognition accuracy was found to be 96.5 percent for the 100-talker test set. These results are comparable to those obtained in earlier work, using a dynamic time-warping recognition algorithm with multiple templates per digit. It is also shown that the computation and storage requirements of the new recognizer were an order of magnitude less than that required for a conventional pattern recognition system using linear prediction with dynamic time warping.

337 citations


Journal ArticleDOI
G. Langdon1, Jorma Rissanen
TL;DR: A one-pass compression scheme which presumes no statistical properties of the data being compressed, and adaptively selects a subset of first-order Markov contexts, based on an estimate of the candidate context's popularity.
Abstract: We describe a one-pass compression scheme which presumes no statistical properties of the data being compressed The model structure adaptively selects a subset of first-order Markov contexts, based on an estimate of the candidate context's popularity The probability distributions for the unselected (lumped) first-order contexts are made the same, reducing cost over a full first-order Markov model Symbol repetitions are handled in special secondorder Markov contexts The statistics for each symbol are adaptively determined by an extension of earlier work

37 citations


Proceedings ArticleDOI
01 Apr 1983
TL;DR: The experiments reported here are the first in which a direct comparison is made between two conceptually different methods of treating the non-stationarity problem in speech recognition by implicitly dividing the speech signal into quasi-stationary intervals.
Abstract: A method for speaker independent isolated digit recognition based on modeling entire words as discrete probabilistic functions of a Markov chain is described. Training is a three part process comprising conventional methods of linear prediction coding (LPC) and vector quantization of the LPCs followed by an algorithm for estimating the parameters of a hidden Markov process. Recognition utilizes linear prediction and vector quantization steps prior to maximum likelihood classification based on the Viterbi algorithm. Vector quantization is performed by a K-means algorithm which finds a codebook of 64 prototypical vectors that minimize the distortion measure (Itakura distance) over the training set. After training based on a 1,000 token set, recognition experiments were conducted on a separate 1,000 token test set obtained from the same talkers. In this test a 3.5% error rate was observed which is comparable to that measured in an identical test of an LPC/DTW (dynamic time warping) system. The computational demand for recognition under the new system is reduced by a factor of approximately 10 in both time and memory compared to that of the LPC/DTW system. It is also of interest that the classification errors made by the two systems are virtually disjoint; thus the possibility exists to obtain error rates near 1% by a combination of the methods. In describing our experiments we discuss several issues of theoretical importance, namely: 1) Alternatives to the Baum-Welch algorithm for model parameter estimation, e.g., Lagrangian techniques; 2) Model combining techniques by means of a bipartite graph matching algorithm providing improved model stability; 3) Methods for treating the finite training data problem by modifications to both the Baum-Welch algorithm and Lagrangian techniques; and 4) Use of non-ergodic Markov chains for isolated word recognition. We note that the experiments reported here are the first in which a direct comparison is made between two conceptually different (i.e. parametric and non-parametric) methods of treating the non-stationarity problem in speech recognition by implicitly dividing the speech signal into quasi-stationary intervals.

36 citations


Proceedings ArticleDOI
01 Apr 1983
TL;DR: It is demonstrated that by using Bayesian techniques, prior knowledge derived from speaker-independent data can be combined with speaker-dependent training data to improve system performance.
Abstract: In order to achieve state-of-the-art performance in a speaker-dependent speech recognition task, it is necessary to collect a large number of acoustic data samples during the training process. Providing these samples to the system can be a long and tedious process for users. One way to attack this problem is to make use of extra information from a data bank representing a large population of speakers. In this paper we demonstrate that by using Bayesian techniques, prior knowledge derived from speaker-independent data can be combined with speaker-dependent training data to improve system performance.

35 citations


Journal ArticleDOI
A. Nadas1
TL;DR: In this paper, the authors consider the problem of estimating the parameters of the distribution of a probabilistic function of a Markov chain (a "hidden Markov model" or "Markov source model").
Abstract: The objects listed in the title have proven to be useful and practical modeling tools in continuous speech recognition work and elsewhere. Nevertheless, there are natural and simple situations in which the forward-backward algorithm will be inadequate for its intended purpose of finding useful maximum likelihood estimates of the parameters of the distribution of a probabilistic function of a Markov chain (a "hidden Markov model" or "Markov source model"). We observe some difficulties that arise in the case of common (e.g., Gaussian) families of conditional distributions for the observables. These difficulties are due not to the algorithm itself, but to modeling assumptions which introduce singularities into the likelihood function. We also comment on the fact that the parameters of a hidden Markov model cannot, in general, be determined, even if the distribution of the observables is completely known. We close with remarks about some effects of these modeling and estimating difficulties on practical speech recognition, and about the role of initial statistics.

17 citations


Journal ArticleDOI
TL;DR: These investigations to the recognition of isolated words from a medium size vocabulary, (129 words), as used in the Bell Laboratories airline reservation and information system, find that recognition accuracy is indeed a function of the HMM parameter and that a vector quantizer which uses energy information gives better performance.
Abstract: Recent work at Bell Laboratories has shown how the theories of LPC Vector Quantization (VQ) and hidden Markov modeling (HMM) can be applied to the recognition of isolated word vocabularies. Our first experiments with HMM based recognizers were restricted to a vocabulary of the ten digits. For this simple vocabulary we found that a high performance recognizer (word accuracy on the order of 97%) could be implemented, and that the performance was, for the most part, insensitive to parameters of both the Markov model and the vector quantizer. In this talk we extend our investigations to the recognition of isolated words from a medium size vocabulary, (129 words), as used in the Bell Laboratories airline reservation and information system. For this moderately complex vocabulary we have found that recognition accuracy is indeed a function of the HMM parameter (i.e., the number of states and the number of symbols in the vector quantizer). We have also found that a vector quantizer which uses energy information gives better performance than a conventional LPC shape vector quantizer of the same size (i.e., number of codebook entries).

7 citations


Book ChapterDOI
S. Kusuoka1
01 Jan 1983

4 citations


Patent
12 Oct 1983
TL;DR: A speech recognizer includes a plurality of stored constrained hidden Markov model reference templates and a set of stored signals representative of prescribed acoustic features of the said plurality of reference patterns.
Abstract: A speech recognizer includes a plurality of stored constrained hidden Markov model reference templates and a set of stored signals representative of prescribed acoustic features of the said plurality of reference patterns. The Markov model template includes a set of N state signals. The number of states is preselected to be independent of the reference pattern acoustic features and preferably substantially smaller than the number of acoustic feature frames of the reference patterns. An input utterance is analyzed to form a sequence of said prescribed feature signals representative of the utterance. The utterance representative prescribed feature signal sequence is combined with the N state constrained hidden Markov model template signals to form a signal representative of the probability of the utterance being each reference pattern. The input speech pattern is identified as one of the reference patterns responsive to the probability representative signals.