scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 1987"


ReportDOI
01 Jan 1987
TL;DR: This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N and explores the trade-off between packing a lot of information into such sequences and being able to model them accurately.
Abstract: : This thesis examines the acoustic-modeling problem in automatic speech recognition from an information-theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is broken down into two steps: a signal processing step which converts a speech waveform into a sequence of information bearing acoustic feature vectors, and a step which models such a sequence. This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N. It explores the trade-off between packing a lot of information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous parameter sequences is addressed by investigating a method of parameter estimation which is specifically designed to cope with inaccurate modeling assumptions.

266 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: Byblos as discussed by the authors is a BBN continuous speech recognition system that integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance using Hidden Markov Models (HMM).
Abstract: In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance. The basic approach, as described in previous papers [1, 2], makes extensive use of robust context-dependent models of phonetic coarticulation using Hidden Markov Models (HMM). We describe the components of the BYBLOS system, including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar and decoder. In recognition experiments, we demonstrate consistently high word recognition performance on continuous speech across: speakers, task domains, and grammars of varying complexity. In speaker-dependent mode, where 15 minutes of speech is required for training to a speaker, 98.5% word accuracy has been achieved in continuous speech for a 350-word task, using grammars with perplexity ranging from 30 to 60. With only 15 seconds of training speech we demonstrate performance of 97% using a grammar.

175 citations


Proceedings ArticleDOI
C. Wellekens1
06 Apr 1987
TL;DR: The Hidden Markov models are generalized by defining a new emission probability which takes the correlation between successive feature vectors into account andimation formulas for the iterative learning both along Viterbi and Maximum likelihood criteria are presented.
Abstract: The Hidden Markov models are generalized by defining a new emission probability which takes the correlation between successive feature vectors into account. Estimation formulas for the iterative learning both along Viterbi and Maximum likelihood criteria are presented.

148 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: A new way of using vector quantization for improving recognition performance for a 60,000 word vocabulary speaker-trained isolated word recognizer using a phonemic Markov model approach to speech recognition is proposed.
Abstract: This paper proposes a new way of using vector quantization for improving recognition performance for a 60,000 word vocabulary speaker-trained isolated word recognizer using a phonemic Markov model approach to speech recognition. We show that we can effectively increase the codebook size by dividing the feature vector into two vectors of lower dimensionality, and then quantizing and training each vector separately. For a small codebook size, integration of the results of the two parameter vectors provides significant improvement in recognition performance as compared to the quantizing and training of the entire feature set together. Even for a codebook size as small as 64, the results obtained when using the new quantization procedure are quite close to those obtained when using Gaussian distribution of the parameter vectors.

89 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: The results indicate that if sufficient training material is available, the best performance is obtained with the Fergusson model, but that with smaller training sets Poisson HSMMs or type B ESHMMs are more robust models.
Abstract: This paper presents an experimental evaluation of two such extensions: hidden semi-Markov models (HSMMs), and expanded state HMMs (ESHMMs). These extensions to the standard HMM (hiden Markov model) formalism permit improved duration modelling and experimental results are presented which show that they can consistently lead to improved performance. The results indicate that if sufficient training material is available, the best performance is obtained with the Fergusson model, but that with smaller training sets Poisson HSMMs or type B ESHMMs are more robust models.

88 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: A new algorithm is introduced that transforms hidden Markov models of speech derived from one "prototype" speaker so that they model the speech of a new speaker in the form of a probabilistic spectral mapping.
Abstract: This paper deals with rapid speaker adaptation for speech recognition. We introduce a new algorithm that transforms hidden Markov models of speech derived from one "prototype" speaker so that they model the speech of a new speaker. The Speaker normalization is accomplished by a probabilistic spectral mapping from one speaker to another. For a 350 word task with a grammar and using only 15 seconds of speech for normalization, the recognition accuracy is 97% averaged over 6 speakers. This accuracy would normally require over 5 minutes of speaker dependent training. We derive the probabilistic spectral transformation of HMMs, describe an algorithm to estimate the transformation, and present recognition results.

76 citations


Proceedings ArticleDOI
06 Apr 1987
TL;DR: An effort to make a Hidden Markov Model Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress was made.
Abstract: Most current speech recognition systems are sensitive to variations in speaker style, the following is the result of an effort to make a Hidden Markov Model (HMM) Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress More than an order-of-magnitude reduction of the error rate was achieved for a 105 word simulated stress database and a 0% error rate was achieved for the TI 20 isolated word database

55 citations


Journal ArticleDOI
TL;DR: An endpoint detection algorithm is presented which is based on hidden Markov model (HMM) technology and explicitly determines a set of speech endpoints based on the output of a Viterbi decoding algorithm.

43 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: An approach to automatic speech recognition is described which attempts to link together ideas from pattern recognition such as dynamic time warping and hidden Markov modeling, with ideas from linguistically motivated approaches.
Abstract: An approach to automatic speech recognition is described which attempts to link together ideas from pattern recognition such as dynamic time warping and hidden Markov modeling, with ideas from linguistically motivated approaches. In this approach, the basic sub-word units are defined acoustically, but not necessarily phonetically. An algorithm was developed which automatically decomposed speech into multiple sub-word segments, based solely upon strict acoustic criteria, without any reference to linguistic content. By repeating this procedure on a large corpus of speech data we obtained an extensive pool of unlabeled sub-word speech segments. Then using well defined clustering techniques, a small set of representative acoustic sub-word units (e.g. an inventory of units) was created. This process is fast, easy to use, and required no human intervention. The interpretation of these sub-word units, in a linguistic sense, in the context of word decoding is an important issue which must be addressed for them to be useful in a large vocabulary system. We have not yet addressed this issue; instead a couple of simple experiments were performed to determine if these acoustic sub-word units had any potential value for speech recognition. For these experiments we used a connected digits database from a single female talker. A 25 sub-word unit codebook of acoustic segments was created from about 1600 segments drawn from 100 connected digit strings. A simple isolated digit recognition system, designed using the statistics of the codewords in the acoustic sub-word unit codebook had a recognition accuracy of 100%. In another experiment a connected digit recognition system was created with representative digit templates created by concatenating the sub-word units in an appropriate manner. The system had a string recognition accuracy of 96%.

41 citations


Proceedings ArticleDOI
01 Apr 1987
TL;DR: This paper investigates the use of a fuzzy vector quantizer (FVQ) as the front end for a hidden Markov modeling (HMM) scheme for isolated word recognition and sees that the FVQ front end significantly reduces the amount of data needed to train the HMM algorithm.
Abstract: This paper investigates the use of a fuzzy vector quantizer (FVQ) as the front end for a hidden Markov modeling (HMM) scheme for isolated word recognition Unlike a standard vector quantizer that generates the index of a single codeword that best matches an input vector, an FVQ generates a vector whose components represent the degree to which each codeword matches the input vector The HMM algorithm is generalized to accommodate the FVQ output This approach is tested on a database of isolated words from a single male speaker It is seen that the FVQ front end significantly reduces the amount of data needed to train the HMM algorithm

36 citations


PatentDOI
TL;DR: In this paper, a speech recognition system and technique of the acoustic/phonetic type is made speaker-independent and capable of continuous speech recognition during fluent discourse by a combination of techniques which include, inter alia, using a so-called continuously variable-duration hidden Markov vodel in identifying word segments, and developing proposed phonetic sequences by a durationally-responsive recursion before any lexical access is attempted.
Abstract: A speech recognition system and technique of the acoustic/phonetic type is made speaker-independent and capable of continuous speech recognition during fluent discourse by a combination of techniques which include, inter alia, using a so-called continuously-variable-duration hidden Markov vodel in identifying word segments, i.e., phonetic units, and developing proposed phonetic sequences by a durationally-responsive recursion before any lexical access is attempted. Lexical access is facilitated by the phonetic transcriptions provided by the durationally-responsive recursion; and the resulting array of word candidates facilitates the subsequent alignment of the word candidates with the acoustic feature signals. A separate step is used for aligning the members of the candidate word arrays with the acoustic feature signals representative of the corresponding portion of the utterance. Any residual work selection ambiguities are then more readily resolved, regardless of the ultimate sentence selection technique employed.

Proceedings ArticleDOI
A.-M. Derouault1
06 Apr 1987
TL;DR: This paper shows that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models, which allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects.
Abstract: One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.

Proceedings ArticleDOI
06 Apr 1987
TL;DR: A new iterative approach for hidden Markov modeling of information sources which aims at minimizing the discrimination information (or the cross-entropy) between the source and the model is proposed.
Abstract: A new iterative approach for hidden Markov modeling of information sources which aims at minimizing the discrimination information (or the cross-entropy) between the source and the model is proposed. This approach does not require the commonly used assumption that the source to be modeled is a hidden Markov process. The algorithm is started from the model estimated by the traditional maximum likelihood (ML) approach and alternatively decreases the discrimination information over all probability distributions of the source which agree with the given measurements and all hidden Markov models. The proposed procedure generalizes the Baum algorithm for ML hidden Markov modeling. The procedure is shown to be a descent algorithm for the discrimination information measure and its local convergence is proved.

Journal ArticleDOI
TL;DR: The trade-off between packing information into sequences of feature vectors and being able to model them accurately is explored, and a method of parameter estimation which is designed to cope with inaccurate modeling assumptions is investigated.

Journal ArticleDOI
Lawrence R. Rabiner1, Jay G. Wilpon1
TL;DR: Algorithms based on both template matching (via dynamic time warping (DTW) procedures) and hidden Markov models (HMMs) have been developed which yield high accuracy on several standard vocabularies, including the 10 digits and the set of 26 letters of the English alphabet.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A probabilistic approach to Chinese four-tone recognition in which the well-known technique of a hidden Markov model is used, using the Baum's forward-backward algorithm based upon the artificial (simulated) training sequences.
Abstract: In this paper, we present a probabilistic approach to Chinese four-tone recognition in which the well-known technique of a hidden Markov model is used. For each tone, a distinct hidden Markov model (HMM) is produced by using the Baum's forward-backward algorithm based upon the artificial (simulated) training sequences. Classification can be made by computing the probability of generating the test utterance with each tone model and choosing as the recognized tone the one corresponding to the model with the highest probability score. The recognition accuracies were found to be 98% for 35 Chinese phonetic alphabets pronounced by standard Chinese speakers and 96% for Chinese digits pronounced by our research group.

Book ChapterDOI
01 Jan 1987
TL;DR: A unified treatment of the labeling and learning problems for the so-called hidden Markov chain model currently used in many speech recognition systems and the hidden Pickard random field image model, formulated in terms of Baum's classical forward-backward recurrence formulae.
Abstract: The paper outlines a unified treatment of the labeling and learning problems for the so-called hidden Markov chain model currently used in many speech recognition systems and the hidden Pickard random field image model (a small but interesting, causal sub-class of hidden Markov random field models). In both cases, labeling techniques are formulated in terms of Baum’s classical forward-backward recurrence formulae, and learning is accomplished by a specialization of the EM algorithm for mixture identification. Experimental results demonstrate that the approach is subjectively relevant to the image restoration and segmentation problems.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A two-stage isolated word speech recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage, reducing the overall error rate by more than a factor of two.
Abstract: This paper describes a two-stage isolated word speech recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage. During recognition, when the first-stage recognizer is unable to clearly differentiate between acoustically similar words such as "go" and "no" the second-stage discriminator is used. The second-stage system focuses on those parts of the unknown token which are most effective at discriminating the confused words. The system was tested on a 35 word, 10,710 token stress speech isolated word data base created at Lincoln Laboratory. Adding the second-stage discriminating system produced the best results to date on this data base, reducing the overall error rate by more than a factor of two.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: This paper presents a study of talker- stress-induced intraword variability, and an algorithm that compensates for the systematic changes observed, based on Hidden Markov Models trained by speech tokens in various talking styles.
Abstract: Automtic speech recognition algorithms generally rely on the assumption that for the distance measure used, intraword variabilities are smaller than interword variabilities so that appropriate separation in the measurement space is possible. As evidenced by degradation of recognition perforrmnce, the validity of such an assumption decreases from simple tasks to complex tasks, from cooperative talkers to casual talkers, and from laboratory talking environments to practical talking environments. This paper presents a study of talker- stress-induced intraword variability, and an algorithm that commpensates for the systematic changes observed. The study is based on Hidden Markov Models trained by speech tokens in various talking styles. The talking styles include normal speech, fast speech, loud speech, soft speech, and talking with noise injected through earphones; the styles are designed to simulate speech produced under real stressful conditions. Cepstral coefficients are used as the parameters in the Hidden Markov Models. The stress compensation algorithm compensates for the variations in the cepstral coefficients in a hypothesis-driven manner. The functional form of the compensation is shown to correspond to the equalization of spectral tilts. Preliminary experiments indicate that a substantial reduction in recognition error rate can be achieved with relatively little increase in computation and storage requirements.

Proceedings ArticleDOI
06 Apr 1987
TL;DR: The stochastic segment model, the recognition algorithm, and the iterative training algorithm for estimating segment models from continuous speech, including speaker-dependent continuous speech recognition, are described.
Abstract: Developing accurate and robust phonetic models for the different speech sounds is a major challenge for high performance continuous speech recognition. In this paper, we introduce a new approach, called the stochastic segment model, for modelling a variable-length phonetic segment X, an L-long sequence of feature vectors. The stochastic segment model consists of 1) time-warping the variable-length segment X into a fixed-length segment Y called a resampled segment, and 2) a joint density function of the parameters of the resampled segment Y, which in this work is assumed Gaussian. In this paper, we describe the stochastic segment model, the recognition algorithm, and the iterative training algorithm for estimating segment models from continuous speech. For speaker-dependent continuous speech recognition, the segment model reduces the word error rate by one third over a hidden Markov phonetic model.

Journal ArticleDOI
TL;DR: The recognition results based on these models clearly show the ability of Hidden Markov Models to model some aspects of the underlying prosodic structure.


Proceedings ArticleDOI
Stephen E. Levinson1
01 Jan 1987
TL;DR: An experimental continuous speech recognition system comprising procedures for acoustic/phonetic classification, lexical access and sentence retrieval and an experimental evaluation of the system, the parameters of an acoustic/Phonetic model were estimated from fluent utterances of 37 seven-digit numbers.
Abstract: This paper describes an experimental continuous speech recognition system comprising procedures for acoustic/phonetic classification, lexical access and sentence retrieval. Speech is assumed to be composed of a small number of phonetic units which may be identified with the states of a hidden Markov model. The acoustic correlates of the phonetic units are then characterized by the observable Gaussian process associated with the corresponding state of the underlying Markov chain. Once the parameters of such a model are determined, a phonetic transcription of an utterance can be obtained by means of a Viterbi-like algorithm. Given a lexicon in which each entry is orthographically represented in terms of the chosen phonetic units, a word lattice is produced by a lexical access procedure. Lexical items whose orthography matches subsequences of the phonetic transcription are sought by means of a hash coding technique and their likelihoods are computed directly from the corresponding interval of acoustic measurements. The recognition process is completed by recovering from the word lattice, the string of words of maximum likelihood conditioned on the measurements. The desired string is derived by a best-first search algorithm. In an experimental evaluation of the system, the parameters of an acoustic/phonetic model were estimated from fluent utterances of 37 seven-digit numbers. A digit recognition rate of 96% was then observed on an independent test set of 59 utterances of the same form from the same speaker. Half of the observed errors resulted from insertions while deletions and substitutions accounted equally for the other half.

17 Dec 1987
TL;DR: The results show that the recognition accuracy obtained using the multi-layer perceptron is comparable with that from using hidden Markov modelling.
Abstract: : The multi-layer perceptron is investigated as a new approach to the automatic recognition of spoken isolated digits. The choice of the parameters for the multi-layer perceptron is discussed and experimental results are reported. A comparison is made with established techniques such as dynamic time-warping and hidden Markov modelling applied to the same data. The results, for this particular task, show that the recognition accuracy obtained using the multi-layer perceptron is comparable with that from using hidden Markov modelling.

Proceedings ArticleDOI
Masafumi Nishimura1, K. Toshioka
01 Apr 1987
TL;DR: A new vector quantization (VQ; so-called labeling) method of a speech recognition system based on hidden Markov model (HMM) which generates multiple labels at each frame while keeping a conventional HMM formulation.
Abstract: This paper describes a new vector quantization (VQ; so-called labeling) method of a speech recognition system based on hidden Markov model (HMM). For improving the VQ accuracy in a simple manner, "multi-labeling" which generates multiple labels at each frame was introduced while keeping a conventional HMM formulation. Furthermore, in order to represent characteristics of speech accurately and effectively, "multi-dimensional labeling" was also introduced which quantizes multiple features such as spectral dynamics and spectrum independently. This labeling method was tested in an isolated word recognition task using 150 Japanese confusable words. The recognition error rate was roughly reduced to 1/2 or less compared with the conventional method.


Proceedings ArticleDOI
06 Apr 1987
TL;DR: This work investigates methods based on the definition of a similarity measure of Hidden Markov Models of phonemes, and on the automatic identification of broad phonetic classes via clustering algorithms, that create classes of equivalence among words by means of a phoneme classification.
Abstract: The development of large dictionary speech recognition systems requires the use of techniques aimed at limiting the search of the correct word to a subset of the vocabulary as small as possible. An approach to this problem is to create classes of equivalence among words by means of a phoneme classification. We investigate methods based on the definition of a similarity measure of Hidden Markov Models of phonemes, and on the automatic identification of broad phonetic classes via clustering algorithms. We discuss the obtained classifications, and their use in a real time speech recognition system for a 3000-word dictionary for Italian; results are compared to those achieved by knowledge based classifications.

Proceedings ArticleDOI
01 Apr 1987
TL;DR: A programmable VLSI processor is described for efficiently computing a variety of kernel operations for speech recognition, which include dynamic programming for isolated and connected word recognition using both the template matching approach and the Hidden Markov Model approach.
Abstract: A programmable VLSI processor is described for efficiently computing a variety of kernel operations for speech recognition. These operations include dynamic programming for isolated and connected word recognition using both the template matching approach and the Hidden Markov Model (HMM) approach, dynamic programming for natural language models, and metric computations for vector quantization and distance measurement. As well as being able to efficiently compute a wide class of speech processing operations, the architecture is useful in other areas such as image processing. Working chips have been produced using 1.5 µ CMOS design rules that combine both custom and standard cell approaches.


Proceedings ArticleDOI
01 Apr 1987
TL;DR: Recognition experiments indicate that the performance of the weighted cepstral distance with vector quantized spectral data is considerably different from that previously reported for unquantized data.
Abstract: This paper extends the use of weighted cepstral distance measures to speaker independent word recognizers based on vector quantization. Recognition results were obtained for two recognition methods: dynamic timewarping of vector codes and hidden Markov modeling. The experiments were carried out on a vocabulary of the ten digits and the word "oh". Two kinds of spectral analysis were considered: LPC, and a recently proposed, low dimensional, perceptually based representation (PLP). The effects of analysis order and varying degrees of quantization in the spectral representation were also considered. Recognition experiments indicate that the performance of the weighted cepstral distance with vector quantized spectral data is considerably different from that previously reported for unquantized data. Comparison of recognition rates shows wide variations due to interaction of the distance measure with the analysis technique and with vector quantization. The best recognition scores were obtained by the combination of weighted cepstral distance and low order PLP analysis. This combination maintained good recognition rates down to very low (16 or 8 codes) codebook sizes.