Showing papers on "Hidden Markov model published in 1987"

PDF

Open Access

Report•DOI•

The acoustic-modeling problem in automatic speech recognition

[...]

01 Jan 1987

TL;DR: This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N and explores the trade-off between packing a lot of information into such sequences and being able to model them accurately.

...read moreread less

Abstract: : This thesis examines the acoustic-modeling problem in automatic speech recognition from an information-theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is broken down into two steps: a signal processing step which converts a speech waveform into a sequence of information bearing acoustic feature vectors, and a step which models such a sequence. This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N. It explores the trade-off between packing a lot of information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous parameter sequences is addressed by investigating a method of parameter estimation which is specifically designed to cope with inaccurate modeling assumptions.

...read moreread less

266 citations

Proceedings Article•DOI•

BYBLOS: The BBN continuous speech recognition system

[...]

Yen-Lu Chow, M. Dunham, Owen Kimball, M. Krasner, G. Kubala, John Makhoul, Patti Price, S. Roucos, Richard Schwartz - Show less +5 more

06 Apr 1987

TL;DR: Byblos as discussed by the authors is a BBN continuous speech recognition system that integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance using Hidden Markov Models (HMM).

...read moreread less

Abstract: In this paper, we describe BYBLOS, the BBN continuous speech recognition system. The system, designed for large vocabulary applications, integrates acoustic, phonetic, lexical, and linguistic knowledge sources to achieve high recognition performance. The basic approach, as described in previous papers [1, 2], makes extensive use of robust context-dependent models of phonetic coarticulation using Hidden Markov Models (HMM). We describe the components of the BYBLOS system, including: signal processing frontend, dictionary, phonetic model training system, word model generator, grammar and decoder. In recognition experiments, we demonstrate consistently high word recognition performance on continuous speech across: speakers, task domains, and grammars of varying complexity. In speaker-dependent mode, where 15 minutes of speech is required for training to a speaker, 98.5% word accuracy has been achieved in continuous speech for a 350-word task, using grammars with perplexity ranging from 30 to 60. With only 15 seconds of training speech we demonstrate performance of 97% using a grammar.

...read moreread less

175 citations

Proceedings Article•DOI•

Explicit time correlation in hidden Markov models for speech recognition

[...]

C. Wellekens¹•Institutions (1)

Philips¹

06 Apr 1987

TL;DR: The Hidden Markov models are generalized by defining a new emission probability which takes the correlation between successive feature vectors into account andimation formulas for the iterative learning both along Viterbi and Maximum likelihood criteria are presented.

...read moreread less

Abstract: The Hidden Markov models are generalized by defining a new emission probability which takes the correlation between successive feature vectors into account. Estimation formulas for the iterative learning both along Viterbi and Maximum likelihood criteria are presented.

...read moreread less

148 citations

Proceedings Article•DOI•

Integration of acoustic information in a large vocabulary word recognizer

[...]

Vishwa Gupta¹, Matthew Lennig, Paul Mermelstein•Institutions (1)

Institut national de la recherche scientifique¹

06 Apr 1987

TL;DR: A new way of using vector quantization for improving recognition performance for a 60,000 word vocabulary speaker-trained isolated word recognizer using a phonemic Markov model approach to speech recognition is proposed.

...read moreread less

Abstract: This paper proposes a new way of using vector quantization for improving recognition performance for a 60,000 word vocabulary speaker-trained isolated word recognizer using a phonemic Markov model approach to speech recognition. We show that we can effectively increase the codebook size by dividing the feature vector into two vectors of lower dimensionality, and then quantizing and training each vector separately. For a small codebook size, integration of the results of the two parameter vectors provides significant improvement in recognition performance as compared to the quantizing and training of the entire feature set together. Even for a codebook size as small as 64, the results obtained when using the new quantization procedure are quite close to those obtained when using Gaussian distribution of the parameter vectors.

...read moreread less

89 citations

Proceedings Article•DOI•

Experimental evaluation of duration modelling techniques for automatic speech recognition

[...]

Martin J. Russell, Adrian Cook

06 Apr 1987

TL;DR: The results indicate that if sufficient training material is available, the best performance is obtained with the Fergusson model, but that with smaller training sets Poisson HSMMs or type B ESHMMs are more robust models.

...read moreread less

Abstract: This paper presents an experimental evaluation of two such extensions: hidden semi-Markov models (HSMMs), and expanded state HMMs (ESHMMs). These extensions to the standard HMM (hiden Markov model) formalism permit improved duration modelling and experimental results are presented which show that they can consistently lead to improved performance. The results indicate that if sufficient training material is available, the best performance is obtained with the Fergusson model, but that with smaller training sets Poisson HSMMs or type B ESHMMs are more robust models.

...read moreread less

88 citations

Proceedings Article•DOI•

Rapid speaker adaptation using a probabilistic spectral mapping

[...]

Richard Schwartz, Yen-Lu Chow, Francis Kubala

01 Apr 1987

TL;DR: A new algorithm is introduced that transforms hidden Markov models of speech derived from one "prototype" speaker so that they model the speech of a new speaker in the form of a probabilistic spectral mapping.

...read moreread less

Abstract: This paper deals with rapid speaker adaptation for speech recognition. We introduce a new algorithm that transforms hidden Markov models of speech derived from one "prototype" speaker so that they model the speech of a new speaker. The Speaker normalization is accomplished by a probabilistic spectral mapping from one speaker to another. For a 350 word task with a grammar and using only 15 seconds of speech for normalization, the recognition accuracy is 97% averaged over 6 speakers. This accuracy would normally require over 5 minutes of speaker dependent training. We derive the probabilistic spectral transformation of HMMs, describe an algorithm to estimate the transformation, and present recognition results.

...read moreread less

76 citations

Proceedings Article•DOI•

A speaker-stress resistant HMM isolated word recognizer

[...]

D.B. Paul¹•Institutions (1)

Massachusetts Institute of Technology¹

06 Apr 1987

TL;DR: An effort to make a Hidden Markov Model Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress was made.

...read moreread less

Abstract: Most current speech recognition systems are sensitive to variations in speaker style, the following is the result of an effort to make a Hidden Markov Model (HMM) Isolated Word Recognizer (IWR) tolerant to such speech changes caused by speaker stress More than an order-of-magnitude reduction of the error rate was achieved for a 105 word simulated stress database and a 0% error rate was achieved for the TI 20 isolated word database

...read moreread less

55 citations

Journal Article•DOI•

Application of hidden Markov models to automatic speech endpoint detection

[...]

Jay G. Wilpon¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Sep 1987-Computer Speech & Language

TL;DR: An endpoint detection algorithm is presented which is based on hidden Markov model (HMM) technology and explicitly determines a set of speech endpoints based on the output of a Viterbi decoding algorithm.

...read moreread less

43 citations

Proceedings Article•DOI•

An investigation on the use of acoustic sub-word units for automatic speech recognition

[...]

Jay G. Wilpon¹, Biing-Hwang Juang, Lawrence R. Rabiner•Institutions (1)

Bell Labs¹

01 Apr 1987

TL;DR: An approach to automatic speech recognition is described which attempts to link together ideas from pattern recognition such as dynamic time warping and hidden Markov modeling, with ideas from linguistically motivated approaches.

...read moreread less

Abstract: An approach to automatic speech recognition is described which attempts to link together ideas from pattern recognition such as dynamic time warping and hidden Markov modeling, with ideas from linguistically motivated approaches. In this approach, the basic sub-word units are defined acoustically, but not necessarily phonetically. An algorithm was developed which automatically decomposed speech into multiple sub-word segments, based solely upon strict acoustic criteria, without any reference to linguistic content. By repeating this procedure on a large corpus of speech data we obtained an extensive pool of unlabeled sub-word speech segments. Then using well defined clustering techniques, a small set of representative acoustic sub-word units (e.g. an inventory of units) was created. This process is fast, easy to use, and required no human intervention. The interpretation of these sub-word units, in a linguistic sense, in the context of word decoding is an important issue which must be addressed for them to be useful in a large vocabulary system. We have not yet addressed this issue; instead a couple of simple experiments were performed to determine if these acoustic sub-word units had any potential value for speech recognition. For these experiments we used a connected digits database from a single female talker. A 25 sub-word unit codebook of acoustic segments was created from about 1600 segments drawn from 100 connected digit strings. A simple isolated digit recognition system, designed using the statistics of the codewords in the acoustic sub-word unit codebook had a recognition accuracy of 100%. In another experiment a connected digit recognition system was created with representative digit templates created by concatenating the sub-word units in an appropriate manner. The system had a string recognition accuracy of 96%.

...read moreread less

41 citations

Proceedings Article•DOI•

Fuzzy vector quantazation applied to hidden Markov modeling

[...]

Ho-Ping Tseng¹, M. Sabin, E. Lee•Institutions (1)

University of California, Berkeley¹

01 Apr 1987

TL;DR: This paper investigates the use of a fuzzy vector quantizer (FVQ) as the front end for a hidden Markov modeling (HMM) scheme for isolated word recognition and sees that the FVQ front end significantly reduces the amount of data needed to train the HMM algorithm.

...read moreread less

Abstract: This paper investigates the use of a fuzzy vector quantizer (FVQ) as the front end for a hidden Markov modeling (HMM) scheme for isolated word recognition Unlike a standard vector quantizer that generates the index of a single codeword that best matches an input vector, an FVQ generates a vector whose components represent the degree to which each codeword matches the input vector The HMM algorithm is generalized to accommodate the FVQ output This approach is tested on a database of isolated words from a single male speaker It is seen that the FVQ front end significantly reduces the amount of data needed to train the HMM algorithm

...read moreread less

36 citations

Patent•DOI•

Speech recognition by acoustic/phonetic system and technique

[...]

Levinson Stephen Eliot¹•Institutions (1)

AT&T Corporation¹

03 Apr 1987-Journal of the Acoustical Society of America

TL;DR: In this paper, a speech recognition system and technique of the acoustic/phonetic type is made speaker-independent and capable of continuous speech recognition during fluent discourse by a combination of techniques which include, inter alia, using a so-called continuously variable-duration hidden Markov vodel in identifying word segments, and developing proposed phonetic sequences by a durationally-responsive recursion before any lexical access is attempted.

...read moreread less

Abstract: A speech recognition system and technique of the acoustic/phonetic type is made speaker-independent and capable of continuous speech recognition during fluent discourse by a combination of techniques which include, inter alia, using a so-called continuously-variable-duration hidden Markov vodel in identifying word segments, i.e., phonetic units, and developing proposed phonetic sequences by a durationally-responsive recursion before any lexical access is attempted. Lexical access is facilitated by the phonetic transcriptions provided by the durationally-responsive recursion; and the resulting array of word candidates facilitates the subsequent alignment of the word candidates with the acoustic feature signals. A separate step is used for aligning the members of the candidate word arrays with the acoustic feature signals representative of the corresponding portion of the utterance. Any residual work selection ambiguities are then more readily resolved, regardless of the ultimate sentence selection technique employed.

...read moreread less

Proceedings Article•DOI•

Context-dependent phonetic Markov models for large vocabulary speech recognition

[...]

A.-M. Derouault¹•Institutions (1)

IBM¹

06 Apr 1987

TL;DR: This paper shows that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models, which allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects.

...read moreread less

Abstract: One approach to large vocabulary speech recognition, is to build phonetic Markov models, and to concatenate them to obtain word models. In previous work, we already designed a recognizer based on 40 phonetic Markov machines, which accepts a 10,000 words vocabulary ([3]), and recently 200,000 words vocabulary ([5]). Since there is one machine per phoneme, these models obviously do not account for coarticulatory effects, which may lead to recognition errors. In this paper, we improve the phonetic models by using general principles about coarticulation effects on automatic phoneme recognition. We show that both the analysis of the errors made by the recognizer, and linguistic facts about phonetic context influence, suggest a method for choosing context dependent models. This method allows to limit the growing of the number of phonems, and still account for the most important coarticulation effects. We present our experiments with a system applying these principles to a set of models for French. With this new system including context-dependant machines, the phoneme recognition rate goes from 82.2% to 85.3%, and the error rate on words with a 10,000 word dictionary, is decreased from 11.2 to 9.8%.

...read moreread less

Proceedings Article•DOI•

A minimum discrimination information approach for hidden Markov modeling

[...]

Yariv Ephraim¹, Amir Dembo¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

06 Apr 1987

TL;DR: A new iterative approach for hidden Markov modeling of information sources which aims at minimizing the discrimination information (or the cross-entropy) between the source and the model is proposed.

...read moreread less

Abstract: A new iterative approach for hidden Markov modeling of information sources which aims at minimizing the discrimination information (or the cross-entropy) between the source and the model is proposed. This approach does not require the commonly used assumption that the source to be modeled is a hidden Markov process. The algorithm is started from the model estimated by the traditional maximum likelihood (ML) approach and alternatively decreases the discrimination information over all probability distributions of the source which agree with the given measurements and all hidden Markov models. The proposed procedure generalizes the Baum algorithm for ML hidden Markov modeling. The procedure is shown to be a descent algorithm for the discrimination information measure and its local convergence is proved.

...read moreread less

Journal Article•DOI•

Speech recognition with continuous-parameter hidden Markov models

[...]

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, Peter V. De Souza¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

01 Sep 1987-Computer Speech & Language

TL;DR: The trade-off between packing information into sequences of feature vectors and being able to model them accurately is explored, and a method of parameter estimation which is designed to cope with inaccurate modeling assumptions is investigated.

...read moreread less

Journal Article•DOI•

Some performance benchmarks for isolated work speech recognition systems

[...]

Lawrence R. Rabiner¹, Jay G. Wilpon¹•Institutions (1)

AT&T¹

01 Sep 1987-Computer Speech & Language

TL;DR: Algorithms based on both template matching (via dynamic time warping (DTW) procedures) and hidden Markov models (HMMs) have been developed which yield high accuracy on several standard vocabularies, including the 10 digits and the set of 26 letters of the English alphabet.

...read moreread less

Proceedings Article•DOI•

A hidden Markov model applied to Chinese four-tone recognition

[...]

Xi-Xian Chen, Chang-Nian Cai, Peng Guo, Ying Sun

01 Apr 1987

TL;DR: A probabilistic approach to Chinese four-tone recognition in which the well-known technique of a hidden Markov model is used, using the Baum's forward-backward algorithm based upon the artificial (simulated) training sequences.

...read moreread less

Abstract: In this paper, we present a probabilistic approach to Chinese four-tone recognition in which the well-known technique of a hidden Markov model is used. For each tone, a distinct hidden Markov model (HMM) is produced by using the Baum's forward-backward algorithm based upon the artificial (simulated) training sequences. Classification can be made by computing the probability of generating the test utterance with each tone model and choosing as the recognized tone the one corresponding to the model with the highest probability score. The recognition accuracies were found to be 98% for 35 Chinese phonetic alphabets pronounced by standard Chinese speakers and 96% for Chinese digits pronounced by our research group.

...read moreread less

Book Chapter•DOI•

Learning the parameters of a hidden Markov random field image model: A simple example

[...]

Pierre A. Devijver¹, Michel Dekesel¹•Institutions (1)

Philips¹

01 Jan 1987

TL;DR: A unified treatment of the labeling and learning problems for the so-called hidden Markov chain model currently used in many speech recognition systems and the hidden Pickard random field image model, formulated in terms of Baum's classical forward-backward recurrence formulae.

...read moreread less

Abstract: The paper outlines a unified treatment of the labeling and learning problems for the so-called hidden Markov chain model currently used in many speech recognition systems and the hidden Pickard random field image model (a small but interesting, causal sub-class of hidden Markov random field models). In both cases, labeling techniques are formulated in terms of Baum’s classical forward-backward recurrence formulae, and learning is accomplished by a specialization of the EM algorithm for mixture identification. Experimental results demonstrate that the approach is subjectively relevant to the image restoration and segmentation problems.

...read moreread less

Proceedings Article•DOI•

Two-stage discriminant analysis for improved isolated-word recognition

[...]

E.A. Martin¹, Richard P. Lippmann, D.B. Paul•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1987

TL;DR: A two-stage isolated word speech recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage, reducing the overall error rate by more than a factor of two.

...read moreread less

Abstract: This paper describes a two-stage isolated word speech recognition system that uses a Hidden Markov Model (HMM) recognizer in the first stage and a discriminant analysis system in the second stage. During recognition, when the first-stage recognizer is unable to clearly differentiate between acoustically similar words such as "go" and "no" the second-stage discriminator is used. The second-stage system focuses on those parts of the unknown token which are most effective at discriminating the confused words. The system was tested on a 35 word, 10,710 token stress speech isolated word data base created at Lincoln Laboratory. Adding the second-stage discriminating system produced the best results to date on this data base, reducing the overall error rate by more than a factor of two.

...read moreread less

Proceedings Article•DOI•

Cepstral domain stress compensation for robust speech recogniton

[...]

Yeunung Chen¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1987

TL;DR: This paper presents a study of talker- stress-induced intraword variability, and an algorithm that compensates for the systematic changes observed, based on Hidden Markov Models trained by speech tokens in various talking styles.

...read moreread less

Abstract: Automtic speech recognition algorithms generally rely on the assumption that for the distance measure used, intraword variabilities are smaller than interword variabilities so that appropriate separation in the measurement space is possible. As evidenced by degradation of recognition perforrmnce, the validity of such an assumption decreases from simple tasks to complex tasks, from cooperative talkers to casual talkers, and from laboratory talking environments to practical talking environments. This paper presents a study of talker- stress-induced intraword variability, and an algorithm that commpensates for the systematic changes observed. The study is based on Hidden Markov Models trained by speech tokens in various talking styles. The talking styles include normal speech, fast speech, loud speech, soft speech, and talking with noise injected through earphones; the styles are designed to simulate speech produced under real stressful conditions. Cepstral coefficients are used as the parameters in the Hidden Markov Models. The stress compensation algorithm compensates for the variations in the cepstral coefficients in a hypothesis-driven manner. The functional form of the compensation is shown to correspond to the equalization of spectral tilts. Preliminary experiments indicate that a substantial reduction in recognition error rate can be achieved with relatively little increase in computation and storage requirements.

...read moreread less

Proceedings Article•DOI•

A stochastic segment model for phoneme-based continuous speech recognition

[...]

S. Roucos, M. Dunham

06 Apr 1987

TL;DR: The stochastic segment model, the recognition algorithm, and the iterative training algorithm for estimating segment models from continuous speech, including speaker-dependent continuous speech recognition, are described.

...read moreread less

Abstract: Developing accurate and robust phonetic models for the different speech sounds is a major challenge for high performance continuous speech recognition. In this paper, we introduce a new approach, called the stochastic segment model, for modelling a variable-length phonetic segment X, an L-long sequence of feature vectors. The stochastic segment model consists of 1) time-warping the variable-length segment X into a fixed-length segment Y called a resampled segment, and 2) a joint density function of the parameters of the resampled segment Y, which in this work is assumed Gaussian. In this paper, we describe the stochastic segment model, the recognition algorithm, and the iterative training algorithm for estimating segment models from continuous speech. For speaker-dependent continuous speech recognition, the segment model reduces the word error rate by one third over a hidden Markov phonetic model.

...read moreread less

Journal Article•DOI•

Recognition of isolated prosodic patterns using Hidden Markov Models

[...]

Andrej Ljolje¹, Frank Fallside¹•Institutions (1)

University of Cambridge¹

01 Mar 1987-Computer Speech & Language

TL;DR: The recognition results based on these models clearly show the ability of Hidden Markov Models to model some aspects of the underlying prosodic structure.

...read moreread less

Proceedings Article•

Explicit correlation in hidden Markov model for speech recognition

[...]

C. J. Wellekens

01 Jan 1987

Proceedings Article•DOI•

Continuous speech recognition by means of acoustic/ Phonetic classification obtained from a hidden Markov model

[...]

Stephen E. Levinson¹•Institutions (1)

Bell Labs¹

01 Jan 1987

TL;DR: An experimental continuous speech recognition system comprising procedures for acoustic/phonetic classification, lexical access and sentence retrieval and an experimental evaluation of the system, the parameters of an acoustic/Phonetic model were estimated from fluent utterances of 37 seven-digit numbers.

...read moreread less

Abstract: This paper describes an experimental continuous speech recognition system comprising procedures for acoustic/phonetic classification, lexical access and sentence retrieval. Speech is assumed to be composed of a small number of phonetic units which may be identified with the states of a hidden Markov model. The acoustic correlates of the phonetic units are then characterized by the observable Gaussian process associated with the corresponding state of the underlying Markov chain. Once the parameters of such a model are determined, a phonetic transcription of an utterance can be obtained by means of a Viterbi-like algorithm. Given a lexicon in which each entry is orthographically represented in terms of the chosen phonetic units, a word lattice is produced by a lexical access procedure. Lexical items whose orthography matches subsequences of the phonetic transcription are sought by means of a hash coding technique and their likelihoods are computed directly from the corresponding interval of acoustic measurements. The recognition process is completed by recovering from the word lattice, the string of words of maximum likelihood conditioned on the measurements. The desired string is derived by a best-first search algorithm. In an experimental evaluation of the system, the parameters of an acoustic/phonetic model were estimated from fluent utterances of 37 seven-digit numbers. A digit recognition rate of 96% was then observed on an independent test set of 59 utterances of the same form from the same speaker. Half of the observed errors resulted from insertions while deletions and substitutions accounted equally for the other half.

...read moreread less

Experiments in Isolated Digit Recognition Using the Multi-Layer Perceptron,

[...]

S. M. Peeling, Roger K. Moore

17 Dec 1987

TL;DR: The results show that the recognition accuracy obtained using the multi-layer perceptron is comparable with that from using hidden Markov modelling.

...read moreread less

Abstract: : The multi-layer perceptron is investigated as a new approach to the automatic recognition of spoken isolated digits. The choice of the parameters for the multi-layer perceptron is discussed and experimental results are reported. A comparison is made with established techniques such as dynamic time-warping and hidden Markov modelling applied to the same data. The results, for this particular task, show that the recognition accuracy obtained using the multi-layer perceptron is comparable with that from using hidden Markov modelling.

...read moreread less

Proceedings Article•DOI•

HMM-Based speech recognition using multi-dimensional multi-labeling

[...]

Masafumi Nishimura¹, K. Toshioka•Institutions (1)

IBM¹

01 Apr 1987

TL;DR: A new vector quantization (VQ; so-called labeling) method of a speech recognition system based on hidden Markov model (HMM) which generates multiple labels at each frame while keeping a conventional HMM formulation.

...read moreread less

Abstract: This paper describes a new vector quantization (VQ; so-called labeling) method of a speech recognition system based on hidden Markov model (HMM). For improving the VQ accuracy in a simple manner, "multi-labeling" which generates multiple labels at each frame was introduced while keeping a conventional HMM formulation. Furthermore, in order to represent characteristics of speech accurately and effectively, "multi-dimensional labeling" was also introduced which quantizes multiple features such as spectral dynamics and spectrum independently. This labeling method was tested in an isolated word recognition task using 150 Japanese confusable words. The recognition error rate was roughly reduced to 1/2 or less compared with the conventional method.

...read moreread less

Transient Analysis of Markov and Markov Reward Models.

[...]

Kishor S. Trivedi, Andrew L. Reibman, Roger M. H. Smith

01 Jan 1987

Proceedings Article•DOI•

Phoneme classification for real time speech recognition of Italian

[...]

Paolo D'Orta¹, Marco Ferretti¹, Stefano Scarci¹•Institutions (1)

IBM¹

06 Apr 1987

TL;DR: This work investigates methods based on the definition of a similarity measure of Hidden Markov Models of phonemes, and on the automatic identification of broad phonetic classes via clustering algorithms, that create classes of equivalence among words by means of a phoneme classification.

...read moreread less

Abstract: The development of large dictionary speech recognition systems requires the use of techniques aimed at limiting the search of the correct word to a subset of the vocabulary as small as possible. An approach to this problem is to create classes of equivalence among words by means of a phoneme classification. We investigate methods based on the definition of a similarity measure of Hidden Markov Models of phonemes, and on the automatic identification of broad phonetic classes via clustering algorithms. We discuss the obtained classifications, and their use in a real time speech recognition system for a 3000-word dictionary for Italian; results are compared to those achieved by knowledge based classifications.

...read moreread less

Proceedings Article•DOI•

The graph search machine (GSM): A programmable processor for connected word speech recognition and other applications

[...]

Stephen C. Glinski¹, T.M. Lalumia, D.R. Cassiday, T. Koh, C. Gerveshi, G.A. Wilson, J. Kumar - Show less +3 more•Institutions (1)

Bell Labs¹

01 Apr 1987

TL;DR: A programmable VLSI processor is described for efficiently computing a variety of kernel operations for speech recognition, which include dynamic programming for isolated and connected word recognition using both the template matching approach and the Hidden Markov Model approach.

...read moreread less

Abstract: A programmable VLSI processor is described for efficiently computing a variety of kernel operations for speech recognition. These operations include dynamic programming for isolated and connected word recognition using both the template matching approach and the Hidden Markov Model (HMM) approach, dynamic programming for natural language models, and metric computations for vector quantization and distance measurement. As well as being able to efficiently compute a wide class of speech processing operations, the architecture is useful in other areas such as image processing. Working chips have been produced using 1.5 µ CMOS design rules that combine both custom and standard cell approaches.

...read moreread less

Journal Article•

On adaptive control of Markov processes

[...]

Petr Mandl, M. Rosario Romera Ayllón

01 Jan 1987-Kybernetika

Proceedings Article•DOI•

Weighted cepstral distance measures in vector quantization based speech recognizers

[...]

Ted H. Applebaum, Brian A. Hanson, Hisashi Wakita

01 Apr 1987

TL;DR: Recognition experiments indicate that the performance of the weighted cepstral distance with vector quantized spectral data is considerably different from that previously reported for unquantized data.

...read moreread less

Abstract: This paper extends the use of weighted cepstral distance measures to speaker independent word recognizers based on vector quantization. Recognition results were obtained for two recognition methods: dynamic timewarping of vector codes and hidden Markov modeling. The experiments were carried out on a vocabulary of the ten digits and the word "oh". Two kinds of spectral analysis were considered: LPC, and a recently proposed, low dimensional, perceptually based representation (PLP). The effects of analysis order and varying degrees of quantization in the spectral representation were also considered. Recognition experiments indicate that the performance of the weighted cepstral distance with vector quantized spectral data is considerably different from that previously reported for unquantized data. Comparison of recognition rates shows wide variations due to interaction of the distance measure with the analysis technique and with vector quantization. The best recognition scores were obtained by the combination of weighted cepstral distance and low order PLP analysis. This combination maintained good recognition rates down to very low (16 or 8 codes) codebook sizes.

...read moreread less