Showing papers on "Hidden Markov model published in 1988"

PDF

Open Access

Proceedings Article•DOI•

[...]

A. Poritz¹•Institutions (1)

11 Apr 1988

TL;DR: The main tool in hidden Markov modeling is the Baum-Welch algorithm for maximum likelihood estimation of the model parameters, which is discussed both from an intuitive point of view as an exercise in the art of counting and from a formalpoint of view via the information-theoretic Q-function.

...read moreread less

Abstract: Hidden Markov modeling is a probabilistic technique for the study of time series. Hidden Markov theory permits modeling with any of the classical probability distributions. The costs of implementation are linear in the length of data. Models can be nested to reflect hierarchical sources of knowledge. These and other desirable features have made hidden Markov methods increasingly attractive for problems in language, speech and signal processing. The basic ideas are introduced by elementary examples in the spirit of the Polya urn models. The main tool in hidden Markov modeling is the Baum-Welch (or forward-backward) algorithm for maximum likelihood estimation of the model parameters. This iterative algorithm is discussed both from an intuitive point of view as an exercise in the art of counting and from a formal point of view via the information-theoretic Q-function. Selected examples drawn from the literature illustrate how the Baum-Welch technique places a rich variety of computational models at the disposal of the researcher. >

...read moreread less

276 citations

Proceedings Article•DOI•

Acoustic Markov models used in the Tangora speech recognition system

[...]

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, P.V. de Souza¹, Michael Picheny¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: An automatic technique for constructing Markov word models is described and results are included of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks.

...read moreread less

Abstract: The Speech Recognition Group at IBM Research has developed a real-time, isolated-word speech recognizer called Tangora, which accepts natural English sentences drawn from a vocabulary of 20000 words. Despite its large vocabulary, the Tangora recognizer requires only about 20 minutes of speech from each new user for training purposes. The accuracy of the system and its ease of training are largely attributable to the use of hidden Markov models in its acoustic match component. An automatic technique for constructing Markov word models is described and results are included of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks. >

...read moreread less

245 citations

Proceedings Article•DOI•

Speech recognition with continuous-parameter hidden Markov models

[...]

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, P.V. de Souza¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: The authors explore the trade-off between packing information into sequences of feature vectors and being able to model them accurately and investigate a method of parameter estimation which is designed to cope with inaccurate modeling assumptions.

...read moreread less

Abstract: The acoustic-modelling problem in automatic speech recognition is examined from an information theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is factored into two steps: a signal-processing step which converts a speech waveform into a sequence of informative acoustic feature vectors, and a step which models such a sequence. The authors are primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space. They explore the trade-off between packing information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous-parameter sequences is addressed by investigating a method of parameter estimation which is designed to cope with inaccurate modeling assumptions. >

...read moreread less

207 citations

Journal Article•DOI•

Hidden Markov model for Mandarin lexical tone recognition

[...]

Wu-Ji Yang¹, J.-C. Lee, Yueh-chin Chang, H.-C. Wang•Institutions (1)

National Tsing Hua University¹

01 Jul 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A case of lexical tone recognition for Mandarin speech is discussed using a combination of vector quantization and hidden Markov modelling techniques to convert the observation sequence into a symbol sequence for Hidden Markov modeling.

...read moreread less

Abstract: A case of lexical tone recognition for Mandarin speech is discussed using a combination of vector quantization and hidden Markov modelling techniques. The observation sequence was a sequence of vectorized parameters consisting of a logarithmic pitch interval and its first derivative. The vector quantization was applied to convert the observation sequence into a symbol sequence for Hidden Markov modeling. The speech database was provided by seven male and seven female college students, with each pronouncing 72 isolated monosyllabic utterances. A probabilistic model for each of the four tones was generated. A series of tonal recognition tests were then conducted to evaluate the effects of pitch reference base, codebook size, and tonal model topology. Future consideration of Mandarin speech recognition is also discussed. >

...read moreread less

191 citations

Proceedings Article•DOI•

Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models

[...]

Alex Waibel, Toshiyuki Hanazawa, Geoffrey E. Hinton¹, Kiyohiro Shikano, Kevin J. Lang² - Show less +1 more•Institutions (2)

University of Toronto¹, Carnegie Mellon University²

11 Apr 1988

TL;DR: A time-delay neural network for phoneme recognition that was able to invent without human interference meaningful linguistic abstractions in time and frequency such as formant tracking and segmentation and does not rely on precise alignment or segmentation of the input.

...read moreread less

Abstract: A time-delay neural network (TDNN) for phoneme recognition is discussed. By the use of two hidden layers in addition to an input and output layer it is capable of representing complex nonlinear decision surfaces. Three important properties of the TDNNs have been observed. First, it was able to invent without human interference meaningful linguistic abstractions in time and frequency such as formant tracking and segmentation. Second, it has learned to form alternate representations linking different acoustic events with the same higher level concept. In this fashion it can implement trading relations between lower level acoustic events leading to robust recognition performance despite considerable variability in the input speech. Third, the network is translation-invariant and does not rely on precise alignment or segmentation of the input. The TDNNs performance is compared with the best of hidden Markov models (HMMs) on a speaker-dependent phoneme-recognition task. The TDNN achieved a recognition of 98.5% compared to 93.7% for the HMM, i.e., a fourfold reduction in error. >

...read moreread less

166 citations

Patent•

Method and apparatus for capturing information in drawing or writing

[...]

Colin Sefton Hilton

18 Jan 1988

TL;DR: In this article, the authors used the Baum-Welch algorithm to calculate the probability that the vertical and horizontal components of the pen tip and the contacts between pen tip with the writing area could have been produced from the hidden Markov model.

...read moreread less

Abstract: A signature to be verified is written on an area which carries horizontal and vertical lines using a pen. As the lines are crossed signals representative of the vertical and horizontal components of the pen tip velocity and whether the tip is in contact with the area are passed to a computer. A hidden Markov model derived from vertical and horizontal velocities and a "contact" signal occurring as a number of authentic signatures are written is stored by the computer. A forward pass of the Baum-Welch algorithm is used to calculate the probability that the vertical and horizontal components of the pen tip and the contacts between pen tip and the writing area could have been produced from the hidden Markov model. This probability is used to decide whether the signature is authentic. The hidden Markov model stored by the computer is derived from an initial model based on pen tip velocities and contacts occurring in an authentic signature, and re-estimation carried out using forward and backward passes of the Baum-Welch algorithm and velocities and contacts from further authentic signatures.

...read moreread less

164 citations

Proceedings Article•DOI•

A segment model based approach to speech recognition

[...]

Chin-Hui Lee¹, F.K. Soong¹, Bing-Hwang Juang¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: The proposed segment model was tested on a speaker-trained, isolated word, speech recognition task with a vocabulary of 1109 basic English words and the average word recognition accuracy was 85% and increased to 96% and 98% for the top 3 and top 5 candidates, respectively.

...read moreread less

Abstract: Proposes a global acoustic segment model for characterizing fundamental speech sound units and their interactions based upon a general framework of hidden Markov models (HMM). Each segment model represents a class of acoustically similar sounds. The intra-segment variability of each sound class is modeled by an HMM, and the sound-to-sound transition rules are characterized by a probabilistic intersegment transition matrix. An acoustically-derived lexicon is used to construct word models based upon subword segment models. The proposed segment model was tested on a speaker-trained, isolated word, speech recognition task with a vocabulary of 1109 basic English words. In the current study, only 128 segment models were used, and recognition was performed by optimally aligning the test utterance with all acoustic lexicon entries using a maximum likelihood Viterbi decoding algorithm. Based upon a database of three male speakers, the average word recognition accuracy for the top candidate was 85% and increased to 96% and 98% for the top 3 and top 5 candidates, respectively. >

...read moreread less

132 citations

Proceedings Article•DOI•

A new algorithm for the estimation of hidden Markov model parameters

[...]

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, P.V. de Souza¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: The authors argue that maximum-likelihood estimation of the parameters does not lead to values which maximize recognition accuracy and describe an alternative estimation procedure called corrective training which is aimed at minimizing the number of recognition errors.

...read moreread less

Abstract: Discusses the problem of estimating the parameter values of hidden Markov word models for speech recognition. The authors argue that maximum-likelihood estimation of the parameters does not lead to values which maximize recognition accuracy and describe an alternative estimation procedure called corrective training which is aimed at minimizing the number of recognition errors. Corrective training is similar to a well-known error-correcting training procedure for linear classifiers and works by iteratively adjusting the parameter values so as to make correct words more probable and incorrect words less probable. There are also strong parallels between corrective training and maximum mutual information estimation. They do not prove that the corrective training algorithm converges, but experimental evidence suggests that it does, and that it leads to significantly fewer recognition errors than maximum likelihood estimation. >

...read moreread less

129 citations

Proceedings Article•DOI•

Large-vocabulary speaker-independent continuous speech recognition using HMM

[...]

Kai-Fu Lee¹, Hsiao-Wuen Hon¹•Institutions (1)

Carnegie Mellon University¹

11 Apr 1988

TL;DR: On a 997-word task using a bigram grammar, SPHINX achieved a word accuracy of 93%.

...read moreread less

Abstract: SPHINX, the first large-vocabulary speaker-independent continuous-speech recognizer is described. SPHINX is a hidden-Markov-model (HMM)-based recognizer using multiple codebooks of various LPC-derived features. Two types of HMMs are used in SPHINX: context-independent phone models and function-word-dependent phone models. On a 997-word task using a bigram grammar, SPHINX achieved a word accuracy of 93%. This demonstrates the feasibility of speaker-independent continuous-speech recognition, and the appropriateness of hidden Markov models for such a task. >

...read moreread less

127 citations

Proceedings Article•

Phoneme recognition: neural networks vs. hidden Markov models.

[...]

Alex Waibel, Toshiyuki Hanazawa, Geoffrey E. Hinton, Kiyohiro Shikano, Kevin J. Lang - Show less +1 more

01 Jan 1988

127 citations

Journal Article•DOI•

Statistical model-based speech enhancement systems

[...]

Yariv Ephraim

01 Nov 1988-Journal of the Acoustical Society of America

TL;DR: This paper deals with the problem of enhancing speech signals that have been degraded by statistically independent quasistationary noise and proposes maximum likelihood estimation solutions that are based upon the E–M algorithm and its derivatives.

...read moreread less

Abstract: This paper deals with the problem of enhancing speech signals that have been degraded by statistically independent quasistationary noise. The estimation of the clean speech waveform, and of the parameters of autoregressive (AR) models for the clean speech, given the noisy speech, is considered. The two problems are demonstrated to be closely related in the sense that a good solution to one of them can be used for achieving a satisfactory solution for the other. The difficulties in solving these estimation problems are mainly due to the lack of explicit knowledge of the statistics of the clean speech signal and of the noise process. Maximum likelihood estimation solutions that are based upon the E–M algorithm and its derivatives are proposed. For estimating the speech waveform, the statistics of the clean speech signal and of the noise process are first estimated by training a pair of Gaussian AR hidden Markov models, one for the clean speech and the other for the noise, using long training sequences from the two sources. Then, the speech waveform is reestimated by applying the E–M algorithm to the estimated statistics. An approximation to the E–M algorithm is interpreted as being an iterative procedure in which Wiener filtering and AR modeling are alternatively applied. The different algorithms considered here will be compared and demonstrated.

...read moreread less

Journal Article•DOI•

Cepstral domain talker stress compensation for robust speech recognition

[...]

Y. Chen¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1988-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A study of talker-stress-induced intraword variability and an algorithm that compensates for the systematic changes observed are presented and the functional form of the compensation is shown to correspond to the equalization of spectral tilts.

...read moreread less

Abstract: A study of talker-stress-induced intraword variability and an algorithm that compensates for the systematic changes observed are presented. The study is based on hidden Markov models trained by speech tokens spoken in various talking styles. The talking styles include normal speech, fast speech, loud speech, soft speech, and taking with noise injected through earphones; the styles are designed to simulate speech produced under real stressful conditions. Cepstral coefficients are used as the parameters in the hidden Markov models. The stress compensation algorithm compensates for the variations in the cepstral coefficients in a hypothesis-driven manner. The functional form of the compensation is shown to correspond to the equalization of spectral tilts. Substantial reduction of error rates has been achieved when the cepstral domain compensation techniques were tested on the simulated-stress speech database. The hypothesis-driven compensation technique reduced the average error rate from 13.9% to 6.2%. When a more sophisticated recognizer was used, it reduced the error rate from 2.5% to 1.9%. >

...read moreread less

Proceedings Article•DOI•

High performance connected digit recognition, using hidden Markov models

[...]

Lawrence R. Rabiner¹, Jay G. Wilpon¹, F.K. Soong¹•Institutions (1)

Bell Labs¹

11 Apr 1988

TL;DR: An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model-based connected digit recognizer is tested in speaker-trained, multispeaker, and speaker-independent modes.

...read moreread less

Abstract: Algorithms for connected-word recognition based on whole-word reference patterns have become increasingly sophisticated and have been shown capable of achieving high recognition performance for small or syntax-constrained moderate-size vocabularies in a speaker-trained mode. An enhanced analysis feature set consisting of both instantaneous and transitional spectral information is used and the hidden-Markov-model-based connected digit recognizer is tested in speaker-trained, multispeaker, and speaker-independent modes. The performance achieved was 0.35, 1.65 and 1.75% string error rates, respectively, for known length strings and 0.78, 2.85 and 2.94% string error rates, respectively, for unknown length strings. >

...read moreread less

Proceedings Article•DOI•

Recognition of handwritten word: first and second order hidden Markov model based approach

[...]

A. Kundu¹, Yang He¹, Paramvir Bahl¹•Institutions (1)

State University of New York System¹

05 Jun 1988

TL;DR: The handwritten word recognition problem is modeled in the framework of the hidden Markov model (HMM) and the Viterbi algorithm is used to recognize the sequence of letters consisting the word.

...read moreread less

Abstract: The handwritten word recognition problem is modeled in the framework of the hidden Markov model (HMM). The states of HMM are identified with the letters of the alphabet. The optimum symbols are then generated experimentally using 15 different features. Both the first- and second-order HMMs are proposed for the recognition tasks. Using the existing statistical knowledge of English, the calculation scheme of the model parameters are immensely simplified. Once the model is established, the Viterbi algorithm is used to recognize the sequence of letters consisting the word. Some experimental results are also provided indicating the success of the scheme. >

...read moreread less

Journal Article•DOI•

Statistical discrimination of fractal and Markov models of single-channel gating.

[...]

S.J. Korn¹, R. Horn¹•Institutions (1)

Roche Institute of Molecular Biology¹

01 Nov 1988-Biophysical Journal

TL;DR: A more detailed analysis showed that the Markov model was not significantly better than the fractal model for the corneal endothelium channels, and the inability to discriminate the models definitively in this case was shown to be due in part to the small size of the data set.

...read moreread less

Proceedings Article•DOI•

Speech recognition using noise-adaptive prototypes

[...]

A. Nadas¹, David Nahamoo¹, Michael Picheny¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: A probabilistic mixture model is described for a frame (the short-term spectrum) of each component of each to be used in speech recognition, which model the energy as the larger of the separate energies of signal and noise in the band.

...read moreread less

Abstract: A probabilistic mixture model is described for a frame (the short-term spectrum) of each to be used in speech recognition. Each component of the mixture is regarded as a prototype for the labeling phase of a hidden Markov model based speech recognition system. Since the ambient noise during recognition can differ from the ambient noise present in the training data, the model is designed for convenient updating in changing noise. Based on the observation that the energy in a frequency band is at any fixed time dominated either by signal energy or by noise energy, the authors model the energy as the larger of the separate energies of signal and noise in the band. Statistical algorithms are given for training this as a hidden variables model. The hidden variables are the prototype identities and the separate signal and noise components. A series of speech recognition experiments that successfully utilize this model is also discussed. >

...read moreread less

Proceedings Article•DOI•

Recognition of handwritten script: a hidden Markov model based approach

[...]

A. Kundu¹, P. Bahl¹•Institutions (1)

State University of New York System¹

11 Apr 1988

TL;DR: The handwritten script recognition problem is modeled in the framework of the hidden Markov model, and the Viterbi algorithm is proposed to recognize the single best optimal state sequence, i.e. sequence of letters comprising the word.

...read moreread less

Abstract: The handwritten script recognition problem is modeled in the framework of the hidden Markov model. For English text, which is the focus of the present research, the states can be identified with the letters of the alphabet, and the optimum symbols can be generated. In order to do so, a quantitative definition of symbols, in terms of features, is required. Fourteen features (some old, some new) are proposed for this task. Using the existing statistical knowledge about the English language, the calculation of the model parameters is immensely simplified. Once the model is established, the Viterbi algorithm is proposed to recognize the single best optimal state sequence, i.e. sequence of letters comprising the word. The modification of the recognition algorithm to accommodate context information is also discussed. Some experimental results are provided indicating the success of the new scheme. >

...read moreread less

Proceedings Article•DOI•

Speech recognition and the frequency of recently used words: a modified Markov model for natural language

[...]

Roland Kuhn¹•Institutions (1)

McGill University¹

22 Aug 1988

TL;DR: A modification of the Markov approach, which assigns higher probabilities to recently used words, is proposed and tested against a pure Markov model.

...read moreread less

Abstract: Speech recognition systems incorporate a language model which, at each stage of the recognition task, assigns a probability of occurrence to each word in the vocabulary. A class of Markov language models identified by Jelinek has achieved considerable success in this domain. A modification of the Markov approach, which assigns higher probabilities to recently used words, is proposed and tested against a pure Markov model. Parameter calculation and comparison of the two models both involve use of the LOB Corpus of tagged modern English.

...read moreread less

Proceedings Article•DOI•

Phonetic recognition using hidden Markov models and maximum mutual information training

[...]

Bernard Merialdo¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: The application of maximum-mutual-information (MMI) training to hidden Markov models (HMMs) is studied for phonetic recognition and shows that the phonetic error rate decreases significantly when MMI training is used, as compared with ML training.

...read moreread less

Abstract: The application of maximum-mutual-information (MMI) training to hidden Markov models (HMMs) is studied for phonetic recognition. MMI training has been proposed as an alternative to standard maximum-likelihood (ML) training. In practice, MMI training performs better (produces models that are more accurate) than ML training. The fundamental notions of HMM, ML and MMI training are reviewed, and it is shown how MMI training can be applied easily to the case of phonetic models and phonetic recognition. Some computational heuristics are proposed to implement these computations practically. Some experiments (training and recognition) are detailed that show that the phonetic error rate decreases significantly when MMI training is used, as compared with ML training. >

...read moreread less

Journal Article•DOI•

Fast search strategy in a large vocabulary word recognizer

[...]

Vishwa Gupta, Matthew Lennig, Paul Mermelstein

01 Dec 1988-Journal of the Acoustical Society of America

TL;DR: A fast search algorithm is presented for generating word hypotheses for a 75 000‐word vocabulary, speaker‐trained, isolated word recognizer as the first pass of a total recognition system generating a small number of hypotheses with rough likelihood estimates, to be followed by more detailed hypothesis evaluation.

...read moreread less

Abstract: In this article, a fast search algorithm is presented for generating word hypotheses for a 75 000‐word vocabulary, speaker‐trained, isolated word recognizer The algorithm is envisioned as the first pass of a total recognition system generating a small number of hypotheses with rough likelihood estimates, to be followed by more detailed hypothesis evaluation The possible word choices are restricted by estimating the number of syllables in the unknown word using a hidden Markov model (HMM) for syllables A heuristic search algorithm then searches through a sequence of syllable networks to find the most likely word candidates Arcs in the syllable network correspond to phonemes The assumption that the likelihoods of these phoneme arcs are independent of the phonetic context allows us to convert the search through a large tree into a search through a much smaller network or graph The computational requirements are reduced by roughly a factor of 70 compared to estimating the exact likelihood scores for the

...read moreread less

Proceedings Article•DOI•

Phoneme modelling using continuous mixture densities

[...]

Hermann Ney¹, A. Noll¹•Institutions (1)

Philips¹

11 Apr 1988

TL;DR: It is shown that the advantage of continuous mixture densities is the ability to lead to parameter estimates that are accurate and at the same time robust with respect to the limited amount of training data.

...read moreread less

Abstract: Deals with the use of continuous mixture densities for phenome modelling in large vocabulary continuous speech recognition. The concept of continuous mixture densities is applied to the emission probability density functions of hidden Markov models for phonemes in order to take into account phonetic-context dependencies. It is shown that the advantage of continuous mixture densities is the ability to lead to parameter estimates that are accurate and at the same time robust with respect to the limited amount of training data. Training and recognition algorithms for mixture densities in the framework of phoneme modelling are described. Recognition results for a 917-word task, requiring only 7 min of speech for training and an overlap of 43 words between training vocabulary and test vocabulary, are presented. >

...read moreread less

Proceedings Article•DOI•

Extended Viterbi algorithm for second order hidden Markov process

[...]

Yang He¹•Institutions (1)

State University of New York System¹

14 Nov 1988

TL;DR: An extended Viterbi algorithm is presented that gives a maximum a posteriori estimation of the second-order hidden Markov process that is compared with those of the original first-order one.

...read moreread less

Abstract: An extended Viterbi algorithm is presented that gives a maximum a posteriori estimation of the second-order hidden Markov process. The advantage of the second-order model and the complexity of the extended algorithm are compared with those of the original first-order one. The method used to develop the extended algorithm can also be used to extend the Viterbi algorithm further to any higher order. >

...read moreread less

Phoneme Recognition: Neural Networks vs

[...]

Alex Waibel, Hanazawa G. Hinton, IC Shikano Ic

01 Jan 1988

TL;DR: It is shown that the TDNN "invented" well-known acoustic-phonetic features and the temporal relationships between them are indeendent of position in time and hence not blurred by temporal shifts in the input.

...read moreread less

Abstract: neme recognition which is characterized by two important properties: 1.) Using a 3 layer arrangement of simple computing units, it can represent arbitrary nonlinear decision surfaces. The TDNN learns these decision surfaces automatically using error back-propagatioii[l]. 2.) he time-delay arrangement enables the network to discover acoustichonetic features and the temporal relationships between them indeendent of position in time and hence not blurred by temporal shifts in the input. For comparison, several discrete Hidden Markov Models (HMM) were trained to perform the same task, i.e., the speakerdependent recognition of the phonemes "B", "D", and "G" extracted We show that the TDNN "invented" well-known acoustic-phonetic

...read moreread less

Proceedings Article•DOI•

Text-dependent speaker identification using circular hidden Markov models

[...]

Y.-C. Zheng¹, Baozong Yuan¹•Institutions (1)

Beijing Jiaotong University¹

11 Apr 1988

TL;DR: For each person, a distinct reference CHMM is produced using the Baum forward-and-backward algorithm and classification is carried out by selecting the model with the highest probability as the speaker identification system output.

...read moreread less

Abstract: A brief overview of hidden Markov models is given. The properties of circular hidden Markov models and their application to speaker recognition are discussed. For each person, a distinct reference CHMM is produced using the Baum forward-and-backward algorithm. Classification is carried out by selecting the model with the highest probability as the speaker identification system output. Preliminary testing on a set of ten speakers indicates a performance of about 94% speaker recognition accuracy. >

...read moreread less

Proceedings Article•DOI•

Obtaining candidate words by polling in a large vocabulary speech recognition system

[...]

R.L. Bahl¹, Raimo Bakis¹, P.V. de Souza¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

11 Apr 1988

TL;DR: Polling is advocated, in which each label produced by the vector quantizer casts a varying, real-valed vote for each word in the vocabulary, and Expressions are derived for these votes under the assumption that for any given word, the observed label frequencies have Poisson distributions.

...read moreread less

Abstract: Considers the problem of rapidly obtaining a short list of candidate words for more detailed inspection in a large vocabulary, vector-quantizing speech recognition system. An approach called polling is advocated, in which each label produced by the vector quantizer casts a varying, real-valed vote for each word in the vocabulary. The words receiving the highest votes are placed on a short list to be matched in detail at a later stage of processing. Expressions are derived for these votes under the assumption that for any given word, the observed label frequencies have Poisson distributions. Although the method is more general, particular attention is paid to the implementation of polling in speech recognition systems which use hidden Markov models during the acoustic match computation. Results are presented of experiments with speaker-dependent and speaker-independent Markov models on two different isolated word recognition tasks. >

...read moreread less

Proceedings Article•

Links Between Markov Models and Multilayer Perceptrons

[...]

Hervé Bourlard¹, C. Wellekens¹•Institutions (1)

Philips¹

01 Jan 1988

TL;DR: In this paper, a discriminant hidden Markov model is defined and it is shown how a particular multilayer perceptron with contextual and extra feedback input units can be considered as a general form of such Markov models.

...read moreread less

Abstract: Hidden Markov models are widely used for automatic speech recognition. They inherently incorporate the sequential character of the speech signal and are statistically trained. However, the apriori choice of the model topology limits their flexibility. Another drawback of these models is their weak discriminating power. Multilayer perceptrons are now promising tools in the connectionist approach for classification problems and have already been successfully tested on speech recognition problems. However, the sequential nature of the speech signal remains difficult to handle in that kind of machine. In this paper, a discriminant hidden Markov model is defined and it is shown how a particular multilayer perceptron with contextual and extra feedback input units can be considered as a general form of such Markov models.

...read moreread less

Proceedings Article•DOI•

Noise compensation algorithms for use with hidden Markov model based speech recognition

[...]

Andrew Varga, Roger K. Moore, John Scott Bridle, K. M. Ponting, M.J. Russel - Show less +1 more

11 Apr 1988

TL;DR: The principles of recognition in noise are discussed from an implementation point of view and it is shown how the three techniques can be viewed as variations on a single theme.

...read moreread less

Abstract: A preliminary theoretical and experimental examination is made of three noise compensation techniques. The three techniques are those due to: Klatt (1976); Bridle et al (1984); and Holmes & Sedgwick (1986). The first two of these techniques have been re-interpreted for use within a hidden Markov model based recogniser. A description is given of how this was done, together with a discussion on some implementation considerations. Experimental results are given for the performance of the algorithms at various signal-to-noise ratios. The principles of recognition in noise are discussed from an implementation point of view and it is shown how the three techniques can be viewed as variations on a single theme. >

...read moreread less

Proceedings Article•DOI•

A neural net approach to speech recognition

[...]

Weiyu Huang¹, Richard P. Lippmann¹, B. Gold¹•Institutions (1)

Massachusetts Institute of Technology¹

11 Apr 1988

TL;DR: The Viterbi net as mentioned in this paper is a neural network implementation of the hidden Markov models (HMMs) used very effectively in recognition systems based on Hidden Markov Models (HMM).

...read moreread less

Abstract: Artificial neural networks are of interest because algorithms used in many speech recognizers can be implemented using highly parallel neural net architectures and because new parallel algorithms are being development that are inspired by biological nervous systems. Some neural net approaches are resented for the problem of static pattern classification and time alignment. For static pattern classification, multi-layer perceptron classifiers trained with back propagation can form arbitrary decision regions, are robust, and train rapidly for convex decision regions. For time alignment, the Viterbi net is a neural net implementation of the Viterbi decoder used very effectively in recognition systems based on hidden Markov models (HMMs). >

...read moreread less

Journal Article•DOI•

Speaker-Independent English Consonant and Japanese Word Recognition by a Stochastic Dynamic Time Warping Method

[...]

Seiichi Nakagawa¹, Hirobumi Nakanishi¹•Institutions (1)

Toyohashi University of Technology¹

01 Jan 1988-Iete Journal of Research

TL;DR: A stochastic dynamic time warping method for speaker- independent recognition is proposed and some considerations are described on speaker-independent consonant recognition and word recognition on a large vocabulary size.

...read moreread less

Abstract: In this paper, a stochastic dynamic time warping method for speaker-independent recognition is proposed and some considerations are described on speaker-independent consonant recognition and word recognition on a large vocabulary size. In this method, conditional probabilities were used instead of local distances in a standard dynamic time warping method, and transition probabilities instead of path costs. This is related to both the standard DTW method and the hidden Markov model. In word recognition, the whole word templates are constructed by the concatenation of syllable templates, which are taken from spoken words. And, we got the reference patterns from 216 words uttered by 30 male speakers and recognized the other 200 words uttered by the other 10 speakers. The standard dynamic time warping method for speaker-independent recognition on 200 words gave the average word recognition rate of 89.3%. The stochastic dynamic time warping method we proposed here improved the recognition rate to 92.9%.

...read moreread less

Proceedings Article•DOI•

On the application of hidden Markov models for enhancing noisy speech

[...]

Yariv Ephraim¹, David Malah², Biing-Hwang Juang³•Institutions (3)

Bell Labs¹, Technion – Israel Institute of Technology², AT&T³

11 Apr 1988

TL;DR: An algorithm is proposed for enhancing noisy speech which has been degraded by statistically independent additive noise by first estimating the most probable sequence of AR models for the speech signal using the Viterbi algorithm, and then applying theseAR models for constructing a sequence of Wiener filters which are used to enhance the noisy speech.

...read moreread less

Abstract: An algorithm is proposed for enhancing noisy speech which has been degraded by statistically independent additive noise. The algorithm is based on modeling the clean speech as a hidden Markov process with mixtures of Gaussian autoregressive (AR) output processes and modeling the noise as a sequence of stationary, statistically independent, Gaussian AR vectors. The parameter sets of the models are estimated using training sequences from the clean speech and the noise process. The parameter set of the hidden Markov model is estimated by the segmental k-means algorithm. Given the estimated models, the enhancement of the noisy speech is done by alternate maximization of the likelihood function of the noisy speech, one over all sequences of states and mixture components assuming that the clean speech signal is given, and then over all vectors of the original speech using the resulting most probable sequence of states and mixture components. This alternating maximization is equivalent to first estimating the most probable sequence of AR models for the speech signal using the Viterbi algorithm, and then applying these AR models for constructing a sequence of Wiener filters which are used to enhance the noisy speech. >

...read moreread less