Showing papers on "Hidden Markov model published in 1990"

PDF

Open Access

Journal Article•DOI•

A time-delay neural network architecture for isolated word recognition

[...]

Kevin J. Lang¹, Alex Waibel¹, Geoffrey E. Hinton²•Institutions (2)

Carnegie Mellon University¹, University of Toronto²

01 Jan 1990-Neural Networks

TL;DR: A translation-invariant back-propagation network is described that performs better than a sophisticated continuous acoustic parameter hidden Markov model on a noisy, 100-speaker confusable vocabulary isolated word recognition task.

...read moreread less

635 citations

Proceedings Article•DOI•

Hidden Markov model decomposition of speech and noise

[...]

Andrew Varga, Roger K. Moore

03 Apr 1990

TL;DR: A technique of signal decomposition using hidden Markov models is described that provides an optimal method of decomposing simultaneous processes and has wide implications for signal separation in general and improved speech modeling in particular.

...read moreread less

Abstract: The problem of automatic speech recognition in the presence of interfering signals and noise with statistical characteristics ranging from stationary to fast changing and impulsive is discussed. A technique of signal decomposition using hidden Markov models is described. This is a generalization of conventional hidden Markov modeling that provides an optimal method of decomposing simultaneous processes. The technique exploits the ability of hidden Markov models to model dynamically varying signals in order to accommodate concurrent processes, including interfering signals as complex as speech. This form of signal decomposition has wide implications for signal separation in general and improved speech modeling in particular. The application of decomposition to the problem of recognition of speech contaminated with noise is emphasized. >

...read moreread less

530 citations

Journal Article•DOI•

An approach to cardiac arrhythmia analysis using hidden Markov models

[...]

D.A. Coast, Richard M. Stern, G.G. Cano, S.A. Briller

01 Sep 1990-IEEE Transactions on Biomedical Engineering

TL;DR: A new approach to ECG arrhythmia analysis is described, based on hidden Markov modeling (HMM), a technique successfully used since the mid 1970s to model speech waveforms for automatic speech recognition.

...read moreread less

Abstract: A new approach to ECG arrhythmia analysis is described. It is based on hidden Markov modeling (HMM), a technique successfully used since the mid 1970s to model speech waveforms for automatic speech recognition. Many ventricular arrhythmias can be classified by detecting and analyzing QRS complexes and determining R-R intervals. Classification of supraventricular arrhythmias, however, often requires detection of the P wave in addition to the QRS complex. The HMM approach combines structural and statistical knowledge of the ECG signal in a single parametric model. Model parameters are estimated from training data using an iterative, maximum-likelihood reestimation algorithm. Initial results suggest that this approach can provide improved supraventricular arrhythmia analysis through accurate representation of the entire beat, including the P-wave. >

...read moreread less

527 citations

Proceedings Article•DOI•

A hidden Markov model based keyword recognition system

[...]

Richard Rose¹, D.B. Paul¹•Institutions (1)

Massachusetts Institute of Technology¹

03 Apr 1990

TL;DR: A speaker-independent hidden Markov model (HMM) keyword recognizer (KWR) based on a continuous-speech-recognition model is presented and techniques for dealing with nonkeyword speech and linear channel effects are discussed.

...read moreread less

Abstract: A speaker-independent hidden Markov model (HMM) keyword recognizer (KWR) based on a continuous-speech-recognition model is presented. The baseline keyword recognition system is described, and techniques for dealing with nonkeyword speech and linear channel effects are discussed. The training of acoustic models to provide an explicit representation of nonvocabulary speech is investigated. A likelihood ratio scoring procedure is used to account for sources of variability affecting keyword likelihood scores. An acoustic class-dependent spectral normalization procedure is used to provide explicit compensation for linear channel effects. Keyword recognition results for a standard conversational speech task with a 20-keyword vocabulary reach 82% probability of detection at a false alarm rate of 12 false alarms per keyword per hour. >

...read moreread less

498 citations

Journal Article•DOI•

An overview of the SPHINX speech recognition system

[...]

Kai-Fu Lee¹, H.-W. Hon¹, Raj Reddy¹•Institutions (1)

Carnegie Mellon University¹

01 May 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: SPHINX is a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition, based on discrete hidden Markov models with LPC- (linear-predictive-coding) derived parameters.

...read moreread less

Abstract: A description is given of SPHINX, a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition. SPHINX is based on discrete hidden Markov models (HMMs) with LPC- (linear-predictive-coding) derived parameters. To provide speaker independence, knowledge was added to these HMMs in several ways: multiple codebooks of fixed-width parameters, and an enhanced recognizer with carefully designed models and word-duration modeling. To deal with coarticulation in continuous speech, yet still adequately represent a large vocabulary, two new subword speech units are introduced: function-word-dependent phone models and generalized triphone models. With grammars of perplexity 997, 60, and 20, SPHINX attained word accuracies of 71, 94, and 96%, respectively, on a 997-word task. >

...read moreread less

487 citations

Journal Article•DOI•

The segmental K-means algorithm for estimating parameters of hidden Markov models

[...]

Biing-Hwang Juang¹, Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Sep 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The authors discuss and document a parameter estimation algorithm for data sequence modeling involving hidden Markov models that uses the state-optimized joint likelihood for the observation data and the underlying Markovian state sequence as the objective function for estimation.

...read moreread less

Abstract: The authors discuss and document a parameter estimation algorithm for data sequence modeling involving hidden Markov models. The algorithm, called the segmental K-means method, uses the state-optimized joint likelihood for the observation data and the underlying Markovian state sequence as the objective function for estimation. The authors prove the convergence of the algorithm and compare it with the traditional Baum-Welch reestimation method. They also print out the increased flexibility this algorithm offers in the general speech modeling framework. >

...read moreread less

473 citations

Journal Article•DOI•

Automatic recognition of keywords in unconstrained speech using hidden Markov models

[...]

Jay G. Wilpon¹, Lawrence R. Rabiner¹, Chin-Hui Lee¹, E.R. Goldman¹•Institutions (1)

Bell Labs¹

01 Nov 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: The modifications made to a connected word speech recognition algorithm based on hidden Markov models which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described.

...read moreread less

Abstract: The modifications made to a connected word speech recognition algorithm based on hidden Markov models (HMMs) which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described. The novelty of this approach is that statistical models of both the actual vocabulary word and the extraneous speech and background are created. An HMM-based connected word recognition system is then used to find the best sequence of background, extraneous speech, and vocabulary word models for matching the actual input. Word recognition accuracy of 99.3% on purely isolated speech (i.e., only vocabulary items and background noise were present), and 95.1% when the vocabulary word was embedded in unconstrained extraneous speech, were obtained for the five word vocabulary using the proposed recognition algorithm. >

...read moreread less

472 citations

Journal Article•DOI•

Links between Markov models and multilayer perceptrons

[...]

Hervé Bourlard¹, C. Wellekens¹•Institutions (1)

Philips¹

01 Dec 1990-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is shown theoretically and experimentally that the outputs of the MLP approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities.

...read moreread less

Abstract: The statistical use of a particular classic form of a connectionist system, the multilayer perceptron (MLP), is described in the context of the recognition of continuous speech. A discriminant hidden Markov model (HMM) is defined, and it is shown how a particular MLP with contextual and extra feedback input units can be considered as a general form of such a Markov model. A link between these discriminant HMMs, trained along the Viterbi algorithm, and any other approach based on least mean square minimization of an error function (LMSE) is established. It is shown theoretically and experimentally that the outputs of the MLP (when trained along the LMSE or the entropy criterion) approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities. Results of a series of speech recognition experiments are reported. The possibility of embedding MLP into HMM is described. Relations with other recurrent networks are also explained. >

...read moreread less

400 citations

Journal Article•DOI•

Continuous speech recognition using hidden Markov models

[...]

Joseph Picone

01 Jul 1990-IEEE Assp Magazine

TL;DR: The use of hidden Markov models (HMMs) in continuous speech recognition is reviewed and a unified view is offered in which both linguistic decoding and acoustic matching are integrated into a single, optimal network search framework.

...read moreread less

Abstract: The use of hidden Markov models (HMMs) in continuous speech recognition is reviewed. Markov models are presented as a generalization of their predecessor technology, dynamic programming. A unified view is offered in which both linguistic decoding and acoustic matching are integrated into a single, optimal network search framework. Advances in recognition architectures are discussed. The fundamentals of Viterbi beam search, the dominant search algorithm used today in speed recognition, are presented. Approaches to estimating the probabilities associated with an HMM model are examined. The HMM-supervised training paradigm is examined. Several examples of successful HMM-based speech recognition systems are reviewed. >

...read moreread less

321 citations

Journal Article•DOI•

Tied mixture continuous parameter modeling for speech recognition

[...]

Jerome R. Bellegarda¹, David Nahamoo¹•Institutions (1)

IBM¹

01 Dec 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A class of very general hidden Markov models which can accommodate feature vector sequences lying either in a discrete or in a continuous space is considered; the new class allows one to represent the prototypes in an assumption-limited, yet convenient way, as tied mixtures of simple multivariate densities.

...read moreread less

Abstract: The acoustic-modeling problem in automatic speech recognition is examined with the goal of unifying discrete and continuous parameter approaches. To model a sequence of information-bearing acoustic feature vectors which has been extracted from the speech waveform via some appropriate front-end signal processing, a speech recognizer basically faces two alternatives: (1) assign a multivariate probability distribution directly to the stream of vectors, or (2) use a time-synchronous labeling acoustic processor to perform vector quantization on this stream, and assign a multinomial probability distribution to the output of the vector quantizer. With a few exceptions, these two methods have traditionally been given separate treatment. A class of very general hidden Markov models which can accommodate feature vector sequences lying either in a discrete or in a continuous space is considered; the new class allows one to represent the prototypes in an assumption-limited, yet convenient way, as tied mixtures of simple multivariate densities. Speech recognition experiments, reported for two (5000- and 20000-word vocabulary) office correspondence tasks, demonstrate some of the benefits associated with this technique. >

...read moreread less

285 citations

Book Chapter•DOI•

Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition

[...]

Kai-Fu Lee¹•Institutions (1)

Carnegie Mellon University¹

01 May 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure.

...read moreread less

Abstract: Context-dependent phone models are applied to speaker-independent continuous speech recognition and shown to be effective in this domain. Several previously proposed context-dependent models are evaluated, and two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure. The subword clustering procedure used for generalized triphones can find the optimal number of models, given a fixed amount of training data. It is shown that context-dependent modeling reduces the error rate by as much as 60%. >

...read moreread less

Journal Article•DOI•

Semi-continuous hidden Markov models for speech signals

[...]

Xuedong Huang¹, Mervyn Jack¹•Institutions (1)

University of Edinburgh¹

01 May 1990-Computer Speech & Language

TL;DR: In this article, a semi-continuous hidden Markov model with the continuous output probability density functions sharing in a mixture Gaussian density codebook is proposed, which can be considered as a special form of continuous mixture HMM model.

...read moreread less

Book Chapter•DOI•

Global Convergence of Genetic Algorithms: A Markov Chain Analysis

[...]

Agoston E. Eiben¹, Emile H. L. Aarts¹, Emile H. L. Aarts², Kees M. van Hee¹•Institutions (2)

Eindhoven University of Technology¹, Philips²

01 Oct 1990

TL;DR: In this article, an abstract stochastic algorithm for combinatorial optimization problems is proposed, which generalizes and unifies genetic algorithms and simulated annealing, such that any GA or SA algorithm at hand is an instance of the abstract algorithm.

...read moreread less

Abstract: In this paper we are trying to make a step towards a concise theory of genetic algorithms (GAs) and simulated annealing (SA). First, we set up an abstract stochastic algorithm for treating combinatorial optimization problems. This algorithm generalizes and unifies genetic algorithms and simulated annealing, such that any GA or SA algorithm at hand is an instance of our abstract algorithm. Secondly, we define the evolution belonging to the abstract algorithm as a Markov chain and find conditions implying that the evolution finds an optimum with probability 1. The results obtained can be applied when designing the components of a genetic algorithm.

...read moreread less

Patent•DOI•

Efficient pruning algorithm for hidden markov model speech recognition

[...]

George R. Doddington¹, Basavaraj I. Pawate¹•Institutions (1)

Texas Instruments¹

04 Apr 1990-Journal of the Acoustical Society of America

TL;DR: In this article, a method for efficient pruning which reduces central processing unit loading during real-time speech recognition was proposed. But this back-propagation increases CPU loading and is alleviated by referring the backpointer of a state within a model to its start state.

...read moreread less

Abstract: A method for efficient pruning which reduces central processing unit loading during real time speech recognition. A CPU uses a predetermined threshold for discarding not useful or necessary information. Useful information is stored in an available scoring buffer slot. A slot is said to be available if its last-time field does not equal the current time-index. To prevent pruning of a slot in the best path, the current time-index has to be propagated to all slots in the best path. This back-propagation increases CPU loading and is alleviated by referring the backpointer of a state within a model to its start state.

...read moreread less

Proceedings Article•DOI•

Continuous speech recognition using multilayer perceptrons with hidden Markov models

[...]

Nelson Morgan, Hervé Bourlard

03 Apr 1990

TL;DR: A phoneme based, speaker-dependent continuous-speech recognition system embedding a multilayer perceptron (MLP) into a hidden Markov model (HMM) approach is described, which appears to be somewhat better when MLP methods are used to estimate the probabilities.

...read moreread less

Abstract: A phoneme based, speaker-dependent continuous-speech recognition system embedding a multilayer perceptron (MLP) (i.e. a feedforward artificial neural network) into a hidden Markov model (HMM) approach is described. Contextual information from a sliding window on the input frames is used to improve frame or phoneme classification performance over the corresponding performance for simple maximum-likelihood probabilities, or even maximum a posteriori (MAP) probabilities which are estimated without the benefit of context. Performance for a simple discrete density HMM system appears to be somewhat better when MLP methods are used to estimate the probabilities. >

...read moreread less

Proceedings Article•DOI•

Integrating natural language constraints into HMM-based speech recognition

[...]

Hy Murveit, Robert C. Moore

03 Apr 1990

TL;DR: An approach to implementing spoken language systems that takes full advantage of syntactic and semantic constraints provided by a natural language processing component in the speed understanding task and provides a tractable search space is discussed.

...read moreread less

Abstract: An approach to implementing spoken language systems is discussed. This approach takes full advantage of syntactic and semantic constraints provided by a natural language processing component in the speed understanding task and provides a tractable search space. The results indicate that the approach is a promising one for large-vocabulary spoken language systems. Parse times within a factor of 20 of real time are achieved for high-perplexity syntactic grammars with resulting hidden Markov model recognition computational requirements (2500 active words/frame) that are well within the capability of high-speed multiprocessor computers or special-purpose speech recognition hardware. >

...read moreread less

Journal Article•DOI•

Characterization of single channel currents using digital signal processing techniques based on Hidden Markov Models.

[...]

Shin-Ho Chung¹, John B. Moore¹, Lige Xia¹, L. S. Premkumar¹, Peter W. Gage¹ - Show less +1 more•Institutions (1)

Australian National University¹

29 Sep 1990-Philosophical Transactions of the Royal Society B

TL;DR: In this article, the authors used a first-order, finite-state, discrete-time Markov process to extract small, single channel ion currents from background noise, which can be used to detect signals that do not conform to a firstorder Markov model, but the method is less accurate when the background noise is not white.

...read moreread less

Abstract: Techniques for extracting small, single channel ion currents from background noise are described and tested. It is assumed that single channel currents are generated by a first-order, finite-state, discrete-time, Markov process to which is added `white' background noise from the recording apparatus (electrode, amplifiers, etc.). Given the observations and the statistics of the background noise, the techniques described here yield a posteriori estimates of the most likely signal statistics, including the Markov model state transition probabilities, duration (open- and closed-time) probabilities, histograms, signal levels, and the most likely state sequence. Using variations of several algorithms previously developed for solving digital estimation problems, we have demonstrated that: (1) artificial, small, first-order, finite-state, Markov model signals embedded in simulated noise can be extracted with a high degree of accuracy, (2) processing can detect signals that do not conform to a first-order Markov model but the method is less accurate when the background noise is not white, and (3) the techniques can be used to extract from the baseline noise single channel currents in neuronal membranes. Some studies have been included to test the validity of assuming a first-order Markov model for biological signals. This method can be used to obtain directly from digitized data, channel characteristics such as amplitude distributions, transition matrices and open- and closed-time durations.

...read moreread less

Proceedings Article•DOI•

Speaker adaptation from a speaker-independent training corpus

[...]

Francis Kubala, Richard Schwartz, C. Barry

03 Apr 1990

TL;DR: A technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large-vocabulary continuous-speech recognition is introduced, and the usual probabilistic spectrum transformation can be applied to the reference HMM to model a new speaker.

...read moreread less

Abstract: A technique for using the speech of multiple reference speakers as a basis for speaker adaptation in large-vocabulary continuous-speech recognition is introduced. In contrast to other methods that use a pooled reference model, this technique normalizes the training speech from multiple reference speakers to a single common feature space before pooling it. The normalized and pooled speech is then treated as if it came from a single reference speaker for training the reference hidden Markov model (HMM). The usual probabilistic spectrum transformation can be applied to the reference HMM to model a new speaker. Preliminary experimental results are reported from applying this approach to over 100 reference speakers from the speaker-independent portion of the DARPA 1000-Word Resource Management Database. >

...read moreread less

Proceedings Article•DOI•

A real-time Mandarin dictation machine for Chinese language with unlimited texts and very large vocabulary

[...]

Lin-Shan Lee¹, Chiu-yu Tseng, H.Y. Gu, F.H. Liu, C.H. Chang, S.H. Hsieh, C.H. Chen - Show less +3 more•Institutions (1)

National Taiwan University¹

03 Apr 1990

TL;DR: A successfully implemented real-time Mandarin dictation machine which recognizes Mandarin speech with unlimited texts and very large vocabulary for the input of Chinese characters to computers is described.

...read moreread less

Abstract: A successfully implemented real-time Mandarin dictation machine which recognizes Mandarin speech with unlimited texts and very large vocabulary for the input of Chinese characters to computers is described. Isolated syllables including the tones are first recognized using specially trained hidden Markov models with special feature parameters. The exact characters are then identified from the syllables using a Markov Chinese language model. The real-time implementation is on an IBM PC/AT, connected to a set of special hardware boards on which ten TMS 320C25 chips operate in parallel. It takes only 0.45 s to dictate a character. >

...read moreread less

Journal Article•DOI•

Acoustic modeling for large vocabulary speech recognition

[...]

Chin-Hui Lee¹, Lawrence R. Rabiner¹, Roberto Pieraccini¹, Jay G. Wilpon¹•Institutions (1)

Bell Labs¹

01 Apr 1990-Computer Speech & Language

TL;DR: One of the large vocabulary speech-recognition systems which is being investigated at AT&T Bell Laboratories is described, and the techniques used to provide the acoustic models of the sub-word units (both context-independent and context-dependent units) are discussed.

...read moreread less

Proceedings Article•DOI•

On the computational complexity of approximating distributions by probabilistic automata

[...]

Naoki Abe¹, Manfred K. Warmuth²•Institutions (2)

NEC¹, University of California, Santa Cruz²

01 Jul 1990

TL;DR: A rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, is introduced and the complexity of the training problem as a computational problem is analyzed.

...read moreread less

Abstract: We introduce a rigorous performance criterion for training algorithms for probabilistic automata (PAs) and hidden Markov models (HMMs), used extensively for speech recognition, and analyze the complexity of the training problem as a computational problem. The PA training problem is the problem of approximating an arbitrary, unknown source distribution by distributions generated by a PA. We investigate the following question about this important, well-studied problem: Does there exist an efficient training algorithm such that the trained PAs provably converge to a model close to an optimum one with high confidence, after only a feasibly small set of training data? We model this problem in the framework of computational learning theory and analyze the sample as well as computational complexity. We show that the number of examples required for training PAs is moderate—except for some log factors the number of examples is linear in the number of transition probabilities to be trained and a low-degree polynomial in the example length and parameters quantifying the accuracy and confidence. Computationally, however, training PAs is quite demanding: Fixed state size PAs are trainable in time polynomial in the accuracy and confidence parameters and example length, but not in the alphabet size unless RP e NP. The latter result is shown via a strong non-approximability result for the single string maximum likelihood model probem for 2-state PAs, which is of independent interest.

...read moreread less

Journal Article•DOI•

Frequency line tracking using hidden Markov models

[...]

R.L. Streit¹, R.F. Barrett¹•Institutions (1)

Salisbury University¹

01 Apr 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Frequency cells comprising a subset, or gate, of the spectral bins from fast Fourier transform (FFT) processing are identified with the states of the hidden Markov chain and analyzed in terms of physically meaningful quantities.

...read moreread less

Abstract: Frequency cells comprising a subset, or gate, of the spectral bins from fast Fourier transform (FFT) processing are identified with the states of the hidden Markov chain. An additional zero state is included to allow for the possibility of track initiation and termination. Analytic expressions for the basic parameters of the hidden Markov model (HMM) are obtained in terms of physically meaningful quantities, and optimization of the HMM tracker is discussed. A measurement sequence based on a simple threshold detector forms the input to the tracker. The outputs of the HMM tracker are a discrete Viterbi track, a gate occupancy probability function, and a continuous mean cell occupancy track. The latter provides an estimate of the mean signal frequency as a function of time. The performance of the HMM tracker is evaluated for two sets of simulated data. The HMM tracker is compared to earlier, related trackers, and possible extensions are discussed. >

...read moreread less

Journal Article•DOI•

Alpha-nets: a recurrent “neural” network architecture with a hidden Markov model interpretation

[...]

J. S. Bridle¹•Institutions (1)

University of St Andrews¹

03 Jan 1990-Speech Communication

TL;DR: A hidden Markov model isolated word recogniser using full likelihood scoring for each word model can be treated as a recurrent ‘neural’ network and can use back-propagation of partial derivatives to hill-climb on a measure of discriminability between words.

...read moreread less

Journal Article•DOI•

A linear predictive HMM for vector-valued observations with applications to speech recognition

[...]

P. Kenny¹, Matthew Lennig¹, Paul Mermelstein¹•Institutions (1)

Institut national de la recherche scientifique¹

01 Feb 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A new type of Markov model developed to account for the correlations between successive frames of a speech signal that performs better than the standard multivariate Gaussian HMM (hidden Markov models) when it is incorporated into a large-vocabulary isolated-word recognizer.

...read moreread less

Abstract: The authors describe a new type of Markov model developed to account for the correlations between successive frames of a speech signal. The idea is to treat the sequence of frames as a nonstationary autoregressive process whose parameters are controlled by a hidden Markov chain. It is shown that this type of model performs better than the standard multivariate Gaussian HMM (hidden Markov model) when it is incorporated into a large-vocabulary isolated-word recognizer. >

...read moreread less

Proceedings Article•DOI•

Connectionist Viterbi training: a new hybrid method for continuous speech recognition

[...]

M. Franzini¹, Kai-Fu Lee¹, Alex Waibel¹•Institutions (1)

Carnegie Mellon University¹

03 Apr 1990

TL;DR: A hybrid method for continuous-speech recognition which combines hidden Markov models (HMMs) and a connectionist technique called connectionist Viterbi training (CVT) is presented and can be run iteratively and applied to large-vocabulary recognition tasks.

...read moreread less

Abstract: A hybrid method for continuous-speech recognition which combines hidden Markov models (HMMs) and a connectionist technique called connectionist Viterbi training (CVT) is presented. CVT can be run iteratively and can be applied to large-vocabulary recognition tasks. Successful completion of training the connectionist component of the system, despite the large network size and volume of training data, depends largely on several measures taken to reduce learning time. The system is trained and tested on the TI/NBS speaker-independent continuous-digits database. Performance on test data for unknown-length strings is 98.5% word accuracy and 95.0% string accuracy. Several improvements to the current system are expected to increase these accuracies significantly. >

...read moreread less

Proceedings Article•DOI•

Word recognition using hidden control neural architecture

[...]

Esther Levin¹•Institutions (1)

Bell Labs¹

03 Apr 1990

TL;DR: The network architecture proposed, the hidden control neural network (HCNN), combines nonlinear prediction of conventional neural networks with hidden Markov modeling and is trained using an algorithm that is based on back-propagation and segmentation algorithms for estimating the unknown control together with the network's parameters.

...read moreread less

Abstract: Neural networks are used to model nonlinear and time-varying systems. The proposed model attempts to cope with the time variability systems by adding an undetermined control input which modulates the mapping implemented by the network. The network architecture proposed, the hidden control neural network (HCNN), combines nonlinear prediction of conventional neural networks with hidden Markov modeling. This network is trained using an algorithm that is based on back-propagation and segmentation algorithms for estimating the unknown control together with the network's parameters. The HCNN approach is evaluated on multispeaker recognition of connected digits, yielding a word accuracy of 99.3%. >

...read moreread less

Journal Article•DOI•

Speech recognition using hidden Markov models: a CMU perspective

[...]

Kai-Fu Lee¹, Hsiao-Wuen Hon¹, Mei-Yuh Hwang¹, Xuedong Huang¹•Institutions (1)

Carnegie Mellon University¹

01 Dec 1990-Speech Communication

TL;DR: This paper introduces Hidden Markov Modelling techniques, analyzes the reason for their success, and describes some improvements to the standard HMM used in SPHINX.

...read moreread less

Patent•DOI•

Speech recognition employing key word modeling and non-key word modeling

[...]

Chin-Hui Lee¹, Lawrence R. Rabiner¹, Jay G. Wilpon¹•Institutions (1)

AT&T¹

09 May 1990-Journal of the Acoustical Society of America

TL;DR: In this paper, a speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one for defined vocabulary words (e.g., collect, calling-card, person, third number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words.

...read moreread less

Abstract: Speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one type for defined vocabulary words (e.g., collect, calling-card, person, third-number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words (e.g. `I want to make a collect call please`). For this type of key word spotting, modifications are made to a connected word speech recognition algorithm based on state-transitional (hidden Markov) models which allow it to recognize words from a pre-defined vocabulary list spoken in an unconstrained fashion. Statistical models of both the actual vocabulary words and the extraneous speech and background noises are created. A syntax-driven connected word recognition system is then used to find the best sequence of extraneous input and vocabulary word models for matching the actual input speech.

...read moreread less

Proceedings Article•DOI•

Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm

[...]

Y.-L. Chow

03 Apr 1990

TL;DR: An application of discriminative training methods, maximum mutual information (MMI) training, to large-vocabulary continuous speech recognition, and an algorithm is developed for efficient MMI estimation of HMM parameters, including exponential codebook coefficients.

...read moreread less

Abstract: An application of discriminative training methods, maximum mutual information (MMI) training, to large-vocabulary continuous speech recognition is described. An algorithm is developed for efficient MMI estimation of HMM parameters, including exponential codebook coefficients, which cannot be estimated using maximum likelihood (ML) methods. Continuous speech recognition performance of the BYBLOS system on the DARPA 1000-word resource management speech corpus is presented. >

...read moreread less

Proceedings Article•DOI•

Combining hidden Markov model and neural network classifiers

[...]

L.T. Niles¹, Harvey F. Silverman¹•Institutions (1)

Brown University¹

03 Apr 1990

TL;DR: An architecture for a neural network that implements a hidden Markov model (HMM) that suggests integrating signal preprocessing (such as vector quantization) with the classifier and a probabilistic interpretation is given for a network with negative, and even complex-valued, parameters.

...read moreread less

Abstract: An architecture for a neural network that implements a hidden Markov model (HMM) is presented. This HMM net suggests integrating signal preprocessing (such as vector quantization) with the classifier. A minimum mean-squared-error training criterion for the HMM/neural net is presented and compared to maximum-likelihood and maximum-mutual-information criteria. The HMM forward-backward algorithm is shown to be the same as the neural net backpropagation algorithm. The implications of probability constraints on the HMM parameters are discussed. Relaxing these constraints allows negative probabilities, equivalent to inhibitory connections. A probabilistic interpretation is given for a network with negative, and even complex-valued, parameters. >

...read moreread less

Collapse