Showing papers on "Hidden Markov model published in 1986"

PDF

Open Access

Journal Article•DOI•

[...]

Lawrence R. Rabiner¹, Biing-Hwang Juang•Institutions (1)

01 Jan 1986-IEEE Assp Magazine

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

Abstract: The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

4,546 citations

Proceedings Article•DOI•

Maximum mutual information estimation of hidden Markov model parameters for speech recognition

[...]

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, P.V. de Souza¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

07 Apr 1986

TL;DR: A method for estimating the parameters of hidden Markov models of speech is described and recognition results are presented comparing this method with maximum likelihood estimation.

...read moreread less

Abstract: A method for estimating the parameters of hidden Markov models of speech is described. Parameter values are chosen to maximize the mutual information between an acoustic observation sequence and the corresponding word sequence. Recognition results are presented comparing this method with maximum likelihood estimation.

...read moreread less

921 citations

Journal Article•DOI•

Continuously variable duration hidden Markov models for automatic speech recognition

[...]

Stephen E. Levinson¹•Institutions (1)

Bell Labs¹

01 Mar 1986-Computer Speech & Language

TL;DR: The solution proposed here is to replace the probability distributions of duration with continuous probability density functions to form a continuously variable duration hidden Markov model (CVDHMM) which is ideally suited to specification of the durational density.

...read moreread less

512 citations

Journal Article•DOI•

Maximum likelihood estimation for multivariate mixture observations of markov chains (Corresp.)

[...]

Bing-Hwang Juang¹, S. Levinson¹, M. Sondhi¹•Institutions (1)

Bell Labs¹

01 Mar 1986-IEEE Transactions on Information Theory

TL;DR: To use probabilistic functions of a Markov chain to model certain parameterizations of the speech signal, an estimation technique of Liporace is extended to the eases of multivariate mixtures, such as Gaussian sums, and products of mixtures.

...read moreread less

Abstract: To use probabilistic functions of a Markov chain to model certain parameterizations of the speech signal, we extend an estimation technique of Liporace to the eases of multivariate mixtures, such as Gaussian sums, and products of mixtures. We also show how these problems relate to Liporace's original framework.

...read moreread less

244 citations

Proceedings Article•DOI•

Script recognition using hidden Markov models

[...]

R. Nag¹, Kin Hong Wong¹, F. Fallside¹•Institutions (1)

University of Cambridge¹

07 Apr 1986

TL;DR: Results are given which show that HMMs provide a versatile pattern matching tool suitable for some image processing tasks as well as speech processing problems.

...read moreread less

Abstract: A handwritten script recognition system is presented which uses Hidden Markov Models (HMM), a technique widely used in speech recognition. The script is encoded as templates in the form of a sequence of quantised inclination angles of short equal length vectors together with some additional features. A HMM is created for each written word from a set of training data. Incoming templates are recognised by calculating which model has the highest probability for producing that template. The task chosen to test the system is that of handwritten word recognition, where the words are digits written by one person. Results are given which show that HMMs provide a versatile pattern matching tool suitable for some image processing tasks as well as speech processing problems.

...read moreread less

124 citations

Proceedings Article•DOI•

Continuously variable duration hidden Markov models for speech analysis

[...]

Stephen E. Levinson¹•Institutions (1)

Bell Labs¹

01 Dec 1986

...read moreread less

Abstract: During the past decade, the applicability of hidden Markov models (HMM) to various facets of speech analysis had been demonstrated in several different experiments. These investigations all rest on the assumption that speech is a quasi-stationary process whose stationary intervals can be identified with the occupancy of a single state of an appropriate HMM. In the traditional form of the HMM, the probability of duration of a state decreases exponentially with time. This behavior does not provide an adequate representation of the temporal structure of speech. The solution proposed here is to replace the probability distributions of duration with continuous probability density functions to form a continuously variable duration hidden Markov model (CVDHMM). The gamma distribution is ideally suited to specification of the durational density since it is one-sided and has only two parameters which, together, define both mean and variance. The main result is a derivation and proof of convergence of reestimation formulae for all the parameters of the CVDHMM. It is interesting to note that if the state durations are gamma distributed, one of the formulae is nonalgebraic but, fortuitously, has properties such that it is easily and rapidly solved numerically to any desired degree of accuracy. Other results are presented including the performance of the formulae on simulated data.

...read moreread less

88 citations

Journal Article•DOI•

Formant tracking using hidden Markov models and vector quantization

[...]

G. Kopec

01 Aug 1986-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: This paper describes an approach to formant tracking based on hidden Markov models and vector quantization of LPC spectra that has been evaluated using portions of the Texas Instruments multidialect connected digits database.

...read moreread less

Abstract: This paper describes an approach to formant tracking based on hidden Markov models and vector quantization of LPC spectra. Two general classes of models are developed, differing in whether formants are tracked singly or jointly. The states of a single-formant model are scalar values corresponding to possible formant frequencies. The states of a multiformant model are frequency vectors defining possible formant configurations. Formant detection and estimation are performed simultaneously using the forward-backward algorithm. Model parameters are estimated from handmarked formant tracks. The models have been evaluated using portions of the Texas Instruments multidialect connected digits database. The most accurate configurations exhibited root-mean-square estimation errors of about 70, 95, and 140 HZ, for F 1 , F 2 , and F 3 , respectively.

...read moreread less

83 citations

Proceedings Article•DOI•

The role of word-dependent coarticulatory effects in a phoneme-based speech recognition system

[...]

Yen-Lu Chow, Richard Schwartz, S. Roucos, Owen Kimball, Patti Price, Francis Kubala, M. Dunham, M. Krasner, J. Makhoul - Show less +5 more

01 Apr 1986

TL;DR: This paper describes the results of the work in designing a system for large-vocabulary word recognition of continuous speech, and generalizes the use of context-dependent Hidden Markov Models of phonemes to take into account word-dependent coarticulatory effects.

...read moreread less

Abstract: This paper describes the results of our work in designing a system for large-vocabulary word recognition of continuous speech. We generalize the use of context-dependent Hidden Markov Models (HMM) of phonemes to take into account word-dependent coarticulatory effects, Robustness is assured by smoothing the detailed word-dependent models with less detailed but more robust models. We describe training and recognition algorithms for HMMs of phonemes-in-context. On a task with a 334-word vocabulary and no grammar (i.e., a branching factor of 334), in speaker-dependent mode, we show an average reduction in word error rate from 24% using context-independent phoneme models, to 10% when using robust context-dependent phoneme models.

...read moreread less

59 citations

Journal Article•DOI•

Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models

[...]

A. Ljolje¹, F. Fallside•Institutions (1)

University of Cambridge¹

01 Oct 1986-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: High-quality speech synthesis is used to demonstrate the power of the HMM in preserving the naturalness of the intonational meaning, conveyed by the variation of fundamental frequency and duration.

...read moreread less

Abstract: A novel technique is introduced for characterizing prosodic structure and is used for speech synthesis. The mechanism consists of modeling a set of observations as a probabilistic function of a hidden Markov chain. It uses mixtures of Gaussian continuous probability density functions to represent the essential, perceptually relevant structure of intonation by observing movements of fundamental frequency in monosyllabic words of varying phonetic structure. High-quality speech synthesis, using multipulse excitation, is used to demonstrate the power of the HMM in preserving the naturalness of the intonational meaning, conveyed by the variation of fundamental frequency and duration. The fundamental frequency contours are synthesized using a random number generator from the models, and are imposed on a synthesized prototype word which had the intonation of a low fall. The resulting monosyllabic words with imposed synthesized fundamental frequency contours show a high level of naturalness and are found to be perceptually indistinguishable from the original recordings with the same intonation. The results clearly show the high potential of hidden Markov models as a mechanism for the representation of prosodic structure by naturally capturing its essentials.

...read moreread less

43 citations

Journal Article•DOI•

A model-based connected-digit recognition system using either hidden Markov models or templates

[...]

Lawrence R. Rabiner¹, Jay G. Wilpon¹, Biing-Hwang Juang¹•Institutions (1)

Bell Labs¹

01 Dec 1986-Computer Speech & Language

TL;DR: A unified system for automatically recognizing fluently spoken digit strings based on whole-word reference units is presented, which can use either hidden Markov model (HMM) technology or template-based technology and contains features from both approaches.

...read moreread less

43 citations

Proceedings Article•DOI•

On hidden Markov models in isolated word recognition

[...]

A. Poritz¹, A. Richter•Institutions (1)

Princeton University¹

07 Apr 1986

TL;DR: In a series of experiments on isolated-word recognition, hidden Markov models with multivariate Gaussian output densities with best models obtained with offsets of 75 or 90 msecs improved on previous algorithms.

...read moreread less

Abstract: Hidden Markov modeling has become an increasingly popular technique in automatic speech recognition. Recently, attention has been focused on the application of these models to talker-independent, isolated-word recognition. Initial results using models with discrete output densities for isolated-digit recognition were later improved using models based on continuous output densities. In a series of experiments on isolated-word recognition, we applied hidden Markov models with multivariate Gaussian output densities to the problem. Speech data was represented by feature vectors consisting of eight log area ratios and the log LPC error. A weak measure of vocal-tract dynamics was included in the observations by appending to the feature vector observed at time t, the vector observed at time t-δ, for some fixed offset δ. The best models were obtained with offsets of 75 or 90 msecs. When a comparison is made on a common data base, the resulting error rate of 0.2% for isolated-digit recognition improves on previous algorithms.

...read moreread less

Proceedings Article•DOI•

Definition and evaluation of phonetic units for speech recognition by hidden Markov models

[...]

M. Cravero, R. Pieraccini, F. Raineri

01 Apr 1986

TL;DR: The definition of this phonetic unit set is presented, along with experimental comparisons with classical diphones and with phoneme-like units, and the performance was qualitatively evaluated using the segmentation of the training data-base provided from the Viterbi algorithm in a forced recognition task.

...read moreread less

Abstract: This paper describes the design of a phonetic unit set for recognition of continuous speech where each unit is represented by an Hidden Markov Model. Starting from a unit set definition like classical diphones, many variations were made in order to have an improvement in recognition performance and a reduction in storage requirements. The definition of this unit set is presented, along with experimental comparisons with classical diphones and with phoneme-like units. The performance was qualitatively evaluated using the segmentation of the training data-base provided from the Viterbi algorithm in a forced recognition task. Classical recognition experiments have also been carried out using different "difficult vocabularies" as test.

...read moreread less

Proceedings Article•DOI•

Mixture autoregressive hidden Markov models for speaker independent isolated word recognition

[...]

Biing-Hwang Juang¹, Lawrence R. Rabiner•Institutions (1)

Bell Labs¹

01 Apr 1986

TL;DR: The signal modeling methodology is discussed, experimental results on speaker independent recognition of isolated digits are given and finite mixture autoregressive probabilistic functions of Markov chains are investigated.

...read moreread less

Abstract: In this paper a signal modeling technique based upon finite mixture autoregressive probabilistic functions of Markov chains is developed and applied to the problem of speech recognition, particularly speaker-independent recognition of isolated digits. Two types of mixture probability densities are investigated: finite mixtures of Gaussian autoregressive densities (GAM) and nearest-neighbor partitioned finite mixtures of Gaussian autoregressive densities (PGAM). In the former (GAM), the observation density in each Markov state is simply a (stochastically constrained) weighted sum of Gaussian autoregressive densities, while in the latter (PGAM) it involves nearest-neighbor decoding which, in effect, defines a set of partitions on the observation space. In this paper we discuss the signal modeling methodology and give experimental results on speaker independent recognition of isolated digits.

...read moreread less

Proceedings Article•DOI•

A continuous parameter and frequency domain based Markov model

[...]

E. Merlo¹, R. De mori, M. Palakal, G. Mercier•Institutions (1)

Concordia University¹

07 Apr 1986

TL;DR: A Markov Model System in which symbols are substituted by Spectral Lines which are sequentially generated over the frequency domain which drastically reduces the number of states on the Markov chain and the use of continuous parameters eliminates quantization error completely.

...read moreread less

Abstract: In most of the existing Automatic Speech Recognition Systems which make use of Markov Models, the output of the Markov Chain are strings whose symbols belong to a finite alphabet and are generated sequentially over the time domain. We propose a Markov Model System in which symbols are substituted by Spectral Lines which are sequentially generated over the frequency domain. Each spectral line is represented by Continuous Distribution of Parameters. Switching from time-domain to frequency domain drastically reduces the number of states on the Markov chain and the use of continuous parameters eliminates quantization error completely. An application will be presented with experimental results in a multi-speaker environment.

...read moreread less

Proceedings Article•DOI•

Markov modeling of continuous parameters in speech recognition

[...]

Serge Soudoplatoff¹•Institutions (1)

IBM¹

07 Apr 1986

TL;DR: The results showed that one can decrease the error-rate, by switching from a simple labelling scheme to this continuous parameter model, and the results of an application of this model to a 5000 words speech recognition system were presented.

...read moreread less

Abstract: This paper presents how to avoid the labelling part of a speech recognition strategy based on hidden Markov models, while keeping a stochastic formulation. After a brief recall of how a Markov model can be used for speech recognition, we propose another formulation, in which the labels are suppressed, dealing only with continuous parameters. The notion of speech generator is then introduced, and the formulas for speech training as well as decoding are rewritten. This new formulation leads to the fact that the probability densities p(x | G) , where G is a generator, and x an acoustic vector, must be estimated. We explain our choice of non-parametric methods, using Parzen estimators. Those estimators require a kernel function, which we choose in a simple manner, and the value for the radius of the kernel, which is the key problem. Successively statistical solution, information theory solution, and an original topological solution are presented, the last being retained. We finally present the results of an application of this model to a 5000 words speech recognition system. The results showed that one can decrease the error-rate, by switching from a simple labelling scheme to this continuous parameter model.

...read moreread less

Proceedings Article•DOI•

Generating word hypotheses in continuous speech

[...]

G. Schukat-Talamazzini, H. Niemann

01 Apr 1986

TL;DR: This paper uses an extension of the well-known hidden Markow models in order to model more accurately the properties of the phonetic labeling stage and presents experimental results which were computed speaker independently.

...read moreread less

Abstract: This paper addresses the problem of generating word hypotheses in continuous German speech. It uses an extension of the well-known hidden Markow models in order to model more accurately the properties of the phonetic labeling stage. A powerful scoring function is derived. Experimental results are presented which were computed speaker independently.

...read moreread less

Proceedings Article•DOI•

Hidden Markov models applied to very low bit rate speech coding

[...]

E.P. Farges¹, M. Clements•Institutions (1)

Georgia Institute of Technology¹

01 Apr 1986

TL;DR: A new type of very low bit rate speech coder based on a global Discrete Hidden Markov Model (DHMM) of continuous speech for a single speaker is presented here.

...read moreread less

Abstract: A new type of very low bit rate speech coder based on a global Discrete Hidden Markov Model (DHMM) of continuous speech for a single speaker is presented here. Several important issues of the training, coding, and decoding procedures are discussed for a 64-state, 1024-observation model. Such a framework is useful in reducing the redundancy in a 10-bit classical Vector Quantizer (VQ), and could lead to a DHMM coder with a bit rate comparable to that of a Segment Vocoder (SV) or a Matrix Quantizer (MQ). This is achieved not only by modelling the long term non-stationarity and the inter-frame time dependencies of the speech, but also by efficiently representing a different kind of information such as vocal tract structure and linguistic patterns.

...read moreread less

Proceedings Article•DOI•

Speaker adaptation for a hidden Markov model

[...]

K. Sugawara¹, M. Nishimura, A. Kuroda•Institutions (1)

IBM¹

01 Apr 1986

TL;DR: The adaptation method proposed in this paper uses the intermediate results of the last iteration of an HMM (hidden Markov model) to reduce recognition errors and for different speakers, only a slight improvement were obtained.

...read moreread less

Abstract: During the training process, parameters of an HMM (hidden Markov model) are calculated iteratively using "Forward-Backward algorithm." The adaptation method we propose in this paper uses the intermediate results of the last iteration. The amount of storage to keep intermediate results is very small (typically 1/400) compared with that of the entire parameters. The confidence measure of the initial training and adaptive training can be reflected to the coefficients in calculating new parameters. Experiments were done on A. the same speaker several months between training and adaptive training/decoding B. different speakers In the case of the same speaker the recognition errors were reduced by 1/2 to 2/3 compared with non-adaptation case. However, for different speakers, only a slight improvement were obtained.

...read moreread less

Patent•

Apparatus and methods for speech recognition

[...]

Gideon Abraham Senensieb

13 Aug 1986

TL;DR: In this article, the authors proposed to reduce the number of probability functions determined and stored by assigning each such function to a reduced number of different states used in the Markov models.

...read moreread less

Abstract: Speech recognisers which employ hidden Markov models using continuous probability functions require a large number of calculations to be carried out in a short time in order to give real time recognition. In addition a large amount of ewctronic storage is also required. The present invention reduces these problems by reducing the number of probability functions determined and stored by assigning each such function to a reduced number of different states used in the Markov models. In recognising words a minimum distance is computed for each model according to the Viterbi algorithm (operations 45 to 47) but since only a relatively small number of probability functions and states are stored the number of calculations required per unit time is also reduced. Fewer probability functions are stored and thus the amount of storage required is not as great as would otherwise be required. Further in order to reduce costs and increase the speed of recognition a specially constructed Viterbi engine is used in determining the probabilities that sounds observed represent various states of the models. Methods of deriving the required number of probability functions and states are also described.

...read moreread less

Proceedings Article•DOI•

Phoneme classification using Markov models

[...]

Bernard Merialdo¹, Anne-Marie Derouault, Serge Soudoplatoff•Institutions (1)

IBM¹

01 Apr 1986

TL;DR: This paper investigates the problem of defining an optimal classification for a given speech decoder, so that broad phonetic classes are recognized as accurately as possible from the speech signal.

...read moreread less

Abstract: An approach for supporting large vocabulary in speech recognition is to use broad phonetic classes to reduce the search to a subset of the dictionary. In this paper, we investigate the problem of defining an optimal classification for a given speech decoder, so that these broad phonetic classes are recognized as accurately as possible from the speech signal. More precisely, given Hidden Markov Models of phonemes, we define a similarity measure of the phonetic machines, and use a standard classification algorithm to find the optimal classification. Three measures are proposed, and compared with manual classifications.

...read moreread less

Journal Article•

Speech recognition based on Hidden Markov Models. (&ltspecial isssue&gtRecent trend of speech information processing.)

[...]

Masaaki Okochi

01 Jan 1986-The Journal of the Acoustical Society of Japan

Proceedings Article•DOI•

Speaker-independent isolated word recognition using label histograms

[...]

Osaaki Watanuki¹, T. Kaneko•Institutions (1)

IBM¹

01 Apr 1986

TL;DR: This method is applied to the recognition of 32-Japanese-word vocabulary, and achieved a recognition accuracy comparable to or better than that of the conventional approaches.

...read moreread less

Abstract: In this paper, a simple and fast method for speaker-independent isolated word recognition is presented. This method is regarded as simplification of the approach based on the Hidden Markov Model (HMM). In the proposed method, all training and decoding data are transformed into label strings by vector quantization. By segmenting the label strings of utterances into N pieces with equal duration, label histograms are computed in the training mode. In recognition, the label string of an input word is also divided into equal N segments, and the likelihood is computed with the corresponding histogram. It will be shown that the computational cost of this method is relatively low. This method is applied to the recognition of 32-Japanese-word vocabulary, and achieved a recognition accuracy comparable to or better than that of the conventional approaches.

...read moreread less

Proceedings Article•DOI•

Syllable-based connected spoken word recognition by two pass O(n) DP matching and hidden Markov models

[...]

Seiichi Nakagawa¹, M. Jilan•Institutions (1)

Toyohashi University of Technology¹

01 Apr 1986

TL;DR: An approach to isolated and connected word recognition by using dynamic time warping algorithm which referes as a hidden Markov model which has almost the same performance as the exact algorithm.

...read moreread less

Abstract: In this paper, we present an approach to isolated and connected word recognition by using dynamic time warping algorithm which referes as a hidden Markov model. The classification consists of computing the a posteriori probability for each word model and choosing the word model that gives the highest probability. The probability is calculated by two different ways: One is the exact algorithm and the other is the approximate (Viterbi) algorithm. In our system, first, an input speech is recognized as a string of monosyllables by the syllable-based O(n) DP matching. Second, the recognized string is matched with a mono-syllable string of each lexical model, and the word or word sequence with the highest probability is recognized as the input speech by using O(n) DP matching based on a hidden Markov model. Reference patterns consist of 68 mono-syllables, and test patterns consists of 90 isolated words, two connected words and three connected words. We conclude from the results of the experiments that: (1) The results by using 3 candidates are much better than those by using only best candidate for each segment. (2) The approximate algorithm has almost the same performance as the exact algorithm. (3) The extended algorithm for connected word recognition works well.

...read moreread less

Dissertation•

Formant tracking using hidden Markov models

[...]

David Michael Weber

01 Jan 1986

Journal Article•DOI•

Dynamic Programming and Statistical Modelling in Automatic Speech Recognition

[...]

Martin J. Russell, Roger K. Moore, M. J. Tomlinson

01 Jan 1986-Journal of the Operational Research Society

TL;DR: The two most prominent algorithms, dynamic time-warping and hidden Markov modelling, are described and compared and particular attention is given to the role of dynamic programming in either approach.

...read moreread less

Abstract: This article describes the methods which form the basis of contemporary automatic speech recognition systems. The two most prominent algorithms, dynamic time-warping and hidden Markov modelling, are described and compared. Particular attention is given to the role of dynamic programming in either approach.

...read moreread less

Proceedings Article•DOI•

Global connected digit recognition using Baum-Welch algorithm

[...]

C. Wellekens¹•Institutions (1)

Philips¹

07 Apr 1986

TL;DR: A connected speech recognition method based on the Baum forward backward algorithm is presented that segmentation of the test sentence uses the probability that an acoustic vector lays at the separation of two speech subunit models.

...read moreread less

Abstract: A connected speech recognition method based on the Baum forward backward algorithm is presented. The segmentation of the test sentence uses the probability that an acoustic vector lays at the separation of two speech subunit models (Hidden Markov models). The labelling rests on the highest probability that a vector has been emitted on the last state of a subunit model. Results are presented for word- and phoneme-recognition.

...read moreread less

Journal Article•DOI•

A statistical analysis system (SAS) computer program for Markov chain analysis

[...]

Stephen V. Faraone¹, Stephen V. Faraone²•Institutions (2)

United States Department of Veterans Affairs¹, Massachusetts Mental Health Center²

01 Dec 1986-Journal of Psychopathology and Behavioral Assessment

TL;DR: A computer program for Markov chain analysis is presented and discussed and tests hypotheses about the goodness of fit of first- and second-order Markov models.

...read moreread less

Abstract: A computer program for Markov chain analysis is presented and discussed. The program is written in the language of the Statistical Analysis System (SAS) but detailed knowledge of SAS is not required for its use. The program tests hypotheses about the goodness of fit of first- and second-order Markov models. It also tests if transition probabilities are homogeneous between the first and the second half of each sequence.

...read moreread less

Proceedings Article•DOI•

A family of formant trackers based on hidden Markov models

[...]

G. Kopec

01 Apr 1986

TL;DR: This paper describes a family of formant trackers based on hidden Markov models and vector quantization of LPC spectra, differing in whether formants are tracked singly or jointly.

...read moreread less

Abstract: This paper describes a family of formant trackers based on hidden Markov models and vector quantization of LPC spectra. Two general classes of models are presented, differing in whether formants are tracked singly or jointly. The states of a single-formant model are scalar values corresponding to possible formant frequencies. The states of a multi-formant model are frequency vectors defining possible formant configurations. Formant detection and estimation are performed simultaneously using the forward-backward algorithm. Model parameters are estimated from hand-marked formant tracks. The models have been evaluated using portions of the Texas Instruments multi-dialect connected digits database. The most accurate configurations exhibited root mean square estimation errors of about 70 Hz, 95 Hz, and 140 Hz, for F 1 , F 2 and F 3 , respectively.

...read moreread less

Proceedings Article•DOI•

Using hidden Markov models to define linguistic units

[...]

R. Nag¹, S. Austin, F. Fallside•Institutions (1)

University of Cambridge¹

01 Apr 1986

TL;DR: This work looks at the problem at two levels, the first at the sub-word level to find significant segment labels and the second at the grammar level in an attempt to deduce the grammatical units of a given vocabulary from the emission probabilities of a Hidden Markov Model.

...read moreread less

Abstract: There has been much work in using Hidden Markov Models to model different types of linguistically defined units such as words, syllables and phonetic-type units. Here we look at the problem from the other direction and try to use the states obtained from a Markov model to find our own linguistic units. We look at the problem at two levels, the first at the sub-word level to find significant segment labels and the second at the grammar level in an attempt to deduce the grammatical units of a given vocabulary from the emission probabilities of a Hidden Markov Model.

...read moreread less

Proceedings Article•DOI•

Speaker-independent French digits recognition using word-based vector quantization and hidden Markov models

[...]

A. Tassy¹, L. Miclet•Institutions (1)

Matra¹

07 Apr 1986

TL;DR: This paper presents a speaker-independent digit recognition system that combines word-based VQ with HMM, the cost of which is low enough to be implemented on a single signal processor available today.

...read moreread less

Abstract: Vector Quantization has recently been used in the realization of a speaker-independent digit recognizer, based uniquely on the spectral content of the speech signal. On the other hand, the Hidden Markov Models proved their ability in modelling temporal distortions between different utterances of a word pronounced by several speakers. In term of recognition rate, HMMs are as efficient as the conventional DTW matching, but they need less computation and memory. This paper presents a speaker-independent digit recognition system that combines word-based VQ with HMM, the cost of which is low enough to be implemented on a single signal processor available today. It is the first result of a cooperation project between ENST and the MATRA company, financially supported by the French government. The proposed recognizer is structured in two parts. First, a VQ-preprocessor, with one vector codebook per vocabulary word, performs a coding of the short-time spectrum of the speech signal and realizes an initial sorting. Then HMMs are used to take the final recognition decision.

...read moreread less