A method for the construction of acoustic Markov models for words

doi:10.1109/89.242490

Home
/
Papers
/
A method for the construction of acoustic Markov models for words

Journal Article•DOI•

A method for the construction of acoustic Markov models for words

Lalit R. Bahl¹, Peter Fitzhugh Brown¹, P.V. de Souza¹, Robert Leroy Mercer¹, Michael Picheny¹ - Show less +1 more•Institutions (1)

IBM¹

01 Oct 1993-IEEE Transactions on Speech and Audio Processing (IEEE)-Vol. 1, Iss: 4, pp 443-452

TL;DR: A method for combining phonetic and fenonic models is presented and results of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks are reported.

read less

Abstract: A technique for constructing Markov models for the acoustic representation of words is described. Word models are constructed from models of subword units called fenones. Fenones represent very short speech events and are obtained automatically through the use of a vector quantizer. The fenonic baseform for a word-i.e., the sequence of fenones used to represent the word-is derived automatically from one or more utterances of that word. Since the word models are all composed from a small inventory of subword models, training for large-vocabulary speech recognition systems can be accomplished with a small training script. A method for combining phonetic and fenonic models is presented. Results of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks are reported. The results are compared with those for phonetics-based Markov models and template-based dynamic programming (DP) matching. >

...read moreread less

Citations

PDF

Open Access

More filters

Patent•

Database annotation and retrieval

[...]

Jason Peter Andrew Charlesworth, Philip N. Garner

28 Sep 2001

TL;DR: In this paper, a data structure for annotating data files within a database is provided, which comprises a phoneme and word lattice which allows the quick and efficient searching of data files in response to a user's input query.

...read moreread less

Abstract: A data structure is provided for annotating data files within a database. The annotation data comprises a phoneme and word lattice which allows the quick and efficient searching of data files within the database in response to a user's input query. The structure of the annotation data is such that it allows the input query to be made by voice and can be used for annotating various kinds of data files, such as audio data files, video data files, multimedia data files etc. The annotation data may be generated from the data files themselves or may be input by the user either from a voiced input or from a typed input.

...read moreread less

314 citations

Posted Content•

Speech Recognition by Machine, A Review

[...]

M. A. Anusuya, S. K. Katti

13 Jan 2010-arXiv: Computation and Language

TL;DR: The objective of this review paper is to summarize and compare some of the well known methods used in various stages of speech recognition system and identify research topic and applications which are at the forefront of this exciting and challenging field.

...read moreread less

Abstract: This paper presents a brief survey on Automatic Speech Recognition and discusses the major themes and advances made in the past 60 years of research, so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. After years of research and development the accuracy of automatic speech recognition remains one of the important research challenges (e.g., variations of the context, speakers, and environment).The design of Speech Recognition system requires careful attentions to the following issues: Definition of various types of speech classes, speech representation, feature extraction techniques, speech classifiers, database and performance evaluation. The problems that are existing in ASR and the various techniques to solve these problems constructed by various research workers have been presented in a chronological order. Hence authors hope that this work shall be a contribution in the area of speech recognition. The objective of this review paper is to summarize and compare some of the well known methods used in various stages of speech recognition system and identify research topic and applications which are at the forefront of this exciting and challenging field.

...read moreread less

211 citations

Patent•

Language recognition using a similarity measure

[...]

Philip N. Garner¹, Jason Peter Andrew Charlesworth¹, Asako Higuchi¹•Institutions (1)

Canon Inc.¹

25 Oct 2000

TL;DR: In this article, a dynamic programming technique for matching two sequences of phonemes both of which may be generated from text or speech is described, and the scoring of the matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.

...read moreread less

Abstract: A dynamic programming technique is provided for matching two sequences of phonemes both of which may be generated from text or speech. The scoring of the dynamic programming matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.

...read moreread less

205 citations

Journal Article•DOI•

Multonic Markov word models for large vocabulary continuous speech recognition

[...]

Lalit R. Bahl¹, Jerome R. Bellegarda¹, P.V. de Souza¹, Ponani S. Gopalakrishnan¹, David Nahamoo¹, Michael Picheny¹ - Show less +2 more•Institutions (1)

IBM¹

01 Jul 1993-IEEE Transactions on Speech and Audio Processing

TL;DR: A new class of hidden Markov models is proposed for the acoustic representation of words in an automatic speech recognition system that is more flexible than previously reported fenone-based word models, which lead to an improved capability of modeling variations in pronunciation.

...read moreread less

Abstract: A new class of hidden Markov models is proposed for the acoustic representation of words in an automatic speech recognition system. The models, built from combinations of acoustically based sub-word units called fenones, are derived automatically from one or more sample utterances of a word. Because they are more flexible than previously reported fenone-based word models, they lead to an improved capability of modeling variations in pronunciation. They are therefore particularly useful in the recognition of continuous speech. In addition, their construction is relatively simple, because it can be done using the well-known forward-backward algorithm for parameter estimation of hidden Markov models. Appropriate reestimation formulas are derived for this purpose. Experimental results obtained on a 5000-word vocabulary natural language continuous speech recognition task are presented to illustrate the enhanced power of discrimination of the new models. >

...read moreread less

170 citations

Patent•DOI•

Speech processing system

[...]

Jebu Jacob Rajan¹•Institutions (1)

Canon Inc.¹

30 May 2001-Journal of the Acoustical Society of America

TL;DR: In this article, a system for allowing a user to add word models to a speech recognition system is described. But this system requires the user to input a number of renditions of a new word and generate from these a sequence of phonemes representative of the new word.

...read moreread less

Abstract: A system is provided for allowing a user to add word models to a speech recognition system. In particular, the system allows a user to input a number of renditions of the new word and which generates from these a sequence of phonemes representative of the new word. This representative sequence of phonemes is stored in a word to phoneme dictionary together with the typed version of the word for subsequent use by the speech recognition system.

...read moreread less

166 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Dynamic programming algorithm optimization for spoken word recognition

[...]

H. Sakoe¹, S. Chiba¹•Institutions (1)

NEC¹

01 Feb 1978-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.

...read moreread less

Abstract: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition. First, a general principle of time-normalization is given using time-warping function. Then, two time-normalized distance definitions, called symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimental comparison with various DP-algorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about two-thirds errors, even compared to the best conventional algorithm.

...read moreread less

5,906 citations

Journal Article•DOI•

Statistical Inference for Probabilistic Functions of Finite State Markov Chains

[...]

Leonard E. Baum, Ted Petrie

01 Dec 1966-Annals of Mathematical Statistics

2,919 citations

Journal Article•

Vector quantization

[...]

Robert M. Gray¹•Institutions (1)

Stanford University¹

01 Apr 1984-IEEE Assp Magazine

TL;DR: During the past few years several design algorithms have been developed for a variety of vector quantizers and the performance of these codes has been studied for speech waveforms, speech linear predictive parameter vectors, images, and several simulated random processes.

...read moreread less

Abstract: A vector quantizer is a system for mapping a sequence of continuous or discrete vectors into a digital sequence suitable for communication over or storage in a digital channel. The goal of such a system is data compression: to reduce the bit rate so as to minimize communication channel capacity or digital storage memory requirements while maintaining the necessary fidelity of the data. The mapping for each vector may or may not have memory in the sense of depending on past actions of the coder, just as in well established scalar techniques such as PCM, which has no memory, and predictive quantization, which does. Even though information theory implies that one can always obtain better performance by coding vectors instead of scalars, scalar quantizers have remained by far the most common data compression system because of their simplicity and good performance when the communication rate is sufficiently large. In addition, relatively few design techniques have existed for vector quantizers. During the past few years several design algorithms have been developed for a variety of vector quantizers and the performance of these codes has been studied for speech waveforms, speech linear predictive parameter vectors, images, and several simulated random processes. It is the purpose of this article to survey some of these design techniques and their applications.

...read moreread less

2,743 citations

An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process

[...]

L. Baum

01 Jan 1972

1,783 citations

Journal Article•DOI•

A Maximum Likelihood Approach to Continuous Speech Recognition

[...]

Lalit R. Bahl¹, Frederick Jelinek¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

01 Feb 1983-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper describes a number of statistical models for use in speech recognition, with special attention to determining the parameters for such models from sparse data, and describes two decoding methods appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks.

...read moreread less

Abstract: Speech recognition is formulated as a problem of maximum likelihood decoding. This formulation requires statistical models of the speech production process. In this paper, we describe a number of statistical models for use in speech recognition. We give special attention to determining the parameters for such models from sparse data. We also describe two decoding methods, one appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks. To illustrate the usefulness of the methods described, we review a number of decoding results that have been obtained with them.

...read moreread less

1,637 citations