A Maximum Likelihood Approach to Continuous Speech Recognition

doi:10.1109/TPAMI.1983.4767370

Home
/
Papers
/
A Maximum Likelihood Approach to Continuous Speech Recognition

Journal Article•DOI•

A Maximum Likelihood Approach to Continuous Speech Recognition

Lalit R. Bahl¹, Frederick Jelinek¹, Robert Leroy Mercer¹•Institutions (1)

IBM¹

01 Feb 1983-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE)-Vol. 5, Iss: 2, pp 308-319

TL;DR: This paper describes a number of statistical models for use in speech recognition, with special attention to determining the parameters for such models from sparse data, and describes two decoding methods appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks.

read less

Abstract: Speech recognition is formulated as a problem of maximum likelihood decoding. This formulation requires statistical models of the speech production process. In this paper, we describe a number of statistical models for use in speech recognition. We give special attention to determining the parameters for such models from sparse data. We also describe two decoding methods, one appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks. To illustrate the usefulness of the methods described, we review a number of decoding results that have been obtained with them.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A tutorial on hidden Markov models and selected applications in speech recognition

[...]

Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

...read moreread less

21,819 citations

Journal Article•DOI•

Class-based n -gram models of natural language

[...]

Peter Fitzhugh Brown¹, Peter Vincent Desouza¹, Robert Leroy Mercer¹, Vincent J. Della Pietra¹, Jenifer C. Lai¹ - Show less +1 more•Institutions (1)

IBM¹

01 Dec 1992-Computational Linguistics

TL;DR: This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.

...read moreread less

Abstract: We address the problem of predicting a word from previous words in a sample of text. In particular, we discuss n-gram models based on classes of words. We also discuss several statistical algorithms for assigning words to classes based on the frequency of their co-occurrence with other words. We find that we are able to extract classes that have the flavor of either syntactically based groupings or semantically based groupings, depending on the nature of the underlying statistics.

...read moreread less

3,336 citations

Cites methods from "A Maximum Likelihood Approach to Co..."

...Figure I shows a model that has long been used in automatic speech recognition ( Bahl, Jelinek, and Mercer 1983 ) and has recently been proposed for machine translation (Brown et al. 1990) and for automatic spelling correction (Mays, Demerau, and Mercer 1990)....
[...]
...…in the second, on entropy and mutualinformation.2 Language ModelsFigure 1 shows a model that has long been used in automatic speech recognition [Bahl et al., 1983]and has recently been proposed for machine translation [Brown et al., 1990] and for automaticspelling correction [Mays et al.,…...
[...]
...[Bahl et al., 1983] Bahl, L. R., Jelinek, F., and Mercer, R. L. (1983)....
[...]

Book•

Speech and Language Processing

[...]

Dan Jurafsky, James Martin

01 Dec 1999

TL;DR: It is now clear that HAL's creator, Arthur C. Clarke, was a little optimistic in predicting when an artiﬁcial agent such as HAL would be avail-able as discussed by the authors.

...read moreread less

Abstract: is one of the most recognizablecharacters in 20th century cinema. HAL is an artiﬁcial agent capable of such advancedlanguage behavior as speaking and understanding English, and at a crucial moment inthe plot, even reading lips. It is now clear that HAL’s creator, Arthur C. Clarke, wasa little optimistic in predicting when an artiﬁcial agent such as HAL would be avail-able. But just how far off was he? What would it take to create at least the language-relatedpartsofHAL?WecallprogramslikeHALthatconversewithhumansinnatural

...read moreread less

3,077 citations

Journal Article•DOI•

An empirical study of smoothing techniques for language modeling

[...]

Stanley F. Chen¹, Joshua T. Goodman²•Institutions (2)

Carnegie Mellon University¹, Microsoft²

01 Oct 1999-Computer Speech & Language

TL;DR: This work surveys the most widely-used algorithms for smoothing models for language n -gram modeling, and presents an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980), and introduces methodologies for analyzing smoothing algorithm efficacy in detail.

...read moreread less

1,948 citations

Journal Article•DOI•

A statistical approach to machine translation

[...]

Peter Fitzhugh Brown¹, John Cocke¹, Stephen A. Della Pietra¹, Vincent J. Della Pietra¹, F. Jelinek¹, John Lafferty¹, Robert Leroy Mercer¹, Paul S. Roossin¹ - Show less +4 more•Institutions (1)

IBM¹

01 Jun 1990-Computational Linguistics

TL;DR: The application of the statistical approach to translation from French to English and preliminary results are described and the results are given.

...read moreread less

Abstract: In this paper, we present a statistical approach to machine translation. We describe the application of our approach to translation from French to English and give preliminary results.

...read moreread less

1,860 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Dynamic Programming

[...]

Richard Ernest Bellman

21 Oct 1957

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.

...read moreread less

Abstract: From the Publisher: An introduction to the mathematical theory of multistage decision processes, this text takes a functional equation approach to the discovery of optimum policies. Written by a leading developer of such policies, it presents a series of methods, uniqueness and existence theorems, and examples for solving the relevant equations. The text examines existence and uniqueness theorems, the optimal inventory equation, bottleneck problems in multistage production processes, a new formalism in the calculus of variation, strategies behind multistage games, and Markovian decision processes. Each chapter concludes with a problem set that Eric V. Denardo of Yale University, in his informative new introduction, calls a rich lode of applications and research topics. 1957 edition. 37 figures.

...read moreread less

14,187 citations

Journal Article•DOI•

The viterbi algorithm

[...]

Jr. G.D. Forney¹•Institutions (1)

Codex Corporation¹

01 Mar 1973

TL;DR: This paper gives a tutorial exposition of the Viterbi algorithm and of how it is implemented and analyzed, and increasing use of the algorithm in a widening variety of areas is foreseen.

...read moreread less

Abstract: The Viterbi algorithm (VA) is a recursive optimal solution to the problem of estimating the state sequence of a discrete-time finite-state Markov process observed in memoryless noise. Many problems in areas such as digital communications can be cast in this form. This paper gives a tutorial exposition of the algorithm and of how it is implemented and analyzed. Applications to date are reviewed. Increasing use of the algorithm in a widening variety of areas is foreseen.

...read moreread less

5,995 citations

Journal Article•DOI•

Prediction and entropy of printed English

[...]

Claude E. Shannon

01 Jan 1951-Bell System Technical Journal

TL;DR: A new method of estimating the entropy and redundancy of a language is described, which exploits the knowledge of the language statistics possessed by those who speak the language, and depends on experimental results in prediction of the next letter when the preceding text is known.

...read moreread less

Abstract: A new method of estimating the entropy and redundancy of a language is described. This method exploits the knowledge of the language statistics possessed by those who speak the language, and depends on experimental results in prediction of the next letter when the preceding text is known. Results of experiments in prediction are given, and some properties of an ideal predictor are developed.

...read moreread less

2,556 citations

An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process

[...]

L. Baum

01 Jan 1972

1,783 citations

Book•

Introduction to Theoretical Linguistics

[...]

John Lyons

01 Jan 1968

TL;DR: 1. Linguistics: the scientific study of language 2. The structure of language 3. The sounds of language 4. Grammar: general principles 5. Grammatical units 6. grammatical structure 7. Semantic structure Notes and references

...read moreread less

Abstract: This is a comprehensive introduction to theorctical linguistics It presupposes no previous knowledge and terms are defined as they are introduced, but it gives a rigorous and technical treatment of a wide range of topics, and brings the reader to an advanced level of understanding

...read moreread less

1,740 citations