A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

doi:10.1214/AOMS/1177697196

Home
/
Papers
/
A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Journal Article•DOI•

A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Leonard E. Baum, Ted Petrie, George Soules, Norman Weiss

01 Feb 1970-Annals of Mathematical Statistics (Institute of Mathematical Statistics)-Vol. 41, Iss: 1, pp 164-171

About: This article is published in Annals of Mathematical Statistics.The article was published on 1970-02-01 and is currently open access. It has received 4618 citations till now. The article focuses on the topics: Examples of Markov chains & Markov chain.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Maximum likelihood from incomplete data via the EM algorithm

[...]

Arthur P. Dempster¹, Nan M. Laird¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Sep 1977-Journal of the royal statistical society series b-methodological

49,597 citations

Journal Article•DOI•

A tutorial on hidden Markov models and selected applications in speech recognition

[...]

Lawrence R. Rabiner¹•Institutions (1)

Bell Labs¹

01 Feb 1989

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

...read moreread less

21,819 citations

Journal Article•DOI•

An introduction to hidden Markov models

[...]

Lawrence R. Rabiner¹, Biing-Hwang Juang•Institutions (1)

Bell Labs¹

01 Jan 1986-IEEE Assp Magazine

TL;DR: The purpose of this tutorial paper is to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

Abstract: The basic theory of Markov chains has been known to mathematicians and engineers for close to 80 years, but it is only in the past decade that it has been applied explicitly to problems in speech processing. One of the major reasons why speech models, based on Markov chains, have not been developed until recently was the lack of a method for optimizing the parameters of the Markov model to match observed signal patterns. Such a method was proposed in the late 1960's and was immediately applied to speech processing in several research institutions. Continued refinements in the theory and implementation of Markov modelling techniques have greatly enhanced the method, leading to a wide range of applications of these models. It is the purpose of this tutorial paper to give an introduction to the theory of Markov models, and to illustrate how they have been applied to problems in speech recognition.

...read moreread less

4,546 citations

Journal Article•DOI•

On the statistical analysis of dirty pictures

[...]

Julian Besag¹•Institutions (1)

Durham University¹

01 Jul 1986-Journal of the royal statistical society series b-methodological

TL;DR: In this paper, the authors proposed an iterative method for scene reconstruction based on a non-degenerate Markov Random Field (MRF) model, where the local characteristics of the original scene can be represented by a nondegenerate MRF and the reconstruction can be estimated according to standard criteria.

...read moreread less

Abstract: may 7th, 1986, Professor A. F. M. Smith in the Chair] SUMMARY A continuous two-dimensional region is partitioned into a fine rectangular array of sites or "pixels", each pixel having a particular "colour" belonging to a prescribed finite set. The true colouring of the region is unknown but, associated with each pixel, there is a possibly multivariate record which conveys imperfect information about its colour according to a known statistical model. The aim is to reconstruct the true scene, with the additional knowledge that pixels close together tend to have the same or similar colours. In this paper, it is assumed that the local characteristics of the true scene can be represented by a nondegenerate Markov random field. Such information can be combined with the records by Bayes' theorem and the true scene can be estimated according to standard criteria. However, the computational burden is enormous and the reconstruction may reflect undesirable largescale properties of the random field. Thus, a simple, iterative method of reconstruction is proposed, which does not depend on these large-scale characteristics. The method is illustrated by computer simulations in which the original scene is not directly related to the assumed random field. Some complications, including parameter estimation, are discussed. Potential applications are mentioned briefly.

...read moreread less

4,490 citations

Cites background from "A Maximization Technique Occurring ..."

...…regard the y as mixture data with the complication that the underlying (unobservable) classification variables making up x are not independent; see Baum et al. (1970) for a version of E M for the one-dimensional case, in which the components of x follow a Markov chain, and Geman and McClure…...
[...]

Journal Article•DOI•

An introduction to variational methods for graphical models

[...]

Michael I. Jordan¹, Zoubin Ghahramani², Tommi S. Jaakkola³, Lawrence K. Saul⁴•Institutions (4)

University of California, Berkeley¹, University College London², Massachusetts Institute of Technology³, AT&T Labs⁴

01 Feb 1999-Machine Learning

TL;DR: This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields), and describes a general framework for generating variational transformations based on convex duality.

...read moreread less

Abstract: This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.

...read moreread less

4,093 citations

Cites background from "A Maximization Technique Occurring ..."

...The cost is that we have obtained a free parameter λ that must be set, once for each x....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Statistical Inference for Probabilistic Functions of Finite State Markov Chains

[...]

Leonard E. Baum, Ted Petrie

01 Dec 1966-Annals of Mathematical Statistics

2,919 citations

Journal Article•DOI•

An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology

[...]

Leonard E. Baum, J. A. Eagon

01 May 1967-Bulletin of the American Mathematical Society

TL;DR: In this paper, a polynomial with nonnegative coefficients homogeneous of degree d in its variables is shown to be polynomially homogeneous unless 3(3(x))>P(x), where 3(x)=x.

...read moreread less

Abstract: 1. Summary. The object of this note is to prove the theorem below and sketch two applications, one to statistical estimation for (proba-bilistic) functions of Markov processes [l] and one to Blakley's model for ecology [4]. 2. Result. THEOREM. Let P(x)=P({xij}) be a polynomial with nonnegative coefficients homogeneous of degree d in its variables {##}. Let x= {##} be any point of the domain D: ## §:(), ]pLi ## = 1, i = l, • • • , p, j=l, • • • , q%. For x= {xij} ££> let 3(#) = 3{##} denote the point of D whose i, j coordinate is (dP\\ \\ f « dP 3(*) Then P(3(x))>P(x) unless 3(x)=x. Notation, fi will denote a doubly indexed array of nonnegative integers: fx= {M#}> i = l> • • • >

...read moreread less

1,145 citations

Book•

The gamma function

[...]

Emil Artin, Michael Butler

01 Jan 1964
384 citations

Book Chapter•DOI•

The Gamma Function

[...]

Willi Freeden¹, Martin Gutting¹•Institutions (1)
Kaiserslautern University of Technology¹

01 Jan 2013
TL;DR: The Gamma function as discussed by the authors is a generalized factorial function that can be used to estimate the probability distribution of a probability distribution, and it has been used in many applications, e.g., as part of probability distributions.
...read moreread less
Abstract: In what follows, we introduce the classical Gamma function in Sect. 2.1. It is essentially understood to be a generalized factorial. However, there are many further applications, e.g., as part of probability distributions (see, e.g., Evans et al. 2000). The main properties of the Gamma function are explained in this chapter (for a more detailed discussion the reader is referred to, e.g., Artin (1964), Lebedev (1973), Muller (1998), Nielsen (1906), and Whittaker and Watson (1948) and the references therein).
...read moreread less
267 citations

Journal Article•DOI•

An Inequality

[...]

Joel Brenner

01 Apr 1971
37 citations

Related Papers (5)

A tutorial on hidden Markov models and selected applications in speech recognition

[...]

01 Feb 1989
Lawrence R. Rabiner

Maximum likelihood from incomplete data via the EM algorithm

[...]

01 Sep 1977-Journal of the royal statistical society series b-methodological

Arthur P. Dempster, Nan M. Laird, Donald B. Rubin

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

[...]

01 Apr 1967-IEEE Transactions on Information Theory

Andrew J. Viterbi

The viterbi algorithm

[...]

01 Mar 1973
Jr. G.D. Forney

An introduction to hidden Markov models

[...]

01 Jan 1986-IEEE Assp Magazine

Lawrence R. Rabiner, Biing-Hwang Juang