scispace - formally typeset
Search or ask a question

Showing papers by "Lawrence K. Saul published in 1999"


Journal ArticleDOI
TL;DR: This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields), and describes a general framework for generating variational transformations based on convex duality.
Abstract: This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.

4,093 citations


Journal ArticleDOI
TL;DR: A set of generalized Baum-Welch updates for factorial hidden Markov models that make use of the transition matrices of these models as a convex combination—or mixture—of simpler dynamical models are derived.
Abstract: We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a convex combination—or mixture—of simpler dynamical models. The parameters in these models admit a simple probabilistic interpretation and can be fitted iteratively by an Expectation-Maximization (EM) procedure. We derive a set of generalized Baum-Welch updates for factorial hidden Markov models that make use of this parameterization. We also describe a simple iterative procedure for approximately computing the statistics of the hidden states. Throughout, we give examples where mixed memory models provide a useful representation of complex stochastic processes.

170 citations


Book ChapterDOI
01 Feb 1999
TL;DR: In this article, a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics is introduced, which is derived from a mean field approximation for large,layered sigmoid belief networks.
Abstract: We introduce a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics. The algorithm is derived from a mean field approximation for large,layered sigmoid belief networks. We show how to (approximately) infer the statistics of these networks without resort to sampling. This is done by solving the mean field equations, which relate the statistics of each unit to those of its Markov blanket. Using these statistics as target values, the weights in the network are adapted by a local delta rule. We evaluate the strengths and weaknesses of these networks for problems in statistical pattern recognition.

28 citations


Proceedings Article
01 Jan 1999
TL;DR: A statistical model for automatic speech recognition that relates variations in speaking rate to nonlinear warpings of time is proposed and it is shown that Markov processes on curves yield lower word error rates than comparably trained hid-ers.
Abstract: We propose a statistical model for automatic speech recognition that relates variations in speaking rate to nonlinear warpings of time. The model describes a discrete random variable, s(t), that evolves as a function of the arc length traversed along a curve, parameterized by x(t). Since arc length does not depend on the rate at which a curve is traversed, this evolution gives rise to a family of Markov processes whose predictions, Pr[sjx], are invariant to nonlinear warpings of time. We describe the use of such models, known as Markov processes on curves (MPCs), for automatic speech recognition, where x are acoustic feature trajectories and s are phonetic transcriptions. On two tasks|recognizing New Jersey town names and connected alpha-digits|we nd that MPCs yield lower word error rates than comparably trained hid-

1 citations