Search or ask a question

Showing papers by "Lawrence K. Saul published in 1999"

PDF

Open Access

Journal Article•DOI•

An introduction to variational methods for graphical models

[...]

Michael I. Jordan¹, Zoubin Ghahramani², Tommi S. Jaakkola³, Lawrence K. Saul⁴•Institutions (4)

University of California, Berkeley¹, University College London², Massachusetts Institute of Technology³, AT&T Labs⁴

01 Feb 1999-Machine Learning

TL;DR: This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields), and describes a general framework for generating variational transformations based on convex duality.

...read moreread less

Abstract: This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.

...read moreread less

4,093 citations

Journal Article•DOI•

Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones

[...]

Lawrence K. Saul¹, Michael I. Jordan²•Institutions (2)

AT&T Labs¹, University of California, Berkeley²

01 Oct 1999-Machine Learning

TL;DR: A set of generalized Baum-Welch updates for factorial hidden Markov models that make use of the transition matrices of these models as a convex combination—or mixture—of simpler dynamical models are derived.

...read moreread less

Abstract: We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a convex combination—or mixture—of simpler dynamical models. The parameters in these models admit a simple probabilistic interpretation and can be fitted iteratively by an Expectation-Maximization (EM) procedure. We derive a set of generalized Baum-Welch updates for factorial hidden Markov models that make use of this parameterization. We also describe a simple iterative procedure for approximately computing the statistics of the hidden states. Throughout, we give examples where mixed memory models provide a useful representation of complex stochastic processes.

...read moreread less

170 citations

Book Chapter•DOI•

A mean field learning algorithm for unsupervised neural networks

[...]

Lawrence K. Saul¹, Michael I. Jordan²•Institutions (2)

AT&T Labs¹, Massachusetts Institute of Technology²

01 Feb 1999

TL;DR: In this article, a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics is introduced, which is derived from a mean field approximation for large,layered sigmoid belief networks.

...read moreread less

Abstract: We introduce a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics. The algorithm is derived from a mean field approximation for large,layered sigmoid belief networks. We show how to (approximately) infer the statistics of these networks without resort to sampling. This is done by solving the mean field equations, which relate the statistics of each unit to those of its Markov blanket. Using these statistics as target values, the weights in the network are adapted by a local delta rule. We evaluate the strengths and weaknesses of these networks for problems in statistical pattern recognition.

...read moreread less

28 citations

Proceedings Article•

Modeling the rate of speech by Markov processes on curves.

[...]

Lawrence K. Saul, Mazin G. Rahim

01 Jan 1999

TL;DR: A statistical model for automatic speech recognition that relates variations in speaking rate to nonlinear warpings of time is proposed and it is shown that Markov processes on curves yield lower word error rates than comparably trained hid-ers.

...read moreread less

Abstract: We propose a statistical model for automatic speech recognition that relates variations in speaking rate to nonlinear warpings of time. The model describes a discrete random variable, s(t), that evolves as a function of the arc length traversed along a curve, parameterized by x(t). Since arc length does not depend on the rate at which a curve is traversed, this evolution gives rise to a family of Markov processes whose predictions, Pr[sjx], are invariant to nonlinear warpings of time. We describe the use of such models, known as Markov processes on curves (MPCs), for automatic speech recognition, where x are acoustic feature trajectories and s are phonetic transcriptions. On two tasks|recognizing New Jersey town names and connected alpha-digits|we nd that MPCs yield lower word error rates than comparably trained hid-

...read moreread less

1 citations