scispace - formally typeset
Search or ask a question

Showing papers by "Michael I. Jordan published in 1999"


Journal ArticleDOI
TL;DR: This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields), and describes a general framework for generating variational transformations based on convex duality.
Abstract: This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.

4,093 citations


BookDOI
01 Feb 1999
TL;DR: This paper presents an introduction to inference for Bayesian networks and a view of the EM algorithm that justifies incremental, sparse and other variants, as well as an information-theoretic analysis of hard and soft assignment methods for clustering.
Abstract: Part 1 Inference: introduction to inference for Bayesian networks, Robert Cowell advanced inference in Bayesian networks, Robert Cowell inference in Bayesian networks using nested junction trees, Uffe Kjoerulff bucket elimination - a unifying framework for probabilistic inference, R. Dechter an introduction to variational methods for graphical models, Michael I. Jordan et al improving the mean field approximation via the use of mixture distributions, Tommi S. Jaakkola and Michael I. Jordan introduction to Monte Carlo methods, D.J.C. MacKay suppressing random walls in Markov chain Monte Carlo using ordered overrelaxation, Radford M. Neal. Part 2 Independence: chain graphs and symmetric associations, Thomas S. Richardson the multiinformation function as a tool for measuring stochastic dependence, M. Studeny and J. Vejnarova. Part 3 Foundations for learning: a tutorial on learning with Bayesian networks, David Heckerman a view of the EM algorithm that justifies incremental, sparse and other variants, Radford M. Neal and Geoffrey E. Hinton. Part 4 Learning from data: latent variable models, Christopher M. Bishop stochastic algorithms for exploratory data analysis - data clustering and data visualization, Joachim M. Buhmann learning Bayesian networks with local structure, Nir Friedman and Moises Goldszmidt asymptotic model selection for directed networks with hidden variables, Dan Geiger et al a hierarchical community of experts, Geoffrey E. Hinton et al an information-theoretic analysis of hard and soft assignment methods for clustering, Michael J. Kearns et al learning hybrid Bayesian networks from data, Stefano Monti and Gregory F. Cooper a mean field learning algorithm for unsupervised neural networks, Lawrence Saul and Michael Jordan edge exclusion tests for graphical Gaussian models, Peter W.F. Smith and Joe Whittaker hepatitis B - a case study in MCMC, D.J. Spiegelhalter et al prediction with Gaussian processes - from linear regression to linear prediction and beyond, C.K.I. Williams.

1,885 citations


Book
01 Aug 1999
TL;DR: This book gives a thorough and rigorous mathematical treatment of the underlying ideas, structures, and algorithms of probabilistic expert systems, emphasizing those cases in which exact answers are obtainable.
Abstract: From the Publisher: Probabilistic expert systems are graphical networks that support the modelling of uncertainty and decisions in large complex domains, while retaining ease of calculation Building on original research by the authors over a number of years, this book gives a thorough and rigorous mathematical treatment of the underlying ideas, structures, and algorithms, emphasizing those cases in which exact answers are obtainable The book will be of interest to researchers and graduate students in artificial intelligence who desire an understanding of the mathematical and statistical basis of probabilistic expert systems, and to students and research workers in statistics wanting an introduction to this fascinating and rapidly developing field The careful attention to detail will also make this work an important reference source for all those involved in the theory and applications of probabilistic expert systems

1,684 citations


Proceedings Article
30 Jul 1999
TL;DR: This paper compares the marginals computed using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR, and finds that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals.
Abstract: Recently, researchers have demonstrated that "loopy belief propagation" -- the use of Pearl's polytree algorithm in a Bayesian network with loops -- can perform well in the context of error-correcting codes. The most dramatic instance of this is the near Shannon-limit performance of "Turbo Codes" -- codes whose decoding algorithm is equivalent to loopy belief propagation in a chain-structured Bayesian network. In this paper we ask: is there something special about the error-correcting code context, or does loopy propagation work as an approximate inference scheme in a more general setting? We compare the marginals computed using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR. We find that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals. However, on the QMR network, the loopy beliefs oscillated and had no obvious relationship to the correct posteriors. We present some initial investigations into the cause of these oscillations, and show that some simple methods of preventing them lead to the wrong results.

1,532 citations


Book
11 Jun 1999
TL;DR: This monograph provides a through and coherent introduction to the mathematical properties of feedforward neural networks and to the computationally intensive methodology that has enabled their highly successful application to complex problems of pattern classification, forecasting, regression, and nonlinear systems modeling.
Abstract: From the Publisher: This monograph provides a through and coherent introduction to the mathematical properties of feedforward neural networks and to the computationally intensive methodology that has enabled their highly successful application to complex problems of pattern classification, forecasting, regression, and nonlinear systems modeling.

386 citations


Journal ArticleDOI
TL;DR: A set of generalized Baum-Welch updates for factorial hidden Markov models that make use of the transition matrices of these models as a convex combination—or mixture—of simpler dynamical models are derived.
Abstract: We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a convex combination—or mixture—of simpler dynamical models. The parameters in these models admit a simple probabilistic interpretation and can be fitted iteratively by an Expectation-Maximization (EM) procedure. We derive a set of generalized Baum-Welch updates for factorial hidden Markov models that make use of this parameterization. We also describe a simple iterative procedure for approximately computing the statistics of the hidden states. Throughout, we give examples where mixed memory models provide a useful representation of complex stochastic processes.

170 citations


Journal ArticleDOI
TL;DR: This work describes a variational approximation method for efficient inference in large-scale probabilistic models and evaluates the algorithm on a large set of diagnostic test cases, comparing the algorithm to a state-of-the-art stochastic sampling method.
Abstract: We describe a variational approximation method for efficient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnostic inference in the "Quick Medical Reference" (QMR) network. The QMR network is a large-scale probabilistic graphical model built on statistical and expert knowledge. Exact probabilistic inference is infeasible in this model for all but a small set of cases. We evaluate our variational inference algorithm on a large set of diagnostic test cases, comparing the algorithm to a state-of-the-art stochastic sampling method.

170 citations



Journal ArticleDOI
TL;DR: The data support the hypothesis that unconstrained motions, unlike compliant motions, are not programmed to follow a straight line path in the extrinsic space and provide a theoretical frame of reference within which some apparently contradictory results in the movement generation literature may be explained.
Abstract: Two main questions were addressed in the present study. First, does the existence of kinematic regularities in the extrinsic space represent a general rule? Second, can the existence of extrinsic regularities be related to specific experimental situations implying, for instance, the generation of compliant motion (i.e. a motion constrained by external contact)? To address these two questions we studied the spatio-temporal characteristics of unconstrained and compliant movements. Five major differences were observed between these two types of movement: (1) the movement latency and movement duration were significantly longer in the compliant than in the unconstrained condition; (2) whereas the hand path was curved and variable according to movement direction for the unconstrained movements, it was straight and invariant for the compliant movements; (3) whereas the movement end-point distribution was roughly circular for the unconstrained movements, it was consistently elongated and typically oriented in the...

42 citations


Book ChapterDOI
01 Feb 1999
TL;DR: In this article, a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics is introduced, which is derived from a mean field approximation for large,layered sigmoid belief networks.
Abstract: We introduce a learning algorithm for unsupervised neural networks based on ideas from statistical mechanics. The algorithm is derived from a mean field approximation for large,layered sigmoid belief networks. We show how to (approximately) infer the statistics of these networks without resort to sampling. This is done by solving the mean field equations, which relate the statistics of each unit to those of its Markov blanket. Using these statistics as target values, the weights in the network are adapted by a local delta rule. We evaluate the strengths and weaknesses of these networks for problems in statistical pattern recognition.

28 citations


Proceedings Article
29 Nov 1999
TL;DR: This work presents a class of approximate inference algorithms for graphical models of the QMR-DT type, and gives convergence rates for these algorithms and for the Jaakkola and Jordan (1999) algorithm, and verifies theoretical predictions empirically.
Abstract: We present a class of approximate inference algorithms for graphical models of the QMR-DT type. We give convergence rates for these algorithms and for the Jaakkola and Jordan (1999) algorithm, and verify these theoretical predictions empirically. We also present empirical results on the difficult QMR-DT network problem, obtaining performance of the new algorithms roughly comparable to the Jaakkola and Jordan algorithm.