scispace - formally typeset
Search or ask a question
Proceedings Article

Loopy belief propagation for approximate inference: an empirical study

30 Jul 1999-pp 467-475
TL;DR: This paper compares the marginals computed using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR, and finds that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals.
Abstract: Recently, researchers have demonstrated that "loopy belief propagation" -- the use of Pearl's polytree algorithm in a Bayesian network with loops -- can perform well in the context of error-correcting codes. The most dramatic instance of this is the near Shannon-limit performance of "Turbo Codes" -- codes whose decoding algorithm is equivalent to loopy belief propagation in a chain-structured Bayesian network. In this paper we ask: is there something special about the error-correcting code context, or does loopy propagation work as an approximate inference scheme in a more general setting? We compare the marginals computed using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR. We find that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals. However, on the QMR network, the loopy beliefs oscillated and had no obvious relationship to the correct posteriors. We present some initial investigations into the cause of these oscillations, and show that some simple methods of preventing them lead to the wrong results.
Citations
More filters
01 Jan 2002
TL;DR: This thesis will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in Dbns, and how to learn DBN models from sequential data.
Abstract: Dynamic Bayesian Networks: Representation, Inference and Learning by Kevin Patrick Murphy Doctor of Philosophy in Computer Science University of California, Berkeley Professor Stuart Russell, Chair Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data. In particular, the main novel technical contributions of this thesis are as follows: a way of representing Hierarchical HMMs as DBNs, which enables inference to be done in O(T ) time instead of O(T ), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T ) space instead of O(T ); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.

2,757 citations

Journal ArticleDOI
TL;DR: This work explains how to obtain region-based free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms, and describes empirical results showing that GBP can significantly outperform BP.
Abstract: Important inference problems in statistical physics, computer vision, error-correcting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems that is exact when the factor graph is a tree, but only approximate when the factor graph has cycles. We show that BP fixed points correspond to the stationary points of the Bethe approximation of the free energy for a factor graph. We explain how to obtain region-based free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms. We emphasize the conditions a free energy approximation must satisfy in order to be a "valid" or "maxent-normal" approximation. We describe the relationship between four different methods that can be used to generate valid approximations: the "Bethe method", the "junction graph method", the "cluster variation method", and the "region graph method". Finally, we explain how to tell whether a region-based approximation, and its corresponding GBP algorithm, is likely to be accurate, and describe empirical results showing that GBP can significantly outperform BP.

1,827 citations


Cites background from "Loopy belief propagation for approx..."

  • ...Nevertheless, in such cases there are no guarantees, and sometimes the results are quite poor, or the algorithm fails to give any result at all because it does not converge [14]....

    [...]

Book
01 Jan 2003
TL;DR: It is shown that BP can only converge to a fixed point that is also a stationary point of the Bethe approximation to the free energy, which enables connections to be made with variational approaches to approximate inference.
Abstract: "Inference" problems arise in statistical physics, computer vision, error-correcting coding theory, and AI. We explain the principles behind the belief propagation (BP) algorithm, which is an efficient way to solve inference problems based on passing local messages. We develop a unified approach, with examples, notation, and graphical models borrowed from the relevant disciplines.We explain the close connection between the BP algorithm and the Bethe approximation of statistical physics. In particular, we show that BP can only converge to a fixed point that is also a stationary point of the Bethe approximation to the free energy. This result helps explaining the successes of the BP algorithm and enables connections to be made with variational approaches to approximate inference.The connection of BP with the Bethe approximation also suggests a way to construct new message-passing algorithms based on improvements to Bethe's approximation introduced Kikuchi and others. The new generalized belief propagation (GBP) algorithms are significantly more accurate than ordinary BP for some problems. We illustrate how to construct GBP algorithms with a detailed example.

1,627 citations

Proceedings Article
02 Aug 2001
TL;DR: Expectation Propagation approximates the belief states by only retaining expectations, such as mean and varitmce, and iterates until these expectations are consistent throughout the network, which makes it applicable to hybrid networks with discrete and continuous nodes.
Abstract: This paper presents a new deterministic approximation technique in Bayesian networks. This method, "Expectation Propagation," unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. Loopy belief propagation, because it propagates exact belief states, is useful for a limited class of belief networks, such as those which are purely discrete. Expectation Propagation approximates the belief states by only retaining expectations, such as mean and varitmce, and iterates until these expectations are consistent throughout the network. This makes it applicable to hybrid networks with discrete and continuous nodes. Experiments with Gaussian mixture models show Expectation Propagation to be donvincingly better than methods with similar computational cost: Laplace's method, variational Bayes, and Monte Carlo. Expectation Propagation also provides an efficient algorithm for training Bayes point machine classifiers.

1,514 citations


Cites background from "Loopy belief propagation for approx..."

  • ...In belief networks with loops it is known that approximate marginal distributions can be obtained by iterating the be­ lief propagation recursions, a process known as loopy be­ lief propagation (Frey & MacKay, 1997; Murphy et al., 1999)....

    [...]

  • ...In belief networks with loops it is known that approximate marginal distributions can be obtained by iterating the belief propagation recursions, a process known as loopy belief propagation (Frey & MacKay, 1997; Murphy et al., 1999)....

    [...]

Posted Content
TL;DR: This work develops an approach to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process, then learns a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.
Abstract: A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.

1,481 citations

References
More filters
Book
01 Jan 1988
TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
Abstract: From the Publisher: Probabilistic Reasoning in Intelligent Systems is a complete andaccessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty. The author provides a coherent explication of probability as a language for reasoning with partial belief and offers a unifying perspective on other AI approaches to uncertainty, such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic. The author distinguishes syntactic and semantic approaches to uncertainty—and offers techniques, based on belief networks, that provide a mechanism for making semantics-based systems operational. Specifically, network-propagation techniques serve as a mechanism for combining the theoretical coherence of probability theory with modern demands of reasoning-systems technology: modular declarative inputs, conceptually meaningful inferences, and parallel distributed computation. Application areas include diagnosis, forecasting, image interpretation, multi-sensor fusion, decision support systems, plan recognition, planning, speech recognition—in short, almost every task requiring that conclusions be drawn from uncertain clues and incomplete information. Probabilistic Reasoning in Intelligent Systems will be of special interest to scholars and researchers in AI, decision theory, statistics, logic, philosophy, cognitive psychology, and the management sciences. Professionals in the areas of knowledge-based systems, operations research, engineering, and statistics will find theoretical and computational tools of immediate practical use. The book can also be used as an excellent text for graduate-level courses in AI, operations research, or applied probability.

15,671 citations

Proceedings Article
01 Jan 1993

7,742 citations

Journal ArticleDOI
TL;DR: In this article, it was shown that probabilistic inference using belief networks is NP-hard and that it seems unlikely that an exact algorithm can be developed to perform inference efficiently over all classes of belief networks and that research should be directed toward the design of efficient special-case, average-case and approximation algorithms.

1,877 citations

Journal ArticleDOI
TL;DR: It is shown that Pearl's algorithm can be used to routinely derive previously known iterative, but suboptimal, decoding algorithms for a number of other error-control systems, including Gallager's low-density parity-check codes, serially concatenated codes, and product codes.
Abstract: We describe the close connection between the now celebrated iterative turbo decoding algorithm of Berrou et al. (1993) and an algorithm that has been well known in the artificial intelligence community for a decade, but which is relatively unknown to information theorists: Pearl's (1982) belief propagation algorithm. We see that if Pearl's algorithm is applied to the "belief network" of a parallel concatenation of two or more codes, the turbo decoding algorithm immediately results. Unfortunately, however, this belief diagram has loops, and Pearl only proved that his algorithm works when there are no loops, so an explanation of the experimental performance of turbo decoding is still lacking. However, we also show that Pearl's algorithm can be used to routinely derive previously known iterative, but suboptimal, decoding algorithms for a number of other error-control systems, including Gallager's (1962) low-density parity-check codes, serially concatenated codes, and product codes. Thus, belief propagation provides a very attractive general methodology for devising low-complexity iterative decoding algorithms for hybrid coded systems.

989 citations


"Loopy belief propagation for approx..." refers background in this paper

  • ...Yet McEliece et. alconjectured that the performance of loopy belief prop-agation on the Turbo code structure was a special caseof a more general phenomenon:We believe there are general undiscoveredtheorems about the performance of beliefpropagation on loopy DAGs....

    [...]

  • ...These theo­ rems, which may have nothing directly to do with coding or decoding will show that in some sense belief propagation "converges with high probability to a near-optimum value" of the desired belief on a class of loopy DAGs [10]....

    [...]

  • ...years" [11] and have recently been shown [9, 10] to uti­ lize an algorithm equivalent to belief propagation in a network with loops....

    [...]

  • ...These theo-rems, which may have nothing directly todo with coding or decoding will show thatin some sense belief propagation \convergeswith high probability to a near-optimumvalue" of the desired belief on a class of loopyDAGs [10]....

    [...]

Book ChapterDOI
01 Jan 1989
TL;DR: Two algorithms were applied to this belief network: a message-passing algorithm by Pearl for probability updating in multiply connected networks using the method of conditioning and the Lauritzen-Spiegelhalter algorithm for local probability computations on graphical structures.
Abstract: ALARM (A Logical Alarm Reduction Mechanism) is a diagnostic application used to explore probabilistic reasoning techniques in belief networks. ALARM implements an alarm message system for patient monitoring; it calculates probabilities for a differential diagnosis based on available evidence. The medical knowledge is encoded in a graphical structure connecting 8 diagnoses, 16 findings and 13 intermediate variables. Two algorithms were applied to this belief network: (1) a message-passing algorithm by Pearl for probability updating in multiply connected networks using the method of conditioning; and (2) the Lauritzen-Spiegelhalter algorithm for local probability computations on graphical structures. The characteristics of both algorithms are analyzed and their specific applications and time complexities are shown.

835 citations