scispace - formally typeset
Open AccessJournal ArticleDOI

An introduction to factor graphs

Hans-Andrea Loeliger
- 19 Feb 2004 - 
- Vol. 21, Iss: 1, pp 28-41
TLDR
This work uses Forney-style factor graphs, which support hierarchical modeling and are compatible with standard block diagrams, and uses them to derive practical detection/estimation algorithms in a wide area of applications.
Abstract
Graphical models such as factor graphs allow a unified approach to a number of key topics in coding and signal processing such as the iterative decoding of turbo codes, LDPC codes and similar codes, joint decoding, equalization, parameter estimation, hidden-Markov models, Kalman filtering, and recursive least squares. Graphical models can represent complex real-world systems, and such representations help to derive practical detection/estimation algorithms in a wide area of applications. Most known signal processing techniques -including gradient methods, Kalman filtering, and particle methods -can be used as components of such algorithms. Other than most of the previous literature, we have used Forney-style factor graphs, which support hierarchical modeling and are compatible with standard block diagrams.

read more

Content maybe subject to copyright    Report

IEEE Signal Processing Mag., Jan. 2004
An Introduction to Factor Graphs
Hans-Andrea Loeliger
ETH Z¨urich
Abstract—A large variety of algorithms in coding, signal processing, and artificial intel-
ligence may be viewed as instances of the summary-product algorithm (or belief/probability
propagation algorithm), which operates by message passing in a graphical model. Spe-
cific instances of such algorithms include Kalman filtering and smoothing, the forward-
backward algorithm for hidden Markov models, probability propagation in Bayesian net-
works, and decoding algorithms for error correcting codes such as the Viterbi algorithm,
the BCJR algorithm, and the iterative decoding of turbo codes, low-density parity check
codes, and similar codes. New algorithms for complex detection and estimation problems
can also be derived as instances of the summary-product algorithm. In this paper, we
give an introduction to this unified perspective in terms of (Forney-style) factor graphs.
1 Introduction
Engineers have always liked graphical models such as circuit diagrams, signal flow graphs,
trellis diagrams, and a variety of block diagrams. In artificial intelligence, statistics, and
neural networks, stochastic models are often formulated as Bayesian networks or Markov
random fields. In coding theory, the iterative decoding of turbo codes and similar codes
may also be understood in terms of a graphical model of the code.
Graphical models are often associated with particular algorithms. For example, the
Viterbi decoding algorithm is naturally described by means of a trellis diagram, and
estimation problems in Markov random fields are often solved by Gibbs sampling.
This paper is an introduction to factor graphs and to the associated summary prop-
agation algorithms, which operate by passing “messages” (“summaries”) along the edges
of the graph. The origins of factor graphs lie in coding theory, but they offer an attractive
notation for a wide variety of signal processing problems. In particular, a large number of
practical algorithms for a wide variety of detection and estimation problems can be derived
as summary propagation algorithms. The algorithms derived in this way often include
the best previously known algorithms as special cases or as obvious approximations.
The two main summary propagation algorithms are the sum-product (or belief propaga-
tion or probability propagation) algorithm and the max-product (or min-sum) algorithm,
both of which have a long history. In the context of error correcting codes, the sum-
product algorithm was invented by Gallager [17] as a decoding algorithm for low-density
H.-A. Loeliger is with the Dept. of Information Technology and Electrical Engineering, ISI-ITET,
ETH urich, CH-8092 urich, Switzerland. Email: loeliger@isi.ee.ethz.ch.
1

parity check (LDPC) codes; it is still the standard decoding algorithm for such codes.
However, the full potential of LDPC codes was not yet realized at that time. Tanner [41]
explicitly introduced graphs to describe LDPC codes, generalized them (by replacing the
parity checks with more general component codes), and introduced the min-sum algo-
rithm.
Both the sum-product and the max-product algorithm have also another root in cod-
ing, viz. the BCJR algorithm [5] and the Viterbi algorithm [10], which both operate on
a trellis. Before the invention of turbo coding, the Viterbi algorithm used to be the
workhorse of many practical coding schemes. The BCJR algorithm, despite its equally
fundamental character, was not widely used; it therefore lingered in obscurity and was
independently re-invented several times.
The f ull power of iterative decoding was only realized by the breakthrough invention
of turbo coding by Berrou et al. [6], which was followed by the rediscovery of LDPC
codes [33]. Wiberg et al. [45], [46] observed that the decoding of turbo codes and LDPC
codes as well as the Viterbi and BCJR algorithms are instances of one single algorithm,
which operates by mess age passing in a generalized Tanner graph. From this perspective,
new applications such as, e.g., iterative decoding for channels with me mory also be came
obvious. The later introduction of factor graphs [15], [24] may be viewed as a further
elaboration of the ideas by Wiberg et al. In the present paper, we will use Forney-style
factor graphs, which were introduced in [13] (and there called “normal graphs”).
Meanwhile, the work of Pearl and others [38], [49], [50], [26] on probability prop-
agation (or belief propagation) in Bayesian networks had attracted much attention in
artificial intelligence and statistics. It was therefore exciting when, in the wake of turbo
coding, probability propagation and the sum-product algorithm were found to be the same
thing [14], [4]. In particular, the example of iterative decoding proved that probability
propagation, which had been used only for cycle-free graphs, could be used also for graphs
with cycles.
In signal processing, both hidden-Markov models (with the associated forward-backward
algorithm) and Kalman filtering (especially in the form of the RLS algorithm) have long
been serving as workhorses in a variety of applications, and it had gradually be come
apparent that these two techniques are really the same abstract idea in two specific em-
bodiments. Today, these important algorithms may be seen as just two other instances of
the sum-product (probability propagation) algorithm. In fact, it was shown in [24] (see
also [4]) that even fast Fourier transform (FFT) algorithms may be viewed as ins tances
of the sum-product algorithm.
Graphical models such as factor graphs support a general trend in signal process-
ing from sequential processing to iterative pro cessing. In communications, for example,
the advent of turbo coding has completely changed the design of receivers; formerly se-
quentially arranged subtasks such as synchronization, equalization, and decoding are now
designed to interact via multiple feedback loops. Another example of this trend are “fac-
torial hidden Markov models” [18], where the state space of traditional hidden Markov
models is split into the product of several state spaces. Again, virtually all such sig-
nal processing schemes are examples of s ummary propagation and may be systematically
2

derived from suitable factor graphs.
The literature on graphical models and their applications is vast. The references men-
tioned in this paper are a somewhat arbitrary sample, very much biased by the author’s
personal perspective and interests. Some excellent papers on iterative coding and commu-
nications are contained in the special issues [1], [2], [3]; beautiful introductions to codes
on graphs and the corresponding algorithms are also given in [11], [12], [25]. Much of the
literature on graphical models appears under the umbrella of neural networks, cf. [22].
A much expected survey on graphical models other than factor graphs is the book by
Jordan [23].
This paper is structured as follows. In Section 2, we introduce factor graphs. (In the
main ideas, we will follow [24] and [13], but we will also adopt some details of notation
from [27] and [42].) The use of such graphs for error correcting codes is described in
Section 3. In Section 4.1, the pivotal issue of eliminating internal variables from a model
is considered. The summary-product algorithm is introduced in Section 4.2. The wide
area of signal processing by message passing is briefly addressed in Sections 4.3 and 4.4.
Some further topics, ranging from convergence issues to analog realizations of the sum-
product algorithm, are briefly touched in Section 5, and some conclusions are offered in
Section 6.
2 Factor Graphs
As mentioned, we will use Forney-style factor graphs (FFGs) rather than the original
factor graphs of [24] (cf. the box on page 9). An FFG is a diagram as in Fig. 1 that
represents the factorization of a function of several variables. Assume, for example, that
some function f(u, w, x, y, z) can be factored as
f(u, w, x, y, z) = f
1
(u, w, x)f
2
(x, y, z)f
3
(z). (1)
This factorization is expressed by the FFG shown in Fig. 1. In general, an FFG consists
of nodes, edges, and “half edges” (which are connected only to one node), and the FFG
is defined by the following rules:
There is a (unique) node for every factor.
There is a (unique) edge or half edge for every variable.
The node representing some factor g is connected with the edge (or half edge)
representing some variable x if and only if g is a function of x.
Implicit in this definition is the assumption that no variable appears in more than two
factors. We will see below how this seemingly severe restriction is easily circumvented.
The factors are sometimes called local functions and their product is called the global
function. In (1), the global function is f , and f
1
, f
2
, f
3
are the local functions.
A configuration is a particular assignment of values to all variables. The configuration
space is the set of all configurations; it is the domain of the global function f. For
3

u
f
1
w
x
f
2
f
3
z
y
Figure 1: A (Forney-style) factor graph (FFG).
p
X
X
p
Y |X
Y
p
Z|Y
Z
Figure 2: FFG of a Markov chain.
example, if all variables in Fig. 1 are binary, the configuration space is the set {0, 1}
5
of all binary 5-tuples; if all variables in Fig. 1 are real, the configuration space is R
5
.
We will primarily consider the case where f is a function from to R
+
, the set of
nonnegative real numbers. In this case, a configuration ω will be called valid if
f(ω) 6= 0.
In every fixed configuration ω Ω, every variable has some definite value. We may
therefore consider also the variables in a factor graph as functions with domain Ω. Mim-
icking the standard notation for random variables, we will denote such functions by capital
letters. E.g., if x takes values in some set X , we will write
X : X : ω 7→ x = X(ω). (2)
A main application of factor graphs are probabilistic models. (In this case, the sample
space can usually be identified with the configuration space Ω.) For example, let X, Y ,
and Z be random variables that form a Markov chain. Then their joint probability density
(or their joint probability mass function) p
XY Z
(x, y, z) can be written as
p
XY Z
(x, y, z) = p
X
(x) p
Y |X
(y|x) p
Z|Y
(z|y). (3)
This factorization is expressed by the FFG of Fig. 2.
If the edge Y is removed from Fig. 2, the remaining graph consists of two unconnected
components, which corresponds to the Markov property
p(x, z|y) = p(x|y)p(z|y). (4)
In general, it is easy to prove the following:
Cut-Set Indepe ndence Theorem: Assume that an FFG represents the joint proba-
bility distribution (or the joint probability density) of several random variables.
4

-
U
?
W
g
-
X
?
Y
h
-
Z
Figure 3: A block diagram.
X
- -
6
r
X
X
0
=
X
00
Figure 4: Branching point (left) becomes an equality constraint node (right).
Assume further that the edges corresponding to some variables Y
1
, . . . , Y
n
form a
cut-set of the graph (i.e., removing these edges cuts the graph into two unconnected
components). In this case, conditioned on Y
1
= y
1
, . . . , Y
n
= y
n
(for any fixed
y
1
, . . . , y
n
), every random variable (or every set of random variables) in one compo-
nent of the graph is independent of every random variable (or every set of random
variables) in the other component.
This fact may be viewed as the “easy” direction of the Hammersley-C lifford Theorem for
Markov random fields [47, Ch. 3].
A deterministic block diagram may also be viewed as a factor graph. Consider, for
example, the block diagram of Fig. 3, which expresses the two equations
X = g(U, W ) (5)
Z = h(X, Y ). (6)
In the factor graph interpretation, the function block X = g(U, W ) in the block diagram
is interpreted as representing the factor δ
x g(u, w)
, where δ(.) is the Kronecker delta
function if X is a discrete variable or the Dirac delta if X is a continuous variable. (The
distinction between these two cases is usually obvious in concrete examples.) Considered
as a factor graph, Fig. 3 thus expresses the factorization
f(u, w, x, y, z) = δ
x g(u, w)
· δ
z h(x, y)
. (7)
Note that this function is nonzero (i.e., the configuration is valid) if and only if the
configuration is consistent with both (5) and (6).
As in this example, it is often convenient to draw a factor graph with arrows on the
edges (cf. Figures 6 and 7).
A block diagram usually contains also branching points as shown in Fig. 4 (left). In
the corresponding FFG, such branching points become factor nodes on their own, as is
5

Citations
More filters
Book

Graphical Models, Exponential Families, and Variational Inference

TL;DR: The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.
Journal ArticleDOI

Cooperative Localization in Wireless Networks

TL;DR: This paper describes several cooperative localization algorithms and quantify their performance, based on realistic UWB ranging models developed through an extensive measurement campaign using FCC-compliant UWB radios, and presents a powerful localization algorithm that is fully distributed, can cope with a wide variety of scenarios, and requires little communication overhead.
Journal ArticleDOI

Simulation-Based Computation of Information Rates for Channels With Memory

TL;DR: The information rate of finite-state source/channel models can be accurately estimated by sampling both a long channel input sequence and the corresponding channel output sequence, followed by a forward sum-product recursion on the joint source/ channel trellis.

Factor Graphs and GTSAM: A Hands-on Introduction

TL;DR: This document provides a hands-on introduction to both factor graphs and GTSAM, a BSD-licensed C++ library based on factor graphs developed at the Georgia Institute of Technology by myself, many of my students, and collaborators.
Journal ArticleDOI

The Factor Graph Approach to Model-Based Signal Processing

TL;DR: The message-passing approach to model-based signal processing is developed with a focus on Gaussian message passing in linear state-space models, which includes recursive least squares, linear minimum-mean-squared-error estimation, and Kalman filtering algorithms.
References
More filters
Book

Matrix computations

Gene H. Golub
Book

Low-Density Parity-Check Codes

TL;DR: A simple but nonoptimum decoding scheme operating directly from the channel a posteriori probabilities is described and the probability of error using this decoder on a binary symmetric channel is shown to decrease at least exponentially with a root of the block length.
Journal ArticleDOI

Factor graphs and the sum-product algorithm

TL;DR: A generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph, that computes-either exactly or approximately-various marginal functions derived from the global function.
Frequently Asked Questions (15)
Q1. What contributions have the authors mentioned in the paper "An introduction to factor graphs" ?

In this paper, the authors give an introduction to this unified perspective in terms of ( Forney-style ) factor graphs. 

In artificial intelligence, statistics, and neural networks, stochastic models are often formulated as Bayesian networks or Markov random fields. 

Engineers have always liked graphical models such as circuit diagrams, signal flow graphs, trellis diagrams, and a variety of block diagrams. 

The origins of factor graphs lie in coding theory, but they offer an attractive notation for a wide variety of signal processing problems. 

The standard decoding algorithm for LDPC codes is the sum-product algorithm; the max-product algorithm as well as various approximations of these algorithms are also sometimes used. 

The two main summary propagation algorithms are the sum-product (or belief propagation or probability propagation) algorithm and the max-product (or min-sum) algorithm, both of which have a long history. 

In the context of error correcting codes, the sumproduct algorithm was invented by Gallager [17] as a decoding algorithm for low-densityH. 

This paper is an introduction to factor graphs and to the associated summary propagation algorithms, which operate by passing “messages” (“summaries”) along the edges of the graph. 

if all variables in Fig. 1 are binary, the configuration space Ω is the set {0, 1}5 of all binary 5-tuples; if all variables in Fig. 1 are real, the configuration space is R5. 

Then an FFG for the dual code C⊥ is obtained from the original FFG by replacing all parity check nodes with equality constraint nodes and vice versa. 

The full power of iterative decoding was only realized by the breakthrough invention of turbo coding by Berrou et al. [6], which was followed by the rediscovery of LDPC codes [33]. 

Tanner [41] explicitly introduced graphs to describe LDPC codes, generalized them (by replacing the parity checks with more general component codes), and introduced the min-sum algorithm. 

Some further topics, ranging from convergence issues to analog realizations of the sumproduct algorithm, are briefly touched in Section 5, and some conclusions are offered in Section 6. 

In this example, as in many similar examples, it is easy to pass from a priori probabilities to a posteriori probabilities: if the variables Y [k] are observed, say Y [k] = y[k], then these variables become constants; they may be absorbed into the involved factors and the corresponding branches may be removed from the graph. 

Another example of this trend are “factorial hidden Markov models” [18], where the state space of traditional hidden Markov models is split into the product of several state spaces.