What contributions have the authors mentioned in the paper "An introduction to factor graphs" ?

In this paper, the authors give an introduction to this unified perspective in terms of ( Forney-style ) factor graphs.

What are the main features of graphs?

Engineers have always liked graphical models such as circuit diagrams, signal flow graphs, trellis diagrams, and a variety of block diagrams.

What is the origin of factor graphs?

The origins of factor graphs lie in coding theory, but they offer an attractive notation for a wide variety of signal processing problems.

What is the standard decoding algorithm for LDPC codes?

The standard decoding algorithm for LDPC codes is the sum-product algorithm; the max-product algorithm as well as various approximations of these algorithms are also sometimes used.

what is the configuration space in fig. 1?

if all variables in Fig. 1 are binary, the configuration space Ω is the set {0, 1}5 of all binary 5-tuples; if all variables in Fig. 1 are real, the configuration space is R5.

What is the FFG for binary linear code C?

Then an FFG for the dual code C⊥ is obtained from the original FFG by replacing all parity check nodes with equality constraint nodes and vice versa.

What are some other topics discussed in Section 5?

Some further topics, ranging from convergence issues to analog realizations of the sumproduct algorithm, are briefly touched in Section 5, and some conclusions are offered in Section 6.

What is the a posteriori probability of the variables in Fig. 6?

In this example, as in many similar examples, it is easy to pass from a priori probabilities to a posteriori probabilities: if the variables Y [k] are observed, say Y [k] = y[k], then these variables become constants; they may be absorbed into the involved factors and the corresponding branches may be removed from the graph.

(Open Access) An introduction to factor graphs (2004) | Hans-Andrea Loeliger

Q: What is the common form of stochastic models?

In artificial intelligence, statistics, and neural networks, stochastic models are often formulated as Bayesian networks or Markov random fields.

Q: What is the origin of the sumproduct algorithm?

In the context of error correcting codes, the sumproduct algorithm was invented by Gallager [17] as a decoding algorithm for low-densityH.

IEEE Signal Processing Mag., Jan. 2004

An Introduction to Factor Graphs

Hans-Andrea Loeliger

ETH Z¨urich

Abstract—A large variety of algorithms in coding, signal processing, and artiﬁcial intel-

ligence may be viewed as instances of the summary-product algorithm (or belief/probability

propagation algorithm), which operates by message passing in a graphical model. Spe-

ciﬁc instances of such algorithms include Kalman ﬁltering and smoothing, the forward-

backward algorithm for hidden Markov models, probability propagation in Bayesian net-

works, and decoding algorithms for error correcting codes such as the Viterbi algorithm,

the BCJR algorithm, and the iterative decoding of turbo codes, low-density parity check

codes, and similar codes. New algorithms for complex detection and estimation problems

can also be derived as instances of the summary-product algorithm. In this paper, we

give an introduction to this uniﬁed perspective in terms of (Forney-style) factor graphs.

1 Introduction

Engineers have always liked graphical models such as circuit diagrams, signal ﬂow graphs,

trellis diagrams, and a variety of block diagrams. In artiﬁcial intelligence, statistics, and

neural networks, stochastic models are often formulated as Bayesian networks or Markov

random ﬁelds. In coding theory, the iterative decoding of turbo codes and similar codes

may also be understood in terms of a graphical model of the code.

Graphical models are often associated with particular algorithms. For example, the

Viterbi decoding algorithm is naturally described by means of a trellis diagram, and

estimation problems in Markov random ﬁelds are often solved by Gibbs sampling.

This paper is an introduction to factor graphs and to the associated summary prop-

agation algorithms, which operate by passing “messages” (“summaries”) along the edges

of the graph. The origins of factor graphs lie in coding theory, but they oﬀer an attractive

notation for a wide variety of signal processing problems. In particular, a large number of

practical algorithms for a wide variety of detection and estimation problems can be derived

as summary propagation algorithms. The algorithms derived in this way often include

the best previously known algorithms as special cases or as obvious approximations.

The two main summary propagation algorithms are the sum-product (or belief propaga-

tion or probability propagation) algorithm and the max-product (or min-sum) algorithm,

both of which have a long history. In the context of error correcting codes, the sum-

product algorithm was invented by Gallager [17] as a decoding algorithm for low-density

H.-A. Loeliger is with the Dept. of Information Technology and Electrical Engineering, ISI-ITET,

ETH Z¨urich, CH-8092 Z¨urich, Switzerland. Email: loeliger@isi.ee.ethz.ch.

parity check (LDPC) codes; it is still the standard decoding algorithm for such codes.

However, the full potential of LDPC codes was not yet realized at that time. Tanner [41]

explicitly introduced graphs to describe LDPC codes, generalized them (by replacing the

parity checks with more general component codes), and introduced the min-sum algo-

rithm.

Both the sum-product and the max-product algorithm have also another root in cod-

ing, viz. the BCJR algorithm [5] and the Viterbi algorithm [10], which both operate on

a trellis. Before the invention of turbo coding, the Viterbi algorithm used to be the

workhorse of many practical coding schemes. The BCJR algorithm, despite its equally

fundamental character, was not widely used; it therefore lingered in obscurity and was

independently re-invented several times.

The f ull power of iterative decoding was only realized by the breakthrough invention

of turbo coding by Berrou et al. [6], which was followed by the rediscovery of LDPC

codes [33]. Wiberg et al. [45], [46] observed that the decoding of turbo codes and LDPC

codes as well as the Viterbi and BCJR algorithms are instances of one single algorithm,

which operates by mess age passing in a generalized Tanner graph. From this perspective,

new applications such as, e.g., iterative decoding for channels with me mory also be came

obvious. The later introduction of factor graphs [15], [24] may be viewed as a further

elaboration of the ideas by Wiberg et al. In the present paper, we will use Forney-style

factor graphs, which were introduced in [13] (and there called “normal graphs”).

Meanwhile, the work of Pearl and others [38], [49], [50], [26] on probability prop-

agation (or belief propagation) in Bayesian networks had attracted much attention in

artiﬁcial intelligence and statistics. It was therefore exciting when, in the wake of turbo

coding, probability propagation and the sum-product algorithm were found to be the same

thing [14], [4]. In particular, the example of iterative decoding proved that probability

propagation, which had been used only for cycle-free graphs, could be used also for graphs

with cycles.

In signal processing, both hidden-Markov models (with the associated forward-backward

algorithm) and Kalman ﬁltering (especially in the form of the RLS algorithm) have long

been serving as workhorses in a variety of applications, and it had gradually be come

apparent that these two techniques are really the same abstract idea in two speciﬁc em-

bodiments. Today, these important algorithms may be seen as just two other instances of

the sum-product (probability propagation) algorithm. In fact, it was shown in [24] (see

also [4]) that even fast Fourier transform (FFT) algorithms may be viewed as ins tances

of the sum-product algorithm.

Graphical models such as factor graphs support a general trend in signal process-

ing from sequential processing to iterative pro cessing. In communications, for example,

the advent of turbo coding has completely changed the design of receivers; formerly se-

quentially arranged subtasks such as synchronization, equalization, and decoding are now

designed to interact via multiple feedback loops. Another example of this trend are “fac-

torial hidden Markov models” [18], where the state space of traditional hidden Markov

models is split into the product of several state spaces. Again, virtually all such sig-

nal processing schemes are examples of s ummary propagation and may be systematically

derived from suitable factor graphs.

The literature on graphical models and their applications is vast. The references men-

tioned in this paper are a somewhat arbitrary sample, very much biased by the author’s

personal perspective and interests. Some excellent papers on iterative coding and commu-

nications are contained in the special issues [1], [2], [3]; beautiful introductions to codes

on graphs and the corresponding algorithms are also given in [11], [12], [25]. Much of the

literature on graphical models appears under the umbrella of neural networks, cf. [22].

A much expected survey on graphical models other than factor graphs is the book by

Jordan [23].

This paper is structured as follows. In Section 2, we introduce factor graphs. (In the

main ideas, we will follow [24] and [13], but we will also adopt some details of notation

from [27] and [42].) The use of such graphs for error correcting codes is described in

Section 3. In Section 4.1, the pivotal issue of eliminating internal variables from a model

is considered. The summary-product algorithm is introduced in Section 4.2. The wide

area of signal processing by message passing is brieﬂy addressed in Sections 4.3 and 4.4.

Some further topics, ranging from convergence issues to analog realizations of the sum-

product algorithm, are brieﬂy touched in Section 5, and some conclusions are oﬀered in

Section 6.

2 Factor Graphs

As mentioned, we will use Forney-style factor graphs (FFGs) rather than the original

factor graphs of [24] (cf. the box on page 9). An FFG is a diagram as in Fig. 1 that

represents the factorization of a function of several variables. Assume, for example, that

some function f(u, w, x, y, z) can be factored as

f(u, w, x, y, z) = f

(u, w, x)f

(x, y, z)f

(z). (1)

This factorization is expressed by the FFG shown in Fig. 1. In general, an FFG consists

of nodes, edges, and “half edges” (which are connected only to one node), and the FFG

is deﬁned by the following rules:

• There is a (unique) node for every factor.

• There is a (unique) edge or half edge for every variable.

• The node representing some factor g is connected with the edge (or half edge)

representing some variable x if and only if g is a function of x.

Implicit in this deﬁnition is the assumption that no variable appears in more than two

factors. We will see below how this seemingly severe restriction is easily circumvented.

The factors are sometimes called local functions and their product is called the global

function. In (1), the global function is f , and f

, f

are the local functions.

A conﬁguration is a particular assignment of values to all variables. The conﬁguration

space Ω is the set of all conﬁgurations; it is the domain of the global function f. For

Figure 1: A (Forney-style) factor graph (FFG).

Y |X

Z|Y

Figure 2: FFG of a Markov chain.

example, if all variables in Fig. 1 are binary, the conﬁguration space Ω is the set {0, 1}

of all binary 5-tuples; if all variables in Fig. 1 are real, the conﬁguration space is R

We will primarily consider the case where f is a function from Ω to R

, the set of

nonnegative real numbers. In this case, a conﬁguration ω ∈ Ω will be called valid if

f(ω) 6= 0.

In every ﬁxed conﬁguration ω ∈ Ω, every variable has some deﬁnite value. We may

therefore consider also the variables in a factor graph as functions with domain Ω. Mim-

icking the standard notation for random variables, we will denote such functions by capital

letters. E.g., if x takes values in some set X , we will write

X : Ω → X : ω 7→ x = X(ω). (2)

A main application of factor graphs are probabilistic models. (In this case, the sample

space can usually be identiﬁed with the conﬁguration space Ω.) For example, let X, Y ,

and Z be random variables that form a Markov chain. Then their joint probability density

(or their joint probability mass function) p

XY Z

(x, y, z) can be written as

XY Z

(x, y, z) = p

(x) p

Y |X

(y|x) p

Z|Y

(z|y). (3)

This factorization is expressed by the FFG of Fig. 2.

If the edge Y is removed from Fig. 2, the remaining graph consists of two unconnected

components, which corresponds to the Markov property

p(x, z|y) = p(x|y)p(z|y). (4)

In general, it is easy to prove the following:

Cut-Set Indepe ndence Theorem: Assume that an FFG represents the joint proba-

bility distribution (or the joint probability density) of several random variables.

Figure 3: A block diagram.

- -

Figure 4: Branching point (left) becomes an equality constraint node (right).

Assume further that the edges corresponding to some variables Y

, . . . , Y

form a

cut-set of the graph (i.e., removing these edges cuts the graph into two unconnected

components). In this case, conditioned on Y

= y

, . . . , Y

= y

(for any ﬁxed

, . . . , y

), every random variable (or every set of random variables) in one compo-

nent of the graph is independent of every random variable (or every set of random

variables) in the other component.

This fact may be viewed as the “easy” direction of the Hammersley-C liﬀord Theorem for

Markov random ﬁelds [47, Ch. 3].

A deterministic block diagram may also be viewed as a factor graph. Consider, for

example, the block diagram of Fig. 3, which expresses the two equations

X = g(U, W ) (5)

Z = h(X, Y ). (6)

In the factor graph interpretation, the function block X = g(U, W ) in the block diagram

is interpreted as representing the factor δ



x − g(u, w)



, where δ(.) is the Kronecker delta

function if X is a discrete variable or the Dirac delta if X is a continuous variable. (The

distinction between these two cases is usually obvious in concrete examples.) Considered

as a factor graph, Fig. 3 thus expresses the factorization

f(u, w, x, y, z) = δ



x − g(u, w)



· δ



z − h(x, y)



. (7)

Note that this function is nonzero (i.e., the conﬁguration is valid) if and only if the

conﬁguration is consistent with both (5) and (6).

As in this example, it is often convenient to draw a factor graph with arrows on the

edges (cf. Figures 6 and 7).

A block diagram usually contains also branching points as shown in Fig. 4 (left). In

the corresponding FFG, such branching points become factor nodes on their own, as is

An introduction to factor graphs

Figures

Citations

Graphical Models, Exponential Families, and Variational Inference

Cooperative Localization in Wireless Networks

Simulation-Based Computation of Information Rates for Channels With Memory

Factor Graphs and GTSAM: A Hands-on Introduction

The Factor Graph Approach to Model-Based Signal Processing

References

Matrix computations

Low-Density Parity-Check Codes

Near Shannon limit error-correcting coding and decoding : Turbo-codes

Optimal decoding of linear codes for minimizing symbol error rate

Factor graphs and the sum-product algorithm

Related Papers (5)

Factor graphs and the sum-product algorithm

Low-Density Parity-Check Codes

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Optimal decoding of linear codes for minimizing symbol error rate

Pattern Recognition and Machine Learning

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "An introduction to factor graphs" ?

Q2. What is the common form of stochastic models?

Q3. What are the main features of graphs?

Q4. What is the origin of factor graphs?

Q5. What is the standard decoding algorithm for LDPC codes?

Q6. What are the main summary propagation algorithms?

Q7. What is the origin of the sumproduct algorithm?

Q8. What is the purpose of this paper?

Q9. what is the configuration space in fig. 1?

Q10. What is the FFG for binary linear code C?

Q11. What was the first real breakthrough in coding?

Q12. What was the first graph to be used to describe LDPC codes?

Q13. What are some other topics discussed in Section 5?

Q14. What is the a posteriori probability of the variables in Fig. 6?

Q15. What is the trend in the field of graphs?