scispace - formally typeset
Open AccessJournal ArticleDOI

Belief propagation vs. TAP for decoding corrupted messages

Yoshiyuki Kabashima, +1 more
- 01 Dec 1998 - 
- Vol. 44, Iss: 5, pp 668-674
TLDR
It is shown that for K>=3 and unbiased messages the iterative solution is sensitive to the initial conditions and is likely to provide erroneous solutions; and that it is generally beneficial to use Nishimori's temperature, especially in the case of biased messages.
Abstract
We employ two different methods, based on belief propagation and TAP,for decoding corrupted messages encoded by employing Sourlas's method, where the code word comprises products of K bits selected randomly from the original message. We show that the equations obtained by the two approaches are similar and provide the same solution as the one obtained by the replica approach in some cases K=2. However, we also show that for K>=3 and unbiased messages the iterative solution is sensitive to the initial conditions and is likely to provide erroneous solutions; and that it is generally beneficial to use Nishimori's temperature, especially in the case of biased messages.

read more

Content maybe subject to copyright    Report

EUROPHYSICS LETTERS 1 December 1998
Europhys. Lett., 44 (5), pp. 668-674 (1998)
Belief propagation vs. TAP for decoding corrupted messages
Y. Kabashima
1
(
)andD. Saad
2
(
∗∗
)
1
Department of Computational Intelligence and Systems Science
Tokyo Institute of Technology - Yokohama 2268502, Japan
2
The Neural Computing Research Group, Aston University
Birmingham B4 7ET, UK
(received 5 February 1998; accepted in final form 7 October 1998)
PACS. 89.70+c Information science.
PACS. 89.90+n Other areas of general interest to physicists.
PACS. 02.50r Probability theory, stochastic processes, and statistics.
Abstract. We employ two different methods, based on belief propagation and TAP, for
decoding corrupted messages encoded by employing Sourlas’s method, where the code word
comprises products of K bits selected randomly from the original message. We show that the
equations obtained by the two approaches are similar and provide the same solution as the one
obtained by the replica approach in some cases (K = 2). However, we also show that for K 3
and unbiased messages the iterative solution is sensitive to the initial conditions and is likely to
provide erroneous solutions; and that it is generally beneficial to use Nishimori’s temperature,
especially in the case of biased messages.
Belief networks [1], also termed Bayesian networks, and influence diagrams are diagram-
matic representations of joint probability distributions over a set of variables. The set of
variables is usually represented by the vertices of a graph, while arcs between vertices rep-
resent probabilistic dependences between variables. Belief propagation provides a convenient
mathematical tool for calculating iteratively joint probability distributions of variables, and
have been used in a variety of cases to assess conditional probabilities and interdependences
between variables in complex systems. One of the most recent uses of belief propagation is in
the field of error-correcting codes, especially for decoding corrupted messages [2] (for a review
of graphical models and their use in the context of error-correcting codes see [3]).
Error-correcting codes provide a mechanism for retrieving the original message after cor-
ruption due to noise during transmission. A new family of error-correcting codes, based on
insights gained from statistical mechanics, has recently been suggested by Sourlas [4]. These
codes can be mapped onto the many-body Ising spin problem and can thus be analysed using
methods adopted from statistical physics [5-9].
In this letter we will examine the similarities and differences between the belief propagation
(BP) and TAP approaches, used as decoders in the context of error-correcting codes. We will
then employ these approaches to examine a few specific cases and compare the results to the
solutions obtained using the replica method [8]. This will enable us to draw some conclusions
on the efficacy of the TAP/BP approach in the context of error-correcting codes.
(
) E-mail: kaba@dis.titech.ac.jp
(
∗∗
) E-mail: saadd@aston.ac.uk
c
EDP Sciences

y. kabashima et al.: belief propagation vs. TAP for decoding etc. 669
In a general scenario, a message represented by an N-dimensional binary vector ξ is
encoded by a vector J
0
which is then transmitted through a noisy channel with some flipping
probability p per bit. The received message J is then decoded to retrieve the original message.
The family of codes, suggested by Sourlas [4], is based on an encoded message of the form
J
0
i
1
,i
2
...i
K
= ξ
i
1
ξ
i
2
...ξ
i
K
, taking the product of K message sites. The original message is then
retrieved by exploring the ground state of the related Hamiltonian
H =
X
hi
1
,i
2
...i
K
i
A
hi
1
,i
2
...i
K
i
J
hi
1
,i
2
...i
K
i
S
i
1
S
i
2
...S
i
K
F/β
X
k
S
k
, (1)
where S is the N -dimensional vector of binary dynamical variables and A is a sparse tensor
with C unit elements per index (setting the rest of the elements to zero), used for constructing
the code word J
0
by selecting K message sites per code word bit. The last term on the right
is required in the case of sparse or biased messages and will require assigning a certain value
to the additive field F/β.CodesofK=2 and K→∞ have been analysed [4, 5] in the case
of extensive connectivity with C
N 1
K 1
and code-rate R = K/C 0, corresponding to
the SK [10] and Random Energy [11] models, respectively; the intensive case with finite and
infinite K, which is of greater practical significance (R 6= 0) and which we will consider here,
has only recently been analysed [8].
We will now present two approaches for decoding the corrupted received message based on
the Bayesian framework and on a statistical-mechanics analysis; the two approaches stem from
the same probabilistic framework and can be easily linked [4].
Decoding the received message J in the Bayesian framework can be carried out by calculat-
ing the marginal posterior probability P(S
l
|J)=Tr
{S
k6=l
}
P(S|J)Tr
{S
k6 = l
}
Q
µ
P (J
µ
|S ) P
0
(S )
for each spin site l,whereµruns over the message components and P
0
(S) represents the prior;
note the similarities to the statistical-mechanics formulation as the logarithms of the likelihood
and prior terms are directly related to the first and second components of the Hamiltonian
(eq. (1)), respectively. Knowing the posterior, one can calculate the typical retrieved message
elements and their alignment with ±1, which correspond to the Bayes-optimal decoding; how-
ever, this turns out to be rather difficult in general and we therefore resort to the methods of
belief propagation, aimed at providing a good approximation to the marginal posterior. This
approach, which is quite similar to the practical approach employed in the case of Gallager
codes [2], assumes a two-layer system corresponding to the elements of the corrupted message J
and the dynamical variables S, respectively, and focuses on the calculation of conditional
probabilities between elements from the two layers when some elements of the system are set
to specific values or removed. Through this process one defines sets of conditional probabilities
relating elements in the two layers (following the general framework of [1] or the more specific
treatments of refs. [2, 3]):
q
x
µl
= P (S
l
= x |{J
ν6=µ
}), (2)
r
x
µl
= P (J
µ
| S
l
= x, {J
ν6=µ
})=Tr
{S
k6=l
}
P(J
µ
|S
l
=x, {S
k6=l
}) P ({S
k6=l
}|{J
ν6=µ
}) ,
where the index µ represents an element of the multidimensional tensor J which is connected
to the corresponding index of S (l in the first equation), i.e. for which the corresponding
element A
hi
1
,...,l,...i
K
i
is non-zero; the notation {S
k6=l
} refers to all elements of the vector S,
excluding the l-th element, which are connected to the corresponding index of J (µ in this case
for the second equation); the index x can take values ±1. The conditional probabilities q
x
µl
and r
x
µl
will enable us, through recursive calculations, to obtain an approximated expression
to the posterior.

670 EUROPHYSICS LETTERS
Employing Bayes rule and the assumptions that the dependence of S
l
on an element J
ν
is
factorizable and vice versa (which are quite reasonable as variables from the same layer are
not expected to be directly dependent):
P (S
l
1
,S
l
2
...S
l
K
|{J
ν6=µ
})=
K
Y
k=1
P (S
l
k
|{J
ν6=µ
})and
P({J
ν6=µ
}|S
l
=x)=
Y
ν6=µ
P(J
ν
|S
l
=x, {J
σ6=ν
}) , (3)
one can write a set of coupled equations for q
±1
µl
and r
±1
µl
of the form
q
x
µl
= a
µl
p
x
l
Q
ν6=µ
r
x
νl
,
r
x
µl
=Tr
{S
k6=l
}
P(J
µ
|S
l
=x, {S
k6=l
})
Q
k6=l
q
S
k
µk
,
(4)
where a
µl
is a normalising factor such that q
1
µl
+q
1
µl
=1 and p
x
l
=P (S
l
=x) are our prior beliefs
in the value of the source bits S
l
.
This set of equations can be solved iteratively [2] by updating a closed coupled set of
difference equations for δq
µl
= q
1
µl
q
1
µl
and δr
µl
= r
1
µl
r
1
µl
, derived for this specific model,
making use of the fact that the variables r
x
µl
, and subsequentially the variables q
x
µl
,canbe
calculated by exploiting the relation r
±1
µl
=(1±δr
µl
)/2 and eqs. (4). At each iteration we can
also calculate the pseudo-posterior probabilities q
x
l
= a
l
p
x
l
Q
ν
r
x
νl
,wherea
l
are normalising
factors, to determine the current typical value of S
l
and consequently the decoded message.
Three points that are worthwhile noting: Firstly, the iterative solution makes use of the
normalisation r
1
µl
+r
1
µl
=1, which is not derived from the basic probability rules and makes
implicit assumptions about the probabilities of obtaining S
l
=±1 for all elements l. Secondly,
the iterative solution would have provided the true posterior probabilities q
x
l
if the graph
connecting the message J and the encoded bits S would have been free of cycles, i.e. atree
with no recurrent dependences among the variables. The fact that the framework does provide
adequate practical solutions has only recently been explained [12]. Thirdly, it is important to
consider the complexity of this decoding scheme as it is of significant practical relevance. Such
analysis has been carried out in ref. [2] resulting in an O(K/R) operations per decoded bit
with a prefactor which depends on the number of iterations required and is typically around
100, which clearly renders this decoding scheme practical.
We will now turn to an alternative approach, showing that for this particular problem it
is possible to obtain a similar set of equations from the corresponding statistical-mechanics
framework based on Bethe approximation [13] or the TAP approach [14] to diluted systems (
1
).
In this approach we assign a Boltzmann weight to each set comprising an encoded message bit
J
µ
and a dynamical vector S,
w
B
(J
µ
|S)=e
βg
(
J
µ
|S
)
, (5)
such that the first term of the system’s Hamiltonian (eq. (1)) can be rewritten as
P
L
µ=1
g(J
µ
|S),
where the index µ = 1 runs over the L non-zero sites in the multidimensional tensor A
(
1
) Note that the terminology in the case of diluted systems is slightly vague as an expansion
with respect to the large Onsager fields is meaningless; here we follow the conventional terminology
for the Bethe approximation when applied to disordered systems subject to mean-field–type random
interactions.

y. kabashima et al.: belief propagation vs. TAP for decoding etc. 671
(which multiplies J). We will now employ two straightforward assumptions to obtain a set
of coupled equations for the mean field q
S
l
µl
= P (S
l
|{J
ν6=µ
}), which may be identified as the
same variable as in the belief network framework (eq. (2)), and the effective Boltzmann weight
w
eff
(J
µ
|S
l
, {J
ν6=µ
}):
1) we assume a mean-field behaviour for the dependence of the dynamical variables S
on a certain realization of the message sites J, i.e. the dependence is factorizable and
may be replaced by a product of mean fields.
2) Boltzmann weights for a specific site S
l
are factorizable with respect to the message
sites J
µ
.
One may argue that these assumptions will provide a reasonable approximation due to the
lack of direct dependence between elements of S and similarly between elements of J (
2
). The
resulting set of equations are of the form
w
eff
(J
µ
| S
l
, {J
ν6=µ
})Tr
{S
k6=l
}
w
B
(J
µ
| S)
Q
k6=l
q
S
k
µl
,
q
S
l
µl
a
µl
p
S
l
l
Q
ν6=µ
w
eff
(J
ν
| S
l
, {J
σ6=ν
}) ,
(6)
where ˜a
µl
is a normalisation factor and p
S
l
l
represents our prior knowledge of the source’s bias.
Replacing the effective Boltzmann weight by a normalised field, which may be identified as
the variable r
S
l
µl
in the belief network framework (eq. (2)), we obtain
r
S
l
µl
= P (S
l
| J
µ
, {J
ν6=µ
})=a
µl
w
eff
(J
µ
| S
l
, {J
ν6=µ
}) , (7)
i.e. a set of equations equivalent to eqs. (4). The explicit expressions of the normalisation
coefficients, a
µl
and ˜a
µl
, are
a
1
µl
=Tr
{S}
w
B
(J
µ
|S)
Y
k6=l
q
S
k
µl
and ˜a
1
µl
=Tr
{S
l
}
p
S
l
l
Y
ν6=µ
r
S
l
νl
. (8)
The somewhat arbitrary use of the differences δq
µl
= hS
µ
l
i
q
and δr
µl
= hS
µ
l
i
r
in the BP
approach becomes clear form the TAP formulation, where they represent the expectation values
of the dynamical variables with respect to the fields. The statistical-mechanics formulation
also provides a partial answer to the successful use of the BP methods to loopy systems, as we
consider a finite number of steps on an infinite lattice [15]. However, it does not provide an
explanation in the case of small loopy systems which should be examined using other methods.
The formulation so far has been rather general and enabled us to show the similarity between
the set of iterative equations obtained by the BP and TAP approaches. We will now make use
of this set of equations to study the efficacy and usefulness of these methods to the problem
at hand, i.e. decoding corrupted messages encoded using Sourlas’s code. In this case we can
make use of the explicit expression for the function g (from eq. (1)) to derive the relation
between q
S
l
µl
, r
S
l
µl
, δq
µl
and δr
µl
,
q
S
l
µl
=
1
2
(1 + δq
µl
S
l
)andr
S
l
µl
=
1
2
(1 + δr
µl
S
l
) , (9)
(
2
) Obviously, the TAP approach is an approximation in this case and these assumptions will be
validated later on by comparing the solutions to those obtained by a different method.

672 EUROPHYSICS LETTERS
as well as an explicit expression for w
B
(J
µ
|S),
w
B
(J
µ
|S)=
1
2
cosh βJ
µ
1+tanhβJ
µ
Y
l∈L(µ)
S
l
, (10)
where L(µ)isthesetofallsitesofSconnected to J
µ
, i.e. for which the corresponding element
of the tensor A is non-zero. The explicit form of the equations for δq
µl
and δr
µl
becomes
δr
µl
=tanhβJ
µ
Q
l∈L(µ)/l
δq
µl
,
δq
µl
=tanh
P
ν∈M(l)
tanh
1
δr
νl
+ F
,
(11)
where M(l) is the set of all indices of the tensor J , excluding µ, which are connected to
the vector site l; the external field F which previously appeared in the last term of eq. (1) is
directly related to our prior belief of the message bias
p
S
l
l
=
1
2
(1 + tanh FS
l
) . (12)
We will now employ eqs. (11) and the explicit expressions obtained above, by making use
of differences δq
µl
and δr
µl
, to obtain values of q
±1
µl
and r
±1
µl
. After these differences are
determined, the (approximated) marginal posterior q
S
l
l
=(1+δq
l
S
l
)/2 can be calculated,
δq
l
=tanh
X
µ∈M(l)
tanh
1
δr
µl
+ F
, (13)
providing the Bayes-optimal decoding ξ
B
l
=sign hS
i
i
T
=sign(δq
l
). The magnetisation M =
1/N
P
N
i=1
ξ
i
ξ
B
i
serves as our performance measure.
We obtained numerical solutions for the cases K =2,5, corruption rate 0 p 0.5, two
bias values (0.1, 0.5) and several temperatures, as shown in fig. 1, which will be compared to
previously obtained solutions [8] using the replica method. The latter have been obtained by
replica symmetric and one step replica symmetry-breaking calculations of the system’s free
energy for the ferromagnetic and paramagnetic phases and the spin-glass phase, respectively
(expecting strong replica symmetry breaking only in the latter), following the work of Sherring-
ton and Wong [15]; saddle-point equations have been solved both analytically and numerically
by employing Monte Carlo techniques.
In the experiments, connectivity is set as C =4,10 for K =2,5, respectively, which provides
thesamecoderateR=1/2 for both cases. For each run, 20000 bit code words J
are
generated from 10000 bit message ξ using a fixed random sparse tensor A. The noise-corrupted
code word J was decoded according to eqs. (11) and (13) to retrieve the original message ξ.
Numerical solutions of 10 individual runs [16], for each value of the flip rate p starting from
different initial conditions, obtained for the case K = 2, different biases (f = p
1
l
=0.1,0.5
—the probability of +1 bit in the original message ξ) and temperatures (T =0.26,T
n
) are
shown in fig. 1(a). The choice of T =0.26, rather than T = 0, for representing solutions at
low temperatures is in order to avoid computational difficulties. We obtain good agreement
between the TAP/BP solutions and the theoretical values obtained using the methods of [8]
(diamond symbols and dashed line, respectively). The results for biased patterns at T =0.26
presented in the form of mean values and standard deviation, show a suboptimal improvement
in performance, as expected. Obtaining solutions under similar conditions but at Nishimori’s

Citations
More filters
Proceedings Article

Generalized Belief Propagation

TL;DR: It is shown that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics, and generalized belief propagation (GBP) versions of these Kikuchi approximations are derived.
Journal ArticleDOI

Statistical physics of inference: thresholds and algorithms

TL;DR: The connection between inference and statistical physics is currently witnessing an impressive renaissance and the current state-of-the-art is reviewed, with a pedagogical focus on the Ising model which, formulated as an inference problem, is called the planted spin glass.
Journal ArticleDOI

Statistical physics of inference: Thresholds and algorithms

TL;DR: In this paper, the authors provide a pedagogical review of the current state-of-the-art algorithms for the planted spin glass problem, with a focus on the Ising model.
Journal ArticleDOI

Statistical-mechanical approach to image processing

TL;DR: The basic frameworks and techniques of the Bayesian approach to image restoration are reviewed from the statistical-mechanical point of view and a few basic notions in digital image processing are explained to convince the reader that statistical mechanics has a close formal similarity to this problem.
Journal ArticleDOI

Adaptive and self-averaging Thouless-Anderson-Palmer mean-field theory for probabilistic modeling.

TL;DR: This work develops a generalization of the Thouless-Anderson-Palmer mean-field approach of disorder physics, which makes the method applicable to the computation of approximate averages in probabilistic models for real data.
References
More filters
Book

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
Journal ArticleDOI

Probabilistic reasoning in intelligent systems: Networks of plausible inference

TL;DR: Probabilistic methods to create the areas, of computational tools, and apparently daphne koller and learning structures evidential reasoning, Pearl is a language for i've is not great give the best references.
Book

Graphical Models for Machine Learning and Digital Communication

TL;DR: Probabilistic inference in graphical models pattern classification unsupervised learning data compression channel coding future research directions and how this affects research directions is investigated.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "Belief propagation vs. tap for decoding corrupted messages" ?

The authors show that the equations obtained by the two approaches are similar and provide the same solution as the one obtained by the replica approach in some cases ( K = 2 ). However, the authors also show that for K ≥ 3 and unbiased messages the iterative solution is sensitive to the initial conditions and is likely to provide erroneous solutions ; and that it is generally beneficial to use Nishimori ’ s temperature, especially in the case of biased messages. In this letter the authors will examine the similarities and differences between the belief propagation ( BP ) and TAP approaches, used as decoders in the context of error-correcting codes. The authors will then employ these approaches to examine a few specific cases and compare the results to the solutions obtained using the replica method [ 8 ]. A new family of error-correcting codes, based on insights gained from statistical mechanics, has recently been suggested by Sourlas [ 4 ].