What are the contributions mentioned in the paper "Belief propagation vs. tap for decoding corrupted messages" ?

The authors show that the equations obtained by the two approaches are similar and provide the same solution as the one obtained by the replica approach in some cases ( K = 2 ). However, the authors also show that for K ≥ 3 and unbiased messages the iterative solution is sensitive to the initial conditions and is likely to provide erroneous solutions ; and that it is generally beneficial to use Nishimori ’ s temperature, especially in the case of biased messages. In this letter the authors will examine the similarities and differences between the belief propagation ( BP ) and TAP approaches, used as decoders in the context of error-correcting codes. The authors will then employ these approaches to examine a few specific cases and compare the results to the solutions obtained using the replica method [ 8 ]. A new family of error-correcting codes, based on insights gained from statistical mechanics, has recently been suggested by Sourlas [ 4 ].

(Open Access) Belief propagation vs. TAP for decoding corrupted messages (1998) | Yoshiyuki Kabashima

EUROPHYSICS LETTERS 1 December 1998

Europhys. Lett., 44 (5), pp. 668-674 (1998)

Belief propagation vs. TAP for decoding corrupted messages

Y. Kabashima

(

∗

)andD. Saad

(

∗∗

)

Department of Computational Intelligence and Systems Science

Tokyo Institute of Technology - Yokohama 2268502, Japan

The Neural Computing Research Group, Aston University

Birmingham B4 7ET, UK

(received 5 February 1998; accepted in ﬁnal form 7 October 1998)

PACS. 89.70+c – Information science.

PACS. 89.90+n – Other areas of general interest to physicists.

PACS. 02.50−r – Probability theory, stochastic processes, and statistics.

Abstract. – We employ two diﬀerent methods, based on belief propagation and TAP, for

decoding corrupted messages encoded by employing Sourlas’s method, where the code word

comprises products of K bits selected randomly from the original message. We show that the

equations obtained by the two approaches are similar and provide the same solution as the one

obtained by the replica approach in some cases (K = 2). However, we also show that for K ≥ 3

and unbiased messages the iterative solution is sensitive to the initial conditions and is likely to

provide erroneous solutions; and that it is generally beneﬁcial to use Nishimori’s temperature,

especially in the case of biased messages.

Belief networks [1], also termed Bayesian networks, and inﬂuence diagrams are diagram-

matic representations of joint probability distributions over a set of variables. The set of

variables is usually represented by the vertices of a graph, while arcs between vertices rep-

resent probabilistic dependences between variables. Belief propagation provides a convenient

mathematical tool for calculating iteratively joint probability distributions of variables, and

have been used in a variety of cases to assess conditional probabilities and interdependences

between variables in complex systems. One of the most recent uses of belief propagation is in

the ﬁeld of error-correcting codes, especially for decoding corrupted messages [2] (for a review

of graphical models and their use in the context of error-correcting codes see [3]).

Error-correcting codes provide a mechanism for retrieving the original message after cor-

ruption due to noise during transmission. A new family of error-correcting codes, based on

insights gained from statistical mechanics, has recently been suggested by Sourlas [4]. These

codes can be mapped onto the many-body Ising spin problem and can thus be analysed using

methods adopted from statistical physics [5-9].

In this letter we will examine the similarities and diﬀerences between the belief propagation

(BP) and TAP approaches, used as decoders in the context of error-correcting codes. We will

then employ these approaches to examine a few speciﬁc cases and compare the results to the

solutions obtained using the replica method [8]. This will enable us to draw some conclusions

on the eﬃcacy of the TAP/BP approach in the context of error-correcting codes.

(

∗

) E-mail: kaba@dis.titech.ac.jp

(

∗∗

) E-mail: saadd@aston.ac.uk

 EDP Sciences

y. kabashima et al.: belief propagation vs. TAP for decoding etc. 669

In a general scenario, a message represented by an N-dimensional binary vector ξ is

encoded by a vector J

which is then transmitted through a noisy channel with some ﬂipping

probability p per bit. The received message J is then decoded to retrieve the original message.

The family of codes, suggested by Sourlas [4], is based on an encoded message of the form

...i

= ξ

...ξ

, taking the product of K message sites. The original message is then

retrieved by exploring the ground state of the related Hamiltonian

H = −

...i

...S

−F/β

, (1)

where S is the N -dimensional vector of binary dynamical variables and A is a sparse tensor

with C unit elements per index (setting the rest of the elements to zero), used for constructing

the code word J

by selecting K message sites per code word bit. The last term on the right

is required in the case of sparse or biased messages and will require assigning a certain value

to the additive ﬁeld F/β.CodesofK=2 and K→∞ have been analysed [4, 5] in the case

of extensive connectivity with C ∼



N − 1

K − 1



and code-rate R = K/C → 0, corresponding to

the SK [10] and Random Energy [11] models, respectively; the intensive case with ﬁnite and

inﬁnite K, which is of greater practical signiﬁcance (R 6= 0) and which we will consider here,

has only recently been analysed [8].

We will now present two approaches for decoding the corrupted received message based on

the Bayesian framework and on a statistical-mechanics analysis; the two approaches stem from

the same probabilistic framework and can be easily linked [4].

Decoding the received message J in the Bayesian framework can be carried out by calculat-

ing the marginal posterior probability P(S

|J)=Tr

k6=l

}

P(S|J)∼Tr

k6 = l

}

P (J

|S ) P

(S )

for each spin site l,whereµruns over the message components and P

(S) represents the prior;

note the similarities to the statistical-mechanics formulation as the logarithms of the likelihood

and prior terms are directly related to the ﬁrst and second components of the Hamiltonian

(eq. (1)), respectively. Knowing the posterior, one can calculate the typical retrieved message

elements and their alignment with ±1, which correspond to the Bayes-optimal decoding; how-

ever, this turns out to be rather diﬃcult in general and we therefore resort to the methods of

belief propagation, aimed at providing a good approximation to the marginal posterior. This

approach, which is quite similar to the practical approach employed in the case of Gallager

codes [2], assumes a two-layer system corresponding to the elements of the corrupted message J

and the dynamical variables S, respectively, and focuses on the calculation of conditional

probabilities between elements from the two layers when some elements of the system are set

to speciﬁc values or removed. Through this process one deﬁnes sets of conditional probabilities

relating elements in the two layers (following the general framework of [1] or the more speciﬁc

treatments of refs. [2, 3]):

µl

= P (S

= x |{J

ν6=µ

}), (2)

µl

= P (J

| S

= x, {J

ν6=µ

})=Tr

k6=l

}

P(J

=x, {S

k6=l

}) P ({S

k6=l

}|{J

ν6=µ

}) ,

where the index µ represents an element of the multidimensional tensor J which is connected

to the corresponding index of S (l in the ﬁrst equation), i.e. for which the corresponding

element A

,...,l,...i

is non-zero; the notation {S

k6=l

} refers to all elements of the vector S,

excluding the l-th element, which are connected to the corresponding index of J (µ in this case

for the second equation); the index x can take values ±1. The conditional probabilities q

µl

and r

µl

will enable us, through recursive calculations, to obtain an approximated expression

to the posterior.

670 EUROPHYSICS LETTERS

Employing Bayes rule and the assumptions that the dependence of S

on an element J

factorizable and vice versa (which are quite reasonable as variables from the same layer are

not expected to be directly dependent):

P (S

...S

|{J

ν6=µ

})=

k=1

P (S

|{J

ν6=µ

})and

P({J

ν6=µ

}|S

=x)=

ν6=µ

P(J

=x, {J

σ6=ν

}) , (3)

one can write a set of coupled equations for q

±1

µl

and r

±1

µl

of the form











µl

= a

µl

ν6=µ

νl

µl

=Tr

k6=l

}

P(J

=x, {S

k6=l

})

k6=l

µk

(4)

where a

µl

is a normalising factor such that q

µl

−1

µl

=1 and p

=P (S

=x) are our prior beliefs

in the value of the source bits S

This set of equations can be solved iteratively [2] by updating a closed coupled set of

diﬀerence equations for δq

µl

= q

µl

−q

−1

µl

and δr

µl

= r

µl

−r

−1

µl

, derived for this speciﬁc model,

making use of the fact that the variables r

µl

, and subsequentially the variables q

µl

,canbe

calculated by exploiting the relation r

±1

µl

=(1±δr

µl

)/2 and eqs. (4). At each iteration we can

also calculate the pseudo-posterior probabilities q

= a

νl

,wherea

are normalising

factors, to determine the current typical value of S

and consequently the decoded message.

Three points that are worthwhile noting: Firstly, the iterative solution makes use of the

normalisation r

µl

−1

µl

=1, which is not derived from the basic probability rules and makes

implicit assumptions about the probabilities of obtaining S

=±1 for all elements l. Secondly,

the iterative solution would have provided the true posterior probabilities q

if the graph

connecting the message J and the encoded bits S would have been free of cycles, i.e. atree

with no recurrent dependences among the variables. The fact that the framework does provide

adequate practical solutions has only recently been explained [12]. Thirdly, it is important to

consider the complexity of this decoding scheme as it is of signiﬁcant practical relevance. Such

analysis has been carried out in ref. [2] resulting in an O(K/R) operations per decoded bit

with a prefactor which depends on the number of iterations required and is typically around

100, which clearly renders this decoding scheme practical.

We will now turn to an alternative approach, showing that for this particular problem it

is possible to obtain a similar set of equations from the corresponding statistical-mechanics

framework based on Bethe approximation [13] or the TAP approach [14] to diluted systems (

In this approach we assign a Boltzmann weight to each set comprising an encoded message bit

and a dynamical vector S,

|S)=e

−βg

(

)

, (5)

such that the ﬁrst term of the system’s Hamiltonian (eq. (1)) can be rewritten as

µ=1

g(J

|S),

where the index µ = 1 runs over the L non-zero sites in the multidimensional tensor A

(

) Note that the terminology in the case of diluted systems is slightly vague as an expansion

with respect to the large Onsager ﬁelds is meaningless; here we follow the conventional terminology

for the Bethe approximation when applied to disordered systems subject to mean-ﬁeld–type random

interactions.

y. kabashima et al.: belief propagation vs. TAP for decoding etc. 671

(which multiplies J). We will now employ two straightforward assumptions to obtain a set

of coupled equations for the mean ﬁeld q

µl

= P (S

|{J

ν6=µ

}), which may be identiﬁed as the

same variable as in the belief network framework (eq. (2)), and the eﬀective Boltzmann weight

eﬀ

, {J

ν6=µ

}):

1) we assume a mean-ﬁeld behaviour for the dependence of the dynamical variables S

on a certain realization of the message sites J, i.e. the dependence is factorizable and

may be replaced by a product of mean ﬁelds.

2) Boltzmann weights for a speciﬁc site S

are factorizable with respect to the message

sites J

One may argue that these assumptions will provide a reasonable approximation due to the

lack of direct dependence between elements of S and similarly between elements of J (

). The

resulting set of equations are of the form











eﬀ

| S

, {J

ν6=µ

})Tr

k6=l

}

| S)

k6=l

µl

=˜a

µl

ν6=µ

eﬀ

| S

, {J

σ6=ν

}) ,

(6)

where ˜a

µl

is a normalisation factor and p

represents our prior knowledge of the source’s bias.

Replacing the eﬀective Boltzmann weight by a normalised ﬁeld, which may be identiﬁed as

the variable r

µl

in the belief network framework (eq. (2)), we obtain

µl

= P (S

| J

, {J

ν6=µ

})=a

µl

eﬀ

| S

, {J

ν6=µ

}) , (7)

i.e. a set of equations equivalent to eqs. (4). The explicit expressions of the normalisation

coeﬃcients, a

µl

and ˜a

µl

, are

−1

µl

=Tr

{S}

|S)

k6=l

µl

and ˜a

−1

µl

=Tr

}

ν6=µ

νl

. (8)

The somewhat arbitrary use of the diﬀerences δq

µl

= hS

and δr

µl

= hS

in the BP

approach becomes clear form the TAP formulation, where they represent the expectation values

of the dynamical variables with respect to the ﬁelds. The statistical-mechanics formulation

also provides a partial answer to the successful use of the BP methods to loopy systems, as we

consider a ﬁnite number of steps on an inﬁnite lattice [15]. However, it does not provide an

explanation in the case of small loopy systems which should be examined using other methods.

The formulation so far has been rather general and enabled us to show the similarity between

the set of iterative equations obtained by the BP and TAP approaches. We will now make use

of this set of equations to study the eﬃcacy and usefulness of these methods to the problem

at hand, i.e. decoding corrupted messages encoded using Sourlas’s code. In this case we can

make use of the explicit expression for the function g (from eq. (1)) to derive the relation

between q

µl

, r

µl

, δq

µl

and δr

µl

(1 + δq

µl

)andr

µl

(1 + δr

µl

) , (9)

(

) Obviously, the TAP approach is an approximation in this case and these assumptions will be

validated later on by comparing the solutions to those obtained by a diﬀerent method.

672 EUROPHYSICS LETTERS

as well as an explicit expression for w

|S,β),

|S,β)=

cosh βJ





1+tanhβJ

l∈L(µ)





, (10)

where L(µ)isthesetofallsitesofSconnected to J

, i.e. for which the corresponding element

of the tensor A is non-zero. The explicit form of the equations for δq

µl

and δr

µl

becomes











δr

µl

=tanhβJ

l∈L(µ)/l

δq

µl

δq

µl

=tanh



ν∈M(l)/µ

tanh

−1

δr

νl

+ F



(11)

where M(l)/µ is the set of all indices of the tensor J , excluding µ, which are connected to

the vector site l; the external ﬁeld F which previously appeared in the last term of eq. (1) is

directly related to our prior belief of the message bias

(1 + tanh FS

) . (12)

We will now employ eqs. (11) and the explicit expressions obtained above, by making use

of diﬀerences δq

µl

and δr

µl

, to obtain values of q

±1

µl

and r

±1

µl

. After these diﬀerences are

determined, the (approximated) marginal posterior q

=(1+δq

)/2 can be calculated,

δq

=tanh





µ∈M(l)

tanh

−1

δr

µl

+ F





, (13)

providing the Bayes-optimal decoding ξ

=sign hS

=sign(δq

). The magnetisation M =

1/N

i=1

serves as our performance measure.

We obtained numerical solutions for the cases K =2,5, corruption rate 0 ≤ p ≤ 0.5, two

bias values (0.1, 0.5) and several temperatures, as shown in ﬁg. 1, which will be compared to

previously obtained solutions [8] using the replica method. The latter have been obtained by

replica symmetric and one step replica symmetry-breaking calculations of the system’s free

energy for the ferromagnetic and paramagnetic phases and the spin-glass phase, respectively

(expecting strong replica symmetry breaking only in the latter), following the work of Sherring-

ton and Wong [15]; saddle-point equations have been solved both analytically and numerically

by employing Monte Carlo techniques.

In the experiments, connectivity is set as C =4,10 for K =2,5, respectively, which provides

thesamecoderateR=1/2 for both cases. For each run, 20000 bit code words J



are

generated from 10000 bit message ξ using a ﬁxed random sparse tensor A. The noise-corrupted

code word J was decoded according to eqs. (11) and (13) to retrieve the original message ξ.

Numerical solutions of 10 individual runs [16], for each value of the ﬂip rate p starting from

diﬀerent initial conditions, obtained for the case K = 2, diﬀerent biases (f = p

=0.1,0.5

—the probability of +1 bit in the original message ξ) and temperatures (T =0.26,T

) are

shown in ﬁg. 1(a). The choice of T =0.26, rather than T = 0, for representing solutions at

low temperatures is in order to avoid computational diﬃculties. We obtain good agreement

between the TAP/BP solutions and the theoretical values obtained using the methods of [8]

(diamond symbols and dashed line, respectively). The results for biased patterns at T =0.26

presented in the form of mean values and standard deviation, show a suboptimal improvement

in performance, as expected. Obtaining solutions under similar conditions but at Nishimori’s

Belief propagation vs. TAP for decoding corrupted messages

Figures

Citations

Generalized Belief Propagation

Statistical physics of inference: thresholds and algorithms

Statistical physics of inference: Thresholds and algorithms

Statistical-mechanical approach to image processing

Adaptive and self-averaging Thouless-Anderson-Palmer mean-field theory for probabilistic modeling.

References

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Probabilistic reasoning in intelligent systems: Networks of plausible inference

Graphical Models for Machine Learning and Digital Communication

Related Papers (5)

Good error-correcting codes based on very sparse matrices

Solution of 'Solvable model of a spin glass'

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Constructing free-energy approximations and generalized belief propagation algorithms

Spin Glass Theory and Beyond

Frequently Asked Questions (1)

Q1. What are the contributions mentioned in the paper "Belief propagation vs. tap for decoding corrupted messages" ?