Journal ArticleDOI

# A Flexible LDPC/Turbo Decoder Architecture

01 Jul 2011-Vol. 64, Iss: 1, pp 1-16

TL;DR: A unified message passing algorithm for LDPC and Turbo codes is proposed and a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding is introduced and an area-efficient flexible SISO decoder architecture is proposed to support LDPC-Turbo codes decoding.

AbstractLow-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm2. The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or 450 Mbps Turbo decoding.

Topics: , Turbo code (71%), Turbo equalizer (67%)

### 1 Introduction

• Practical wireless communication channels are inherently “noisy” due to the impairments caused by channel distortions and multipath effect.
• They can both be represented as codes on graphs which define the constraints satisfied by codewords.
• Third, the authors propose a flexible SISO decoder hardware architecture based on the FFU.
• Section 2 reviews the super-code based decoding algorithm for LDPC codes.

### 2 Review of Super-code Based Decoding Algorithm for LDPC Codes

• Naturally, Turbo decoding procedure can be partitioned into two phases where each phase corresponds to one super-code processing.
• Similarly, LDPC codes can also be partitioned into super-codes for efficient processing as previously mentioned in Section 1.
• Before proceeding with a discussion of the proposed flexible decoder architecture, it is desirable to review the super-code based LDPC decoding scheme in this section.

### 3 Flexible SISO Module

• The authors propose a flexible soft-input softoutput (SISO) module, named Flex-SISO module, to decode LDPC and Turbo codes.
• To reduce complexity, the MAP algorithm is usually calculated in the log domain [31].
• For LDPC codes, a Flex-SISO module was used to decode a super-code.
• The soft input values λi(u) are the outputs from the previous Flex-SISO module, or other previous modules if necessary.
• First, the authors decompose a QC-LDPC code into multiple supercodes, where each layer of the parity check matrix defines a super-code.

### 3.3.1 Review of the Traditional Turbo Decoder Structure

• The traditional Turbo decoding procedure with two SISO decoders is shown in Fig. 7.
• The definitions of the symbols in the figure are as follows.
• The channel LLR values for uk and p(i)k are denoted as λc(uk) and λc(p (i) k ), respectively.
• In the first half iteration, SISO decoder 1 computes the extrinsic value λ1e(uk) and pass it to SISO decoder 2.
• The computation is repeated in each iteration.

### 3.3.2 Modif ied Turbo Decoder Structure Using Flex-SISO Modules

• In order to use the proposed Flex-SISO module for Turbo decoding, the authors modify the traditional Turbo decoder structure.
• Then it removes the old extrinsic value λ1e(uk; old) from the soft input LLR λ1i (uk) to form a temporary message λ1t (uk) as follows (for brevity, the authors drop the superscript “1" in the following equations) λt(uk) = λi(uk) − λe(uk; old).
• The computation is repeated in each half-iteration until the iteration converges.
• Figure 9 shows an iterative Turbo decoder architecture based on the Flex-SISO module.
• The memory organizations are similar, but with a variety of sizes depending on the codeword length.

### 4 Design of a Flexible Functional Unit

• The MAP processor is the main processing unit in both LDPC and Turbo decoders as depicted in Fig. 6 and Fig. 9. (25) Figure 13 shows a MAP processor structure to decode the single parity check code.
• Thus, the same look-up table configuration can be applied to the Turbo ACSA unit.
• To support both LDPC and Turbo codes with minimum hardware overhead, the authors propose a flexible functional unit (FFU) which is depicted in Fig. 15.

### 5 Design of A Flexible SISO Decoder

• Built on top of the FFU arithmetic unit, the authors introduce a flexible SISO decoder architecture to handle LDPC and Turbo codes.
• The boundary β metrics are initialized from an NII buffer (not shown in Fig. 19).
• The decoder first computes λt(u) based on Eq. 5. Prior to decoding, the α and β metrics are initialized to the maximum value.
• While the β unit and the extrinsic-1 unit are working on the first data stream, the α unit can work on the second stream which leads to a pipelined implementation.
• In a parallel processing environment, multiple SISO decoders can be used to increase the throughput.

### 6 Parallel Decoder Architecture Using Multiple Flex-SISO Decoder Cores

• For high throughput applications, it is necessary to use multiple SISO decoders working in parallel to increase the decoding speed.
• For parallel Turbo decoding, multiple SISO decoders can be employed by dividing a codeword block into several sub-blocks and then each sub-block is processed separately by a dedicated SISO decoder [7, 20, 30, 41, 42].
• APP memory is used to store the initial and updated LLR values.
• Turbo parity memory is used to store the channel LLR values for each parity bit in a Turbo codeword.
• As a case study, the authors have designed a high-throughput, flexible LDPC/Turbo decoder to support the following three codes: 1) 802.16e WiMAX LDPC code, 2) 802.11n WLAN LDPC code, and 3) 3GPP-LTE Turbo code.

### 8 Conclusion

• The authors present a flexible decoder architecture to support LDPC and Turbo codes.
• To increase decoding throughput, the authors propose a parallel LDPC/Turbo decoder using multiple Flex-SISO cores.
• The proposed architecture can significantly reduce the cost of a multi-mode receiver.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

J Sign Process Syst (2011) 64:1–16
DOI 10.1007/s11265-010-0477-6
A Flexible LDPC/Turbo Decoder Architecture
Yang Sun · Joseph R. Cavallaro
Received: 21 November 2009 / Revised: 11 March 2010 / Accepted: 12 March 2010 / Published online: 9 April 2010
Abstract Low-density parity-check (LDPC) codes and
convolutional Turbo codes are two of the most power-
ful error correcting codes that are widely used in mod-
ern communication systems. In a multi-mode baseband
receiver, both LDPC and Turbo decoders may be re-
quired. However, the different decoding approaches
for LDPC and Turbo codes usually lead to different
hardware architectures. In this paper we propose a uni-
fied message passing algorithm for LDPC and Turbo
codes and introduce a flexible soft-input soft-output
(SISO) module to handle LDPC/Turbo decoding. We
employ the trellis-based maximum a posteriori (MAP)
algorithm as a bridge between LDPC and Turbo codes
decoding. We view the LDPC code as a concatenation
of n super-codes where each super-code has a simpler
trellis structure so that the MAP algorithm can be
easily applied to it. We propose a flexible functional
unit (FFU) for MAP processing of LDPC and Turbo
and timing overhead). Based on the FFU, we propose
an area-efficient flexible SISO decoder architecture to
support LDPC/Turbo codes decoding. Multiple such
SISO modules can be embedded into a parallel decoder
for higher decoding throughput. As a case study, a
flexible LDPC/Turbo decoder has been synthesized on
a TSMC 90 nm CMOS technology with a core area of
3.2 mm
2
. The decoder can support IEEE 802.16e LDPC
codes, IEEE 802.11n LDPC codes, and 3GPP LTE
Y. Sun (
B
) · J. R. Cavallaro
Department of Electrical and Computer Engineering Rice
University, 6100 Main Street, Houston, TX 77005, USA
e-mail: ysun@rice.edu
J. R. Cavallaro
e-mail: cavallar@rice.edu
Turbo codes. Running at 500 MHz clock frequency, the
decoder can sustain up to 600 Mbps LDPC decoding or
450 Mbps Turbo decoding.
Keywords SISO decoder ·LDPC decoder ·
Turbo decoder · Error correcting codes ·
MAP algorithm · Reconfigurable architecture
1 Introduction
Practical wireless communication channels are inher-
ently “noisy” due to the impairments caused by channel
distortions and multipath effect. Error correcting codes
are widely used to increase the bandwidth and energy
efficiency of wireless communication systems. As a core
technology in wireless communications, forward error
correction (FEC) coding has migrated from basic con-
volutional/block codes to more powerful Turbo codes
and LDPC codes. Turbo codes, introduced by Berrou
et al. in 1993 [4], have been employed in 3G and
beyond 3G wireless systems, such as UMTS/WCDMA
and 3GPP Long-Term Evolution (LTE) systems. As a
candidate for 4G coding scheme, LDPC codes, which
were introduced by Gallager in 1963 [13], have re-
cently received significant attention in coding theory
tems such as IEEE 802.16e WiMAX system and IEEE
802.11n WLAN system. In future 4G networks, inter-
networking and roaming between different networks
would require a multi-standard FEC decoder. Since
Turbo codes and LDPC codes are widely used in many
different 3G/4G systems, it is important to design a
configurable decoder to support multiple FEC coding
schemes.

2 J Sign Process Syst (2011) 64:1–16
In the literature, many efficient LDPC decoder VLSI
architectures have been studied [6, 9, 12, 14, 18, 24, 27,
29, 35, 37, 39, 45, 47]. Turbo decoder VLSI architec-
tures have also been extensively investigated by many
researchers [5, 8, 20, 21, 25, 30, 33, 41, 44]. However,
designing a flexible decoder to support both LDPC
and Turbo codes still remains very challenging. In this
paper, we aim to provide an alternative to dedicated
silicon that reduces the cost of supporting both LDPC
propose a flexible decoder architecture to meet the
needs of a multi-standard FEC decoder.
From the theoretical point of view, there are some
similarities between LDPC and Turbo codes. They can
both be represented as codes on graphs which define
the constraints satisfied by codewords. Both families
of codes are decoded in an iterative manner by em-
ploying the sum-product algorithm or belief propa-
gation algorithm. For example, MacKay has related
these two codes by treating a Turbo code as a low-
density parity-check code [23]. On the other hand, a
few other researchers have tried to treat a LDPC code
as a Turbo code and apply a turbo-like message passing
algorithm to LDPC codes. For example, Mansour and
Shanbhag [24] introduce an efficient turbo message
passing algorithm for architecture-aware LDPC codes.
Hocevar [18] proposes a layered decoding algorithm
which treats the parity check matrix as horizontal lay-
ers and passes the soft information between layers to
improve the performance. Zhu and Chakrabarti [50]
looked at the super-code based LDPC construction and
decoding. Zhang and Fossorier [46] suggest a shuffled
belief propagation algorithm to achieve a faster decod-
ing speed. Lu and Moura [22] propose to partition the
Tanner graph into several trees and apply the turbo-like
decoding algorithm in each tree for faster convergence
rate. Dai et al. [12] introduce a turbo-sum-product
hybrid decoding algorithm for quasi-cyclic (QC) LDPC
codes by splitting the parity check matrix into two sub-
matrices where the information is exchanged.
In our early work [38], we have proposed a super-
code based decoding algorithm for LDPC codes. In
this paper, we extend this algorithm and present a
more generic message passing algorithm for LDPC
and Turbo decodings, and then exploit the architecture
commonalities between LDPC and Turbo decoders.
We create a connection between LDPC and Turbo
codes by applying a super-code based decoding algo-
rithm, where a code is divided into multiple super-codes
and then the decoding operation is performed by iter-
atively exchanging the soft information between super-
codes. In the LDPC decoding, we treat a LDPC code
as a concatenation of n super-codes, where each super-
code has a simpler trellis structure so that the maxi-
mum a posteriori (MAP) algorithm can be efficiently
performed. In the Turbo decoding, we modify the tradi-
tional message passing flow so that the proposed super-
code based decoding scheme works for Turbo codes as
well.
Contributions of this paper are as follows. First, we
introduce a flexible soft-input soft-output (Flex-SISO)
module for LDPC and Turbo codes decoding. Sec-
ond, we introduce an area-efficient flexible functional
unit (FFU) for implementing the MAP algorithm in
hardware. Third, we propose a flexible SISO decoder
hardware architecture based on the FFU. Finally, we
show how to enable parallel decoding by using multiple
such Flex-SISO decoders.
The remainder of the paper is organized as follows.
Section 2 reviews the super-code based decoding al-
gorithm for LDPC codes. Section 3 presents a Flex-
SISO module for LDPC/Turbo decoding. Section 4
introduces a flexible functional unit (FFU) for LDPC
and Turbo decoding. Based on the FFU, Section 5
describes a dual-mode Flex-SISO decoder architecture.
Section 6 presents a parallel decoder architecture us-
ing multiple Flex-SISO cores. Section 7 compares our
flexible decoder with existing decoders in the literature.
Finally, Section 8 concludes the paper.
2 Review of Super-code Based Decoding Algorithm
forLDPCCodes
By definition, a Turbo code is a parallel concatenation
of two super-codes, where each super-code is a con-
stituent convolutional code. Naturally, Turbo decoding
procedure can be partitioned into two phases where
each phase corresponds to one super-code processing.
Similarly, LDPC codes can also be partitioned into
super-codes for efficient processing as previously men-
tioned in Section 1. Before proceeding with a discussion
of the proposed flexible decoder architecture, it is de-
sirable to review the super-code based LDPC decoding
scheme in this section.
2.1 Trellis Structure for LDPC Codes
A binary LDPC code is a linear block code specified by
a very sparse binary M × N parity check matrix:
H ·x
T
= 0, (1)
where x is a codeword (x C)andH can be viewed
as a bipartite graph where each column and row in
H represent a variable node and a check node, re-
spectively. Each element of the parity check matrix is

J Sign Process Syst (2011) 64:1–16 3
Interconnect Network (
Π
)
Variable Nodes
Check Nodes
2-state trellis
Degree i
Degree j
1 2 … j
x
1
+x
2
+…+x
j
=0
Figure 1 Trellis representation for LDPC codes where a two-
state trellis diagram is associated with each check node.
either a zero or a one, where nonzero elements are
typically placed at random positions to achieve good
performance. The number of nonzero elements in each
row or each column of the parity check matrix is called
check node degree or variable node degree. A regular
LDPC code has the same check node and variable node
degrees, whereas an irregular LDPC code has different
check node and variable node degrees.
The full trellis structure of an LDPC code is enor-
mously large, and it is impractical to apply the MAP
algorithm on the full trellis. However, alternately, a
(N,M-N)LDPCcodecanbeviewedasM parallel
concatenated single parity check codes. Although the
performance of a single parity check code is poor, when
many of them are sparsely connected they become a
very strong code. Figure 1 shows a trellis representation
for LDPC codes where a single parity check code is
considered as a low-weight two-state trellis, starting at
state 0 and ending at state 0.
2.2 Layered Message Passing Algorithm for LDPC
Codes
The main idea behind the layered LDPC decoding is
essentially the Turbo message passing algorithm [24].
It has been shown that the layered message passing
c1 c2 c3 c4
v2v1 v3 v4 v5 v6
c1
v2v1
c3
v3 v4 v5
c2
v2v1 v3
c4
v4 v5 v6
Original factor graph
Sub factor graph 1 Sub factor graph 2
Figure 2 Dividing a factor graph into sub-graphs.
Super-code 1
I
0
Super-code 2
Super-code
Super-code n
...
Figure 3 A block-structured parity check matrix, where each
block row (or layer) defines a super-code. Each sub-matrix of the
parity check matrix is either a zero matrix or a z × z cyclically
shifted identity matrix.
algorithm can achieve a faster convergence rate than
the standard two-phase message-passing algorithm for
structured LDPC codes [18, 24]. To be more general,
we can divide the factor graph of an LDPC code into
several sub-graphs [38] as illustrated in Fig. 2.Eachsub-
graph corresponds to a super-code. If we restrict that
each sub-graph is loop-free, then each super-code has a
simpler trellis structure so that the MAP algorithm can
be efficiently performed.
As a special example, the block-structured Quasi-
Cyclic (QC) LDPC codes used in many practical com-
munication systems such as 802.16e and 802.11n can be
easily decomposed into several super-codes. As shown
in Fig. 3, a block structured parity check matrix can
be viewed as a 2-D array of square sub-matrices. Each
sub-matrix is either a zero matrix or a z-by-z cyclically
shifted identity matrix I
z(x)
with random shift value x.
The parity check matrix can be viewed as a concate-
nation of n super-codes where each block row or layer
defines a super-code. In the layered message passing
algorithm, soft information generated by one super-
code can be used immediately by the following super-
codes which leads to a faster convergence rate [24].
3 Flexible SISO Module
In this section, we propose a flexible soft-input soft-
output (SISO) module, named Flex-SISO module, to
decode LDPC and Turbo codes. The SISO module is
based on the MAP algorithm [3]. To reduce complexity,
the MAP algorithm is usually calculated in the log do-
main [31]. In this paper, we assume the MAP algorithm
is always calculated in the log domain.
The decoding algorithm underlying the Flex-SISO
module works for codes which have trellis representa-
tions. For LDPC codes, a Flex-SISO module was used

4 J Sign Process Syst (2011) 64:1–16
Flex-SISO
Module
Memory
i
(u)
λλ
λ
λλ
o
(u)
c
(p)
e
(u;new)
e
(u;old)
Soft values for
informatio n bits
APP values for
information bits
New extrinsic values
for information bits
Channel values for
parity bits
Old extrinsic values
for information bits
Figure 4 Flex-SISO module.
to decode a super-code. For Turbo codes, a Flex-SISO
module was used to decode a component convolutional
code. Iteration performed by the Flex-SISO module is
called sub-iteration, and thus one full iteration contains
n sub-iterations.
3.1 Flex-SISO Module
Figure 4 depicts the proposed Flex-SISO module. The
output of the Flex-SISO module is the a posteriori
probability (APP) log-likelihood ratio (LLR) values,
denoted as λ
o
(u), for information bits. It should be
noted that the Flex-SISO module exchanges the soft
values λ
o
(u) instead of the extrinsic values in the iter-
ative decoding process. The extrinsic values, denoted
as λ
e
(u), are stored in a local memory of the Flex-
SISO module. To distinguish the extrinsic values gen-
erated at different sub-iterations, we use λ
e
(u;old) and
λ
e
(u;new) to represent the extrinsic values generated in
the previous sub-iteration and the current sub-iteration,
respectively. The soft input values λ
i
(u) are the out-
puts from the previous Flex-SISO module, or other
previous modules if necessary. Another input to the
Flex-SISO module is the channel values for parity bits,
denoted as λ
c
(p), if available. For LDPC codes, we do
not distinguish information and parity bits, and all the
codeword bits are treated as information bits. However,
in the case of Turbo codes, we treat information and
parity bits separately. Thus the input port λ
c
(p) will not
be used when decoding of LDPC codes. At each sub-
iteration, the old extrinsic values, denoted as λ
e
(u;old),
subtracted from the soft input values λ
i
(u) to avoid
positive feedback.
A generic description of the message passing algo-
rithm is as follows. Multiple Flex-SISO modules are
connected in series to form an iterative decoder. First,
the Flex-SISO module receives the soft values λ
i
(u)
from upstream Flex-SISO modules and the channel
values (for parity bits) λ
c
(p) if available. The λ
i
(u) can
be thought of as the sum of the channel value λ
c
(u)
(for information bit) and all the extrinsic values λ
e
(u)
previously generated by all the super-codes:
λ
i
(u) = λ
c
(u) +
λ
e
(u). (2)
Note that prior to the iterative decoding, λ
i
(u) should
be initialized with λ
c
(u). Next, the old extrinsic value
λ
e
(u;old) generated by this Flex-SISO module in the
previous iteration is subtracted from λ
i
(u) as follows:
λ
t
(u) = λ
i
(u) λ
e
(u;old). (3)
Then, the new extrinsic value λ
e
(u;new) can be com-
puted using the MAP algorithm based on λ
t
(u),and
λ
c
(p) if available. Finally, the APP value is updated as
λ
o
(u) = λ
i
(u) λ
e
(u;old) +λ
e
(u;new). (4)
Then this updated APP value is passed to the down-
stream Flex-SISO modules. This computation repeats
in each sub-iteration.
3.2 Flex-SISO Module to Decode LDPC Codes
In this section, we show how to use the Flex-SISO
module to decode LDPC codes. Because QC-LDPC
codes are widely used in many practical systems, we
will primarily focus on the QC-LDPC codes. First,
we decompose a QC-LDPC code into multiple super-
codes, where each layer of the parity check matrix
defines a super-code. After the layered decomposition,
each super-code comprises z independent two-state sin-
gle parity check codes. Figure 5 shows the super-code
based, or layered, LDPC decoder architecture using the
Flex-SISO modules. The decoder parallelism at each
Flex-SISO module is at the level of the sub-matrix size
z, because these z single parity codes have no data
dependency and can thus be processed simultaneously.
This architecture differs than the regular two-phase
LDPC decoder in that a code is partitioned into mul-
tiple sections, and each section is processed by a same
processor. The convergence rate can be twice faster
than that of a regular decoder [18].
Memory
o
(u)
λ λ
i
(u)
Flex-SISO 1
Memory
o
(u)
λ λ
i
(u)
Flex-SISO 2
Memory
o
(u)
λ λ
i
(u)
Flex-SISO n
e
(u;old)
λλ
e
(u;new)
...
Figure 5 LDPC decoding using Flex-SISO modules where a
LDPC code is decomposed into n super-codes, and n Flex-SISO
modules are connected in series to decode.

J Sign Process Syst (2011) 64:1–16 5
APP
Memory
+
-
LDPC
MAP Processor
Extrinsic
Memory
o
(u)
λλ
λλ
λ
λ
λ
i
(u)
e
(u;new)
e
(u;old)
t
(u)
c
(u)
Flex-SISO
c
(p)=0
Figure 6 LDPC decoder architecture based on the Flex-SISO
module.
Since the data flow is the same between different
sub-iterations, one physical Flex-SISO module is in-
stantiated, and it is re-used at each sub-iteration, which
leads to a partial-parallel decoder architecture. Figure 6
shows an iterative LDPC decoder hardware architec-
ture based on the Flex-SISO module. The structure
comprises an APP memory to store the soft APP val-
ues, an extrinsic memory to store the extrinsic values,
and a MAP processor to implement the MAP algorithm
for z single parity check codes. Prior to the iterative
decoding process, the APP memory is initialized with
channel values λ
c
(u), and the extrinsic memory is ini-
tialized with 0.
The decoding flow is summarized as follows. It
should be noted that the parity bits are treated as
information bits for the decoding of LDPC codes. We
use the symbol u
k
to represent the k-th data bit in the
codeword. For check node m,weusethesymbolu
m,k
to denote the k-th codeword bit (or variable node) that
is connected to this check node m. To remove corre-
lations between iterations, the old extrinsic message
is subtracted from the soft input message to create a
temporary message λ
t
as follows
λ
t
(u
m,k
) = λ
i
(u
k
) λ
e
(u
m,k
;old), (5)
where λ
i
(u
k
) is the soft input log likelihood ratio (LLR)
and λ
e
(u
m,k
;old) is the old extrinsic value generated by
this MAP processor in the previous iteration. Then the
new extrinsic value can be computed as:
λ
e
(u
m,k
;new) =
j:j=k
λ
t
(u
m, j
), (6)
where the
operation is associative and commutative,
and is defined as [15]
λ(u
1
) λ(u
2
) = log
1 +e
λ(u
1
)
e
λ(u
2
)
e
λ(u
1
)
+e
λ(u
2
)
. (7)
Finally, the new APP value is updated as:
λ
o
(u
k
) = λ
t
(u
m,k
) +λ
e
(u
m,k
;new). (8)
For each sub-iteration l,Eqs.(5)–(8) can be executed
in parallel for check nodes m = lz to lz + z 1 because
there are no data dependency between them.
3.3 Flex-SISO Module to Decode Turbo Codes
In this section, we show how to use the Flex-SISO mod-
ule to decode Turbo codes. A Turbo code can be nat-
urally partitioned into two super-codes, or constituent
codes. In a traditional Turbo decoder, where the extrin-
sic messages are exchanged between two super-codes,
the Flex-SISO module can not be directly applied,
because the Flex-SISO module requires the APP val-
ues, rather than the extrinsic values, being exchanged
between super-codes. In this section, we made a small
modification to the traditional Turbo decoding flow so
that the APP values are exchanged in the decoding
procedure.
3.3.1 Review of the Traditional Turbo Decoder
Structure
The traditional Turbo decoding procedure with two
SISO decoders is shown in Fig. 7. The definitions of
the symbols in the figure are as follows. The informa-
tion bit and the parity bits at time k are denoted as
u
k
and ( p
(1)
k
, p
(2)
k
, ..., p
(n)
k
), respectively, with u
k
, p
(i)
k
{0, 1}. The channel LLR values for u
k
and p
(i)
k
are
denoted as λ
c
(u
k
) and λ
c
(p
(i)
k
), respectively. The apriori
LLR, the extrinsic LLR, and the APP LLR for u
k
are
denoted as λ
a
(u
k
), λ
e
(u
k
),andλ
o
(u
k
), respectively.
SISO 1 SISO 2
1
e
(u)
2
a
(u)
c
(u)
1
Π
Π
Π
2
e
(u)
λ
λ
λλ
λ
λ
λ
λ
λ
1
a
(u)
c
(
p1
)
c
(
p2
)
1
o
(u)
2
o
(u)
Figure 7 Traditional Turbo decoding procedure using two SISO
decoders, where the extrinsic LLR values are exchanged between
two SISO decoders.

##### Citations
More filters

Journal ArticleDOI
TL;DR: This work concentrates on the design of a reconfigurable architecture for both turbo and LDPC codes decoding, tackling the reconfiguration issue and introducing a formal and systematic treatment that was not previously addressed.
Abstract: Flexible and reconfigurable architectures have gained wide popularity in the communications field. In particular, reconfigurable architectures for the physical layer are an attractive solution not only to switch among different coding modes but also to achieve interoperability. This work concentrates on the design of a reconfigurable architecture for both turbo and LDPC codes decoding. The novel contributions of this paper are: i) tackling the reconfiguration issue introducing a formal and systematic treatment that, to the best of our knowledge, was not previously addressed and ii) proposing a reconfigurable NoC-based turbo/LDPC decoder architecture and showing that wide flexibility can be achieved with a small complexity overhead. Obtained results show that dynamic switching between most of considered communication standards is possible without pausing the decoding activity. Moreover, post-layout results show that tailoring the proposed architecture to the WiMAX standard leads to an area occupation of 2.75 mm2 and a power consumption of 101.5 mW in the worst case.

53 citations

### Cites background or methods from "A Flexible LDPC/Turbo Decoder Archi..."

• ...Flexible decoders available in the literature [9]–[13], [16], [17], [19], [20], though supporting a wide range of codes, do not address the reconfiguration issue....

[...]

• ...Sun and Cavallaro describe in [13] a decoder working with 3GPP-LTE turbo codes and WiMAX and WiFi LDPC codes....

[...]

Proceedings ArticleDOI
14 Mar 2011
TL;DR: A multi-core architecture which supports convolutional codes, binary/duo-binary turbo codes, and LDPC codes, based on Application Specific Instruction-set Processors (ASIP) and avoids the use of dedicated interleave/deinterleave address lookup memories is presented.
Abstract: In order to address the large variety of channel coding options specified in existing and future digital communication standards, there is an increasing need for flexible solutions. This paper presents a multi-core architecture which supports convolutional codes, binary/duo-binary turbo codes, and LDPC codes. The proposed architecture is based on Application Specific Instruction-set Processors (ASIP) and avoids the use of dedicated interleave/deinterleave address lookup memories. Each ASIP consists of two datapaths one optimized for turbo and the other for LDPC mode, while efficiently sharing memories and communication resources. The logic synthesis results yields an overall area of 2.6mm2 using 90nm technology. Payload throughputs of up to 312Mbps in LDPC mode and of 173Mbps in Turbo mode are possible at 520MHz, fairing better than existing solutions.

35 citations

### Cites methods from "A Flexible LDPC/Turbo Decoder Archi..."

• ...A high throughput of 257Mbps is achieved for LDPC mode while a limited throughput of 37.2Mbps in DBTC and 18.6Mbps in SBTC modes are achieved at 400MHz....

[...]

Journal ArticleDOI
TL;DR: This work proposes dynamic multi-frame processing schedule which efficiently utilizes the layered-LDPC decoding with minimum pipeline stages and efficient comparison techniques for both column and row layered schedule and rejection-based high-speed circuits to compute the two minimum values from multiple inputs required for row layered processing of hardware-friendly min-sum decoding algorithm.
Abstract: This paper presents architecture of block-level-parallel layered decoder for irregular LDPC code. It can be reconfigured to support various block lengths and code rates of IEEE 802.11n (WiFi) wireless-communication standard. We have proposed efficient comparison techniques for both column and row layered schedule and rejection-based high-speed circuits to compute the two minimum values from multiple inputs required for row layered processing of hardware-friendly min-sum decoding algorithm. The results show good speed with lower area as compared to state-of-the-art circuits. Additionally, this work proposes dynamic multi-frame processing schedule which efficiently utilizes the layered-LDPC decoding with minimum pipeline stages. The suggested LDPC-decoder architecture has been synthesized and post-layout simulated in 90 nm-CMOS process. This decoder occupies 5.19 ${\rm mm}^{2}$ area and supports multiple code rates like 1/2, 2/3, 3/4 & 5/6 as well as block-lengths of 648, 1296 & 1944. At a clock frequency of 336 MHz, the proposed LDPC-decoder has achieved better throughput of 5.13 Gbps and energy efficiency of 0.01 nJ/bits/iterations, as compared to the similar state-of-the-art works.

33 citations

### Cites background from "A Flexible LDPC/Turbo Decoder Archi..."

• ...While Sun and Cavallardo [15] have designed single architecture to process both LDPC and turbo codes by proposing a unified algorithm....

[...]

• ...Multi-mode reconfigurable architectures in [14] and [15] have the flexibility to switch between LDPC and turbo decoding-process....

[...]

Patent
08 Sep 2010
Abstract: A configurable Turbo-LDPC decoder comprising: A set of P> 1 Soft-Input-Soft-Output decoding units (DP 0 -DP P-1 ; DP i ) for iteratively decoding both Turbo- and LDPC-encoded input data, each of said decoding units having first (I 1 i ) and second (I 2 i ) input ports and first (O 1 i ) and second (O 2 i ) output ports for intermediate data; First and second memories (M 1 , M 2 ) for storing said intermediate data, each of said first and second memories comprising P independently readable and writable memory blocks having respective input and output ports; and A configurable switching network (SN) for connecting the first input and output ports of said decoding units to the output and input ports of said first memory, and the second input and output ports of said decoding units to the output and input ports of said second memory

22 citations

Proceedings ArticleDOI
12 Mar 2012
TL;DR: This contribution focuses on one of the most important baseband processing units in wireless receivers, the forward error correction unit, and proposes a Network-on-Chip (NoC) based approach to the design of multi-standard decoders.
Abstract: The current convergence process in wireless technologies demands for strong efforts in the conceiving of highly flexible and interoperable equipments. This contribution focuses on one of the most important baseband processing units in wireless receivers, the forward error correction unit, and proposes a Network-on-Chip (NoC) based approach to the design of multi-standard decoders. High level modeling is exploited to drive the NoC optimization for a given set of both turbo and Low-Density-Parity-Check (LDPC) codes to be supported. Moreover, synthesis results prove that the proposed approach can offer a fully compliant WiMAX decoder, supporting the whole set of turbo and LDPC codes with higher throughput and an occupied area comparable or lower than previously reported flexible implementations. In particular, the mentioned design case achieves a worst-case throughput higher than 70 Mb/s at the area cost of 3.17 mm2 on a 90 nm CMOS technology.

20 citations

### Cites methods from "A Flexible LDPC/Turbo Decoder Archi..."

• ...The architecture for WiMAX/WiFi LDPC codes and 3GPP-LTE turbo code presented in [8] runs at 500 MHz and achieves the highest throughput among compared architectures with the same complexity as our architecture....

[...]

##### References
More filters

Book
01 Jan 1963
TL;DR: A simple but nonoptimum decoding scheme operating directly from the channel a posteriori probabilities is described and the probability of error using this decoder on a binary symmetric channel is shown to decrease at least exponentially with a root of the block length.
Abstract: A low-density parity-check code is a code specified by a parity-check matrix with the following properties: each column contains a small fixed number j \geq 3 of l's and each row contains a small fixed number k > j of l's. The typical minimum distance of these codes increases linearly with block length for a fixed rate and fixed j . When used with maximum likelihood decoding on a sufficiently quiet binary-input symmetric channel, the typical probability of decoding error decreases exponentially with block length for a fixed rate and fixed j . A simple but nonoptimum decoding scheme operating directly from the channel a posteriori probabilities is described. Both the equipment complexity and the data-handling capacity in bits per second of this decoder increase approximately linearly with block length. For j > 3 and a sufficiently low rate, the probability of error using this decoder on a binary symmetric channel is shown to decrease at least exponentially with a root of the block length. Some experimental results show that the actual probability of decoding error is much smaller than this theoretical bound.

10,950 citations

### "A Flexible LDPC/Turbo Decoder Archi..." refers methods in this paper

• ...As a candidate for 4G coding scheme, LDPC codes, which were introduced by Gallager in 1963 [ 13 ], have recently received significant attention in coding theory and have been adopted by some advanced wireless systems such as IEEE 802.16e WiMAX system and IEEE 802.11n WLAN system....

[...]

Proceedings Article
01 Jan 1993

7,737 citations

Journal Article

6,582 citations

Proceedings ArticleDOI
23 May 1993
Abstract: A new class of convolutional codes called turbo-codes, whose performances in terms of bit error rate (BER) are close to the Shannon limit, is discussed. The turbo-code encoder is built using a parallel concatenation of two recursive systematic convolutional codes, and the associated decoder, using a feedback decoding rule, is implemented as P pipelined identical elementary decoders. >

5,895 citations

Journal ArticleDOI
, J. Raviv1
TL;DR: The general problem of estimating the a posteriori probabilities of the states and transitions of a Markov source observed through a discrete memoryless channel is considered and an optimal decoding algorithm is derived.
Abstract: The general problem of estimating the a posteriori probabilities of the states and transitions of a Markov source observed through a discrete memoryless channel is considered. The decoding of linear block and convolutional codes to minimize symbol error probability is shown to be a special case of this problem. An optimal decoding algorithm is derived.

4,737 citations