TL;DR: A unified message passing algorithm for LDPC and Turbo codes is proposed and a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding is introduced and an area-efficient flexible SISO decoder architecture is proposed to support LDPC-Turbo codes decoding.
Abstract: Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm2. The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE Turbo codes. Running at 500 MHz clock frequency, the decoder can sustain up to 600 Mbps LDPC decoding or 450 Mbps Turbo decoding.
Practical wireless communication channels are inherently “noisy” due to the impairments caused by channel distortions and multipath effect.
They can both be represented as codes on graphs which define the constraints satisfied by codewords.
Third, the authors propose a flexible SISO decoder hardware architecture based on the FFU.
Section 2 reviews the super-code based decoding algorithm for LDPC codes.
2 Review of Super-code Based Decoding Algorithm for LDPC Codes
Naturally, Turbo decoding procedure can be partitioned into two phases where each phase corresponds to one super-code processing.
Similarly, LDPC codes can also be partitioned into super-codes for efficient processing as previously mentioned in Section 1.
Before proceeding with a discussion of the proposed flexible decoder architecture, it is desirable to review the super-code based LDPC decoding scheme in this section.
3 Flexible SISO Module
The authors propose a flexible soft-input softoutput (SISO) module, named Flex-SISO module, to decode LDPC and Turbo codes.
To reduce complexity, the MAP algorithm is usually calculated in the log domain [31].
For LDPC codes, a Flex-SISO module was used to decode a super-code.
The soft input values λi(u) are the outputs from the previous Flex-SISO module, or other previous modules if necessary.
First, the authors decompose a QC-LDPC code into multiple supercodes, where each layer of the parity check matrix defines a super-code.
3.3.1 Review of the Traditional Turbo Decoder Structure
The traditional Turbo decoding procedure with two SISO decoders is shown in Fig. 7.
The definitions of the symbols in the figure are as follows.
The channel LLR values for uk and p(i)k are denoted as λc(uk) and λc(p (i) k ), respectively.
In the first half iteration, SISO decoder 1 computes the extrinsic value λ1e(uk) and pass it to SISO decoder 2.
The computation is repeated in each iteration.
3.3.2 Modif ied Turbo Decoder Structure Using Flex-SISO Modules
In order to use the proposed Flex-SISO module for Turbo decoding, the authors modify the traditional Turbo decoder structure.
Then it removes the old extrinsic value λ1e(uk; old) from the soft input LLR λ1i (uk) to form a temporary message λ1t (uk) as follows (for brevity, the authors drop the superscript “1" in the following equations) λt(uk) = λi(uk) − λe(uk; old).
The computation is repeated in each half-iteration until the iteration converges.
Figure 9 shows an iterative Turbo decoder architecture based on the Flex-SISO module.
The memory organizations are similar, but with a variety of sizes depending on the codeword length.
4 Design of a Flexible Functional Unit
The MAP processor is the main processing unit in both LDPC and Turbo decoders as depicted in Fig. 6 and Fig. 9. (25) Figure 13 shows a MAP processor structure to decode the single parity check code.
Thus, the same look-up table configuration can be applied to the Turbo ACSA unit.
To support both LDPC and Turbo codes with minimum hardware overhead, the authors propose a flexible functional unit (FFU) which is depicted in Fig. 15.
5 Design of A Flexible SISO Decoder
Built on top of the FFU arithmetic unit, the authors introduce a flexible SISO decoder architecture to handle LDPC and Turbo codes.
The boundary β metrics are initialized from an NII buffer (not shown in Fig. 19).
The decoder first computes λt(u) based on Eq. 5. Prior to decoding, the α and β metrics are initialized to the maximum value.
While the β unit and the extrinsic-1 unit are working on the first data stream, the α unit can work on the second stream which leads to a pipelined implementation.
In a parallel processing environment, multiple SISO decoders can be used to increase the throughput.
6 Parallel Decoder Architecture Using Multiple Flex-SISO Decoder Cores
For high throughput applications, it is necessary to use multiple SISO decoders working in parallel to increase the decoding speed.
For parallel Turbo decoding, multiple SISO decoders can be employed by dividing a codeword block into several sub-blocks and then each sub-block is processed separately by a dedicated SISO decoder [7, 20, 30, 41, 42].
APP memory is used to store the initial and updated LLR values.
Turbo parity memory is used to store the channel LLR values for each parity bit in a Turbo codeword.
As a case study, the authors have designed a high-throughput, flexible LDPC/Turbo decoder to support the following three codes: 1) 802.16e WiMAX LDPC code, 2) 802.11n WLAN LDPC code, and 3) 3GPP-LTE Turbo code.
7 Related Work and Architecture Comparison
Multi-mode Turbo decoders are an increasingly important component in mobile wireless devices.
In [34], a multi processor system on chip architecture is described for LDPC and Turbo code decoding.
Table 7 shows the architecture comparison and tradeoff analysis of these decoders.
Each approach has different benefit in terms of flexibility.
The authors focus is to achieve highest throughput for both LDPC and Turbo codes.
8 Conclusion
The authors present a flexible decoder architecture to support LDPC and Turbo codes.
To increase decoding throughput, the authors propose a parallel LDPC/Turbo decoder using multiple Flex-SISO cores.
The proposed architecture can significantly reduce the cost of a multi-mode receiver.
TL;DR: This work concentrates on the design of a reconfigurable architecture for both turbo and LDPC codes decoding, tackling the reconfiguration issue and introducing a formal and systematic treatment that was not previously addressed.
Abstract: Flexible and reconfigurable architectures have gained wide popularity in the communications field. In particular, reconfigurable architectures for the physical layer are an attractive solution not only to switch among different coding modes but also to achieve interoperability. This work concentrates on the design of a reconfigurable architecture for both turbo and LDPC codes decoding. The novel contributions of this paper are: i) tackling the reconfiguration issue introducing a formal and systematic treatment that, to the best of our knowledge, was not previously addressed and ii) proposing a reconfigurable NoC-based turbo/LDPC decoder architecture and showing that wide flexibility can be achieved with a small complexity overhead. Obtained results show that dynamic switching between most of considered communication standards is possible without pausing the decoding activity. Moreover, post-layout results show that tailoring the proposed architecture to the WiMAX standard leads to an area occupation of 2.75 mm2 and a power consumption of 101.5 mW in the worst case.
57 citations
Cites background or methods from "A Flexible LDPC/Turbo Decoder Archi..."
...Flexible decoders available in the literature [9]–[13], [16], [17], [19], [20], though supporting a wide range of codes, do not address the reconfiguration issue....
[...]
...Sun and Cavallaro describe in [13] a decoder working with 3GPP-LTE turbo codes and WiMAX and WiFi LDPC codes....
TL;DR: This work proposes dynamic multi-frame processing schedule which efficiently utilizes the layered-LDPC decoding with minimum pipeline stages and efficient comparison techniques for both column and row layered schedule and rejection-based high-speed circuits to compute the two minimum values from multiple inputs required for row layered processing of hardware-friendly min-sum decoding algorithm.
Abstract: This paper presents architecture of block-level-parallel layered decoder for irregular LDPC code. It can be reconfigured to support various block lengths and code rates of IEEE 802.11n (WiFi) wireless-communication standard. We have proposed efficient comparison techniques for both column and row layered schedule and rejection-based high-speed circuits to compute the two minimum values from multiple inputs required for row layered processing of hardware-friendly min-sum decoding algorithm. The results show good speed with lower area as compared to state-of-the-art circuits. Additionally, this work proposes dynamic multi-frame processing schedule which efficiently utilizes the layered-LDPC decoding with minimum pipeline stages. The suggested LDPC-decoder architecture has been synthesized and post-layout simulated in 90 nm-CMOS process. This decoder occupies 5.19 ${\rm mm}^{2}$ area and supports multiple code rates like 1/2, 2/3, 3/4 & 5/6 as well as block-lengths of 648, 1296 & 1944. At a clock frequency of 336 MHz, the proposed LDPC-decoder has achieved better throughput of 5.13 Gbps and energy efficiency of 0.01 nJ/bits/iterations, as compared to the similar state-of-the-art works.
42 citations
Cites background from "A Flexible LDPC/Turbo Decoder Archi..."
...While Sun and Cavallardo [15] have designed single architecture to process both LDPC and turbo codes by proposing a unified algorithm....
[...]
...Multi-mode reconfigurable architectures in [14] and [15] have the flexibility to switch between LDPC and turbo decoding-process....
TL;DR: A multi-core architecture which supports convolutional codes, binary/duo-binary turbo codes, and LDPC codes, based on Application Specific Instruction-set Processors (ASIP) and avoids the use of dedicated interleave/deinterleave address lookup memories is presented.
Abstract: In order to address the large variety of channel coding options specified in existing and future digital communication standards, there is an increasing need for flexible solutions. This paper presents a multi-core architecture which supports convolutional codes, binary/duo-binary turbo codes, and LDPC codes. The proposed architecture is based on Application Specific Instruction-set Processors (ASIP) and avoids the use of dedicated interleave/deinterleave address lookup memories. Each ASIP consists of two datapaths one optimized for turbo and the other for LDPC mode, while efficiently sharing memories and communication resources. The logic synthesis results yields an overall area of 2.6mm2 using 90nm technology. Payload throughputs of up to 312Mbps in LDPC mode and of 173Mbps in Turbo mode are possible at 520MHz, fairing better than existing solutions.
36 citations
Cites methods from "A Flexible LDPC/Turbo Decoder Archi..."
...A high throughput of 257Mbps is achieved for LDPC mode while a limited throughput of 37.2Mbps in DBTC and 18.6Mbps in SBTC modes are achieved at 400MHz....
TL;DR: A configurable Turbo-LDPC decoder is presented in this article, where a set of P> 1 Soft-Input-Soft-Output decoding units (DP 0 -DP P-1 ; DP i ) are used for iteratively decoding both Turbo-and LDPC-encoded input data.
Abstract: A configurable Turbo-LDPC decoder comprising:
A set of P> 1 Soft-Input-Soft-Output decoding units (DP 0 -DP P-1 ; DP i ) for iteratively decoding both Turbo- and LDPC-encoded input data, each of said decoding units having first (I 1 i ) and second (I 2 i ) input ports and first (O 1 i ) and second (O 2 i ) output ports for intermediate data; First and second memories (M 1 , M 2 ) for storing said intermediate data, each of said first and second memories comprising P independently readable and writable memory blocks having respective input and output ports; and A configurable switching network (SN) for connecting the first input and output ports of said decoding units to the output and input ports of said first memory, and the second input and output ports of said decoding units to the output and input ports of said second memory
TL;DR: This contribution focuses on one of the most important baseband processing units in wireless receivers, the forward error correction unit, and proposes a Network-on-Chip (NoC) based approach to the design of multi-standard decoders.
Abstract: The current convergence process in wireless technologies demands for strong efforts in the conceiving of highly flexible and interoperable equipments. This contribution focuses on one of the most important baseband processing units in wireless receivers, the forward error correction unit, and proposes a Network-on-Chip (NoC) based approach to the design of multi-standard decoders. High level modeling is exploited to drive the NoC optimization for a given set of both turbo and Low-Density-Parity-Check (LDPC) codes to be supported. Moreover, synthesis results prove that the proposed approach can offer a fully compliant WiMAX decoder, supporting the whole set of turbo and LDPC codes with higher throughput and an occupied area comparable or lower than previously reported flexible implementations. In particular, the mentioned design case achieves a worst-case throughput higher than 70 Mb/s at the area cost of 3.17 mm2 on a 90 nm CMOS technology.
20 citations
Cites methods from "A Flexible LDPC/Turbo Decoder Archi..."
...The architecture for WiMAX/WiFi LDPC codes and 3GPP-LTE turbo code presented in [8] runs at 500 MHz and achieves the highest throughput among compared architectures with the same complexity as our architecture....
TL;DR: A simple but nonoptimum decoding scheme operating directly from the channel a posteriori probabilities is described and the probability of error using this decoder on a binary symmetric channel is shown to decrease at least exponentially with a root of the block length.
Abstract: A low-density parity-check code is a code specified by a parity-check matrix with the following properties: each column contains a small fixed number j \geq 3 of l's and each row contains a small fixed number k > j of l's. The typical minimum distance of these codes increases linearly with block length for a fixed rate and fixed j . When used with maximum likelihood decoding on a sufficiently quiet binary-input symmetric channel, the typical probability of decoding error decreases exponentially with block length for a fixed rate and fixed j . A simple but nonoptimum decoding scheme operating directly from the channel a posteriori probabilities is described. Both the equipment complexity and the data-handling capacity in bits per second of this decoder increase approximately linearly with block length. For j > 3 and a sufficiently low rate, the probability of error using this decoder on a binary symmetric channel is shown to decrease at least exponentially with a root of the block length. Some experimental results show that the actual probability of decoding error is much smaller than this theoretical bound.
11,592 citations
"A Flexible LDPC/Turbo Decoder Archi..." refers methods in this paper
...As a candidate for 4G coding scheme, LDPC codes, which were introduced by Gallager in 1963 [ 13 ], have recently received significant attention in coding theory and have been adopted by some advanced wireless systems such as IEEE 802.16e WiMAX system and IEEE 802.11n WLAN system....
TL;DR: In this article, a new class of convolutional codes called turbo-codes, whose performances in terms of bit error rate (BER) are close to the Shannon limit, is discussed.
Abstract: A new class of convolutional codes called turbo-codes, whose performances in terms of bit error rate (BER) are close to the Shannon limit, is discussed. The turbo-code encoder is built using a parallel concatenation of two recursive systematic convolutional codes, and the associated decoder, using a feedback decoding rule, is implemented as P pipelined identical elementary decoders. >
TL;DR: The general problem of estimating the a posteriori probabilities of the states and transitions of a Markov source observed through a discrete memoryless channel is considered and an optimal decoding algorithm is derived.
Abstract: The general problem of estimating the a posteriori probabilities of the states and transitions of a Markov source observed through a discrete memoryless channel is considered. The decoding of linear block and convolutional codes to minimize symbol error probability is shown to be a special case of this problem. An optimal decoding algorithm is derived.