Journal Article•DOI•

Unequal Error Protection of Memories in LDPC Decoders

Carlo Condo¹, Guido Masera¹, Paolo Montuschi¹•Institutions (1)

01 Oct 2015-IEEE Transactions on Computers (IEEE)-Vol. 64, Iss: 10, pp 2981-2993

TL;DR: The devised UEP method is divided in four adjustable levels, each one offering a different degree of protection, and shows an unmatched level of protection from errors at a small complexity and energy cost.

read less

Abstract: Memories are one of the most critical components of many systems: due to exposure to energetic particles, fabrication defects and aging they are subject to various kinds of permanent and transient errors. In this scenario, Unequal error protection (UEP) techniques have been proposed in the past to encode stored information, allowing to detect and possibly recover from errors during load operations, while offering different levels of protection to partitions of codewords according to their importance. Low-density parity-check (LDPC) codes are used in many communication standards to encode the transmitted information: at reception, LDPC decoders heavily rely on memories to store and correct the received information. To ensure efficient and reliable decoding of information, the need to protect the memories used in LDPC decoders is of primary importance. In this paper we present a study on how to efficiently design UEP techniques for LDPC decoder memories. The devised UEP method is divided in four adjustable levels, each one offering a different degree of protection. The full UEP, along with simplified versions, has been implemented within an existing decoder and its area occupation and power consumption evaluated. Comparison with the literature on the subject shows an unmatched level of protection from errors at a small complexity and energy cost.

...read moreread less

Summary (5 min read)

Jump to: [Introduction] – [II. LDPC DECODING] – [III. PREVIOUS WORK] – [IV. ERROR ANALYSIS] – [V. UNEQUAL ERROR PROTECTION] – [A. Level 1 - towards full recovery] – [B. Level 2 - tentative recovery from critical errors] – [C. Level 3 - bounding of error impact] – [D. Level 4 - no protection zone] – [E. UEP - full design] – [F. Remarks on Burst Errors] – [G. Additional schemes] – [A. Architecture] – [VII. UEP PERFORMANCE] – [VIII. COMPARISON] and [IX. CONCLUSION]

Introduction

Memories are particularly critical devices, that are subject to various types of faults.
This paper focuses on providing error resilience to LDPC decoders, to ensure correct functionality also in presence of permanent and transient memory error conditions under which current decoders cannot work.
As the authors will see throughout the paper, particularly suitable for LDPC decoders and applications that use narrow memories with frequent accesses and complex address patterns.
Comparison with the state of the art is performed in Section VIII and conclusions are drawn in Section IX.

II. LDPC DECODING

LDPC codes are characterized by a binary parity check matrix H [3] with M rows and N columns.
In the following the authors will focus on the layered scheduling technique, which has been shown to be more performing, nearly doubling the convergence speed of the decoding process with respect to two-phase scheduling.
Let us denote with λ[c] the LLR of symbol c; the bit LLR λk[c] is initialized, for column k in H, to the corresponding received soft value.
Out of the several approximations present in the literature, the authors have considered the Self-Corrected-Min-Sum (SCMS) [14] as it combines easiness of implementation with negligible BER performance losses.
Please observe that while Rlk and Qlk[c] are updated only once per iteration, and are thus endowed with the iteration indicator i, λk[c] is updated multiple times during each iteration, and the apexes ‘old” and ‘new” are consequently used to differentiate the values before and after each update.

III. PREVIOUS WORK

Several memory-protection techniques and algorithms have been presented in the specialized literature over the years.
The concept of unequal error protection of memories has been proposed for the first time in [7]: codewords are subdivided in slots, to which different degrees of protection are applied.
From a practical point of view, it has then been studied effectively for wireless transmissions and storage of of images, where a certain degree of unreliability can be tolerated [9], [20], [21].
The work in [26] builds and updates a list of cells that are probably faulty.
The work presented in [6] applies separate protection techniques to the different functional blocks of an existing LDPC decoder architecture, according to their level of exposure to failures and importance for the correct operations of the system.

IV. ERROR ANALYSIS

Almost all practical implementations of LDPC decoders make use of memories to store LLRs between iterations or, in case of multi-core decoders, to exchange data information between processing elements.
In the following subsections the authors move to analyze the effect of errors on different bits on the metrics stored in LDPC decoder memories, and how they influence the decoding per- formance at the variation of different parameters.
Fig. 2 plots the FER for three different error bits and three Imax values.
While the effect of errors is almost the same with Imax=10 and Imax=15, the degradation caused by erroneous bits is more consistent when Imax=20: in fact, the authors observe the existence of a larger gap between the “no error” curve and the MSB7 curve with respect to the two other cases, while the MSB3 curve is proportionally shifted.
Fig. 4 shows that rate and error injection variations do not scale proportionally, as it has been already observed with changes in code size: high rate codes are more sensitive to faults.

V. UNEQUAL ERROR PROTECTION

The analysis on the impact of the different memory errors carried out in Section IV highlighted that not all errors on the LLR bits have the same influence on the FER of LDPC decoders.
The choice on the number and type of error protection techniques is subject to a suitable tradeoff.
On one side, it should be guaranteed that the UEP is granted sufficient granularity to effectively act upon errors with different impacts on the decoding capability.
On the other side, as this number should be kept small to save area, execution time and complexity of the decoder, bits with similar significance should be protected with the same technique.
After having analyzed different tradeoff alternatives, the authors have opted for a UEP subdivided into four levels of possible error protection.

A. Level 1 - towards full recovery

The highest level of protection is applied to bits which reliability is mandatory for a correct decoding, i.e. the sign bit and possibly the magnitude MSBs.
Errors on sign bits and on bits representing a large part of the total dynamic will consequently have catastrophic effects on the decoding; sign changes and sudden increments or decrements in LLRs may cause an avalanche of metrics to evolve towards misleading directions.
To provide a high level of reliability and recovery, their choice has been that bits falling within the Level 1 protection level are tripled during write operations: at load time, a majority voter selects the most probable output.

B. Level 2 - tentative recovery from critical errors

An extensive simulation campaign has been performed to observe the characteristics of λoldk [c] and λ new k [c] that are most sensitive to memory errors and that can lead to unsuccessful decoding.
The authors have chosen to add a parity bit to the Level 2 bits, and in case of discrepancy during load operations the pattern recognition system is activated.
If λnewk [c] matches the critical bit pattern, recovery is possible by observing how λnewk [c] varies with changes of the Level 2 bits in λoldk [c].
The distinctive bit pattern is dependent on the total quantization of the LLRs and on the position of the wrong bit, while the Level 2 bits must be chosen with care to obtain the maximum effectiveness: thus, every case must be analyzed separately.
Level 2 is not able to give the same level of protection as Level 1, but gives a very good percentage of identification and recovery of errors that have been observed to be the main cause for LDPC decoder performance loss.

C. Level 3 - bounding of error impact

The authors have chosen to apply the same concept to the third level of protection, designed for bits of medium-to-low significance.
A parity bit is added to the protected bits during write operations.
When the LLR is loaded, parity is recomputed and in case of discrepancy the contribution of all the bits falling within Level 3 is nulled or reduced.
Level 3 protection does not allow to recover from errors, but reduces their impact by decreasing the LLR magnitude, that in turn induces a conservative behavior in the decoder.
This method can not be applied to bits expressing large percentages of the total dynamic, since the LLR magnitude change would be too large and cause errors.

D. Level 4 - no protection zone

As shown in Section IV, errors on the least significant bits seldom affect the overall decoding performance.
For Level 4, their choice for this set of low-importance bits has been to leave them unprotected, as possibly this will not incur in any impacting degradation.

E. UEP - full design

The partition of memory bits among the four levels of the proposed UEP has been carried out through extensive simulations.
As mentioned in Section V-B, the pattern recognition and error recovery involved in Level 2 protection must be evaluated according to the LLR quantization and to the number and position of bits assigned to Level 2.
This pattern is observed either in case a single error is introduced in MSB2 or MSB3 of λoldk [c], or in case both MSB2 and MSB3 are incorrect: this characteristic is exploited to recover from the error.
The authors do not mean that they do not contribute to the decoding, but, as they have observed in their tests, occasional error events on these bits do not affect the overall performance.
The additional memory cost of the complete UEP is 57.1%, i.e. the same as applying Level 1 to MSB1-2, but in this case shielding from errors five bits instead of two: the impact on the decoder architecture and the additional logic required are discussed in Section VI.

F. Remarks on Burst Errors

With the level of current integration the problem of burst or multi-cell errors has gathered increased interest [17], [32].
It is possible to greatly limit the impact of burst errors by scrambling the bits of LLRs before storage, and rearranging them at load time.
Scrambling is a technique widely used in communications and data storage, where the MSB LSB probability of burst errors is larger than the probability of single errors, as it allows, under certain conditions, to avoid long error correcting codes to recover from burst errors [32].
By interleaving the bits belonging to the same level with those from other levels, multiple errors are spread over the different protection techniques and can still be handled.
In fact, the sparse structure of the parity check matrix acts as an interleaver and does not require loading consecutive LLRs.

G. Additional schemes

Such a high degree of confidence is not always necessary.
This is achieved by including only some of the previous protection levels and has the advantage to guarantee the required lower level of protection by using a reduced overhead.
Table I summarizes the previously designed UEP case of study together with the two new ones, with details being given on the protection of λk[c].
UEPfull refers to the case of study detailed previously in this section.
The second scheme UEPsim1 applies Level 2 protection to MSB1 and MSB2, while MSB3-7 are left in Level 4.

A. Architecture

7; UEPsim1 and UEPsim2 can be derived from it considering only the employed levels.
Three datapaths are necessary to implement the Level 2 operations: the parity comparison is performed on the Level 2 bits, and if an error occurred, each datapath receives a different version of the LLR (λk[c]1, λk[c]2 and λk[c]3).
This version of the A implementation has been named AREF and has been used as a reference in the following comparisons.
The difference in power consumption increments (24.9% and 23.3%) is even smaller, since the Level 1 memory bits in Asim2 contribute to a higher percentage of the total power consumption.
To prove that the devised UEP does not influence the performance of the decoder in terms of throughput and maximum frequency, Table III reports the delay introduced by each UEP level and by the whole architecture for different target frequencies in 90 nm CMOS technology.

VII. UEP PERFORMANCE

This section presents the performance evaluation of the proposed UEP under the same conditions of Section IV and Section V, showing the impact of each level of UEP as described in UEPfull.
The decoders in [31] and [30] present similarities with many other decoders in the state of the art (serial core, min-sumbased layered decoding, partial parallelism, shared or dedicated memories, either high-throughput or flexible design).
A relevant improvement can be noticed for all the AFPI values except the largest (i.e. AFPI=248.9), while no degradation is observed for the smallest AFPI=0.09 case.
The complete UEPfull has been employed to obtain the curves in Fig. 12.
Let us move to the evaluation of another characteristic of the proposed protection technique: stuck at bits errors.

VIII. COMPARISON

The resilient LDPC decoder designed in [6] protects both memories and logic from errors.
A dedicated 9-bit RAM is used to store λk[c] values, and it is protected with MSB1 tripling (+22% memory increment), while the initial LLRs received from the channel are stored in a 6-bit RAM and protected with MSB1 duplication and puncturing in case of discrepancy (+23% memory increment).
On the other hand, the proposed UEP targets much more degraded environments, since total error protection is achieved in presence of AFPI four orders of magnitude greater than those in [6]: moreover, this work tackles permanent errors as well, together with burst errors, both neglected in [6].
The work in [32] can potentially reach performance similar to this work’s, but at a much higher complexity cost.
The statistical error correction scheme devised in [25] is built around a concept different from the proposed UEP: voltage overscaling is introduced in the decoder to save energy, and the performance loss brought by the timing errors caused by this technique must be compensated.

IX. CONCLUSION

This paper proposes a novel Unequal Error Protection technique for memories used in LDPC decoders.
It is divided in four levels, that can be adjusted and applied according to the decoder parameters and desired degree of protection.
A complete design is presented, together with results for other two alternative schemes, showing a high level of resilience to transient and permanent errors, both single-bit and multibit.
The design of an hardware architecture implementing the UEP is proposed, and applied to an existing LDPC decoder to evaluate area and power consumption overheads.
Comparison with the state of the art shows superior error resiliency even at comparable complexity overheads.

Did you find this useful? Give us your feedback

Figures (17)

Figure 1. FER - errors on different MSBs

Figure 4. FER - errors on different MSBs and variation of code rate for Imax=10, N=2304, P(e)=0.0005, (7,0)-bit quantization

Table II UEP - AREA OCCUPATION, POWER CONSUMPTION (90 NM CMOS, 200 MHZ)

Figure 10. FER with different AFPI - Level 1 on MSB1

Table III UEP - ACHIEVABLE FREQUENCY AND DELAY (90 NM CMOS)

Figure 9. FER with different Average Failures Per Iteration (AFPI) - No error protection

Figure 5. FER - errors on different MSBs and variation of decoding algorithm

Figure 2. FER - errors on different MSBs and variation of Imax for N=2304, r=1/2, P(e)=0.0005, (7,0)-bit quantization

Figure 11. FER with different AFPI - Level 1 on MSB1, Level 2 on MSB2-3

Table IV UEP - SINGLE STUCK-AT BITS THAT CAN BE RECOVERED

Figure 12. FER with different AFPI - Complete UEP

Figure 13. UEP performance in presence of stuck-at bits

Figure 6. Example of LLR rearranged bits for burst error protection

Figure 3. FER - errors on different MSBs and variation of code size for Imax=10, r=1/2, P(e)=0.0005, (7,0)-bit quantization

Content maybe subject to copyright Report

Frequently Asked Questions (1)

Q1. What have the authors contributed in "Unequal error protection of memories in ldpc decoders" ?

Memories are one of the most critical components of many systems: due to exposure to energetic particles, fabrication defects and aging they are subject to various kinds of permanent and transient errors. In this paper the authors present a study on how to efficiently design UEP techniques for LDPC decoder memories.