scispace - formally typeset
Search or ask a question

Showing papers on "Error detection and correction published in 2009"


Journal ArticleDOI
TL;DR: A comprehensive and self-contained simplified review of the quantum computing scheme of Phys.
Abstract: We present a comprehensive and self-contained simplified review of the quantum computing scheme of Raussendorf et al. [Phys. Rev. Lett. 98, 190504 (2007); N. J. Phys. 9, 199 (2007)], which features a two-dimensional nearest-neighbor coupled lattice of qubits, a threshold error rate approaching 1%, natural asymmetric and adjustable strength error correction, and low overhead arbitrarily long-range logical gates. These features make it one of the best and most practical quantum computing schemes devised to date. We restrict the discussion to direct manipulation of the surface code using the stabilizer formalism, both of which we also briefly review, to make the scheme accessible to a broad audience.

441 citations


Journal ArticleDOI
TL;DR: Both simulations and theoretical analysis confirm the advantages of the proposed network coding schemes over the ARQ ones and derive a few theoretical results on the bandwidth efficiency.
Abstract: Traditional approaches to reliably transmit information over an error-prone network employ either forward error correction (FEC) or retransmission techniques. In this paper, we propose some network coding schemes to reduce the number of broadcast transmissions from one sender to multiple receivers. The main idea is to allow the sender to combine and retransmit the lost packets in a certain way so that with one transmission, multiple receivers are able to recover their own lost packets. For comparison, we derive a few theoretical results on the bandwidth efficiency of the proposed network coding and traditional automatic repeat-request (ARQ) schemes. Both simulations and theoretical analysis confirm the advantages of the proposed network coding schemes over the ARQ ones.

436 citations




Journal ArticleDOI
TL;DR: In this article, the basic aspects of quantum error correction and fault-tolerant quantum computation are summarized, but not as a detailed guide, but rather as a basic introduction.
Abstract: Quantum error correction (QEC) and fault-tolerant quantum computation represent one of the most vital theoretical aspect of quantum information processing. It was well known from the early developments of this exciting field that the fragility of coherent quantum systems would be a catastrophic obstacle to the development of large scale quantum computers. The introduction of quantum error correction in 1995 showed that active techniques could be employed to mitigate this fatal problem. However, quantum error correction and fault-tolerant computation is now a much larger field and many new codes, techniques, and methodologies have been developed to implement error correction for large scale quantum algorithms. In response, we have attempted to summarize the basic aspects of quantum error correction and fault-tolerance, not as a detailed guide, but rather as a basic introduction. This development in this area has been so pronounced that many in the field of quantum information, specifically researchers who are new to quantum information or people focused on the many other important issues in quantum computation, have found it difficult to keep up with the general formalisms and methodologies employed in this area. Rather than introducing these concepts from a rigorous mathematical and computer science framework, we instead examine error correction and fault-tolerance largely through detailed examples, which are more relevant to experimentalists today and in the near future.

233 citations


Journal ArticleDOI
TL;DR: An LDPC-coded turbo-equalizer as a candidate for dealing simultaneously with fiber nonlinearities, PMD, and residual chromatic dispersion and how to combine multilevel modulation and channel coding optimally by using coded modulation are described.
Abstract: Codes on graphs of interest for next generation forward error correction (FEC) in high-speed optical networks, namely turbo codes and low-density parity-check (LDPC) codes, are described in this invited paper. We describe both binary and nonbinary LDPC codes, their design, and decoding. We also discuss an FPGA implementation of decoders for binary LDPC codes. We then explain how to combine multilevel modulation and channel coding optimally by using coded modulation. Also, we describe an LDPC-coded turbo-equalizer as a candidate for dealing simultaneously with fiber nonlinearities, PMD, and residual chromatic dispersion.

217 citations


Patent
03 Feb 2009
TL;DR: In this article, a method for operating a memory, which includes analog memory cells, includes encoding data with an Error Correction Code (ECC) that is representable by a plurality of equations.
Abstract: A method for operating a memory, which includes analog memory cells, includes encoding data with an Error Correction Code (ECC) that is representable by a plurality of equations. The encoded data is stored in a group of the analog memory cells by writing respective input storage values to the memory cells in the group. Multiple sets of output storage values are read from the memory cells in the group using one or more different, respective read parameters for each set. Numbers of the equations, which are satisfied by the respective sets of the output storage values, are determined. A preferred setting of the read parameters is identified responsively to the respective numbers of the satisfied equations. The memory is operated on using the preferred setting of the read parameters.

196 citations


Journal ArticleDOI
TL;DR: A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed, providing better performance of UEP scheme, which is confirmed both theoretically and experimentally.
Abstract: A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.

185 citations


Journal ArticleDOI
TL;DR: It is shown that a more involved equalization algorithm allows to achieve an excellent bit-error-rate performance, even when error-correcting codes designed for the Gaussian-noise limited channel are employed, and thus does not require a complete redesign of the coding scheme.
Abstract: We investigate the spectral efficiency, achievable by a low-complexity symbol-by-symbol receiver, when linear modulations based on the superposition of uniformly time- and frequency-shifted replicas of a base pulse are employed. Although orthogonal signaling with Gaussian inputs achieves capacity on the additive white Gaussian noise channel, we show that, when finite-order constellations are employed, by giving up the orthogonality condition (thus accepting interference among adjacent signals) we can considerably improve the performance, even when a symbol-by-symbol receiver is used. We also optimize the spacing between adjacent signals to maximize the achievable spectral efficiency. Moreover, we propose a more involved transmission scheme, consisting of the superposition of two independent signals with suitable power allocation and a two-stage receiver, showing that it allows a further increase of the spectral efficiency. Finally, we show that a more involved equalization algorithm, based on soft interference cancellation, allows to achieve an excellent bit-error-rate performance, even when error-correcting codes designed for the Gaussian-noise limited channel are employed, and thus does not require a complete redesign of the coding scheme.

182 citations


Journal ArticleDOI
TL;DR: SHREC, a new algorithm for correcting errors in short-read data that uses a generalized suffix trie on the read data as the underlying data structure, achieves an error correction accuracy of over 80% for simulated data and over 88% for real data.
Abstract: Motivation: Second-generation sequencing technologies produce a massive amount of short reads in a single experiment. However, sequencing errors can cause major problems when using this approach for de novo sequencing applications. Moreover, existing error correction methods have been designed and optimized for shotgun sequencing. Therefore, there is an urgent need for the design of fast and accurate computational methods and tools for error correction of large amounts of short read data. Results: We present SHREC, a new algorithm for correcting errors in short-read data that uses a generalized suffix trie on the read data as the underlying data structure. Our results show that the method can identify erroneous reads with sensitivity and specificity of over 99% and 96% for simulated data with error rates of up to 3% as well as for real data. Furthermore, it achieves an error correction accuracy of over 80% for simulated data and over 88% for real data. These results are clearly superior to previously published approaches. SHREC is available as an efficient open-source Java implementation that allows processing of 10 million of short reads on a standard workstation. Availability: SHREC source code in JAVA is freely available at http://www.informatik.uni-kiel.de/∼jasc/Shrec/ Contact: jasc@informatik.uni-kiel.de

173 citations


Journal ArticleDOI
TL;DR: A cross-layer methodology for the analysis of error control schemes in WSNs is presented such that the effects of multi-hop routing and the broadcast nature of the wireless channel are investigated.
Abstract: Error control is of significant importance for Wireless Sensor Networks (WSNs) because of their severe energy constraints and the low power communication requirements. In this paper, a cross-layer methodology for the analysis of error control schemes in WSNs is presented such that the effects of multi-hop routing and the broadcast nature of the wireless channel are investigated. More specifically, the cross-layer effects of routing, medium access, and physical layers are considered. This analysis enables a comprehensive comparison of forward error correction (FEC) codes, automatic repeat request (ARQ), and hybrid ARQ schemes in WSNs. The validation results show that the developed framework closely follows simulation results. Hybrid ARQ and FEC schemes improve the error resiliency of communication compared to ARQ. In a multi-hop network, this improvement can be exploited by constructing longer hops (hop length extension), which can be achieved through channel-aware routing protocols, or by reducing the transmit power (transmit power control). The results of our analysis reveal that for hybrid ARQ schemes and certain FEC codes, the hop length extension decreases both the energy consumption and the end-to-end latency subject to a target packet error rate (PER) compared to ARQ. This decrease in end-to-end latency is crucial for delay sensitive, real-time applications, where both hybrid ARQ and FEC codes are strong candidates. We also show that the advantages of FEC codes are even more pronounced as the network density increases. On the other hand, transmit power control results in significant savings in energy consumption at the cost of increased latency for certain FEC codes. The results of our analysis also indicate the cases where ARQ outperforms FEC codes for various end-to-end distance and target PER values.

Journal ArticleDOI
TL;DR: In this article, the problem of error correction in coherent and non-coherent network coding is considered under an adversarial model, and it is shown that universal network error correcting codes achieving the Singleton bound can be easily constructed and efficiently decoded.
Abstract: The problem of error correction in both coherent and noncoherent network coding is considered under an adversarial model. For coherent network coding, where knowledge of the network topology and network code is assumed at the source and destination nodes, the error correction capability of an (outer) code is succinctly described by the rank metric; as a consequence, it is shown that universal network error correcting codes achieving the Singleton bound can be easily constructed and efficiently decoded. For noncoherent network coding, where knowledge of the network topology and network code is not assumed, the error correction capability of a (subspace) code is given exactly by a new metric, called the injection metric, which is closely related to, but different than, the subspace metric of KOumltter and Kschischang. In particular, in the case of a non-constant-dimension code, the decoder associated with the injection metric is shown to correct more errors then a minimum-subspace-distance decoder. All of these results are based on a general approach to adversarial error correction, which could be useful for other adversarial channels beyond network coding.

Journal ArticleDOI
TL;DR: It is demonstrated that the design flexibility and UEP performance make EWF codes ideally suited for real-time scalable video multicast, i.e.,EWF codes offer a number of design parameters to be ldquotunedrdquo at the server side to meet the different reception criteria of heterogeneous receivers.
Abstract: Fountain codes were introduced as an efficient and universal forward error correction (FEC) solution for data multicast over lossy packet networks. They have recently been proposed for large scale multimedia content delivery in practical multimedia distribution systems. However, standard fountain codes, such as LT or Raptor codes, are not designed to meet unequal error protection (UEP) requirements typical in real-time scalable video multicast applications. In this paper, we propose recently introduced UEP expanding window fountain (EWF) codes as a flexible and efficient solution for real-time scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be ldquotunedrdquo at the server side to meet the different reception criteria of heterogeneous receivers. The performance analysis using both analytical results and simulation experiments of H.264 scalable video coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.

Journal ArticleDOI
TL;DR: A general theory for 1:N and M:1 dimension changing mappings is presented, and two examples for a Gaussian source and channel are provided where both a 2:1 bandwidth-reducing and a 1:2 bandwidth-expanding mapping are optimized.
Abstract: This paper deals with lossy joint source-channel coding for transmitting memoryless sources over AWGN channels. The scheme is based on the geometrical interpretation of communication by Kotel'nikov and Shannon where amplitudecontinuous, time-discrete source samples are mapped directly onto the channel using curves or planes. The source and channel spaces can have different dimensions and thereby achieving either compression or error control, depending on whether the source bandwidth is smaller or larger than the channel bandwidth. We present a general theory for 1:N and M:1 dimension changing mappings, and provide two examples for a Gaussian source and channel where we optimize both a 2:1 bandwidth-reducing and a 1:2 bandwidth-expanding mapping. Both examples show high spectral efficiency and provide both graceful degradation and improvement for imperfect channel state information at the transmitter.

Journal ArticleDOI
TL;DR: It is shown in this paper that once the authors identify the trapping sets of an LDPC code of interest, a sum-product algorithm (SPA) decoder can be custom-designed to yield floors that are orders of magnitude lower than floors of the the conventional SPA decoder.
Abstract: One of the most significant impediments to the use of LDPC codes in many communication and storage systems is the error-rate floor phenomenon associated with their iterative decoders. The error floor has been attributed to certain subgraphs of an LDPC code's Tanner graph induced by so-called trapping sets. We show in this paper that once we identify the trapping sets of an LDPC code of interest, a sum-product algorithm (SPA) decoder can be custom-designed to yield floors that are orders of magnitude lower than floors of the the conventional SPA decoder. We present three classes of such decoders: (1) a bi-mode decoder, (2) a bit-pinning decoder which utilizes one or more outer algebraic codes, and (3) three generalized-LDPC decoders. We demonstrate the effectiveness of these decoders for two codes, the rate-1/2 (2640,1320) Margulis code which is notorious for its floors and a rate-0.3 (640,192) quasi-cyclic code which has been devised for this study. Although the paper focuses on these two codes, the decoder design techniques presented are fully generalizable to any LDPC code.

Journal ArticleDOI
TL;DR: The behavior of iteratively decoded low-density parity-check (LDPC) codes over the binary erasure channel in the so-called "waterfall region" is investigated and shows that the performance curves follow a very basic scaling law.
Abstract: We investigate the behavior of iteratively decoded low-density parity-check (LDPC) codes over the binary erasure channel in the so-called ldquowaterfall region.rdquo We show that the performance curves in this region follow a simple scaling law. We conjecture that essentially the same scaling behavior applies in a much more general setting and we provide some empirical evidence to support this conjecture. The scaling law, together with the error floor expressions developed previously, can be used for a fast finite-length optimization.

Journal ArticleDOI
TL;DR: This work derives several lower and upper bounds on the size of codes for rank modulation, and shows the existence of codes whose size is within a constant factor of the sphere packing bound for any fixed number of errors.
Abstract: Codes for rank modulation have been recently proposed as a means of protecting flash memory devices from errors. We study basic coding theoretic problems for such codes, representing them as subsets of the set of permutations of $n$ elements equipped with the Kendall tau distance. We derive several lower and upper bounds on the size of codes. These bounds enable us to establish the exact scaling of the size of optimal codes for large values of $n$. We also show the existence of codes whose size is within a constant factor of the sphere packing bound for any fixed number of errors.

Journal ArticleDOI
TL;DR: The model provided provides failure probability to probabilistically demonstrate the benefits of various interleaving scheme selections for the memories with SEC and successfully showed the difference in failure probability for different choices of interleaved schemes.
Abstract: The significance of multiple cell upsets (MCUs) is revealed by sharing the soft-error test results in three major technologies, 90 nm, 65 nm, and 45 nm. The effectiveness of single-bit error correction (SEC) codes can be maximized in mitigating MCU errors when used together with the interleaving structure in memory designs. The model proposed in this paper provides failure probability to probabilistically demonstrate the benefits of various interleaving scheme selections for the memories with SEC. Grouped events such as MCU are taken into account in the proposed model by using the compound Poisson process. As a result of the proposed model, designers can perform predictive analysis of their design choices of interleaving schemes. The model successfully showed the difference in failure probability for different choices of interleaving schemes. The model behaved as the upper bound for failure probability when compared to the neutron test data with the 45-nm static-random-access memory (SRAM) design.

Journal ArticleDOI
TL;DR: Numerical simulations demonstrate that D-RLS can outperform existing approaches in terms of estimation performance and noise resilience, while it has the potential of performing efficient tracking.
Abstract: Recursive least-squares (RLS) schemes are of paramount importance for reducing complexity and memory requirements in estimating stationary signals as well as for tracking nonstationary processes, especially when the state and/or data model are not available and fast convergence rates are at a premium. To this end, a fully distributed (D-) RLS algorithm is developed for use by wireless sensor networks (WSNs) whereby sensors exchange messages with one-hop neighbors to consent on the network-wide estimates adaptively. The WSNs considered here do not necessarily possess a Hamiltonian cycle, while the inter-sensor links are challenged by communication noise. The novel algorithm is obtained after judiciously reformulating the exponentially-weighted least-squares cost into a separable form, which is then optimized via the alternating-direction method of multipliers. If powerful error control codes are utilized and communication noise is not an issue, D-RLS is modified to reduce communication overhead when compared to existing noise-unaware alternatives. Numerical simulations demonstrate that D-RLS can outperform existing approaches in terms of estimation performance and noise resilience, while it has the potential of performing efficient tracking.

Patent
Simon Litsyn1, Idan Alrod1, Eran Sharon1, Mark Murin1, Menachem Lasser1 
11 Mar 2009
TL;DR: In this paper, data that are stored in cells of a multi-bit-per-cell memory, according to a systematic or non-systematic ECC, are read and corrected.
Abstract: Data that are stored in cells of a multi-bit-per cell memory, according to a systematic or non-systematic ECC, are read and corrected (systematic ECC) or recovered (non-systematic ECC) in accordance with estimated probabilities that one or more of the read bits are erroneous. In one method of the present invention, the estimates are a priori. In another method of the present invention, the estimates are based only on aspects of the read bits that include significances or bit pages of the read bits. In a third method of the present invention, the estimates are based only on values of the read bits. Not all the estimates are equal.

Journal ArticleDOI
TL;DR: This research investigates pronunciation errors frequently made by foreigners learning Dutch as a second language and compares four types of classifiers that can be used to detect erroneous pronunciations of these phones.

Patent
Axel Lakus-Becker1
17 Nov 2009
TL;DR: In this article, a computer implemented method of storing pixel data corresponding to a pixel is disclosed, where a first and a second set of pixel data are determined for the pixel and parity bits for the first and second sets are generated, using error correction.
Abstract: A computer implemented method of storing pixel data corresponding to a pixel is disclosed. A first and a second set of pixel data is determined for the pixel. Parity bits for the first set of pixel data are generated, using error correction. An encoded version of the first set of pixel data including the parity bits is stored. An encoded version of the second set of pixel data is stored, using lossless data compression, for use in decoding the first set of pixel data.

Patent
12 Mar 2009
TL;DR: In this article, a data storage device is disclosed that receives a read command from a host, wherein the read command comprises a read logical block address (LBA_R), and a target data sector is read in response to the LBA-R to generate a read signal.
Abstract: A data storage device is disclosed that receives a read command from a host, wherein the read command comprises a read logical block address (LBA_R). A target data sector is read in response to the LBA_R to generate a read signal. The read signal is processed to detect user data and redundancy data using a soft-output detector that outputs quality metrics for the user data and redundancy data. A high quality metric is assigned to the LBA_R, and errors are corrected in the user data using an error correction code (ECC) decoder in response to the quality metrics output by the soft-output detector and the quality metrics assigned to the LBA_R.

Journal ArticleDOI
TL;DR: This paper presents a high-throughput decoder architecture for generic quasi-cyclic low-density parity-check (QC-LDPC) codes and an approximate layered decoding approach is explored to reduce the critical path of the layered LDPC decoder.
Abstract: This paper presents a high-throughput decoder architecture for generic quasi-cyclic low-density parity-check (QC-LDPC) codes. Various optimizations are employed to increase the clock speed. A row permutation scheme is proposed to significantly simplify the implementation of the shuffle network in LDPC decoder. An approximate layered decoding approach is explored to reduce the critical path of the layered LDPC decoder. The computation core is further optimized to reduce the computation delay. It is estimated that 4.7 Gb/s decoding throughput can be achieved at 15 iterations using the current technology.

Patent
18 May 2009
TL;DR: In this article, an apparatus, system, and method are disclosed for detecting and replacing failed data storage, where an ECC module determines, using an error correcting code (ECC), if one or more errors exist in tested data and if the errors are correctable using the ECC.
Abstract: An apparatus, system, and method are disclosed for detecting and replacing failed data storage. A read module reads data from an array of memory devices. The array includes two or more memory devices and one or more extra memory devices storing parity information from the memory devices. An ECC module determines, using an error correcting code (“ECC”), if one or more errors exist in tested data and if the errors are correctable using the ECC. The tested data includes data read by the read module. An isolation module selects a memory device in response to the ECC module determining that errors exists in the data read by the read module and that the errors are uncorrectable using the ECC. The isolation module also replaces data read from the selected memory device with replacement data and available data wherein the tested data includes the available data combined with the replacement data.

Journal ArticleDOI
TL;DR: This paper presents a new formulation of the ternary ECOC distance and the error-correcting capabilities in the terrifying ECOC framework, and stresses on how to design coding matrices preventing codification ambiguity and proposes a new sparse random coding matrix with ternARY distance maximization.

Journal ArticleDOI
TL;DR: It is demonstrated that the proposed codes outperform other existing coding schemes in making NOC fabrics reliable and energy efficient, with lower latency.
Abstract: Network-on-chip (NOC) is emerging as a revolutionary methodology to integrate numerous intellectual property blocks in a single die. It is the packet switching-based communications backbone that interconnects the components on multicore system-on-chip (SoC). A major challenge that NOC design is expected to face is related to the intrinsic unreliability of the interconnect infrastructure under technology limitations. By incorporating error control coding schemes along the interconnects, NOC architectures are able to provide correct functionality in the presence of different sources of transient noise and yet have lower overall energy dissipation. In this paper, designs of novel joint crosstalk avoidance and triple-error-correction/quadruple-error-detection codes are proposed, and their performance is evaluated in different NOC fabrics. It is demonstrated that the proposed codes outperform other existing coding schemes in making NOC fabrics reliable and energy efficient, with lower latency.

Journal ArticleDOI
TL;DR: In this article, the authors describe a parallel serial decoder architecture that can be used to map any low-density parity-check (LDPC) code with such a structure to a hardware emulation platform.
Abstract: Many classes of high-performance low-density parity-check (LDPC) codes are based on parity check matrices composed of permutation submatrices. We describe the design of a parallel-serial decoder architecture that can be used to map any LDPC code with such a structure to a hardware emulation platform. High-throughput emulation allows for the exploration of the low bit-error rate (BER) region and provides statistics of the error traces, which illuminate the causes of the error floors of the (2048, 1723) Reed-Solomon based LDPC (RS-LDPC) code and the (2209, 1978) array-based LDPC code. Two classes of error events are observed: oscillatory behavior and convergence to a class of non-codewords, termed absorbing sets. The influence of absorbing sets can be exacerbated by message quantization and decoder implementation. In particular, quantization and the log-tanh function approximation in sum-product decoders strongly affect which absorbing sets dominate in the errorfloor region. We show that conventional sum-product decoder implementations of the (2209, 1978) array-based LDPC code allow low-weight absorbing sets to have a strong effect, and, as a result, elevate the error floor. Dually-quantized sum-product decoders and approximate sum-product decoders alleviate the effects of low-weight absorbing sets, thereby lowering the error floor.

Proceedings ArticleDOI
12 Dec 2009
TL;DR: In this article, the authors proposed the first work to apply symptom-based detection and diagnosis for faults in multicore architectures running multithreaded software, using a lightweight isolated deterministic replay to diagnose the faulty core.
Abstract: Continued technology scaling is resulting in systems with billions of devices. Unfortunately, these devices are prone to failures from various sources, resulting in even commodity systems being affected by the growing reliability threat. Thus, traditional solutions involving high redundancy or piecemeal solutions targeting specific failure modes will no longer be viable owing to their high overheads. Recent reliability solutions have explored using low-cost monitors that watch for anomalous software behavior as a symptom of hardware faults. We previously proposed the SWAT system that uses such low-cost detectors to detect hardware faults, and a higher cost mechanism for diagnosis. However, all of the prior work in this context, including SWAT, assumes single-threaded applications and has not been demonstrated for multithreaded applications running on multicore systems. This paper presents mSWAT, the first work to apply symptom based detection and diagnosis for faults in multicore architectures running multithreaded software. For detection, we extend the symptom-based detectors in SWAT and show that they result in a very low Silent Data Corruption (SDC) rate for both permanent and transient hardware faults. For diagnosis, the multicore environment poses significant new challenges. First, deterministic replay required for SWAT's single-threaded diagnosis incurs higher overheads for multithreaded workloads. Second, the fault may propagate to fault-free cores resulting in symptoms from fault-free cores and no available known-good core, breaking fundamental assumptions of SWAT's diagnosis algorithm. We propose a novel permanent fault diagnosis algorithm for multithreaded applications running on multicore systems that uses a lightweight isolated deterministic replay to diagnose the faulty core with no prior knowledge of a known good core. Our results show that this technique successfully diagnoses over 95% of the detected permanent faults while incurring low hardware overheads. mSWAT thus offers an affordable solution to protect future multicore systems from hardware faults.

Journal ArticleDOI
TL;DR: This paper identifies and defines a new class of error-correcting codes whose redundancy makes the design of fault-secure detectors (FSD) particularly simple and quantifies the importance of protecting encoder and decoder circuitry against transient errors.
Abstract: Memory cells have been protected from soft errors for more than a decade; due to the increase in soft error rate in logic circuits, the encoder and decoder circuitry around the memory blocks have become susceptible to soft errors as well and must also be protected. We introduce a new approach to design fault-secure encoder and decoder circuitry for memory designs. The key novel contribution of this paper is identifying and defining a new class of error-correcting codes whose redundancy makes the design of fault-secure detectors (FSD) particularly simple. We further quantify the importance of protecting encoder and decoder circuitry against transient errors, illustrating a scenario where the system failure rate (FIT) is dominated by the failure rate of the encoder and decoder. We prove that Euclidean geometry low-density parity-check (EG-LDPC) codes have the fault-secure detector capability. Using some of the smaller EG-LDPC codes, we can tolerate bit or nanowire defect rates of 10% and fault rates of 10-18 upsets/device/cycle, achieving a FIT rate at or below one for the entire memory system and a memory density of 1011 bit/cm2 with nanowire pitch of 10 nm for memory blocks of 10 Mb or larger. Larger EG-LDPC codes can achieve even higher reliability and lower area overhead.