scispace - formally typeset
Search or ask a question

Showing papers by "Fabrizio Lombardi published in 2019"


Journal ArticleDOI
TL;DR: A novel approximate multiplier with a low power consumption and a short critical path is proposed for high-performance DSP applications that leverages a newly designed approximate adder that limits its carry propagation to the nearest neighbors for fast partial product accumulation.
Abstract: Approximate circuits have been considered for applications that can tolerate some loss of accuracy with improved performance and/or energy efficiency. Multipliers are key arithmetic circuits in many of these applications including digital signal processing (DSP). In this paper, a novel approximate multiplier with a low power consumption and a short critical path is proposed for high-performance DSP applications. This multiplier leverages a newly designed approximate adder that limits its carry propagation to the nearest neighbors for fast partial product accumulation. Different levels of accuracy can be achieved by using either OR gates or the proposed approximate adder in a configurable error recovery circuit. The approximate multipliers using these two error reduction strategies are referred to as AM1 and AM2, respectively. Both AM1 and AM2 have a low mean error distance, i.e., most of the errors are not significant in magnitude. Compared with a Wallace multiplier optimized for speed, an $8\times 8$ AM1 using four most significant bits for error reduction shows a 60% reduction in delay (when optimized for delay) and a 42% reduction in power dissipation (when optimized for area). In a $16\times 16$ design, half of the least significant partial products are truncated for AM1 and AM2, which are thus denoted as TAM1 and TAM2, respectively. Compared with the Wallace multiplier, TAM1 and TAM2 save from 50% to 66% in power, when optimized for area. Compared with existing approximate multipliers, AM1, AM2, TAM1, and TAM2 show significant advantages in accuracy with a low power-delay product. AM2 has a better accuracy compared with AM1 but with a longer delay and higher power consumption. Image processing applications, including image sharpening and smoothing, are considered to show the quality of the approximate multipliers in error-tolerant applications. By utilizing an appropriate error recovery scheme, the proposed approximate multipliers achieve similar processing accuracy as exact multipliers, but with significant improvements in power.

71 citations


Journal ArticleDOI
TL;DR: The proposed approximate RB multiplier designs are compared with previous approximate Booth multipliers; the results show that the approximate RB multipliers are better than approximate NB Boothmultipliers especially when the word size is large.
Abstract: As technology scaling is reaching its limits, new approaches have been proposed for computional efficiency. Approximate computing is a promising technique for high performance and low power circuits as used in error-tolerant applications. Among approximate circuits, approximate arithmetic designs have attracted significant research interest. In this paper, the design of approximate redundant binary (RB) multipliers is studied. Two approximate Booth encoders and two RB 4:2 compressors based on RB (full and half) adders are proposed for the RB multipliers. The approximate design of the RB-Normal Binary (NB) converter in the RB multiplier is also studied by considering the error characteristics of both the approximate Booth encoders and the RB compressors. Both approximate and exact regular partial product arrays are used in the approximate RB multipliers to meet different accuracy requirements. Error analysis and hardware simulation results are provided. The proposed approximate RB multipliers are compared with previous approximate Booth multipliers; the results show that the approximate RB multipliers are better than approximate NB Booth multipliers especially when the word size is large. Case studies of error-resilient applications are also presented to show the validity of the proposed designs.

59 citations


Journal ArticleDOI
TL;DR: This paper presents different approximate designs for computing the FFT, where the tradeoff between accuracy and performance is achieved by adjusting the word length in each computational stage by two algorithms for word length modification under a specific error margin.
Abstract: This paper presents different approximate designs for computing the FFT. The tradeoff between accuracy and performance is achieved by adjusting the word length in each computational stage. Two algorithms for word length modification under a specific error margin are proposed. The first algorithm targets an approximate FFT for an area-limited design compared to the conventional fixed design; the second algorithm targets performance so it achieves a higher operating frequency. Both of the proposed algorithms show that an efficient balance between hardware utilization and performance is possible at stage-level. The proposed approximate FFT designs are implemented on FPGA; experimental results show that hardware utilization using the first approximate algorithm are reduced by at least nearly 40%. The second algorithm increases performance of the designs by over 20%. Fine granularity design is also investigated, where the FPGA resources for a 256-point FFT computation can be further reduced by nearly 10% compared to a coarse design. Finally, the proposed approximate designs are applied to a feature extraction module in an isolated word recognition system; the numbers of LUTs and FFs for the Mel frequency cepstrum coefficients (MFCC) extraction module are decreased by up to 47.2% and 39.0%, respectively with a power reduction of up to 27.0% at a loss in accuracy of less than 2%.

42 citations


Journal ArticleDOI
TL;DR: The proposed XRRO and XRBR PUFs are very efficient designs with good uniqueness and reliability and require only 12.5% of the hardware resources of previous bitstable ring PUFs and reconfigurable RO PUFs, respectively, to generate a 1-bit response.
Abstract: With the rapid development of the Internet of Things (IoT), security has attracted considerable interest. Conventional security solutions that have been proposed for the Internet based on classical cryptography cannot be applied to IoT nodes as they are typically resource-constrained. A physical unclonable function (PUF) is a hardware-based security primitive and can be used to generate a key online or uniquely identify an integrated circuit (IC) by extracting its internal random differences using so-called challenge-response pairs (CRPs). It is regarded as a promising low-cost solution for IoT security. A logic reconfigurable PUF (RPUF) is highly efficient in terms of hardware cost. This article first presents a new classification for RPUFs, namely circuit-based RPUF (C-RPUF) and algorithm-based RPUF (A-RPUF); two Exclusive OR (XOR)-based RPUF circuits (an XOR-based reconfigurable bistable ring PUF (XRBR PUF) and an XOR-based reconfigurable ring oscillator PUF (XRRO PUF)) are proposed. Both the XRBR and XRRO PUFs are implemented on Xilinx Spartan-6 field-programmable gate arrays (FPGAs). The implementation results are compared with previous PUF designs and show good uniqueness and reliability. Compared to conventional PUF designs, the most significant advantage of the proposed designs is that they are highly efficient in terms of hardware cost. Moreover, the XRRO PUF is the most efficient design when compared with previous RPUFs. Also, both the proposed XRRO and XRBR PUFs require only 12.5% of the hardware resources of previous bitstable ring PUFs and reconfigurable RO PUFs, respectively, to generate a 1-bit response. This confirms that the proposed XRBR and XRRO PUFs are very efficient designs with good uniqueness and reliability.

41 citations


Journal ArticleDOI
TL;DR: Initial tests show that to attack the proposed FF-APUF design requires more effort for the adversary than a conventional APUF design, and the empirical min-entropy of the FF-apUF design across different devices is shown to be more than twice that of the conventional APF design.
Abstract: The PUF is a physical security primitive that permits to extract intrinsic digital identifiers from electronic devices. As low-cost nature PUF is a promising candidate to meet security in lightweight devices for IoT application. The Arbiter PUF or APUF has been widely studied in the technical literature. However it often suffers from disadvantages such as poor uniqueness and reliability, particularly when implemented on FPGAs due to features such as physical layout restrictions. To address these problems, a new design known as the FF-APUF has been proposed; it offers a compact architecture, combined with good uniqueness and reliability, as well as suitable for FPGA implementation. Many PUF designs have been shown to be vulnerable to ML based modeling attacks. In this paper, it is initially shown that the FF-APUF design requires more efforts than a conventional APUF design for the adversary to attack. A comprehensive analysis of the experimental results for the FF-APUF design is also presented. An improved APUF design with a balanced arbiter and a FF-APUF design are proposed and implemented on the Xilinx Artix-7 FPGA at 28 nm technology. The experimental min-entropy of the FF-APUF design across different devices is more than twice of a conventional APUF design.

40 citations


Journal ArticleDOI
TL;DR: The radix-8 Booth algorithm is used to reduce the number of partial products in the DA architecture, although no multiplication is explicitly performed, and the proposed design achieves 45%–61% lower EPO compared with the DLMS design.
Abstract: In this paper, a fixed-point finite impulse response adaptive filter is proposed using approximate distributed arithmetic (DA) circuits. In this design, the radix-8 Booth algorithm is used to reduce the number of partial products in the DA architecture, although no multiplication is explicitly performed. In addition, the partial products are approximately generated by truncating the input data with an error compensation. To further reduce hardware costs, an approximate Wallace tree is considered for the accumulation of partial products. As a result, the delay, area, and power consumption of the proposed design are significantly reduced. The application of system identification using a 48-tap bandpass filter and a 103-tap high-pass filter shows that the approximate design achieves a similar accuracy as its accurate counterpart. Compared with the state-of-the-art adaptive filter using bit-level pruning in the adder tree (referred to as the delayed least mean square (DLMS) design), it has a lower steady-state mean squared error and a smaller normalized misalignment. Synthesis results show that the proposed design attains on average a 55% reduction in energy per operation (EPO) and a $3.2\times $ throughput per area compared with an accurate design. Moreover, the proposed design achieves 45%–61% lower EPO compared with the DLMS design. A saccadic system using the proposed approximate adaptive filter-based cerebellar model achieves a similar retinal slip as using an accurate filter. These results are promising for the large-scale integration of approximate circuits into high-performance and energy-efficient systems for error-resilient applications.

38 citations


Journal ArticleDOI
TL;DR: In this SC-RNN, a hybrid structure is developed by utilizing SC designs and binary circuits to improve the hardware efficiency without significant loss of accuracy and achieves a higher noise tolerance compared to binary implementations.
Abstract: Recurrent neural networks (RNNs) are widely used to solve a large class of recognition problems, including prediction, machine translation, and speech recognition. The hardware implementation of RNNs is, however, challenging due to the high area and energy consumption of these networks. Recently, stochastic computing (SC) has been considered for implementing neural networks and reducing the hardware consumption. In this paper, we propose an energy-efficient and noise-tolerant long short-term memory-based RNN using SC. In this SC-RNN, a hybrid structure is developed by utilizing SC designs and binary circuits to improve the hardware efficiency without significant loss of accuracy. The area and energy consumption of the proposed design are between 1.6%–2.3% and 6.5%–11.2%, respectively, of a 32-bit floating-point (FP) implementation. The SC-RNN requires significantly smaller area and lower energy consumption in most cases compared to an 8-bit fixed point implementation. The proposed design achieves a higher noise tolerance compared to binary implementations. The inference accuracy is from 10% to 13% higher than an FP design when the noise level is high in the computation process.

23 citations


Journal ArticleDOI
TL;DR: The proposed designs outperform other approximate designs in image processing applications including change detection (for the divider), envelope detection ( for the SQR circuit) and image reconstruction (for both designs).
Abstract: In this paper, an adaptive approximation approach is proposed for the design of a divider and a square root (SQR) circuit. In this design, the division/SQR is computed by using a reduced-width divider/SQR circuit and a shifter by adaptively pruning some insignificant input bits. Specifically, for a $2n/n$2n/n division, $2k$2k and $k$k ($k

20 citations


Book ChapterDOI
01 Jan 2019
TL;DR: In this chapter, a classification is presented for the current designs of approximate arithmetic circuits including adders, multipliers, and dividers including addition, multiplication, and divide to understand the features of various designs.
Abstract: Arithmetic circuits are important computing modules in a processor. They play a key role in the performance and the energy consumption of many image processing applications. In this chapter, a classification is presented for the current designs of approximate arithmetic circuits including adders, multipliers, and dividers. To understand the features of various designs, a comparative evaluation of their error and circuit characteristics is performed. The accuracy of approximate arithmetic circuits is evaluated by carrying out Monte Carlo simulations. The circuit measurements are assessed by synthesizing approximate designs in an STM CMOS 28 nm process. The simulation and synthesis results show the trade-offs of approximate arithmetic circuits between accuracy and hardware efficiency.

19 citations


Journal ArticleDOI
TL;DR: A deterministic low-complexity approximate DCT technique that accurately configures the size of the transform matrix according to the number of retained coefficients in the zigzag scanning process, which shows the superior performance compared to previous ADCT techniques under different metrics.
Abstract: The approximate (multiplier-less) two-dimensional discrete cosine transform (DCT) is a widely adopted technique for image/video compression. This paper proposes a deterministic low-complexity approximate DCT technique that accurately configures the size of the transform matrix ( ${T}$ ) according to the number of retained coefficients in the zigzag scanning process. This is achieved by establishing the relationship between the number of retained coefficients and the number of rows of the ${T}$ matrix. The proposed technique referred to as the zigzag low-complexity approximate DCT (ZLCADCT), when compared with approximate DCT (ADCT), decreases the number of addition operations and the energy consumption while retaining the PSNR of the compressed image. In addition, the ZLCADCT eliminates the zigzag scanning process used in the ADCT. Moreover, to characterize the deterministic operation of the ZLCADCT, a detailed mathematical model is provided. A hardware platform based on FPGAs is then utilized to experimentally assess and compare the proposed technique; as modular, deterministic, low latency, and scalable, the proposed techniques can be implemented upon any change in the number of retaining coefficients by realizing only a partial reconfiguration of the FPGA resources for the additional required hardware. The extensive simulation and experimental results show the superior performance compared to previous ADCT techniques under different metrics.

17 citations


Proceedings ArticleDOI
13 May 2019
TL;DR: Approximate adders and multipliers are evaluated and compared for a better understanding of their characteristics when the implementations are optimized for performance or power.
Abstract: Taking advantage of the error resilience in many applications as well as the perceptual limitations of humans, numerous approximate arithmetic circuits have been proposed that trade off accuracy for higher speed or lower power in emerging applications that exploit approximate computing. However, characterizing the various approximate designs for a specific application under certain performance constraints becomes a new challenge. In this paper, approximate adders and multipliers are evaluated and compared for a better understanding of their characteristics when the implementations are optimized for performance or power. Although simple truncation can effectively reduce the hardware of an arithmetic circuit, it is shown that some other designs perform better in speed, power and power-delay product. For instance, many approximate adders have a higher performance than a truncated adder. A truncated multiplier is faster but consumes a higher power than most approximate designs for achieving a similar mean error magnitude. The logarithmic multipliers are very fast and power-efficient at a lower accuracy. Approximate multipliers can also be generated by an automated process to be very efficient while ensuring a sufficiently high accuracy.

Journal ArticleDOI
TL;DR: First, it is shown that the properties of signed integer multiplication (two´s complement format) can be used to make RPR more efficient, and its principles are extended to the MAC operation by proposing RPR implementations that improve the error correction capabilities with a limited impact on circuit overhead.
Abstract: Multiply and Accumulate (MAC) is one of the most common operations in modern computing systems. It is for example used in matrix multiplication and in new computational environments such as those executed on neural networks for deep machine learning. MAC is also used in critical systems that must operate reliably such as object recognition for vehicles. Therefore, MAC implementations must be able to cope with errors that may be caused for example by radiation. A common scheme to deal with soft errors in arithmetic circuits is the use of Reduced Precision Redundancy (RPR). RPR instead of replicating the entire circuit, uses reduced precision copies which significantly reduce the overhead while still being able to correct the largest errors. This paper considers the implementation of RPR Multiply and Accumulate circuits. First, it is shown that the properties of signed integer multiplication (two´s complement format) can be used to make RPR more efficient. Then its principles are extended to the MAC operation by proposing RPR implementations that improve the error correction capabilities with a limited impact on circuit overhead. The proposed schemes have been implemented and tested. The results show that they can significantly reduce the Mean Square Error (MSE) at the output when the circuit is affected by a soft error and the implementation overhead of the proposed schemes is extremely low.

Journal ArticleDOI
TL;DR: In this article, the authors proposed the design, fabrication, experimental implementation, and equivalent circuit model of a magneto-electric (ME) sensing device for determining the magnitude and direction of low-frequency ac magnetic fields.
Abstract: We propose the design, fabrication, experimental implementation, and equivalent circuit model of a magneto-electric (ME) sensing device for determining the magnitude and direction of low-frequency ac magnetic fields. The device consists of two capacitive ME thin-film sensors fabricated by means of the laser ablation technique inside a pulsed laser deposition (PLD) chamber. The films are grown on a 500-nm thick indium tin oxide layer on $1.5\times 1.5$ cm2 silicon substrates using two deposition methods, namely, single target (ST) and multiple target (MT). The proposed setup is used to detect ac magnetic fields generated from a solenoid coil located at a distance of 10 mm. The experimental results demonstrate that the proposed magnetic field sensing approach using the MT method can establish the magnitude and direction of external magnetic fields with detection errors below 8%. Based on the experimental observations, we also establish a mathematical expression describing the direct ME coupling effect observed in M-type strontium hexaferrite thin films. Subsequently, we develop an electrical circuit model that can accurately predict device behavior using circuit simulations. The differential output voltages at different field strengths are predicted for both ST and MT films by simulating this equivalent circuit using the LTspice software from Linear Technology. The simulation results are in good agreement with the experimental data both qualitatively and quantitatively, and the average difference between the two ranges from 5% to 19% in the worst case.

Journal ArticleDOI
TL;DR: Results show that OBP reduces the encoding and error detection circuitry complexity and delay, while TBP additionally reduces the number of parity bits for some configurations, therefore, OBP and TBP can be efficient alternatives for detection of limited magnitude errors in MLC memories that use a binary encoding of levels to bits.
Abstract: Emerging memory technologies rely on Multilevel Cells (MLC) to achieve high density; the use of multiple levels per cell allows storage of multiple bits, but it also reduces the margins and makes it error prone. Error control codes (including error correction and detection codes) can be used to protect MLC memories from errors; however, most existing coding schemes have been designed for traditional binary memories (so storing a single bit). In MLC memories, errors cause a change from a level to an adjacent level or to the next one (depending on the employed technology), so they are often referred to as limited magnitude errors. For a binary coding of levels to bits, these limited magnitude errors can corrupt several bits making traditional coding schemes inefficient. In this paper, error detection of MLC memories is considered when a binary encoding of levels to bits is used and two new schemes are proposed: One-Bit Parity (OBP) and Two-Bit Parity (TBP). The first scheme targets errors of magnitude-1 for detection using a single parity bit that checks only one bit per cell. The second scheme detects both magnitude-1 and -2 errors using only two parity bits. Both schemes are compared to existing alternatives, namely Gray coding combined with a single parity bit (GP) for OBP and Interleaved Parity (IP) for TBP. The results show that OBP reduces the encoding and error detection circuitry complexity and delay, while TBP additionally reduces the number of parity bits for some configurations. Therefore, OBP and TBP can be efficient alternatives for detection of limited magnitude errors in MLC memories that use a binary encoding of levels to bits.

Proceedings ArticleDOI
17 Jul 2019
TL;DR: A new metric referred to as the influence factor is defined and used to assess the approximate radix-4 Booth algorithm for different sizes of multipliers and it is shown that they offer superior performance.
Abstract: Approximate computing at the nanoscale provides a new approach for low power design for error-tolerant applications. Many emerging nanotechnologies are based on majority logic (ML) and therefore the 3-input majority gate has been used as the basic building block in digital circuit design. In this paper, we consider the design of approximate radix-4 Booth multipliers based on ML. In particular, an approximate partial product encoder and an approximate correction term encoder are proposed. A new metric referred to as the influence factor is defined and used to assess the approximate radix-4 Booth algorithm for different sizes of multipliers. The proposed designs are evaluated using hardware metrics as well as error metrics. It is shown that they offer superior performance. Image processing as a case study of error-tolerant applications are also presented to show the validity of the proposed designs.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Results show that the proposed architecture for 8-bit matrix multiplication with an approximation factor $\alpha=7$ has the lower power consumption compared to existing inexact designs found in the technical literature with comparable NMED.
Abstract: Matrix multiplication (MM) is a basic operation for many Digital Signal Processing applications. A Systolic Array (SA) is often considered as one of the most favorable architecture to achieve high performance for matrix multiplication. In this paper, the design exploration for an approximate SA is pursued; three design schemes are proposed by introducing approximation in multiple sub-modules. An approximation factor $\alpha$ is introduced; it is related to the inexact columns in the SA to explore the accuracy-efficiency trade-off present in the proposed designs. In the evaluation, an 8-bit input operand matrix multiplication is considered; the Synopsys Design Compiler at 45nm technology node is used to establish hardware-related metrics. The Error Rate (ER), Normalized Mean Error Distance (NMED) and Mean Relative Error Distance (MRED) are used as figures of merit for error analysis. Results show that the proposed architecture for 8-bit matrix multiplication with an approximation factor $\alpha=7$ has the lower power consumption compared to existing inexact designs found in the technical literature with comparable NMED. In addition, a power delay product vs NMED analysis shows the proposed designs have a lower PDP so applicable to low power applications. The practicality of the proposed architecture is established by computing the Discrete Cosine Transform.

Journal ArticleDOI
TL;DR: A novel N-N Reduced Precision Redundancy scheme is proposed with a simple comparison-based approach and a probabilistic analysis is pursued to determine the conditions by which RPR data is provided as output; it is shown that its probability is very small.
Abstract: Information is an integral part of the correct and reliable operation of today's computing systems. Data either stored or provided as input to computation processing modules must be tolerant to many externally and internally induced destructive phenomena such as soft errors and faults, often of a transient nature but also in large numbers, thus causing catastrophic system failures. Together with error tolerance, reliable operation must be provided by reducing the large overheads often encountered at system-level when employing redundancy. While information-based techniques can also be used in some of these schemes, the complexity and limited capabilities for implementing high order correction functions for decoding limit their application due to poor performance; therefore, N Modular Redundancy (NMR) is often employed. In NMR the correct output is given by majority voting among the N input copies of data. Reduced Precision Redundancy (RPR) has been advocated to reduce the redundancy, mostly for the case of N = 3; in a 3RPR scheme, one full precision (FP) input is needed while two inputs require reduced precision (RP) (usually by truncating some of the least significant bits (LSBs) in the input data). However, its decision logic is more complex than a 3MR scheme. This paper proposes a novel NRPR scheme with a simple comparison-based approach; the realistic case of N = 5 is considered as an example to explain in detail such proposed scheme; different arrangements for the redundancy (with three or four FP data copies) are considered. In addition to the design of the decision circuit, a probabilistic analysis is also pursued to determine the conditions by which RPR data is provided as output; it is shown that its probability is very small. Different applications of the proposed NRPR system are presented; in these applications, data is used either as memory output and/or for computing the discrete cosine transform. In both cases, the proposed 5RPR scheme shows considerable advantages in terms of redundancy management and reliable image processing.

Journal ArticleDOI
TL;DR: The peak signal-to-noise ratio (PSNR) results for the addition of two images show that the inexact full adder achieves a higher output image quality than the exact circuit when the frequency is scaled up.
Abstract: This study presents frequency upscaling as a technique for developing error resilient arithmetic designs in approximate computing whereby the input signal frequency of the circuit is upscaled beyond its largest operating value in generating errors in the arithmetic operation while speeding up the computational throughput. This study initially presents the mathematical modelling of frequency upscaling for both exact and inexact full adders. An exhaustive simulation and evaluation of 4 and 8 bits subtraction followed by addition of two images and approximate discrete cosine transform (DCT) is pursued using exact and inexact circuits when subjected to the proposed technique. The results estimated using the proposed model show good agreement with the simulation results. The normalised mean error distance of subtraction using an inexact circuit is close to the exact value for different technology nodes. The peak signal-to-noise ratio (PSNR) results for the addition of two images show that the inexact full adder achieves a higher output image quality than the exact circuit when the frequency is scaled up. Also, in an approximate DCT, the input frequency of an inexact full adder can be scaled up significantly higher than an exact full adder without a significant decrease in PSNR value.

Journal ArticleDOI
TL;DR: The application of the proposed MLGs to design fast decoders for one-step ML decodable (OS-MLD) codes is presented; the results show that the proposedMLGs are very efficient circuits for this coding application.
Abstract: The majority logic (ML) gate (MLG) is required in fast decoder implementations to protect memories from transient soft errors. In this paper, a novel MLG design is proposed; it consists of a pMOS pull-up network, an nMOS pull-down network, and an inverter. The proposed design is applicable to an arbitrary number of inputs $\gamma $ (and operating as a mirror circuit when $\gamma $ is odd). The proposed designs are simply requiring a small number of transistors; when simulated, they offer improved metrics such as reduction in delay, area, and power dissipation compared with existing designs found in the technical literature. When the combined power-delay-area product (PDAP) is considered, the advantages of the proposed designs are pronounced. The application of the proposed MLGs to design fast decoders for one-step ML decodable (OS-MLD) codes is also presented; the results show that the proposed MLGs are very efficient circuits for this coding application.

Journal ArticleDOI
TL;DR: A new method to construct Double Error Correction (DEC) OS-MLD codes is presented that provides codes that require a smaller number of parity check bits than existing codes like Orthogonal Latin Square (OLS) codes, and the generalization of the proposed scheme to codes with larger error correction capabilities is discussed.
Abstract: Error Correction Codes (ECCs) are commonly used to protect memories against soft errors with an impact on memory area and delay. For large memories, the area overhead is mostly due to the additional cells needed to store the parity check bits. In terms of delay, the overhead is mostly needed to detect and correct errors when the data is read from the memory. Most ECCs that can correct more than one error have a complex decoding process and so are limited in high speed memory applications. One exception is One Step Majority Logic Decodable (OS-MLD) codes for which decoding can be done in parallel at high speed. Unfortunately, there are only a few OS-MLD codes that provide a limited choice in terms of block sizes, error correction capabilities and code rate. Therefore, there is considerable interest in a novel construction of OS-MLD codes to provide additional choices for protecting memories. In this paper, a new method to construct Double Error Correction (DEC) OS-MLD codes is presented. This method is based on the use of parity check matrices in which two bits have at most two parity check equations in common; the proposed method provides codes that require a smaller number of parity check bits than existing codes like Orthogonal Latin Square (OLS) codes. The drawback of the proposed Two Bit Overlap (TBO) codes is that they require slightly more complex decoding than OLS codes. Therefore, they provide an intermediate solution between OLS and non OS-MLD codes in terms of decoding delay and number of parity check bits. The proposed TBO codes have been implemented for some block sizes and compared to both OLS and BCH codes to illustrate the trade off in delay and memory overhead. Finally, this paper discusses the generalization of the proposed scheme to codes with larger error correction capabilities.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Several Spotty codes that can correct double bit errors in 4-bit cells are designed and evaluated and require fewer parity bits than existing Spotty code or symbol-based codes such as Hong-Patel codes, which reduces the size of the memory while having encoding and decoding complexity similar to existing alternative codes.
Abstract: Non-volatile emerging Multilevel Cell (MLC) memories (such as magneto electric, magnetic resistive, memristor-based and phase change memories) are attractive to increase density. A key advantage of these memories is that they can store several bits per cell by using different levels. This however reduces the margins against noise and other effects and can lead to larger error rates. Errors in MLC memories are usually limited to magnitude-2 levels, and thus corrupt one or two bits per cell when using a Gray mapping from levels to bits. This enables the use of codes that can correct those error patterns in a memory cell instead of codes that correct all possible patterns in the cell, thus reducing complexity and cost. In this paper, the case of a 64 data bit memory built using memory cells that can store four bits and suffer up to double bit errors per cell is considered. Several (72, 64) Spotty codes that can correct double bit errors in 4-bit cells are designed and evaluated. The new codes require fewer parity bits than existing Spotty codes or symbol-based codes such as Hong-Patel codes. Therefore, they reduce the size of the memory while having encoding and decoding complexity similar to existing alternative codes.

Journal ArticleDOI
TL;DR: This paper presents a new write latency reduction scheme for a Phase Change Memory (PCM) made of Multi-Level Cells (MLCs) that improves over an existing scheme found in the technical literature and known as CABS.
Abstract: This paper presents a new write latency reduction scheme for a Phase Change Memory (PCM) made of Multi-Level Cells (MLCs) This scheme improves over an existing scheme found in the technical literature and known as CABS The proposed scheme is based on the utilization of a new coding arrangement for the selection of candidate codewords The code relies on the two-step feature found in the write operation of a MLC PCM and avoids the symbol that incurs in the largest latency at a higher rate than CABS A detailed simulation based evaluation and comparison are also pursued; the proposed scheme accomplishes improvements in write latency (for parallel writing) as well as coding rate (16/17 for the proposed scheme versus 16/18 for CABS for 16 symbols or 32-bit word) As the proposed scheme utilizes novel selection criteria for the candidates, the design of the required circuitry (encoder and decoder) has also been changed with respect to CABS; in terms of hardware, the areas of the encoder and decoder for the proposed scheme are reduced by 73 and 56 percent respectively compared with CABS