Improving cache lifetime reliability at ultra-low voltages
read more
Citations
Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms
Energy-efficient cache design using variable-strength error-correcting codes
Fault-tolerant iterative methods via selective reliability.
Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors
Archipelago: A polymorphic cache design for enabling robust near-threshold operation
References
Fundamentals of Modern VLSI Devices
Error Control Coding
Modeling the effect of technology trends on the soft error rate of combinational logic
On a class of error correcting binary group codes
The impact of intrinsic device fluctuations on CMOS SRAM cell stability
Related Papers (5)
Frequently Asked Questions (20)
Q2. What contributions have the authors mentioned in the paper "Improving cache lifetime reliability at ultra-low voltages" ?
In this paper, the authors propose a novel adaptive technique to improve cache lifetime reliability and enable low voltage operation. Furthermore, MS-ECC ’ s design can allow the operating system to adaptively change the cache size and ECC capability to adjust to system operating conditions.
Q3. What is the purpose of reducing Vccmin in the context of memory failures?
Reducing Vccmin in the context of memory failures is important for enabling ultra-low power modes that are more energy-efficient.
Q4. What is the way to improve cache reliability?
One solution to improve cache reliability is to implement true column and/or row redundancy by adding multiple spare rows and/or columns to the cache array [24].
Q5. What is the primary concern of the low-voltage mode?
as the low-voltage mode is normally used when the processor load is low, energy efficiency, rather than performance is the primary concern.
Q6. How many errors are required to be fixed per segment?
their evaluation shows that enabling ultra-low voltage operation requires three or more errors to be fixed per segment, even for small segment sizes.
Q7. How do the authors determine the probability of a erratic bit failure?
Since cell stability and oxide strength play a role in both erratic bit failures and persistent failures, the authors expect that the probability of an erratic bit failure will be proportional to the probability of a voltage–dependent, persistent failure.
Q8. Why do erratic cells appear as normal cells?
Due to their random nature, erratic cells may escape standard testing and appear as normal cells, but may cause bit failures later.
Q9. What is the critical path for the decoder?
The critical path for the decoder is ceil(log2(m)) levels of 2-input XOR, one level of 2:1 MUX, plus (2t+1)-input majority function.
Q10. How is the probability of erratic failures sensitive to supply voltage?
The authors note that since the authors model erratic failures as a fixed proportion of persistent failures, the probability of erratic failures is sensitive to supply voltage.
Q11. How many errors can be corrected in low voltage mode?
To enable ultralow voltage operation, MS-ECC needs to use an error correction code whose complexity scales well with the number of error corrections.
Q12. What is the role of erratic bit failures in the future?
Erratic bit failures have played a key role in setting Vccmin in the past, and they are likely to re-emerge as a reliability concern in the future [1, 6].
Q13. Why is the MSECC technique so important?
Because the size of each segment is smaller than the entire cache line, the latency and complexity for MSECC is significantly less than conventional (un-segmented) ECC.
Q14. What is the minimum voltage for a die to operate reliably?
These variations restrict voltage scaling to a minimum value, often called Vccmin (or Vmin), which is the minimum supply voltage for a die to operate reliably.
Q15. How much increase is the soft error rate for a given voltage?
While previous measurements show that soft error rates increase exponentially with reduction in supply voltage, the rate of increase is limited to 2.5x-3x for every 500mV decrease in supply voltage.
Q16. How many errors can be corrected for each 64-bit segment?
This mechanism can correct 1-4 errors for each 64-bit segment, where a higher correction capability increases reliability at the expense of sacrificing a bigger percentage of the cache size.
Q17. How many random bits can a cache handle?
it cannot deal with thousands of randomly distributed cache bits in large caches which would become defective in the low voltage mode due to high cell failure rates.
Q18. How many bits can be corrected in a 64-bit segment?
This is intuitive since BFXECC can only correct one erratic bit or soft error in every 512 bits, while MS-ECC can correct up to four such errors in each 64-bit segment if that segment contains no persistent failures.
Q19. How can the authors reduce the number of written-back lines?
While the authors can decrease the number of written-back lines by either controlling the placement of dirty lines or choosing ECC ways dynamically, the authors leave such optimizations to future work.
Q20. How does the model for erratic bit failures work?
To account for the impact of FIT rate on Vccmin, the authors use a comprehensive model for cache failures that includes the impact of voltage–dependent, persistent failures on yield loss as well as the impact of soft error rates (SER) and erratic bits on FIT rate.