High-reliability fault tolerant digital systems in nanometric technologies: Characterization and design methodologies
read more
Citations
Low-cost checkpointing in automotive safety-relevant systems
Pulse-length determination techniques in the rectangular single event transient fault model
Evaluating CLB designs under multiple SETs in SRAM-based FPGAs
Two soft-error mitigation techniques for functional units of DSP processors
Modelling and mitigation of soft-errors in CMOS processors
References
Computer Architecture: A Quantitative Approach
Content-addressable memory (CAM) circuits and architectures: a tutorial and survey
Trends and challenges in VLSI circuit reliability
Single event upset at ground level
Test Generation for Microprocessors
Related Papers (5)
Frequently Asked Questions (19)
Q2. What are the future works mentioned in the paper "High-reliability fault tolerant digital systems in nanometric technologies: characterization and design methodologies" ?
These approaches, although developed within the project scenario, have been validated and evaluated independently, but not integrated within a single framework ; such an integration and an overall validation of the complete methodology are considered future work.
Q3. What is the meaning of Bloom filter?
Since Bloom filter uses standard memories to store its data, the authors assume that the filter memory is protected by Error Correcting Codes.
Q4. What is the significance of the effects that could reduce the reliability of circuits?
While the shrinking of minimum dimensions of integrated circuits till tenths of nanometers allows the integration of millions of gates on the single chip, it also implies the growth of the importance of effects that could reduce the reliability of circuits.
Q5. What are the main problems of the project?
Within the project, the authors considered the following faults:• resistive bridgings and opens in local and global interconnects; • transient faults due to energetic particles hitting the considered circuit; • inductive and capacitive crosstalk among interconnects, considering both the interconnects within each block composing the SoPC and the interconnects among the different blocks; • signal integrity issues due to power supply noise (mainly due to the simultaneous switching of high capacitive bus line drivers) and to degradation phenomena (mainly due to Negative Biased Temperature Instability, NBTI); • problems due to clock faults, considering both the issues within the blocks composing the SoPC, and the ones in the communication among the different blocks.
Q6. What is the second avenue of attack?
The second avenue of attack concerned the Very Long Instruction Word processors (VLIW), that are now commonly used in several signal processing applications, especially when power and performance constraints must be considered at a time.
Q7. What is the role of the reconfiguration controller?
In particular, the reconfiguration controller is in charge of monitoring the error signals produced made available by the application of hardening techniques, and when a problem is detected, a reconfiguration is triggered to recover by re-programming the board with the original configuration.
Q8. What is the main issue to be addressed in the definition of methods and techniques to harden systems?
One of the main issues to be addressed in the definition of methods and techniques to harden systems implemented on SoPC is the identification of a set of fault models, for121978-1-4673-3044-2/12/$31.00 c© 2012 IEEEboth permanent and transient faults, to be used as a starting point for the selection/development of proper fault tolerant techniques able to guarantee the desired level of reliability.
Q9. What are the main effects of the reduced integration step?
In particular, the reduced integration step, the reduced supply voltage that lowers the noise immunity, the growing power needs, the eventual integration of both digital and analog circuits on the same chip and the highly growing of radiation sensitivity [1], [2], [3] require an accurate evaluation of possible reliability reduction for the occurrence of:• permanent faults due to the aging of device materials [4], the interruptions of metal interconnections due to electromigration [5] or the crack of the insulation oxide of transistor [6]; • transient faults, known as Single Event Effects (SEE), which are much more likely than in the past due to the reduced transistors’ sizes [3]: in particular new technology devices are more prone to crosstalk in the interconnects and to radiation effects.
Q10. What is the sensitivity of multilevel cell flash memories?
Flash memories are sensitive to alpha particles, whereas SLC devices do not show any sensitivity down to a feature size of 34 nm.
Q11. What is the main goal of the research?
The overall target of the research is the development of a design methodology for highly reliable systems realized on reconfigurable platforms based on a System-on-Programmable Chip (SoPC), as discussed in the next section.
Q12. What is the role of the CAM in modern microprocessors?
These structures play an important role in modern microprocessor [24], and the percentage of chip area devoted to their implementation is increasing, and consequently is more likely that an error occurs in these structures.
Q13. What is the LET for retention errors?
The cross section for retention errors follows a Weibull curve: compared to HI upsets, retention errors have a threshold LET more than 10 times higher and a saturation cross section about two orders of magnitude smaller.
Q14. What is the main advantage of the VLIW architecture?
their regular architecture allows automating the generation of effective test code, once the architecture of the processor is known, thus making functional test practical for real applications [21], [22].
Q15. What is the advantage of the design for testability approach?
A further advantage of this approach lies in the fact that the test is performed at the same speed of normal operations,thus offering a better defect coverage than other solutions.
Q16. What is the main contribution of the paper?
This paper reports the main contribution of a project devoted to the definition of techniques to design and evaluate fault tolerant systems implemented using the SoPC paradigm, suitable for mission- and safety-critical application environments.
Q17. What is the agreement between experimental data and simulations?
The best agreement between experimental data and simulations is obtained when energy deposition in the FG is considered, with strong implications for error rate calculations.
Q18. What is the first method used to detect and correct errors in CAM?
The first method combines the use of parity bits with a duplication and comparison scheme to detect and correct errors in CAM [25].
Q19. What is the trend of embedding more complex microprocessors in SoPC?
The trend of embedding more complex microprocessors in SoPC has been confirmed in these years by the announcement of novel devices of Xilinx and Altera with embedded Intel or ARM microprocessors.