Journal ArticleDOI
Transient fault tolerance in digital systems
TLDR
This framework provides a basis for understanding transient fault problems in digital systems and can be helpful in selecting optimum techniques to mask or eliminate transient fault effects in developed systems.Abstract:
It is hard to shield systems effectively from transient faults (fault avoidance techniques). So some other means must be employed to assure appropriate levels of transient fault tolerance (insensitivity to transient faults). They are based on fault-masking and fault recovery ideas. Having analyzed this problem, the author identifies critical design points and outlines some practical solutions that refer to efficient on-line detectors (detecting errors during the system operation) and error handling procedures. This framework provides a basis for understanding transient fault problems in digital systems. It can be helpful in selecting optimum techniques to mask or eliminate transient fault effects in developed systems. >read more
Citations
More filters
Journal ArticleDOI
Fault and Error Tolerance in Neural Networks: A Review
TL;DR: A survey on fault tolerance in neural networks manly focusing on well-established passive techniques to exploit and improve, by design, such potential but limited intrinsic property in neural models, particularly for feedforward neural networks is presented.
Proceedings ArticleDOI
ICR: in-cache replication for enhancing data cache reliability
TL;DR: This paper proposes a novel solution to this problem by allowing in-cache replication, wherein reliability can be enhanced without excessively slowing down cache accesses or requiring significant area cost increases.
Journal ArticleDOI
Threshold-based mechanisms to discriminate transient from intermittent faults
TL;DR: A class of count-and-threshold mechanisms, collectively named /spl alpha/-count, which are able to discriminate between transient faults and intermittent faults in computing systems and adopt a mathematically defined structure, which is simple enough to analyze by standard tools.
Journal ArticleDOI
Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems With Checkpointing and Replication
TL;DR: This work uses checkpointing with rollback recovery and active replication for tolerating transient faults, and presents several design optimization approaches which are able to find fault-tolerant implementations given a limited amount of resources.
Proceedings ArticleDOI
Area efficient architectures for information integrity in cache memories
Seongwoo Kim,Arun K. Somani +1 more
TL;DR: This work focuses on transient fault tolerance in primary cache memories and develops new architectural solutions, to maximize fault coverage when the budgeted silicon area is not sufficient for the conventional configuration of an error checking code.
References
More filters
Book
Testing Semiconductor Memories: Theory and Practice
TL;DR: Memory modeling functional testing: reduced functional RAM chip model Functional RAM chip testing functional ROM chip testingfunctional memory array testing functional memory board testing electrical testing: parametric testing dynamic testing on chip testing conclusions: address line scrambling various proofs software package.
Journal ArticleDOI
Concurrent error detection using watchdog processors-a survey
A. Mahmood,Edward J. McCluskey +1 more
TL;DR: It is shown that a large number of errors can be detected by monitoring the control flow and memory-access behavior and two techniques for control-flow checking are discussed and compared with current error-detection techniques.
Proceedings ArticleDOI
Evaluation of error detection schemes using fault injection by heavy-ion radiation
TL;DR: Several concurrent error detection schemes suitable for a watch-dog processor were evaluated by fault injection andSoft errors were induced into a MC6809E microprocessor by heavy-ion radiation from a Californium-252 source to characterize the errors and determine coverage and latency for the variouserror detection schemes.