New Techniques for Improving the Performance of the Lockstep Architecture for SEEs Mitigation in FPGA Embedded Processors
Reads0
Chats0
TLDR
A non invasive approach for the implementation of fault tolerant systems based on COTS processors embedded in FPGAs, using lockstep in conjunction with checkpoint and rollback recovery, is presented.Abstract:
The growing availability of embedded processors inside FPGAs provides unprecedented flexibility for system designers. The use of such devices for space or mission critical applications, however, is being delayed by the lack of effective low cost techniques to mitigate radiation induced errors. In this paper a non invasive approach for the implementation of fault tolerant systems based on COTS processors embedded in FPGAs, using lockstep in conjunction with checkpoint and rollback recovery, is presented. The proposed approach does not require modifications in the processor architecture or in the application software. The experimental validation of this approach through fault injection is described, the corresponding results are discussed, and the addition of a write history table as a means to reduce the performance overhead imposed by previous implementations is proposed and evaluated.read more
Citations
More filters
Journal ArticleDOI
Fault-tolerant computer system design
TL;DR: Fault-Tolerant Computer System Design by Dhiraj K. Pradhan examines the design of fault-tolerant systems and their applications in the oil and gas industry.
Journal ArticleDOI
Low-overhead fault-tolerance technique for a dynamically reconfigurable softcore processor
TL;DR: A new Enhanced Lockstep scheme built using a pair of MicroBlaze cores is proposed and implemented on Xilinx Virtex-5 FPGA, which can mitigate radiation-induced temporary faults (single-event upsets (SEUs) at moderate cost and requires significantly shorter error recovery time.
Proceedings ArticleDOI
Scrubbing-based SEU mitigation approach for Systems-on-Programmable-Chips
Aitzan Sari,Mihalis Psarakis +1 more
TL;DR: A constraint-driven re-placement method to reduce the number of sensitive configuration frames and consequently the scrubbing time is proposed and a low-cost SEU mitigation approach for SoPCs is presented which uses configuration memory scan and scrubbing as fault detection and fault repair mechanisms combined with checkpointing and rollback for fault recovery.
Proceedings ArticleDOI
Fault tolerant FPGA processor based on runtime reconfigurable modules
Mihalis Psarakis,A. Apostolakis +1 more
TL;DR: This paper partitions the processor core into reconfigurable modules and duplicate these modules to implement a concurrent error detection mechanism and generates precompiled configurations which include spare resources and are used to runtime repair the defective module.
Proceedings ArticleDOI
Combining checkpointing and scrubbing in FPGA-based real-time systems
TL;DR: This paper calculates the checkpoint frequencies that guarantee the execution of the tasks within their deadlines in the presence of transient faults, and proposes a selective scrubbing approach to reduce the scrubbing time and make feasible the fault tolerant execution of tasks with tight deadlines.
References
More filters
Journal ArticleDOI
Soft errors in advanced computer systems
TL;DR: This article comprehensively analyzes soft-error sensitivity in modern systems and shows it to be application dependent.
Book
Fault-tolerant computer system design
TL;DR: This new edition specifically deals with this dynamically changing computing environment, incorporating new topics such as fault-tolerance in multiprocessor and distributed systems.
Journal ArticleDOI
Concurrent error detection using watchdog processors-a survey
A. Mahmood,Edward J. McCluskey +1 more
TL;DR: It is shown that a large number of errors can be detected by monitoring the control flow and memory-access behavior and two techniques for control-flow checking are discussed and compared with current error-detection techniques.
Journal ArticleDOI
Control-flow checking by software signatures
TL;DR: A pure software method that checks the control flow of a program using assigned signatures that can be used even when the operating system does not support multitasking, and it is possible to increase error detection coverage for control flow errors by an order of magnitude.
Journal ArticleDOI
Fault-tolerant computer system design
TL;DR: Fault-Tolerant Computer System Design by Dhiraj K. Pradhan examines the design of fault-tolerant systems and their applications in the oil and gas industry.