scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Spatial avoidance of hardware faults using FPGA partial reconfiguration of tile-based soft processors

06 Mar 2010-pp 1-11
TL;DR: This paper presents the design of a many-core computer architecture with fault detection and recovery using partial reconfiguration of an FPGA, which has the advantage of recovering from faults in both the circuit fabric and the configuration RAM of anFPGA in addition to spatially avoiding permanently damaged regions of the chip.
Abstract: This paper presents the design of a many-core computer architecture with fault detection and recovery using partial reconfiguration of an FPGA. The FPGA fabric is partitioned into tiles which contain homogenous soft processors. At any given time, three processors are configured in triple modulo redundancy to detect faults. Spare processors are brought online to replace faulted tiles in real time. A recovery procedure involving partial reconfiguration is used to repair faulted tiles. This type of approach has the advantage of recovering from faults in both the circuit fabric and the configuration RAM of an FPGA in addition to spatially avoiding permanently damaged regions of the chip. 1 2
Citations
More filters
Proceedings ArticleDOI
25 Jun 2012
TL;DR: This paper presents a novel high performance and fault-tolerant ICAP controller which can operate at a high speed and recover from emerging faults, and demonstrates the use of Triple Modular Redundancy (TMR) in some of theICAP controller components which have the ability to reconfigure the rest of the IC AP controller when faults are detected.
Abstract: Dynamic Partial Reconfiguration is an important feature of modern FPGAs as it allows for better exploitation of FPGA resources over time and space. The Internal Configuration Access Port (ICAP) enables DPR from within an FPGA chip, leading to the possibility of fully autonomous FPGA-based systems. This paper presents a novel high performance and fault-tolerant ICAP controller which can operate at a high speed and recover from emerging faults. Test results showed that our ICAP controller is 25 times faster than the Xilinx' XPS_HWICAP IP core. We demonstrate the use of Triple Modular Redundancy (TMR) in some of the ICAP controller components which have the ability to reconfigure the rest of the ICAP controller when faults are detected. This method is shown to have a 49% smaller area footprint compared to traditional full TMR.

28 citations


Cites background from "Spatial avoidance of hardware fault..."

  • ...In [8], [9] and [10], the authors demonstrated how current TMR techniques can be improved by reconfiguring faulty modules to keep the system free of faults....

    [...]

Proceedings ArticleDOI
21 Feb 2016
TL;DR: This paper investigates the improvements in reliability of a LEON3 soft processor operating on a SRAM-based FPGA when using triple-modular redundancy and other processor-specific mitigation techniques and demonstrates an average improvement of 10×.
Abstract: Processors are an essential component in most satellite payload electronics and handle a variety of functions including command handling and data processing. There is growing interest in implementing soft processors on commercial FPGAs within satellites. Commercial FPGAs offer reconfigurability, large logic density, and I/O bandwidth; however, they are sensitive to ionizing radiation and systems developed for space must implement single-event upset mitigation to operate reliably. This paper investigates the improvements in reliability of a LEON3 soft processor operating on a SRAM-based FPGA when using triple-modular redundancy and other processor-specific mitigation techniques. The improvements in reliability provided by these techniques are validated with both fault injection and heavy ion radiation tests. The fault injection experiments indicate an improvement of 51× and the radiation testing results demonstrate an average improvement of 10×. Orbit failure rate estimations were computed and suggest that the TMR LEON3 processor has a mean-time to failure of over 76 years in a geosynchronous orbit.

26 citations


Cites methods from "Spatial avoidance of hardware fault..."

  • ...A triple modular small Xilinx PicoBlaze soft processor was tested on the Xilinx Virtex 5 architecture using a tilebased approach and resynchronization through partial reconfiguration [12]....

    [...]

Journal ArticleDOI
TL;DR: This paper introduces a computer architecture for static random-access-memory-based field-programmable gate arrays that resists failures caused by ionizing radiation.
Abstract: Radiation-tolerant computing is of great importance to the aerospace community because future missions demand more computational power. Of special interest to the aerospace community are flight computers implemented on static random-access-memory-based field-programmable gate arrays. Such computer systems allow the in-flight reconfiguration of hardware that enables the practical deployment of truly reconfigurable computers. However, commercial static random-access-memory-based field-programmable gate arrays are uniquely susceptible to ionizing radiation. This paper introduces a computer architecture for static random-access-memory-based field-programmable gate arrays that resists failures caused by ionizing radiation. The approach extends the widely accepted fault mitigation practice of triple modular redundancy and configuration memory scrubbing by adding spare circuitry and environmental awareness through an ionizing radiation sensor. This paper describes the design of the system in addition to a theore...

13 citations


Cites methods from "Spatial avoidance of hardware fault..."

  • ...Previous work atMontana State University has also employed TMR and scrubbing techniques [13]....

    [...]

Proceedings ArticleDOI
26 Oct 2015
TL;DR: This paper proposes the use of Triple Modular Redundancy at the controller level, and calculates system reliability using Markov models to quantitatively show the advantage of the proposed technique in terms of extended lifetime.
Abstract: Fault-tolerance is becoming an essential feature in the design of Networked Control Systems (NCSs). Furthermore, Sensor-to-Actuator (S2A) architectures have shown some advantages over conventional In-Loop architectures. This paper focuses on fault-tolerant controllers in the context of S2A systems. It proposes the use of Triple Modular Redundancy at the controller level. The fault-tolerant controller will be hosted in an FPGA that has a spare location. The voter in this TMR scheme is fault-secure to guarantee that the controllers never produce an undetected incorrect control action. Finally, system reliability is calculated using Markov models to quantitatively show, via case studies, the advantage of the proposed technique in terms of extended lifetime.

13 citations


Cites background or methods from "Spatial avoidance of hardware fault..."

  • ...Another technique was developed in [33] where an FPGA is divided into partially reconfigurable homogeneous tiles each of which contains a copy of the soft core processor....

    [...]

  • ...Also, one of the main differences between the proposed technique and those in [31, 33] is that the system does not stop operation during recovery which is essential in real-time systems such as the one targeted in this research....

    [...]

Dissertation
28 Apr 2011
TL;DR: It was possible to show that the proposed fault tolerant scheme based on information redundancy leads to a better implementation and provides better SEU resistance than the traditional Triple Modular Redundancy (TMR).
Abstract: Spacecrafts are extensively used by public and private sectors to support a variety of services. Considering the cost and the strategic importance of these spacecrafts, there has been an increasing demand to utilize strong cryptographic primitives to assure their security. Moreover, it is of utmost importance to consider fault tolerance in their designs due to the harsh environment found in space, while keeping low area and power consumption. The problem of recovering spacecrafts from failures or attacks, and bringing them back to an operational and safe state is crucial for reliability. Despite the recent interest in incorporating on-board security, there is limited research in this area. This research proposes a trusted hardware module approach for recovering the spacecrafts subsystems and their cryptographic capabilities after an attack or a major failure has happened. The proposed fault tolerant trusted modules are capable of performing platform restoration as well as recovering the cryptographic capabilities of the spacecraft. This research also proposes efficient fault tolerant architectures for the secure hash (SHA-2) and message authentication code (HMAC) algorithms. The proposed architectures are the first in the literature to detect and correct errors by using Hamming codes to protect the main registers. Furthermore, a quantitative analysis of the probability of failure of the proposed fault tolerance mechanisms is introduced. Based upon an extensive set of experimental results along with probability of failure analysis, it was possible to show that the proposed fault tolerant scheme based on information redundancy leads to a better implementation and provides better SEU resistance than the traditional Triple Modular Redundancy (TMR). The fault tolerant cryptographic primitives introduced in this research are of crucial importance for the implementation of on-board security in spacecrafts.

7 citations


Cites methods from "Spatial avoidance of hardware fault..."

  • ...An approach to optimize the reconfiguration of the FPGA is proposed in [100]....

    [...]

References
More filters
Book
29 Jul 1993
TL;DR: In this paper, the response of materials and devices to radiation in space radiation environments is investigated, including metal-oxide semiconductor (MOS) devices, Diodes, solar cells, and opto-electronics.
Abstract: Introduction 1: Radiation environments 2: The response of materials and devices to radiation 3: Metal-oxide semiconductor (MOS) devices 4: Discrete bipolar transistors 5: Diodes, solar cells, and opto-electronics 6: Power devices 7: Optical media 8: Other components 9: Polymers and other organics 10: The interaction of space radiation with shielding materials 11: Computer methods for particle transport 12: Radiation testing 13: Radiation-hardening of parts 14: Equipment hardening and hardness assurance 15: Conclusions Appendices A Useful general and geophysical data B Useful radiation data C Useful data on materials D Radiation response data for electronic components E Depth-dose curves for representative satellite orbits F Degradation in polymers Index

685 citations


"Spatial avoidance of hardware fault..." refers background in this paper

  • ...When cosmic particles (typically heavy ions and protons) strike integrated circuits (IC), fault conditions called Single Event Effects (SEE) can occur [1]....

    [...]

  • ...This phenomenon permanently degrades the transistor and can result in threshold shifts, increased device leakage, timing changes, and ultimately functional failure of the device [1]....

    [...]

Book
21 Aug 2002
TL;DR: In this paper, the basic radiation damage mechanism in Semiconductor Materials and Devices and Displacement Damage in Group IV and Group III Semiconductors are discussed. And GaAs Based Field Effect Transistors for Radiation-Hard Applications.
Abstract: Radiation Environments and Component Selection Strategy.- Basic Radiation Damage Mechanisms in Semiconductor Materials and Devices.- Displacement Damage in Group IV Semiconductor Materials.- Radiation Damage in GaAs.- Space Radiation Aspects of Silicon Bipolar Technologies.- Radiation Damage in Silicon MOS Devices.- GaAs Based Field Effect Transistors for Radiation-Hard Applications.- Opto-Electronic Components for Space.- Advanced Semiconductor Materials and Devices - Outlook.

375 citations


"Spatial avoidance of hardware fault..." refers background in this paper

  • ...When the magnitude of the SET is large enough to cause a logic transition on a receiving gate, logic failures in the circuit can exist [2-3]....

    [...]

Book
22 Jul 2013
TL;DR: This paper introduces the fundamental concepts of fault tolerant computing and key topics covered are voting, fault detection, clock synchronization, Byzantine Agreement, diagnosis, and reliability analysis.
Abstract: This paper introduces the fundamental concepts of fault tolerant computing. Key topics covered are voting, fault detection, clock synchronization, Byzantine Agreement, diagnosis, and reliability analysis. Low level mechanisms such as Hamming codes or low level communications protocols are not covered. The paper is tutorial in nature and does not cover any topic in detail. The focus is on rationale and approach rather than detailed exposition.

66 citations


"Spatial avoidance of hardware fault..." refers background or methods in this paper

  • ...Watchdog timers independently observe the operation of a system and initiate a reset when the system becomes idle for too long [8]....

    [...]

  • ...For more complex systems, TMR can be used in conjunction with a recovery sequence which can reset and reinitialize the system when a fault is detected [8]....

    [...]

Proceedings ArticleDOI
04 Mar 2006
TL;DR: A framework that allows Earth and space scientists to use FPGA resources through an abstraction layer is explored, and a synthetic aperture radar application is used to demonstrate the power of the system architecture.
Abstract: Complex real-time signal and image processing applications require low-latency and high-performance hardware to achieve optimal performance. Building such a high-performance platform for space deployment is hampered by hostile environmental conditions and power constraints. Custom space-based FPGA coprocessors help alleviate these constraints, but their use is typically restricted by the need for TMR or radiation-hardened components. This paper explores a framework that allows Earth and space scientists to use FPGA resources through an abstraction layer. A synthetic aperture radar application is used to demonstrate the power of the system architecture. The performance of the application is shown to achieve a speedup of 19 when compared to a software solution and is able to maintain comparable data reliability. Projected speedups, for the same case study executing on the proposed flight system architecture, are several times better and also discussed. This work supports the Dependable Multiprocessor project at Honeywell and the University of Florida, a mission for the Space Technology 8 (ST-8) satellite of NASA's New Millennium Program.

42 citations