Spatial avoidance of hardware faults using FPGA partial reconfiguration of tile-based soft processors

doi:10.1109/AERO.2010.5446663

Home
/
Papers
/
Spatial avoidance of hardware faults using FPGA partial reconfiguration of tile-based soft processors

Proceedings Article•DOI•

Spatial avoidance of hardware faults using FPGA partial reconfiguration of tile-based soft processors

Clint Gauer¹, Brock J. LaMeres¹, David M. Racek¹•Institutions (1)

Montana State University¹

06 Mar 2010-pp 1-11

TL;DR: This paper presents the design of a many-core computer architecture with fault detection and recovery using partial reconfiguration of an FPGA, which has the advantage of recovering from faults in both the circuit fabric and the configuration RAM of anFPGA in addition to spatially avoiding permanently damaged regions of the chip.

read less

Abstract: This paper presents the design of a many-core computer architecture with fault detection and recovery using partial reconfiguration of an FPGA. The FPGA fabric is partitioned into tiles which contain homogenous soft processors. At any given time, three processors are configured in triple modulo redundancy to detect faults. Spare processors are brought online to replace faulted tiles in real time. A recovery procedure involving partial reconfiguration is used to repair faulted tiles. This type of approach has the advantage of recovering from faults in both the circuit fabric and the configuration RAM of an FPGA in addition to spatially avoiding permanently damaged regions of the chip. 1 2

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

A novel high-performance fault-tolerant ICAP controller

[...]

Ali Ebrahim¹, Khaled Benkrid¹, Xabier Iturbe¹, Chuan Hong¹•Institutions (1)

University of Edinburgh¹

25 Jun 2012

TL;DR: This paper presents a novel high performance and fault-tolerant ICAP controller which can operate at a high speed and recover from emerging faults, and demonstrates the use of Triple Modular Redundancy (TMR) in some of theICAP controller components which have the ability to reconfigure the rest of the IC AP controller when faults are detected.

...read moreread less

Abstract: Dynamic Partial Reconfiguration is an important feature of modern FPGAs as it allows for better exploitation of FPGA resources over time and space. The Internal Configuration Access Port (ICAP) enables DPR from within an FPGA chip, leading to the possibility of fully autonomous FPGA-based systems. This paper presents a novel high performance and fault-tolerant ICAP controller which can operate at a high speed and recover from emerging faults. Test results showed that our ICAP controller is 25 times faster than the Xilinx' XPS_HWICAP IP core. We demonstrate the use of Triple Modular Redundancy (TMR) in some of the ICAP controller components which have the ability to reconfigure the rest of the ICAP controller when faults are detected. This method is shown to have a 49% smaller area footprint compared to traditional full TMR.

...read moreread less

28 citations

Cites background from "Spatial avoidance of hardware fault..."

...In [8], [9] and [10], the authors demonstrated how current TMR techniques can be improved by reconfiguring faulty modules to keep the system free of faults....
[...]

Proceedings Article•DOI•

SEU Mitigation and Validation of the LEON3 Soft Processor Using Triple Modular Redundancy for Space Processing

[...]

Michael Wirthlin¹, Andrew M. Keller¹, Chase McCloskey¹, Parker Ridd¹, David S. Lee², Jeffrey Draper³ - Show less +2 more•Institutions (3)

Brigham Young University¹, Sandia National Laboratories², University of Southern California³

21 Feb 2016

TL;DR: This paper investigates the improvements in reliability of a LEON3 soft processor operating on a SRAM-based FPGA when using triple-modular redundancy and other processor-specific mitigation techniques and demonstrates an average improvement of 10×.

...read moreread less

Abstract: Processors are an essential component in most satellite payload electronics and handle a variety of functions including command handling and data processing. There is growing interest in implementing soft processors on commercial FPGAs within satellites. Commercial FPGAs offer reconfigurability, large logic density, and I/O bandwidth; however, they are sensitive to ionizing radiation and systems developed for space must implement single-event upset mitigation to operate reliably. This paper investigates the improvements in reliability of a LEON3 soft processor operating on a SRAM-based FPGA when using triple-modular redundancy and other processor-specific mitigation techniques. The improvements in reliability provided by these techniques are validated with both fault injection and heavy ion radiation tests. The fault injection experiments indicate an improvement of 51× and the radiation testing results demonstrate an average improvement of 10×. Orbit failure rate estimations were computed and suggest that the TMR LEON3 processor has a mean-time to failure of over 76 years in a geosynchronous orbit.

...read moreread less

26 citations

Cites methods from "Spatial avoidance of hardware fault..."

...A triple modular small Xilinx PicoBlaze soft processor was tested on the Xilinx Virtex 5 architecture using a tilebased approach and resynchronization through partial reconfiguration [12]....
[...]

Journal Article•DOI•

Increasing Radiation Tolerance of Field-Programmable- Gate-Array-Based Computers Through Redundancy and Environmental Awareness

[...]

Jennifer Hane, Brock J. LaMeres¹, Todd Kaiser¹, Raymond J. Weber¹, Buerkle Todd M² - Show less +1 more•Institutions (2)

Montana State University¹, Micron Technology²

11 Feb 2014-Journal of Aerospace Information Systems

TL;DR: This paper introduces a computer architecture for static random-access-memory-based field-programmable gate arrays that resists failures caused by ionizing radiation.

...read moreread less

Abstract: Radiation-tolerant computing is of great importance to the aerospace community because future missions demand more computational power. Of special interest to the aerospace community are flight computers implemented on static random-access-memory-based field-programmable gate arrays. Such computer systems allow the in-flight reconfiguration of hardware that enables the practical deployment of truly reconfigurable computers. However, commercial static random-access-memory-based field-programmable gate arrays are uniquely susceptible to ionizing radiation. This paper introduces a computer architecture for static random-access-memory-based field-programmable gate arrays that resists failures caused by ionizing radiation. The approach extends the widely accepted fault mitigation practice of triple modular redundancy and configuration memory scrubbing by adding spare circuitry and environmental awareness through an ionizing radiation sensor. This paper describes the design of the system in addition to a theore...

...read moreread less

13 citations

Cites methods from "Spatial avoidance of hardware fault..."

...Previous work atMontana State University has also employed TMR and scrubbing techniques [13]....
[...]

Proceedings Article•DOI•

FPGA-based reliable TMR controller design for S2A architectures

[...]

Hassan H. Halawa¹, Ramez M. Daoud¹, Hassanein H. Amer¹, Gehad I. Alkady¹, Ali AbdelKader¹ - Show less +1 more•Institutions (1)

American University in Cairo¹

26 Oct 2015

TL;DR: This paper proposes the use of Triple Modular Redundancy at the controller level, and calculates system reliability using Markov models to quantitatively show the advantage of the proposed technique in terms of extended lifetime.

...read moreread less

Abstract: Fault-tolerance is becoming an essential feature in the design of Networked Control Systems (NCSs). Furthermore, Sensor-to-Actuator (S2A) architectures have shown some advantages over conventional In-Loop architectures. This paper focuses on fault-tolerant controllers in the context of S2A systems. It proposes the use of Triple Modular Redundancy at the controller level. The fault-tolerant controller will be hosted in an FPGA that has a spare location. The voter in this TMR scheme is fault-secure to guarantee that the controllers never produce an undetected incorrect control action. Finally, system reliability is calculated using Markov models to quantitatively show, via case studies, the advantage of the proposed technique in terms of extended lifetime.

...read moreread less

13 citations

Cites background or methods from "Spatial avoidance of hardware fault..."

...Another technique was developed in [33] where an FPGA is divided into partially reconfigurable homogeneous tiles each of which contains a copy of the soft core processor....
[...]
...Also, one of the main differences between the proposed technique and those in [31, 33] is that the system does not stop operation during recovery which is essential in real-time systems such as the one targeted in this research....
[...]

Dissertation•

Fault Tolerant Cryptographic Primitives for Space Applications

[...]

Marcio Juliato

28 Apr 2011

TL;DR: It was possible to show that the proposed fault tolerant scheme based on information redundancy leads to a better implementation and provides better SEU resistance than the traditional Triple Modular Redundancy (TMR).

...read moreread less

Abstract: Spacecrafts are extensively used by public and private sectors to support a variety of services. Considering the cost and the strategic importance of these spacecrafts, there has been an increasing demand to utilize strong cryptographic primitives to assure their security. Moreover, it is of utmost importance to consider fault tolerance in their designs due to the harsh environment found in space, while keeping low area and power consumption. The problem of recovering spacecrafts from failures or attacks, and bringing them back to an operational and safe state is crucial for reliability. Despite the recent interest in incorporating on-board security, there is limited research in this area. This research proposes a trusted hardware module approach for recovering the spacecrafts subsystems and their cryptographic capabilities after an attack or a major failure has happened. The proposed fault tolerant trusted modules are capable of performing platform restoration as well as recovering the cryptographic capabilities of the spacecraft. This research also proposes efficient fault tolerant architectures for the secure hash (SHA-2) and message authentication code (HMAC) algorithms. The proposed architectures are the first in the literature to detect and correct errors by using Hamming codes to protect the main registers. Furthermore, a quantitative analysis of the probability of failure of the proposed fault tolerance mechanisms is introduced. Based upon an extensive set of experimental results along with probability of failure analysis, it was possible to show that the proposed fault tolerant scheme based on information redundancy leads to a better implementation and provides better SEU resistance than the traditional Triple Modular Redundancy (TMR). The fault tolerant cryptographic primitives introduced in this research are of crucial importance for the implementation of on-board security in spacecrafts.

...read moreread less

7 citations

Cites methods from "Spatial avoidance of hardware fault..."

...An approach to optimize the reconfiguration of the FPGA is proposed in [100]....
[...]

1
2
3
4
…

References

PDF

Open Access

More filters

Book•

Handbook of Radiation Effects

[...]

Andrew Holmes-Siedle¹, Len Adams²•Institutions (2)

Brunel University London¹, European Space Research and Technology Centre²

29 Jul 1993

TL;DR: In this paper, the response of materials and devices to radiation in space radiation environments is investigated, including metal-oxide semiconductor (MOS) devices, Diodes, solar cells, and opto-electronics.

...read moreread less

Abstract: Introduction 1: Radiation environments 2: The response of materials and devices to radiation 3: Metal-oxide semiconductor (MOS) devices 4: Discrete bipolar transistors 5: Diodes, solar cells, and opto-electronics 6: Power devices 7: Optical media 8: Other components 9: Polymers and other organics 10: The interaction of space radiation with shielding materials 11: Computer methods for particle transport 12: Radiation testing 13: Radiation-hardening of parts 14: Equipment hardening and hardness assurance 15: Conclusions Appendices A Useful general and geophysical data B Useful radiation data C Useful data on materials D Radiation response data for electronic components E Depth-dose curves for representative satellite orbits F Degradation in polymers Index

...read moreread less

685 citations

"Spatial avoidance of hardware fault..." refers background in this paper

...When cosmic particles (typically heavy ions and protons) strike integrated circuits (IC), fault conditions called Single Event Effects (SEE) can occur [1]....
[...]
...This phenomenon permanently degrades the transistor and can result in threshold shifts, increased device leakage, timing changes, and ultimately functional failure of the device [1]....
[...]

Book•

Radiation Effects in Advanced Semiconductor Materials and Devices

[...]

Cor Claeys, Eddy Simoen

21 Aug 2002

TL;DR: In this paper, the basic radiation damage mechanism in Semiconductor Materials and Devices and Displacement Damage in Group IV and Group III Semiconductors are discussed. And GaAs Based Field Effect Transistors for Radiation-Hard Applications.

...read moreread less

Abstract: Radiation Environments and Component Selection Strategy.- Basic Radiation Damage Mechanisms in Semiconductor Materials and Devices.- Displacement Damage in Group IV Semiconductor Materials.- Radiation Damage in GaAs.- Space Radiation Aspects of Silicon Bipolar Technologies.- Radiation Damage in Silicon MOS Devices.- GaAs Based Field Effect Transistors for Radiation-Hard Applications.- Opto-Electronic Components for Space.- Advanced Semiconductor Materials and Devices - Outlook.

...read moreread less

375 citations

"Spatial avoidance of hardware fault..." refers background in this paper

...When the magnitude of the SET is large enough to cause a logic transition on a receiving gate, logic failures in the circuit can exist [2-3]....
[...]

Book•

A Primer on Architectural Level Fault Tolerance

[...]

Ricky W. Butler¹•Institutions (1)

Langley Research Center¹

22 Jul 2013

TL;DR: This paper introduces the fundamental concepts of fault tolerant computing and key topics covered are voting, fault detection, clock synchronization, Byzantine Agreement, diagnosis, and reliability analysis.

...read moreread less

Abstract: This paper introduces the fundamental concepts of fault tolerant computing. Key topics covered are voting, fault detection, clock synchronization, Byzantine Agreement, diagnosis, and reliability analysis. Low level mechanisms such as Hamming codes or low level communications protocols are not covered. The paper is tutorial in nature and does not cover any topic in detail. The focus is on rationale and approach rather than detailed exposition.

...read moreread less

66 citations

"Spatial avoidance of hardware fault..." refers background or methods in this paper

...Watchdog timers independently observe the operation of a system and initiate a reset when the system becomes idle for too long [8]....
[...]
...For more complex systems, TMR can be used in conjunction with a recovery sequence which can reset and reinitialize the system when a fault is detected [8]....
[...]

Proceedings Article•DOI•

Hardware/software interface for high-performance space computing with FPGA coprocessors

[...]

J. Greco¹, Grzegorz Cieslewski¹, Adam Jacobs¹, Ian A. Troxel¹, Alan D. George¹ - Show less +1 more•Institutions (1)

University of Florida¹

04 Mar 2006

TL;DR: A framework that allows Earth and space scientists to use FPGA resources through an abstraction layer is explored, and a synthetic aperture radar application is used to demonstrate the power of the system architecture.

...read moreread less

Abstract: Complex real-time signal and image processing applications require low-latency and high-performance hardware to achieve optimal performance. Building such a high-performance platform for space deployment is hampered by hostile environmental conditions and power constraints. Custom space-based FPGA coprocessors help alleviate these constraints, but their use is typically restricted by the need for TMR or radiation-hardened components. This paper explores a framework that allows Earth and space scientists to use FPGA resources through an abstraction layer. A synthetic aperture radar application is used to demonstrate the power of the system architecture. The performance of the application is shown to achieve a speedup of 19 when compared to a software solution and is able to maintain comparable data reliability. Projected speedups, for the same case study executing on the proposed flight system architecture, are several times better and also discussed. This work supports the Dependable Multiprocessor project at Honeywell and the University of Florida, a mission for the Space Technology 8 (ST-8) satellite of NASA's New Millennium Program.

...read moreread less

42 citations

Radiation Hardened Electronics for Space Environments (RHESE) Project Overview

[...]

Andrew S. Keys, James H. Adams, John D. Cressler, Marshall C. Patrick, Michael A. Johnson, Ronald C. Darty - Show less +2 more

25 Jun 2008

14 citations