scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

SEU Mitigation and Validation of the LEON3 Soft Processor Using Triple Modular Redundancy for Space Processing

TL;DR: This paper investigates the improvements in reliability of a LEON3 soft processor operating on a SRAM-based FPGA when using triple-modular redundancy and other processor-specific mitigation techniques and demonstrates an average improvement of 10×.
Abstract: Processors are an essential component in most satellite payload electronics and handle a variety of functions including command handling and data processing. There is growing interest in implementing soft processors on commercial FPGAs within satellites. Commercial FPGAs offer reconfigurability, large logic density, and I/O bandwidth; however, they are sensitive to ionizing radiation and systems developed for space must implement single-event upset mitigation to operate reliably. This paper investigates the improvements in reliability of a LEON3 soft processor operating on a SRAM-based FPGA when using triple-modular redundancy and other processor-specific mitigation techniques. The improvements in reliability provided by these techniques are validated with both fault injection and heavy ion radiation tests. The fault injection experiments indicate an improvement of 51× and the radiation testing results demonstrate an average improvement of 10×. Orbit failure rate estimations were computed and suggest that the TMR LEON3 processor has a mean-time to failure of over 76 years in a geosynchronous orbit.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, a variety of SEU mitigation and repair techniques are applied to the LEON3 soft-core processor to study the effects and complementary nature of each technique, including triple modular redundancy (TMR), configuration memory (CRAM), and internal block memory (BRAM) scrubbing.
Abstract: A variety of mitigation techniques have been demonstrated to reduce the sensitivity of FPGA designs to soft errors. Without mitigation, SEUs can cause failure by altering the logic, routing, and state of a design operating on an SRAM-based FPGA. Various combinations of SEU mitigation and repair techniques are applied to the LEON3 soft-core processor to study the effects and complementary nature of each technique. This work focuses on Triple modular redundancy (TMR), configuration memory (CRAM) scrubbing, and internal block memory (BRAM) scrubbing. All mitigation methods demonstrate some improvement in both fault injection and neutron radiation testing. Results in this paper show complementary SEU mitigation techniques working together to improve fault-tolerance. The results also suggest that fault injection can be a good way to estimate the cross section of a design before going to a radiation test. TMR with CRAM scrubbing demonstrates a $27\times $ improvement whereas TMR with both CRAM and BRAM scrubbing demonstrates approximately a $50\times $ improvement.

36 citations


Cites background or methods or result from "SEU Mitigation and Validation of th..."

  • ...This paper uses the same configuration of the LEON3 system as [5]....

    [...]

  • ...A previous experiment [5] applied triple modular redundancy (TMR), internal block memory (BRAM) scrubbing and configuration memory (CRAM) scrubbing to the LEON3 softcore processor to improve its fault-tolerance....

    [...]

  • ...The results during radiation testing, summarized in Table III, are an improvement over the results obtained during a similar experiment [5]....

    [...]

  • ...The experiment in this paper tested more variations of SEU mitigation techniques than [5], which only compares the unmitigated design and the fully mitigated design (i....

    [...]

  • ...This logic is also protected by TMR [5], [12]....

    [...]

Journal ArticleDOI
TL;DR: Three strategies to mitigate against single-event upsets within the configuration memory of static random access memory field-programmable gate arrays are presented: incremental routing, incremental placement, and striping.
Abstract: Triple modular redundancy (TMR) with repair has proven to be an effective strategy for mitigating the effects of single-event upsets within the configuration memory of static random access memory field-programmable gate arrays. Applying TMR to the design successfully reduces the design’s neutron cross section by $80\times $ . The effectiveness of TMR, however, is limited by the presence of single bits in the configuration memory which cause more than one TMR domain to fail simultaneously. We present three strategies to mitigate against these failures and improve the effectiveness of TMR: incremental routing, incremental placement, and striping. These techniques were tested using both fault injection and a wide spectrum neutron beam with the best technique offering a $400\times $ reduction to the design’s sensitive neutron cross section. An analysis from the radiation test shows that no single bits caused failure and that multicell upsets were the main cause of failure for these mitigation strategies.

34 citations


Cites background or methods from "SEU Mitigation and Validation of th..."

  • ...However, recent experiments have measured the improvement to be on the order of 10–100× [6], [17]....

    [...]

  • ...The failure rate was calculated for an unmitigated LEON3 processor in GEO orbit implemented on a Xilinx 7Series device (see Table V in [17])....

    [...]

Proceedings ArticleDOI
01 Jul 2019
TL;DR: The TMR RISC-V processor showed a 33× reduction in the neutron cross section and a 27% decrease in operational frequency, resulting in a 24× improvement of the mean work to failure with a cost of around 5.6× resource utilization.
Abstract: Many space applications are considering the use of commercial SRAM-based FPGAs over radiation hardened devices. When using SRAM-based FPGAs, soft processors may be required to fulfill application requirements, but the FPGA designs must overcome radiation-induced soft errors to provide a reliable system. TMR is one solution in designing a fault tolerant soft processor to mitigate the failures caused by SEUs. This paper compares the neutron soft-error reliability of an unmitigated and TMR version of a Taiga RISC-V soft processor on a Xilinx SRAM-based FPGA. The TMR RISC-V processor showed a 33× reduction in the neutron cross section and a 27% decrease in operational frequency, resulting in a 24× improvement of the mean work to failure with a cost of around 5.6× resource utilization.

29 citations


Cites background from "SEU Mitigation and Validation of th..."

  • ...In many of the efforts to produce a fault tolerant soft processor, the LEON2 and LEON3 processors were targeted and modified to provide improved reliability [14]–[16]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a radiation-hardened nonvolatile lookup table (LUT) circuit utilizing spin Hall effect (SHE)-magnetic random access memory (MRAM) devices is proposed.
Abstract: In this paper, we have developed a radiation-hardened non-volatile lookup table (LUT) circuit utilizing spin Hall effect (SHE)-magnetic random access memory (MRAM) devices. The design is motivated by modeling the effect of radiation particles striking hybrid complementary metal oxide semiconductor/spin based circuits, and the resistive behavior of SHE-MRAM devices via established and precise physics equations. The models developed are leveraged in the SPICE circuit simulator to verify the functionality of the proposed design. The proposed hardening technique is based on using feedback transistors, as well as increasing the radiation capacity of the sensitive nodes. Simulation results show that our proposed LUT circuit can achieve multiple node upset (MNU) tolerance with more than 38% and 60% power-delay product improvement as well as 26% and 50% reduction in device count compared to the previous energy-efficient radiation-hardened LUT designs. Finally, we have performed a process variation analysis showing that the MNU immunity of our proposed circuit is realized at the cost of increased susceptibility to transistor and MRAM variations compared to an unprotected LUT design.

19 citations


Cites background from "SEU Mitigation and Validation of th..."

  • ...Introduction and previous work Radiation-induced soft errors in nanometer-scale electronic circuits are of increasing concern in missioncritical space-based [1], high altitude [2], and terrestrial applications [3]....

    [...]

Proceedings ArticleDOI
01 Sep 2016
TL;DR: A brief survey of fault-tolerance methods and their suitability to cross-layer design of real-time systems is provided.
Abstract: Continued transistor scaling and increasing power density has resulted in considerable increase in fault rates of nano-technology systems Cross-layer fault tolerance techniques present a more cost-efficient methodology for adapting to such increased fault rates as opposed to fixing everything at the hardware layer The effectiveness (Coverage, Fault-Masking and Recovery) and overheads (Execution time, Energy and Cost) of each fault tolerance technique varies with the layer and frequency at which it is applied The choice of appropriate fault-aware design should also account for the application specific design goals and constraints of real-time systems To this end, we provide a brief survey of fault-tolerance methods and discuss their suitability to cross-layer design We also provide a few case studies that motivate the need for effective design space exploration (DSE) for cross-layer fault-aware design of real-time systems and discuss a few factors that have a major impact on such DSE

13 citations

References
More filters
Journal ArticleDOI
TL;DR: The Cosmic Ray on Micro-Electronics (CREME) as mentioned in this paper is a suite of programs for creating numerical models of the ionizing-radiation environment in near-Earth orbits and for evaluating radiation effects in spacecraft.
Abstract: CREME96 is an update of the Cosmic Ray on Micro-Electronics code, a widely-used suite of programs for creating numerical models of the ionizing-radiation environment in near-Earth orbits and for evaluating radiation effects in spacecraft. CREME96, which is now available over the World-Wide Web (WWW) at http://crsp3.nrl.navy.mil/creme96/, has many significant features, including: (1) improved models of the galactic cosmic ray, anomalous cosmic ray, and solar energetic particle ("flare") components of the near-Earth environment; (2) improved geomagnetic transmission calculations; (3) improved nuclear transport routines; (4) improved single-event upset (SEU) calculation techniques, for both proton-induced and direct-ionization-induced SEUs; and (5) an easy-to-use graphical interface, with extensive on-line tutorial information. In this paper we document some of these improvements.

605 citations


"SEU Mitigation and Validation of th..." refers methods in this paper

  • ...Once cross-section curve estimates are created, the orbit error rate is estimated using a tool called CREME-96 [31]....

    [...]

Proceedings ArticleDOI
07 Mar 2005
TL;DR: The experimental results presented in this paper demonstrate that the number and placement of voters in the TMR design can directly affect the fault tolerance, ranging from 4.03% to 0.98% the number of upsets in the routing able to cause an error in theTMR circuit.
Abstract: Triple modular redundancy (TMR) is a suitable fault tolerant technique for SRAM-based FPGA However, one of the main challenges in achieving 100% robustness in designs protected by TMR running on programmable platforms is to prevent upsets in the routing from provoking undesirable connections between signals from distinct redundant logic parts, which can generate an error in the output This paper investigates the optimal design of the TMR logic (eg, by cleverly inserting voters) to ensure robustness Four different versions of a TMR digital filter were analyzed by fault injection Faults were randomly inserted straight into the bitstream of the FPGA The experimental results presented in this paper demonstrate that the number and placement of voters in the TMR design can directly affect the fault tolerance, ranging from 403% to 098% the number of upsets in the routing able to cause an error in the TMR circuit

243 citations


"SEU Mitigation and Validation of th..." refers methods in this paper

  • ...One of the most common ways of applying structural mitigation is using triplemodular redundancy or TMR [3]....

    [...]

BookDOI
01 Oct 1985
TL;DR: The terms fault, error and failure are carefully defined and distinguished in the hope that an agreed terminology will emerge in the fault tolerance community.
Abstract: At present, the fault tolerance community is hampered by using a set of conflicting terms to refer to closely related fault tolerance concepts. This paper presents informal, but precise, definitions and terminology for these concepts. In particular, the terms fault, error and failure are carefully defined and distinguished. The aim is to promote discussion in the hope that an agreed terminology will emerge.

211 citations

Proceedings ArticleDOI
F. C. Lima1, C. Carmichael1, J. Fabula1, R. Padovani1, Ricardo Reis 
10 Sep 2001
TL;DR: In this paper, the authors present the meaningful results of a single bit upset fault injection analysis performed in Virtex FPGA triple modular redundancy (TMR) design, each programmable bit upset able to cause an error in the TMR design has been investigated.
Abstract: This paper presents the meaningful results of a single bit upset fault injection analysis performed in Virtex FPGA triple modular redundancy (TMR) design. Each programmable bit upset able to cause an error in the TMR design has been investigated. Final conclusion using the TMR "golden" comparison method shows that "no errors" were reported by Virtex TMR design implementation in the presence of single bit upsets in the customization logic. The proton radiation ground test has confirmed the results achieved by fault injection.

130 citations


"SEU Mitigation and Validation of th..." refers methods in this paper

  • ...A useful way of learning more about the SEU sensitivity of an FPGA design and to understand the benefits of a mitigation technique is to apply artificial fault injection within the configuration memory [29]....

    [...]

Proceedings ArticleDOI
01 Dec 2006
TL;DR: This paper presents a survey of soft-core processors that are used in embedded systems, and several soft- core processors available from commercial vendors and open-source communities are reviewed and compared based on major architectural features.
Abstract: A soft-core processor is a hardware description language (HDL) model of a specific processor (CPU) that can be customized for a given application and synthesized for an ASIC or FPGA target. In many applications, soft-core processors provide several advantages over custom designed processors such as reduced cost, flexibility, platform independence and greater immunity to obsolescence. Embedded systems are hardware and software components working together to perform a specific function. Usually they contain embedded processors that are often in the form of soft-core processors that execute software code. This paper presents a survey of soft-core processors that are used in embedded systems. Several soft-core processors available from commercial vendors and open-source communities are reviewed and compared based on major architectural features. In addition, several real world examples of embedded systems that employ soft-core processors are summarized. As the complexity of embedded systems continues to increase, it is expected that the usage of customizable soft-core processors will become more widespread.

122 citations


"SEU Mitigation and Validation of th..." refers background in this paper

  • ...A soft processor can be an attractive alternative to a rad-hard processor by providing processor-specific customization, the ability to add custom reliability techniques, and the ability to provide customized FPGA logic [1]....

    [...]