scispace - formally typeset
Search or ask a question
Author

Andrew M. Keller

Bio: Andrew M. Keller is an academic researcher from Brigham Young University. The author has contributed to research in topics: Triple modular redundancy & Fault injection. The author has an hindex of 5, co-authored 14 publications receiving 116 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, a variety of SEU mitigation and repair techniques are applied to the LEON3 soft-core processor to study the effects and complementary nature of each technique, including triple modular redundancy (TMR), configuration memory (CRAM), and internal block memory (BRAM) scrubbing.
Abstract: A variety of mitigation techniques have been demonstrated to reduce the sensitivity of FPGA designs to soft errors. Without mitigation, SEUs can cause failure by altering the logic, routing, and state of a design operating on an SRAM-based FPGA. Various combinations of SEU mitigation and repair techniques are applied to the LEON3 soft-core processor to study the effects and complementary nature of each technique. This work focuses on Triple modular redundancy (TMR), configuration memory (CRAM) scrubbing, and internal block memory (BRAM) scrubbing. All mitigation methods demonstrate some improvement in both fault injection and neutron radiation testing. Results in this paper show complementary SEU mitigation techniques working together to improve fault-tolerance. The results also suggest that fault injection can be a good way to estimate the cross section of a design before going to a radiation test. TMR with CRAM scrubbing demonstrates a $27\times $ improvement whereas TMR with both CRAM and BRAM scrubbing demonstrates approximately a $50\times $ improvement.

36 citations

Journal ArticleDOI
TL;DR: Three strategies to mitigate against single-event upsets within the configuration memory of static random access memory field-programmable gate arrays are presented: incremental routing, incremental placement, and striping.
Abstract: Triple modular redundancy (TMR) with repair has proven to be an effective strategy for mitigating the effects of single-event upsets within the configuration memory of static random access memory field-programmable gate arrays. Applying TMR to the design successfully reduces the design’s neutron cross section by $80\times $ . The effectiveness of TMR, however, is limited by the presence of single bits in the configuration memory which cause more than one TMR domain to fail simultaneously. We present three strategies to mitigate against these failures and improve the effectiveness of TMR: incremental routing, incremental placement, and striping. These techniques were tested using both fault injection and a wide spectrum neutron beam with the best technique offering a $400\times $ reduction to the design’s sensitive neutron cross section. An analysis from the radiation test shows that no single bits caused failure and that multicell upsets were the main cause of failure for these mitigation strategies.

34 citations

Proceedings ArticleDOI
01 Apr 2018
TL;DR: How common mode failures are introduced during the implementation process is described and an approach for resolving them through a custom incremental placement tool for Xilinx 7-Series FPGAs is introduced.
Abstract: TMR combined with configuration scrubbing is an effective technique to mitigate against radiation-induced CRAM upsets on SRAM-based FPGAs. However, its effectiveness is limited by low-level common mode failures due to the physical mapping of a design to the FPGA device. This paper describes how common mode failures are introduced during the implementation process and introduces an approach for resolving them through a custom incremental placement tool for Xilinx 7-Series FPGAs. Multiple designs across multiple generations of devices are shown to be sensitive to common mode failures. Applying the incremental placement technique yields an improvement of 10,721x over an unmitigated design through fault-injection testing. Radiation testing is then performed to show that the MTTF of this technique is 91,500 days in GEO orbit, a 367x improvement over the unmitigated design and a 5x improvement over baseline TMR.

29 citations

Proceedings ArticleDOI
21 Feb 2016
TL;DR: This paper investigates the improvements in reliability of a LEON3 soft processor operating on a SRAM-based FPGA when using triple-modular redundancy and other processor-specific mitigation techniques and demonstrates an average improvement of 10×.
Abstract: Processors are an essential component in most satellite payload electronics and handle a variety of functions including command handling and data processing. There is growing interest in implementing soft processors on commercial FPGAs within satellites. Commercial FPGAs offer reconfigurability, large logic density, and I/O bandwidth; however, they are sensitive to ionizing radiation and systems developed for space must implement single-event upset mitigation to operate reliably. This paper investigates the improvements in reliability of a LEON3 soft processor operating on a SRAM-based FPGA when using triple-modular redundancy and other processor-specific mitigation techniques. The improvements in reliability provided by these techniques are validated with both fault injection and heavy ion radiation tests. The fault injection experiments indicate an improvement of 51× and the radiation testing results demonstrate an average improvement of 10×. Orbit failure rate estimations were computed and suggest that the TMR LEON3 processor has a mean-time to failure of over 76 years in a geosynchronous orbit.

26 citations

Journal ArticleDOI
TL;DR: Two field-programmable gate array designs are tested for dynamic single event upset (SEU) sensitivity on two different 28-nm static random access memory-based FPGAs—an Intel Stratix V and a Xilinx Kintex 7 FPGA.
Abstract: Two field-programmable gate array (FPGA) designs are tested for dynamic single event upset (SEU) sensitivity on two different 28-nm static random access memory-based FPGAs—an Intel Stratix V and a Xilinx Kintex 7 FPGA. These designs were tested in both a conventional unmitigated version and a version to tolerate SEUs with feedback triple modular redundancy (TMR). The unmitigated design sensitivity and the low-level device sensitivity were found to be similar between the devices through neutron radiation testing. Results also show that feedback TMR and configuration scrubbing benefit both designs on both FPGAs. While TMR is helpful, the benefit of TMR depends on the design structure and the device architecture. TMR and scrubbing reduced dynamic SEU sensitivity by a factor of 4– $54\times $ .

24 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, a variety of SEU mitigation and repair techniques are applied to the LEON3 soft-core processor to study the effects and complementary nature of each technique, including triple modular redundancy (TMR), configuration memory (CRAM), and internal block memory (BRAM) scrubbing.
Abstract: A variety of mitigation techniques have been demonstrated to reduce the sensitivity of FPGA designs to soft errors. Without mitigation, SEUs can cause failure by altering the logic, routing, and state of a design operating on an SRAM-based FPGA. Various combinations of SEU mitigation and repair techniques are applied to the LEON3 soft-core processor to study the effects and complementary nature of each technique. This work focuses on Triple modular redundancy (TMR), configuration memory (CRAM) scrubbing, and internal block memory (BRAM) scrubbing. All mitigation methods demonstrate some improvement in both fault injection and neutron radiation testing. Results in this paper show complementary SEU mitigation techniques working together to improve fault-tolerance. The results also suggest that fault injection can be a good way to estimate the cross section of a design before going to a radiation test. TMR with CRAM scrubbing demonstrates a $27\times $ improvement whereas TMR with both CRAM and BRAM scrubbing demonstrates approximately a $50\times $ improvement.

36 citations

Journal ArticleDOI
TL;DR: Three strategies to mitigate against single-event upsets within the configuration memory of static random access memory field-programmable gate arrays are presented: incremental routing, incremental placement, and striping.
Abstract: Triple modular redundancy (TMR) with repair has proven to be an effective strategy for mitigating the effects of single-event upsets within the configuration memory of static random access memory field-programmable gate arrays. Applying TMR to the design successfully reduces the design’s neutron cross section by $80\times $ . The effectiveness of TMR, however, is limited by the presence of single bits in the configuration memory which cause more than one TMR domain to fail simultaneously. We present three strategies to mitigate against these failures and improve the effectiveness of TMR: incremental routing, incremental placement, and striping. These techniques were tested using both fault injection and a wide spectrum neutron beam with the best technique offering a $400\times $ reduction to the design’s sensitive neutron cross section. An analysis from the radiation test shows that no single bits caused failure and that multicell upsets were the main cause of failure for these mitigation strategies.

34 citations

Proceedings ArticleDOI
01 Apr 2018
TL;DR: How common mode failures are introduced during the implementation process is described and an approach for resolving them through a custom incremental placement tool for Xilinx 7-Series FPGAs is introduced.
Abstract: TMR combined with configuration scrubbing is an effective technique to mitigate against radiation-induced CRAM upsets on SRAM-based FPGAs. However, its effectiveness is limited by low-level common mode failures due to the physical mapping of a design to the FPGA device. This paper describes how common mode failures are introduced during the implementation process and introduces an approach for resolving them through a custom incremental placement tool for Xilinx 7-Series FPGAs. Multiple designs across multiple generations of devices are shown to be sensitive to common mode failures. Applying the incremental placement technique yields an improvement of 10,721x over an unmitigated design through fault-injection testing. Radiation testing is then performed to show that the MTTF of this technique is 91,500 days in GEO orbit, a 367x improvement over the unmitigated design and a 5x improvement over baseline TMR.

29 citations

Proceedings ArticleDOI
01 Jul 2019
TL;DR: The TMR RISC-V processor showed a 33× reduction in the neutron cross section and a 27% decrease in operational frequency, resulting in a 24× improvement of the mean work to failure with a cost of around 5.6× resource utilization.
Abstract: Many space applications are considering the use of commercial SRAM-based FPGAs over radiation hardened devices. When using SRAM-based FPGAs, soft processors may be required to fulfill application requirements, but the FPGA designs must overcome radiation-induced soft errors to provide a reliable system. TMR is one solution in designing a fault tolerant soft processor to mitigate the failures caused by SEUs. This paper compares the neutron soft-error reliability of an unmitigated and TMR version of a Taiga RISC-V soft processor on a Xilinx SRAM-based FPGA. The TMR RISC-V processor showed a 33× reduction in the neutron cross section and a 27% decrease in operational frequency, resulting in a 24× improvement of the mean work to failure with a cost of around 5.6× resource utilization.

29 citations

Journal ArticleDOI
TL;DR: The radiation test data on Xilinx Kintex-7 SRAM-based FPGA using ultrahigh energy heavy-ion test beam for the first time available to third-party radiation test in CERN is presented.
Abstract: In recent years, field-programmable gate array (FPGA) devices have attracted a lot of attentions due to the increasing performance they provide thanks to technology scaling, besides their high flexibility through in-field reprogramming and/or partial reconfiguration capability. However, when such devices are to be deployed in safety- and mission-critical applications such as avionic and space applications, it is mandatory to verify the reliability of the device in the target environment where radiation effect is considered as one of the major sources of faults in the system. For static random access memory (SRAM)-based FPGA devices, the SRAM cells holding the configuration data for the circuit implemented on the devices are highly susceptible against single-event upset (SEU) induced by charged particle striking the device and one single SEU in the configuration memory may corrupt the implemented circuit design causing system misbehavior. In this paper, we present the radiation test data on Xilinx Kintex-7 SRAM-based FPGA using ultrahigh energy heavy-ion test beam for the first time available to third-party radiation test in CERN.

26 citations