A Fault Tolerant Design Methodology for a FPGA-based Softcore Processor

doi:10.3182/20120403-3-DE-3010.00005

Home
/
Papers
/
A Fault Tolerant Design Methodology for a FPGA-based Softcore Processor

Journal Article•DOI•

A Fault Tolerant Design Methodology for a FPGA-based Softcore Processor

Paulo A. Garcia¹, Tiago Gomes¹, F. Salgado¹, Jorge Cabral¹, Paulo Cardoso¹, Mongkol Ekpanyapong², Adriano Tavares¹ - Show less +3 more•Institutions (2)

University of Minho¹, Asian Institute of Technology²

01 Jan 2012-IFAC Proceedings Volumes (Elsevier)-Vol. 45, Iss: 4, pp 145-150

TL;DR: The architecture of a Fault Tolerant softcore processor using triplication of all units as well as using a parity protection scheme for on-chip caches is described, presenting the impact on area, clock frequency and I/O requirements of both implementations, targeting FPGAs.

read less

About: This article is published in IFAC Proceedings Volumes.The article was published on 2012-01-01. It has received 7 citations till now. The article focuses on the topics: General protection fault & Stuck-at fault.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Towards a Heterogeneous Fault-Tolerance Architecture based on Arm and RISC-V Processors

[...]

Cristiano Rodrigues¹, Ivo Marques¹, Sandro Pinto¹, Tiago Gomes¹, Adriano Tavares¹ - Show less +1 more•Institutions (1)

University of Minho¹

01 Oct 2019

TL;DR: Lock-V, a heterogeneous architecture that explores a Dual-Core Lockstep (DCLS) fault-tolerance technique in two different processing units: a hard-core Arm Cortex-A9 and a soft-core RISC-V-based processor is proposed.

...read moreread less

Abstract: Computer systems are permanently present in our daily basis in a wide range of applications. In systems with mixed-criticality requirements, e.g., autonomous driving or aerospace applications, devices are expected to continue operating properly even in the event of a failure. An approach to improve the robustness of the device's operation lies in enabling fault-tolerant mechanisms during the system's design. This article proposes Lock-V, a heterogeneous architecture that explores a Dual-Core Lockstep (DCLS) fault-tolerance technique in two different processing units: a hard-core Arm Cortex-A9 and a soft-core RISC-V-based processor. It resorts a System-on-Chip (SoC) solution with software programmability (available trough the hard-core Arm Cortex-A9) and field-programmable gate array (FPGA) technology, taking advantages from the latter to support the deployment of the RISC-V soft-core along with dedicated hardware accelerators towards the realization of the DCLS.

...read moreread less

11 citations

Cites background or methods from "A Fault Tolerant Design Methodology..."

...While some techniques replicate processing units in a technique called dual-core lockstep (DCLS) -implemented either loosely- or tightly-coupled to the processor- [4,7]–[11], others apply a triple modular redundancy (TMR) mechanism, where the processing units are triplicated and a voter module is added to the system [12]....
[...]
...module and a voter must be added to the system [12]....
[...]

Journal Article•DOI•

Lock-V: A heterogeneous fault tolerance architecture based on Arm and RISC-V

[...]

Ivo Marques¹, Cristiano Rodrigues¹, Adriano Tavares¹, Sandro Pinto¹, Tiago Gomes¹ - Show less +1 more•Institutions (1)

University of Minho¹

01 May 2021-Microelectronics Reliability

TL;DR: In this paper, a heterogeneous fault tolerance architecture that explores a dual-core lockstep (DCLS) technique to mitigate single event upset (SEU) and common-mode failure (CMF) problems is presented.

...read moreread less

8 citations

Dynamically Reconfigurable Fault-Tolerant Design of r-VEX Softcore Processor

[...]

Muhammad Usman Saleem

01 Jan 2018

TL;DR: A dynamically reconfigurable fault-tolerant mode is presented in r-VEX, a softcore processor, so that it could be used as an attractive alternative to expensive radiation-hardened processors for space-based applications.

...read moreread less

Abstract: Over the past many years, technology scaling has resulted in a continuous reduction of lateral and vertical dimensions of transistors. The technology scaling, on the one hand, has led to a commensurate performance gain for very-large-scale integration (VLSI) circuits, but on the other hand, has also made such circuits more vulnerable to ionizing radiations which can cause single event effects(SEEs). These SEEs may cause the underlying user circuitry to deviate from its normal behavior. Devices that are destined for space missions need special protection for such kind of anomalies as space environment is filled with massive amount of high energy particles and ionizing radiations. In this thesis, the design, implementation, and verification of a fault-tolerant r-VEX, a softcore processor, is presented, so that it could be used as an attractive alternative to expensive radiation-hardened processors for space-based applications. r-VEX is a VLIW based, dynamically reconfigurable processor. Keeping in line with its inherent attribute, a dynamically reconfigurable fault-tolerant mode is presented in this work, which provides the running application an option to activate and deactivate the fault-tolerant mode multiple times. In this mode, for the protection of processor pipeline, a non-traditional TMR approach that requires 3 lanegroups running in 2-way mode is implemented. For the reliability of user memories, Hamming codes are implemented as an ECC coding scheme. The functionally of our fault-tolerant design is verified by using both a simulation-based platform (ModelSim) and an on-board FPGA platform (ML605 development kit). To measure the fault-tolerant capabilities of the r-VEX core, saboteurs are used to artificially inject faults at various predefined locations in the core. The obtained results have shown that our design can mitigate all injected single faults in the pipeline and double faults in the caches, without triggering any failure. The dynamically configurable fault-tolerant feature is obtained at the cost of about 30% additional resource utilization and 20% reduction in the maximum operating frequency.

...read moreread less

5 citations

Proceedings Article•DOI•

Reliability Evaluation of the Count Min Sketch (CMS) against Single Event Transients (SETs)

[...]

Jinhua Zhu¹, Zhen Gao¹, Jie Jin¹, Pedro Reviriego²•Institutions (2)

Tianjin University¹, Charles III University of Madrid²

25 Apr 2021

TL;DR: In this paper, the effect of SETs on the hash functions of the count min sketch (CMS) frequency estimate is analyzed theoretically in terms of overestimation probability, underestimation probability and the equal probability, and further discussed for data with different frequency.

...read moreread less

Abstract: Estimating the frequency of the elements in a data set is commonly needed in data analysis. With the increase of the size of the data sets, accurately computing the number of times that each element appears with a counter becomes impractical. Instead, the Count Min Sketch (CMS) is widely used in big data processing to estimate frequency due to its simplicity and small storage needs. However, soft errors caused by Single Event Transients (SETs) will affect the hardware implementation of the CMS, mainly the hash functions. In this paper, the effect of SETs on the hash functions of the CMS frequency estimate is analyzed theoretically in terms of overestimation probability, underestimation probability, and the equal probability, and further discussed for data with different frequency. Simulation results verify the correctness of the theoretical analysis and reveal several valuable conclusions. First, a large portion of SETs can be tolerated by the CMS itself, and the reliability of the CMS improves when larger number of arrays are used. Second, the average probability for overestimation and underestimation are almost the same, and decrease for larger numbers of arrays. Third, SETs are more likely to cause underestimation for the most frequent data elements. Finally, the overall effect of SETs on the CMS is slightly affected by the number of counters in each array, and seems to be independent of the distribution of the input sequence. The results and analysis presented in this paper provide a starting point for the design of efficient SET fault-tolerant schemes for the CMS.

...read moreread less

1 citations

Proceedings Article•DOI•

RAPTOR-Design: Refactorable Architecture Processor to Optimize Recurrent Design

[...]

Paulo A. Garcia¹, Tiago Gomes¹, F. Salgado¹, Jorge Cabral¹, João L. Monteiro¹, Adriano Tavares¹ - Show less +2 more•Institutions (1)

University of Minho¹

05 Nov 2012

TL;DR: RAPTOR-Design is presented, a framework for System-on-Chip (SoC) design which incorporates a customizable processor architecture and allows rapid software-to-hardware migration, custom hardware integration in a tightly-coupled fashion and seamless Fault Tolerance (FT) capabilities for FPGA platforms.

...read moreread less

Abstract: The growth in embedded systems complexity has created the demand for novel tools which allow rapid systems development and facilitate the designer's management of complexity. Especially since systems must incorporate a variety of often contradictory characteristics, achieving design metrics in short development time is an increasing challenge. This paper presents RAPTOR-Design, a framework for System-on-Chip (SoC) design which incorporates a customizable processor architecture and allows rapid software-to-hardware migration, custom hardware integration in a tightly-coupled fashion and seamless Fault Tolerance (FT) capabilities for FPGA platforms. Impact on design metrics of processor customization, FT-capabilities and custom hardware integration are presented, as well as an overview of the design process using RAPTOR-Design.

...read moreread less

Cites methods from "A Fault Tolerant Design Methodology..."

...The RAPTOR framework builds upon previous work presented in [23] and [24]....
[...]

References

PDF

Open Access

More filters

Proceedings Article•DOI•

A fault tolerant approach to microprocessor design

[...]

Christopher T. Weaver¹, Todd Austin•Institutions (1)

University of Michigan¹

01 Jul 2001

TL;DR: This work proposes a fault-tolerant approach to reliable microprocessor design that provides significant resistance to core processor design errors and operational faults such as supply voltage noise and energetic particle strikes, and shows through cycle-accurate simulation and timing analysis of a physical checker design that it preserves system performance while keeping area overheads and power demands low.

...read moreread less

Abstract: We propose a fault-tolerant approach to reliable microprocessor design. Our approach, based on the use of an online checker component in the processor pipeline, provides significant resistance to core processor design errors and operational faults such as supply voltage noise and energetic particle strikes. We show through cycle-accurate simulation and timing analysis of a physical checker design that our approach preserves system performance while keeping area overheads and power demands low. Furthermore, analyses suggest that the checker is a fairly simple state machine that can be formally verified, scaled in performance, and reused. Further simulation analyses show virtually no performance impacts when our simple checker design is coupled with a high-performance microprocessor model. Timing analyses indicate that a fully synthesized unpipelined 4-wide checker component in 0.25 /spl mu/m technology is capable of checking Alpha instructions at 288 MHz. Physical analyses also confirm that costs are quite modest; our prototype checker requires less than 6% the area and 1.5% the power of an Alpha 21264 processor in the same technology. Additional improvements to the checker component are described which allow for improved detection of design, fabrication and operational faults.

...read moreread less

154 citations

Proceedings Article•DOI•

A fault injection analysis of Virtex FPGA TMR design methodology

[...]

F. C. Lima¹, C. Carmichael¹, J. Fabula¹, R. Padovani¹, Ricardo Reis - Show less +1 more•Institutions (1)

Xilinx¹

10 Sep 2001

TL;DR: In this paper, the authors present the meaningful results of a single bit upset fault injection analysis performed in Virtex FPGA triple modular redundancy (TMR) design, each programmable bit upset able to cause an error in the TMR design has been investigated.

...read moreread less

Abstract: This paper presents the meaningful results of a single bit upset fault injection analysis performed in Virtex FPGA triple modular redundancy (TMR) design. Each programmable bit upset able to cause an error in the TMR design has been investigated. Final conclusion using the TMR "golden" comparison method shows that "no errors" were reported by Virtex TMR design implementation in the presence of single bit upsets in the customization logic. The proton radiation ground test has confirmed the results achieved by fault injection.

...read moreread less

130 citations

Proton Testing of SEU Mitigation Methods for the Virtex FPGA

[...]

C. Carmichael, Earl Fuller, Joe Fabula, Fernanda De Lima

01 Jan 2001

TL;DR: The Xilinx prescribed SEU mitigation schemes were tested for a generic functional usage at the proton facility in UC Davis and demonstrated improvement in the programmed functional upset sensitivity and the system consequence of upsets.

...read moreread less

Abstract: Total ionizing dose (TID), heavy ion and proton characterization have previously been performed on Virtex FPGAs, fabricated on epitaxial silicon, to evaluate the onorbit radiation performance expected for this technology. The dominant risk is Single Event Upset (SEU), so upset detection and mitigation schemes were developed and tested to demonstrate the improvement in the programmed functional upset sensitivity and the system consequence of upsets. The Xilinx prescribed SEU mitigation schemes were tested for a generic functional usage at the proton facility in UC Davis.

...read moreread less

65 citations

Proceedings Article•DOI•

Fast dynamic and partial reconfiguration data path with low hardware overhead on Xilinx FPGAs

[...]

Michael Hübner, Diana Gohringer, Juanjo Noguera, Jürgen Becker

19 Apr 2010

TL;DR: A novel methodology for the inclusion of the configuration access port into the data path of a processor core in order to adapt the internal architecture and to re-use this access port as data- sink and source is shown.

...read moreread less

Abstract: Dynamic and partial reconfiguration of Xilinx FPGAs is a well known technique in runtime adaptive system design. With this technique, parts of a configuration can be substituted while other parts stay operative without any disturbance. The advantage is the fact, that the spatial and temporal partitioning can be exploited with the goal to increase performance and to reduce power consumption due to the re-use of chip area. This paper shows a novel methodology for the inclusion of the configuration access port into the data path of a processor core in order to adapt the internal architecture and to re-use this access port as data- sink and source. It is obvious that the chip area, which is utilized by the hardware drivers for the internal configuration access port (ICAP), has to be as small as possible in comparison to the application functionality. Therefore, a hardware design with a small footprint, but with an adequate performance in terms of data throughput, is necessary. This paper presents a fast data path for dynamic and partial reconfiguration data with the advantage of a small footprint on the hardware resources.

...read moreread less

59 citations

Journal Article•DOI•

Study of the Effects of SEU-Induced Faults on a Pipeline Protected Microprocessor

[...]

Emmanuel Touloupis, James A. Flint, V.A. Chouliaras, David D. Ward

01 Dec 2007-IEEE Transactions on Computers

TL;DR: The study proposes statistical methods for both the single and dual fault injection campaigns and demonstrates the fault-tolerant capability of both processors in terms of fault latencies, the probability of fault manifestation, and the behavior of latent faults.

...read moreread less

Abstract: This paper presents a detailed analysis of the behavior of a novel fault-tolerant 32-bit embedded CPU as compared to a default (non-fault-tolerant) implementation of the same processor during a fault injection campaign of single and double faults. The fault-tolerant processor tested is characterized by per-cycle voting of microarchitectural and the flop-based architectural states, redundancy at the pipeline level, and a distributed voting scheme. Its fault-tolerant behavior is characterized for three different workloads from the automotive application domain. The study proposes statistical methods for both the single and dual fault injection campaigns and demonstrates the fault-tolerant capability of both processors in terms of fault latencies, the probability of fault manifestation, and the behavior of latent faults.

...read moreread less

58 citations