scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Fault Tolerant Design Methodology for a FPGA-based Softcore Processor

01 Jan 2012-IFAC Proceedings Volumes (Elsevier)-Vol. 45, Iss: 4, pp 145-150
TL;DR: The architecture of a Fault Tolerant softcore processor using triplication of all units as well as using a parity protection scheme for on-chip caches is described, presenting the impact on area, clock frequency and I/O requirements of both implementations, targeting FPGAs.
About: This article is published in IFAC Proceedings Volumes.The article was published on 2012-01-01. It has received 7 citations till now. The article focuses on the topics: General protection fault & Stuck-at fault.
Citations
More filters
Proceedings ArticleDOI
01 Oct 2019
TL;DR: Lock-V, a heterogeneous architecture that explores a Dual-Core Lockstep (DCLS) fault-tolerance technique in two different processing units: a hard-core Arm Cortex-A9 and a soft-core RISC-V-based processor is proposed.
Abstract: Computer systems are permanently present in our daily basis in a wide range of applications. In systems with mixed-criticality requirements, e.g., autonomous driving or aerospace applications, devices are expected to continue operating properly even in the event of a failure. An approach to improve the robustness of the device's operation lies in enabling fault-tolerant mechanisms during the system's design. This article proposes Lock-V, a heterogeneous architecture that explores a Dual-Core Lockstep (DCLS) fault-tolerance technique in two different processing units: a hard-core Arm Cortex-A9 and a soft-core RISC-V-based processor. It resorts a System-on-Chip (SoC) solution with software programmability (available trough the hard-core Arm Cortex-A9) and field-programmable gate array (FPGA) technology, taking advantages from the latter to support the deployment of the RISC-V soft-core along with dedicated hardware accelerators towards the realization of the DCLS.

11 citations


Cites background or methods from "A Fault Tolerant Design Methodology..."

  • ...While some techniques replicate processing units in a technique called dual-core lockstep (DCLS) -implemented either loosely- or tightly-coupled to the processor- [4,7]–[11], others apply a triple modular redundancy (TMR) mechanism, where the processing units are triplicated and a voter module is added to the system [12]....

    [...]

  • ...module and a voter must be added to the system [12]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a heterogeneous fault tolerance architecture that explores a dual-core lockstep (DCLS) technique to mitigate single event upset (SEU) and common-mode failure (CMF) problems is presented.

8 citations

01 Jan 2018
TL;DR: A dynamically reconfigurable fault-tolerant mode is presented in r-VEX, a softcore processor, so that it could be used as an attractive alternative to expensive radiation-hardened processors for space-based applications.
Abstract: Over the past many years, technology scaling has resulted in a continuous reduction of lateral and vertical dimensions of transistors. The technology scaling, on the one hand, has led to a commensurate performance gain for very-large-scale integration (VLSI) circuits, but on the other hand, has also made such circuits more vulnerable to ionizing radiations which can cause single event effects(SEEs). These SEEs may cause the underlying user circuitry to deviate from its normal behavior. Devices that are destined for space missions need special protection for such kind of anomalies as space environment is filled with massive amount of high energy particles and ionizing radiations. In this thesis, the design, implementation, and verification of a fault-tolerant r-VEX, a softcore processor, is presented, so that it could be used as an attractive alternative to expensive radiation-hardened processors for space-based applications. r-VEX is a VLIW based, dynamically reconfigurable processor. Keeping in line with its inherent attribute, a dynamically reconfigurable fault-tolerant mode is presented in this work, which provides the running application an option to activate and deactivate the fault-tolerant mode multiple times. In this mode, for the protection of processor pipeline, a non-traditional TMR approach that requires 3 lanegroups running in 2-way mode is implemented. For the reliability of user memories, Hamming codes are implemented as an ECC coding scheme. The functionally of our fault-tolerant design is verified by using both a simulation-based platform (ModelSim) and an on-board FPGA platform (ML605 development kit). To measure the fault-tolerant capabilities of the r-VEX core, saboteurs are used to artificially inject faults at various predefined locations in the core. The obtained results have shown that our design can mitigate all injected single faults in the pipeline and double faults in the caches, without triggering any failure. The dynamically configurable fault-tolerant feature is obtained at the cost of about 30% additional resource utilization and 20% reduction in the maximum operating frequency.

5 citations

Proceedings ArticleDOI
25 Apr 2021
TL;DR: In this paper, the effect of SETs on the hash functions of the count min sketch (CMS) frequency estimate is analyzed theoretically in terms of overestimation probability, underestimation probability and the equal probability, and further discussed for data with different frequency.
Abstract: Estimating the frequency of the elements in a data set is commonly needed in data analysis. With the increase of the size of the data sets, accurately computing the number of times that each element appears with a counter becomes impractical. Instead, the Count Min Sketch (CMS) is widely used in big data processing to estimate frequency due to its simplicity and small storage needs. However, soft errors caused by Single Event Transients (SETs) will affect the hardware implementation of the CMS, mainly the hash functions. In this paper, the effect of SETs on the hash functions of the CMS frequency estimate is analyzed theoretically in terms of overestimation probability, underestimation probability, and the equal probability, and further discussed for data with different frequency. Simulation results verify the correctness of the theoretical analysis and reveal several valuable conclusions. First, a large portion of SETs can be tolerated by the CMS itself, and the reliability of the CMS improves when larger number of arrays are used. Second, the average probability for overestimation and underestimation are almost the same, and decrease for larger numbers of arrays. Third, SETs are more likely to cause underestimation for the most frequent data elements. Finally, the overall effect of SETs on the CMS is slightly affected by the number of counters in each array, and seems to be independent of the distribution of the input sequence. The results and analysis presented in this paper provide a starting point for the design of efficient SET fault-tolerant schemes for the CMS.

1 citations

Proceedings ArticleDOI
05 Nov 2012
TL;DR: RAPTOR-Design is presented, a framework for System-on-Chip (SoC) design which incorporates a customizable processor architecture and allows rapid software-to-hardware migration, custom hardware integration in a tightly-coupled fashion and seamless Fault Tolerance (FT) capabilities for FPGA platforms.
Abstract: The growth in embedded systems complexity has created the demand for novel tools which allow rapid systems development and facilitate the designer's management of complexity. Especially since systems must incorporate a variety of often contradictory characteristics, achieving design metrics in short development time is an increasing challenge. This paper presents RAPTOR-Design, a framework for System-on-Chip (SoC) design which incorporates a customizable processor architecture and allows rapid software-to-hardware migration, custom hardware integration in a tightly-coupled fashion and seamless Fault Tolerance (FT) capabilities for FPGA platforms. Impact on design metrics of processor customization, FT-capabilities and custom hardware integration are presented, as well as an overview of the design process using RAPTOR-Design.

Cites methods from "A Fault Tolerant Design Methodology..."

  • ...The RAPTOR framework builds upon previous work presented in [23] and [24]....

    [...]

References
More filters
Proceedings ArticleDOI
01 Jul 2001
TL;DR: This work proposes a fault-tolerant approach to reliable microprocessor design that provides significant resistance to core processor design errors and operational faults such as supply voltage noise and energetic particle strikes, and shows through cycle-accurate simulation and timing analysis of a physical checker design that it preserves system performance while keeping area overheads and power demands low.
Abstract: We propose a fault-tolerant approach to reliable microprocessor design. Our approach, based on the use of an online checker component in the processor pipeline, provides significant resistance to core processor design errors and operational faults such as supply voltage noise and energetic particle strikes. We show through cycle-accurate simulation and timing analysis of a physical checker design that our approach preserves system performance while keeping area overheads and power demands low. Furthermore, analyses suggest that the checker is a fairly simple state machine that can be formally verified, scaled in performance, and reused. Further simulation analyses show virtually no performance impacts when our simple checker design is coupled with a high-performance microprocessor model. Timing analyses indicate that a fully synthesized unpipelined 4-wide checker component in 0.25 /spl mu/m technology is capable of checking Alpha instructions at 288 MHz. Physical analyses also confirm that costs are quite modest; our prototype checker requires less than 6% the area and 1.5% the power of an Alpha 21264 processor in the same technology. Additional improvements to the checker component are described which allow for improved detection of design, fabrication and operational faults.

154 citations

Proceedings ArticleDOI
F. C. Lima1, C. Carmichael1, J. Fabula1, R. Padovani1, Ricardo Reis 
10 Sep 2001
TL;DR: In this paper, the authors present the meaningful results of a single bit upset fault injection analysis performed in Virtex FPGA triple modular redundancy (TMR) design, each programmable bit upset able to cause an error in the TMR design has been investigated.
Abstract: This paper presents the meaningful results of a single bit upset fault injection analysis performed in Virtex FPGA triple modular redundancy (TMR) design. Each programmable bit upset able to cause an error in the TMR design has been investigated. Final conclusion using the TMR "golden" comparison method shows that "no errors" were reported by Virtex TMR design implementation in the presence of single bit upsets in the customization logic. The proton radiation ground test has confirmed the results achieved by fault injection.

130 citations

01 Jan 2001
TL;DR: The Xilinx prescribed SEU mitigation schemes were tested for a generic functional usage at the proton facility in UC Davis and demonstrated improvement in the programmed functional upset sensitivity and the system consequence of upsets.
Abstract: Total ionizing dose (TID), heavy ion and proton characterization have previously been performed on Virtex FPGAs, fabricated on epitaxial silicon, to evaluate the onorbit radiation performance expected for this technology. The dominant risk is Single Event Upset (SEU), so upset detection and mitigation schemes were developed and tested to demonstrate the improvement in the programmed functional upset sensitivity and the system consequence of upsets. The Xilinx prescribed SEU mitigation schemes were tested for a generic functional usage at the proton facility in UC Davis.

65 citations

Proceedings ArticleDOI
19 Apr 2010
TL;DR: A novel methodology for the inclusion of the configuration access port into the data path of a processor core in order to adapt the internal architecture and to re-use this access port as data- sink and source is shown.
Abstract: Dynamic and partial reconfiguration of Xilinx FPGAs is a well known technique in runtime adaptive system design. With this technique, parts of a configuration can be substituted while other parts stay operative without any disturbance. The advantage is the fact, that the spatial and temporal partitioning can be exploited with the goal to increase performance and to reduce power consumption due to the re-use of chip area. This paper shows a novel methodology for the inclusion of the configuration access port into the data path of a processor core in order to adapt the internal architecture and to re-use this access port as data- sink and source. It is obvious that the chip area, which is utilized by the hardware drivers for the internal configuration access port (ICAP), has to be as small as possible in comparison to the application functionality. Therefore, a hardware design with a small footprint, but with an adequate performance in terms of data throughput, is necessary. This paper presents a fast data path for dynamic and partial reconfiguration data with the advantage of a small footprint on the hardware resources.

59 citations

Journal ArticleDOI
TL;DR: The study proposes statistical methods for both the single and dual fault injection campaigns and demonstrates the fault-tolerant capability of both processors in terms of fault latencies, the probability of fault manifestation, and the behavior of latent faults.
Abstract: This paper presents a detailed analysis of the behavior of a novel fault-tolerant 32-bit embedded CPU as compared to a default (non-fault-tolerant) implementation of the same processor during a fault injection campaign of single and double faults. The fault-tolerant processor tested is characterized by per-cycle voting of microarchitectural and the flop-based architectural states, redundancy at the pipeline level, and a distributed voting scheme. Its fault-tolerant behavior is characterized for three different workloads from the automotive application domain. The study proposes statistical methods for both the single and dual fault injection campaigns and demonstrates the fault-tolerant capability of both processors in terms of fault latencies, the probability of fault manifestation, and the behavior of latent faults.

58 citations