scispace - formally typeset
Proceedings ArticleDOI

On the optimal reconfiguration times for TMR circuits on SRAM based FPGAs

24 Jun 2013-pp 9-14

TL;DR: This work proposes a novel circuit instrumentation method for probing Triple Modular Redundancy (TMR) circuits for error detection at the granularity of individual domains and then use selective run-time dynamic reconfiguration for recovery.

AbstractUnreliable and harsh environmental conditions in avionics and space applications demand run-time adaptation capabilities to withstand environmental changes and radiation-induced faults. Modern SRAM-based FPGAs integrating high computational power with partial and dynamic reconfiguration abilities are a usual candidate for such systems. However, due to the vulnerability of these devices to Single Event Upsets (SEUs), designs need proper fault-handling mechanisms. In this work we propose a novel circuit instrumentation method for probing Triple Modular Redundancy (TMR) circuits for error detection at the granularity of individual domains and then use selective run-time dynamic reconfiguration for recovery. Error detection logic is inserted in the physical net-list to identify and localize faults. Moreover, selective domain reconfiguration is achieved by careful considerations in the placement phase on the FPGA reconfigurable area. The proposed technique is suitable for systems having hard real-time constraints. Our results demonstrate that this approach has an overhead of 2 LUTs per majority voter in internal partitions in terms of area when compared to the standard TMR circuits. In addition, it brings down the reconfiguration times of TMR circuits to a single domain and ensures a 100% availability of the device assuming the Single Event Upset fault model.

...read more


Citations
More filters
Journal ArticleDOI
01 Sep 2013
TL;DR: In this article, the authors evaluate different trade-offs of N-modular redundancy technique in SRAM-based FPGAs and evaluate different cross-section, area and power consumption for different numbers of redundant modules.
Abstract: This paper evaluates different trade-offs of N-modular redundancy technique in SRAM-based FPGAs. Redundant copies of the same module were implemented and the outputs voted by self-adapted majority voter. The redundant design was exposed to neutrons and the error rate was evaluated. Results in cross-section, area and power consumption were analyzed for different numbers of redundant modules, ranging from 3 copies (standard TMR) up to 7 copies.

14 citations


Cites background from "On the optimal reconfiguration time..."

  • ...So, the scrubbing can be limited to the bitstream region implementing the faulty module [18], [19]....

    [...]

Proceedings ArticleDOI
26 Oct 2015
TL;DR: This paper proposes the use of Triple Modular Redundancy at the controller level, and calculates system reliability using Markov models to quantitatively show the advantage of the proposed technique in terms of extended lifetime.
Abstract: Fault-tolerance is becoming an essential feature in the design of Networked Control Systems (NCSs). Furthermore, Sensor-to-Actuator (S2A) architectures have shown some advantages over conventional In-Loop architectures. This paper focuses on fault-tolerant controllers in the context of S2A systems. It proposes the use of Triple Modular Redundancy at the controller level. The fault-tolerant controller will be hosted in an FPGA that has a spare location. The voter in this TMR scheme is fault-secure to guarantee that the controllers never produce an undetected incorrect control action. Finally, system reliability is calculated using Markov models to quantitatively show, via case studies, the advantage of the proposed technique in terms of extended lifetime.

13 citations


Cites background from "On the optimal reconfiguration time..."

  • ...Finally, it differs from [30, 32] in that it targets permanent as well as transient faults....

    [...]

  • ...In [30], a modified implementation of TMR circuits was proposed allowing for the insertion of fault detection logic, mainly minority voters, on the granularity of a domain....

    [...]

Journal ArticleDOI
TL;DR: The experimental results show that the proposed fault-tolerant strategy of selective triple modular redundancy based on multi-objective optimization and evolvable hardware against single-event upsets for circuits implemented on field programmable gate arrays (FPGAs) based on static random access memory (SRAM).
Abstract: To improve the reliability of spaceborne electronic systems, a fault-tolerant strategy of selective triple modular redundancy (STMR) based on multi-objective optimization and evolvable hardware (EHW) against single-event upsets (SEUs) for circuits implemented on field programmable gate arrays (FPGAs) based on static random access memory (SRAM) is presented in this paper. Various topologies of circuit with the same functionality are evolved using EHW firstly. Then the SEU-sensitive gates of each circuit are identified using signal probabilities of all the lines in it, and each circuit is hardened against SEUs by selectively applying triple modular redundancy (TMR) to these SEU-sensitive gates. Afterward, each circuit hardened has been evaluated by SEU Simulation, and the multi-objective optimization technology is introduced to optimize the area overhead and the number of functional errors of all the circuits. The proposed fault-tolerant strategy is tested on four circuits from microelectronics center of North Carolina (MCNC) benchmark suite. The experimental results show that it can generate innovative trade-off solutions to compromise between hardware resource consumption and system reliability. The maximum savings in the area overhead of the STMR circuit over the full TMR design is 58% with the same SEU immunity.

11 citations


Cites background or methods from "On the optimal reconfiguration time..."

  • ...(1) To improve the reliability of electronics in space, the multi-objective evolutionary design of STMR system against SEUs is proposed in this paper....

    [...]

  • ...(1) The input is considered to be sensitive only if its value is dominant over other inputs....

    [...]

  • ...(1) Obtaining the required topologies using EHW....

    [...]

  • ...Each optimal chromosome evolved in (1) is translated into VHDL format to design a new VRC structure capable of implementing STMR....

    [...]

  • ...The basic concept is as follows: (1) a set of the primary input probabilities has been generated and propagated through the combinational circuit; (2) the output signal probabilities of all the lines in the circuit are calculated; (3) SEU sensitive gates are identified; (4) TMR is introduced to these sensitive gates....

    [...]

Proceedings ArticleDOI
01 Sep 2017
TL;DR: A design is proposed that takes advantage of the Dynamic Partial Reconfiguration property inherent in some FPGAs and can support FPGA full configuration as well as partial reconfiguration over TCP/IP networks through on-chip configuration network interface with minimal off-chip components.
Abstract: Fault-Tolerance is currently a very important feature in industrial automation. This paper focuses on Fault-Tolerant FPGA-based controllers for Sensor-to-Actuator Networked Control Systems. A design is proposed that takes advantage of the Dynamic Partial Reconfiguration property inherent in some FPGAs. The controller is assumed to consist of a small processor, memory and associated hardware. Several Fault-Tolerance techniques are applied to this generic system and single points of failure are avoided even in the error detection and recovery mechanisms. The fault model considered is single and/or some multiple event upsets. A case study is presented to quantify the increase in reliability of the presented system. Furthermore, the proposed architecture can support FPGA full configuration as well as partial reconfiguration over TCP/IP networks through on-chip configuration network interface with minimal off-chip components.

11 citations


Cites methods from "On the optimal reconfiguration time..."

  • ...In [10], TMR is used along with DPR for recovery of transient faults in [10]....

    [...]

Proceedings ArticleDOI
15 Jun 2015
TL;DR: A fully reconfigurable medium-grained triple modular redundancy (TMR) architecture which forms part of a runtime adaptive on-board processor (OBP) is presented and fault mitigation is extended to the voting mechanism by applying the reconfiguration methodology not only to domain replicas but also to the voter itself.
Abstract: The impact of SRAM-based FPGAs is constantly growing in aerospace industry despite the fact that their volatile configuration memory is highly susceptible to radiation effects Therefore, strong fault-handling mechanisms have to be developed in order to protect the design and make it capable of fighting against both soft and permanent errors In this paper, a fully reconfigurable medium-grained triple modular redundancy (TMR) architecture which forms part of a runtime adaptive on-board processor (OBP) is presented Fault mitigation is extended to the voting mechanism by applying our reconfiguration methodology not only to domain replicas but also to the voter itself The proposed approach takes advantage of adaptive configuration placement and modular property of the OBP, thus allowing on-line creation of different medium-grained TMRs and selection of their granularity level Consequently, we are able to narrow down the fault-affected area thus making the error recovery process faster and less power consuming The conventional hardware based voting is supported by the ICAP-based one in order to additionally strengthen the reconfigurable intermediate voting In addition, the implementation methodology ensures using only one memory footprint for all voters and their voting adaptations thus saving storing resources in expensive rad-hard memories

10 citations


Cites background from "On the optimal reconfiguration time..."

  • ...the optimal granularity of DMR/TMR domains [4], its voting mechanism or even optimal reconfiguration time after the detection of the fault [11]....

    [...]


References
More filters
Proceedings ArticleDOI
26 Mar 2006
TL;DR: An efficient approach of applying mitigation to an FPGA design to protect against single event upsets (SEUs) and applies triple modular redundancy (TMR) selectively based on the classification of the circuit structure.
Abstract: This paper describes an efficient approach of applying mitigation to an FPGA design to protect against Single Event Upsets (SEUs). This approach applies mitigation selectively to FPGA circuit structures depending on their importance within the design. Higher priority is given to structures causing "persistent" errors within the design. For certain applications, applying selective mitigation to the persistent components can yield higher returns in reliability per unit cost than full mitigation. A software tool is also introduced which automatically classifies circuit structures based on this concept and applies Triple Modular Redundancy (TMR) selectively based on the classification of the circuit structure.

189 citations


"On the optimal reconfiguration time..." refers background or methods in this paper

  • ...TMR with configuration scrubbing [3][4] has reconfiguration time on the orders of milliseconds which can be intolerable for systems with hard real-time constraints....

    [...]

  • ...However, exploiting the reconfiguration abilities of FPGA, TMR life can be extended by writing the configuration bit-stream periodically in order to stop the accumulation of multiple independent single event upsets [3] [4]....

    [...]

Proceedings ArticleDOI
26 Sep 2007
TL;DR: The adoption of the triple modular redundancy coupled with the partial dynamic reconfiguration of field programmable gate arrays to mitigate the effects of soft errors in such class of device platforms is presented.
Abstract: This paper presents the adoption of the triple modular redundancy coupled with the partial dynamic reconfiguration of field programmable gate arrays to mitigate the effects of soft errors in such class of device platforms. We propose an exploration of the design space with respect to several parameters (e.g., area and recovery time) in order to select the most convenient way to apply this technique to the device under consideration. The application to a case study is presented and used to exemplify the proposed approach.

136 citations


"On the optimal reconfiguration time..." refers background in this paper

  • ...Similarly, a system level partitioning based approach is proposed in [8] in order to detect and localize faults and use dynamic reconfiguration for recovery....

    [...]

Proceedings ArticleDOI
29 Sep 2009
TL;DR: This paper presents a novel technique that allows partial reconfiguration to be used with configuration scrubbing, a self scrubber that performs the necessary operations to reconfigure a portion of the design while continuously scrubbing the entire FPGA.
Abstract: SRAM-based FPGA devices are susceptible to single event effects (SEE) including single event upsets (SEU) within the configuration memory. Configuration scrubbing along with TMR or other hardware redundancy techniques are often used to mitigate the effects of these SEUs. However, the use of traditional configuration scrubbing prevents the ability to reconfigure the FPGA dynamically or to perform partial reconfiguration. This paper presents a novel technique that allows partial reconfiguration to be used with configuration scrubbing. A self scrubber, utilizing a small portion of the FPGA, performs the necessary operations to reconfigure a portion of the design while continuously scrubbing the entire FPGA.

130 citations


"On the optimal reconfiguration time..." refers methods in this paper

  • ...A method on how to resolve this situation is presented in [9]....

    [...]

Proceedings ArticleDOI
24 Jun 2009
TL;DR: The method combines large grain TMR with special voters capable of signalizing the faulty module and check point states that allow the sequential synchronization of the recovered module with the Xilinx TMR (XTMR) approach, minimizing time and energy spent in the process.
Abstract: This paper presents an innovative method that allows the use of dynamic partial reconfiguration combined with triple modular redundancy (TMR) in SRAM-based FPGAs fault-tolerant designs. The method combines large grain TMR with special voters capable of signalizing the faulty module and check point states that allow the sequential synchronization of the recovered module with the Xilinx TMR (XTMR) approach. As a result, only the faulty domain is reconfigured, minimizing time and energy spent in the process. In addition, the use of checkpoint states avoids system downtime, since the synchronization of the recovered module is performed while the others are kept running. Experimental results show that the method has a reduced fault recovery time compared to the standard TMR implementation, maintaining the compatible area overhead and performance.

34 citations


"On the optimal reconfiguration time..." refers background in this paper

  • ...An interesting work in presented in [10] where the authors suggest to remove all the internal voters and implement TMR as a single partition i-e voters are used only at the output....

    [...]

Proceedings ArticleDOI
02 Mar 2006
TL;DR: Methods for efficient on-line failure detection, integrated in a reconfigurable system for execution and test of multiple automotive inner cabin functions, and even make it possible for a system to heal itself from more advanced faults are presented.
Abstract: The rapid development of hardware/software and microelectronic technology enables the realization of more complex systems with new characteristics. These characteristics could lead to further advances in electronic measurement-, control- and regulation systems. The industrial demands of future electronic systems rely on systems to be fault-tolerant, since the complexity increased to the point where it is impossible to detect all errors during the design phase. The ability for a system to recover from a failure requires that incorrect system operation can be detected and analysed during run-time. To achieve this, methods for performing tests of functionalities and components dynamically must be incorporated in the system behaviour during the design phase. This paper presents methods for efficient on-line failure detection, integrated in a reconfigurable system for execution and test of multiple automotive inner cabin functions. These methods also allow a certain degree of failure recovery, and even make it possible for a system to heal itself from more advanced faults. By exploiting the ability of dynamic and partial hardware reconfiguration, the monitoring can also be performed with less hardware overhead since the monitoring functionalities are configured only when they are required.

33 citations


"On the optimal reconfiguration time..." refers methods in this paper

  • ...In [7] the authors present a method to recover a system to health by dynamic partial reconfiguration....

    [...]