scispace - formally typeset
Open AccessJournal ArticleDOI

Resilience Articulation Point (RAP): Cross-layer dependability modeling for nanometer system-on-chip resilience

Reads0
Chats0
TLDR
This paper shows by example how probabilistic bit flips are systematically abstracted and propagated towards higher abstraction levels up to the application software layer, and how RAP can be used to parameterize architecture-level resilience methods.
About
This article is published in Microelectronics Reliability.The article was published on 2014-06-01 and is currently open access. It has received 28 citations till now. The article focuses on the topics: Abstraction layer & Resilience (network).

read more

Citations
More filters
Proceedings ArticleDOI

Multi-Layer Dependability: From Microarchitecture to Application Level

TL;DR: It is shown that multi-layer dependability is an indispensable way to cope with the increasing amount of technology-induced dependability problems that threaten to proceed further scaling and that the paradigm of multi- layer dependability bears a large potential for significantly increasing dependability at reasonable effort.
Journal ArticleDOI

From Layout to System: Early Stage Power Delivery and Architecture Co-Exploration

TL;DR: A run-time simulation framework of both PD and architecture and captures their interactions that can achieve smaller than 1% deviation from SPICE for an entire PD system simulation and investigates the impact of dynamic noise on system level oxide breakdown reliability.
Proceedings ArticleDOI

Workload- and Instruction-Aware Timing Analysis: The missing Link between Technology and System-level Resilience

TL;DR: An enhanced static timing analysis is presented which links technology-level effects to system-level and vice versa and discusses the accurate and efficient consideration of system workload and impact of executed instructions on circuit timing.
Proceedings ArticleDOI

System C-based multi-level error injection for the evaluation of fault-tolerant systems

TL;DR: An approach based on simulation-based error injection and system prototypes modeled in SystemC is presented, which is the realization of an efficient multi-level error effect simulation for the evaluation of the fault-tolerance of a system.
Proceedings ArticleDOI

Fault-Tolerant Regularity-Based Real-Time Virtual Resources

TL;DR: A fault tolerance model for Regularity-based Real-Time Virtual Resources to recover from transient hardware faults without modifying user applications and shows the effectiveness of the proposed framework while incurring minimal overhead.
References
More filters
Journal ArticleDOI

Impact of CMOS technology scaling on the atmospheric neutron soft error rate

TL;DR: If the increasing number of bits is taken into account, then the SER per chip is not expected to increase faster than linearly with decreasing L/sub G/.
Proceedings ArticleDOI

Scheduling dynamic dataflow graphs with bounded memory using the token flow model

TL;DR: The authors build upon research by E. A. Lee (1991) concerning the token flow model by analyzing the properties of cycles of the schedule: sequences of actor executions that return the graph to its initial state.
Journal ArticleDOI

Space, atmospheric, and terrestrial radiation environments

TL;DR: The progress on developing models of the radiation environment since the 1960s is reviewed with emphasis on models that can be applied to predicting the performance of microelectronics used in spacecraft and instruments as mentioned in this paper.
Journal ArticleDOI

Developing a low-cost high-quality software tool for dynamic fault-tree analysis

TL;DR: An approach to tool development that attacks the inability to produce, at a reasonable cost, supporting software tools that have the: usability and dependability characteristics that industrial users require; and evolvability to accommodate software change as the underlying analysis methods are refined and enhanced.
Journal ArticleDOI

Investigation of multi-bit upsets in a 150 nm technology SRAM device

TL;DR: The results showed that the memory architecture is critical in affecting the single-bit EDAC effectiveness and the predominant MBU shape is strongly influenced by the vertical and horizontal distance of the active nodes of the memory cells.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What have the authors contributed in "Resilience articulation point (rap): cross-layer dependability modeling for nanometer system-on-chip resilience" ?

The Resilience Articulation Point ( RAP ) model aims at provisioning researchers and developers with a probabilistic fault abstraction and error propagation framework covering all hardware/software layers of a System on Chip. This paper introduces the ideas of RAP based on examples of radiation induced soft errors in SRAM cells, voltage variations and sequential CMOS logic. Thus, design concerns at higher abstraction layers can be investigated without the necessity to further consider the full details of lower levels of design. 

The correlation coefficient method is adopted to obtain error probabilities and correlations of primary outputs due to a particle strike at internal nodes. 

Based on success trees, a variant of the well-known fault trees, the proposed method not only considers multiple transient and permanent faults concurrently, but a carefully introduced structure of the success tree enables to track a system failure back to the critical effect. 

Over time and with aging effects taking place, more and more components tend to be permanently defective, making permanent effects the dominant source of failure. 

For a decreased reliability in which voltage drops up to 300 mV occur, the authors can react on the application layer by increasing the number of iterations in order to regain communications performance. 

To make the fault injection experiment feasible the authors used a Mixture Importance Sampling approach to simulate only relevant scenarios. 

The authors assume in the following that the probability distribution of injected charges due to a neutron strike follows an exponential distribution [13]:fQ(Qinjected) = 1Qs expQinjectedQs!(3)The parameter Qs is the charge collection slope due to one neutron strike, which is technology dependent [10]. 

When operand variables of arithmetic operations are stored in an SRAM memory array, then Pword(~x, t) describes the probability with which these variables contain erroneous data. 

In [35], the authors studied the effects of hardware errors in the system memories of a MIMO-BICM receiver on the system’s communications performance because the memories consume a large amount of the systems area. 

According to Fig. 12, the probability that a faulty data word is read from the cache decreases as expected, while the overall system failure probability only slightly decreases compared to the unprotected cache. 

In this case, the optimization algorithm presented in [30] will suggest a solution of s = 8 and a time window of TTW = 1.30 ms which will just meet the aforementioned demand. 

The authors will show in this section how the authors can model the bit flip probabilities in an SRAM array by using the generic model from Section 3.A bit flip in an SRAM cell occurs for example when a particle strike induces enough charge on a point within the cell to cause a flip in the cell’s content. 

The error probability PE(c) for each pipeline component c is obtained using the HW-level reliability methods like EPP [24], CEP [25] and CLASS [26].