scispace - formally typeset
Journal ArticleDOI

Derivation and Calibration of a Transient Error Reliability Model

Reads0
Chats0
TLDR
A new modeling methodology to characterize failure processes in digital computers due to hardware transients is presented, and models of common fault-tolerant redundant structures are developed using decreasing hazard function distributions.
Abstract
In this paper a new modeling methodology to characterize failure processes in digital computers due to hardware transients is presented. The basic assumption made is that system sensitivity to hardware transient errors is a function of critical resources usage. The failure rate of a given resource is approximated by a deterministic function of time, depending on the average workload of that resource, plus a Gaussian process. The probability density function of the time to failure obtained under this assumption has a decreasing hazard function, explaining why decreasing hazard function densities such as the Weibull fit experimental data so well. Data on transient errors obtained from several systems are analyzed. Statistical tests confirm the good fit between decreasing hazard distributions and actual data. Finally, models of common fault-tolerant redundant structures are developed using decreasing hazard function distributions. The analysis indicates significant differences between reliability predictions based on the exponential distribution and those based on decreasing hazard function distributions. Reliability differences of 0.2 and factors greater than 2 in Mission Time Improvement are seen in model results. System designers should be aware of these differences.

read more

Citations
More filters
Proceedings ArticleDOI

The effects of energy management on reliability in real-time embedded systems

TL;DR: In this article, the authors investigated the effects of frequency and voltage scaling on the fault rate and proposed two fault rate models based on previously published data and analyzed the effect of energy management on reliability.
Journal ArticleDOI

Design and synthesis of self-checking VLSI circuits

TL;DR: Methods for the cost-effective design of combinational and sequential self-checking functional circuits and checkers are examined and the area overhead for all proposed design alternatives is studied in detail.
Journal ArticleDOI

Reliability-Aware Energy Management for Periodic Real-Time Tasks

TL;DR: This work investigates static and dynamic reliability-aware energy management schemes to minimize energy consumption for periodic real-time systems while preserving system reliability and presents two integrated approaches to reclaim both static andynamic slack at runtime.
Journal ArticleDOI

The interplay of power management and fault recovery in real-time systems

TL;DR: The results show that traditional periodic checkpointing is not the best policy for the combined purpose of conserving energy and guaranteeing recovery, and better energy savings are possible through a nonuniform distribution of checkpoints that takes into account the energy consumption and reliability factors.
Journal ArticleDOI

Analysis of Checkpointing for Real-Time Systems

TL;DR: The effects of checkpointing strategies on task response time are analysed, and some insights for optimalcheckpointing are provided, and exact schedulability tests for fault tolerant task sets under a specified failure hypothesis are provided.
References
More filters
Journal ArticleDOI

The UNIX time-sharing system

TL;DR: The nature and implementation of the file system and of the user command interface are discussed, including the ability to initiate asynchronous processes and over 100 subsystems including a dozen languages.
Journal ArticleDOI

The CRAY-1 computer system

TL;DR: The CRAY-1 is the only computer to have been built to date that satisfies ERDA's Class VI requirement (a computer capable of processing from 20 to 60 million floating point operations per second) and its Fortran compiler (CFT) is designed to give the scientific user immediate access to the benefits of the Cray-1's vector processing architecture.
Related Papers (5)