scispace - formally typeset
Proceedings ArticleDOI

DEFINE: a distributed fault injection and monitoring environment

TLDR
DEFINE can inject both hardware faults and software faults into any process running in a distributed system, either in user mode or in supervisor mode, and monitor the fault impact and propagation in software systems and among machines.
Abstract
This paper presents a distributed fault injection and monitoring environment (DEFINE) as a tool to evaluate system dependability, to investigate fault propagation, and to validate fault-tolerant mechanisms. DEFINE can inject both hardware faults (hardware-induced software errors) and software faults into any process running in a distributed system, either in user mode or in supervisor mode, and monitor the fault impact and propagation in software systems and among machines. It employs two fault injection techniques: (i) using hardware clock interrupts to control the time of fault injection and activation, and (ii) using software traps to inject all the faults except communication faults and memory faults in the data/stack segment. Experiments on six Sun SPARCstations to study the system behavior under faults are conducted to demonstrate the application of DEFINE.

read more

Citations
More filters
Journal ArticleDOI

Xception: a technique for the experimental evaluation of dependability in modern computers

TL;DR: Experimental, results are presented to demonstrate the accuracy and potential of Xception in the evaluation of the dependability properties of the complex computer systems available nowadays.
Journal ArticleDOI

FERRARI: a flexible software-based fault and error injection system

TL;DR: The methodology and guidelines for the design of flexible software based fault and error injection are described and a tool, FERRARI, that incorporates the techniques are presented that demonstrates the effectiveness of the software-based error injection tool in evaluating the dependability properties of complex systems.
Journal ArticleDOI

Dependability of COTS microkernel-based systems

TL;DR: A prototype environment, called MAFALDA (Microkernel Assessment by Fault injection AnaLysis and Design Aid), that is aimed at providing objective failure data on a candidate microkernel and also improving its error detection capabilities is described.
Journal ArticleDOI

DEPEND: a simulation-based environment for system level dependability analysis

TL;DR: The rationale for a functional simulation tool, called DEPEND, which provides an integrated design and fault injection environment for system level dependability analysis is presented and techniques developed to simulate realistic fault scenarios, reduce simulation time explosion, and handle the large fault model and component domain associated with system level analysis are presented.
Journal ArticleDOI

Assessing Dependability with Software Fault Injection: A Survey

TL;DR: This survey provides a comprehensive overview of the state of the art on Software Fault Injection to support researchers and practitioners in the selection of the approach that best fits their dependability assessment goals.
References
More filters
Proceedings ArticleDOI

FIAT-fault injection based automated testing environment

TL;DR: An automated real-time distributed accelerated fault injection environment (FIAT) is presented as an attempt to provide suitable tools for the validation process and an example of fault tolerant systems such as checkpointing and duplicate and match is used to show its usefulness.
Journal ArticleDOI

Fault injection experiments using FIAT

TL;DR: FIAT is capable of emulating a variety of distributed system architectures and it provides the capabilities to monitor system behavior and inject faults for the purpose of experimental characterization and validation of a system's dependability.
Proceedings ArticleDOI

FERRARI: a tool for the validation of system dependability properties

TL;DR: FERRARI as mentioned in this paper is a fault and error automatic real-time injector, which can evaluate complex systems by emulating most hardware faults in software, including permanent faults and transient errors.
Journal ArticleDOI

FINE: A fault injection and monitoring environment for tracing the UNIX system behavior under faults

TL;DR: Experimental results show that memory and software faults usually have a very long latency, while bus andCPU faults tend to crash the system immediately, and Markov reward analysis shows that the performance loss incurred by bus faults and CPU faults is much higher than that incurred by software and memory faults.
Proceedings ArticleDOI

Understanding large system failures-a fault injection experiment

TL;DR: The idea of failure acceleration is introduced to conduct experiments that enhance the understanding of large system failures and provide a foundation for design enhancements and modeling of availability.
Related Papers (5)