scispace - formally typeset
Proceedings ArticleDOI

System structure for software fault tolerance

Brian Randell
- pp 437-449
Reads0
Chats0
TLDR
In this article, the authors present a method for structuring complex computing systems by the use of what they term "recovery blocks", "conversations", and "fault-tolerant interfaces".
Abstract
The paper presents, and discusses the rationale behind, a method for structuring complex computing systems by the use of what we term “recovery blocks”, “conversations” and “fault-tolerant interfaces”. The aim is to facilitate the provision of dependable error detection and recovery facilities which can cope with errors caused by residual design inadequacies, particularly in the system software, rather than merely the occasional malfunctioning of hardware components.

read more

Citations
More filters

Basic Concepts and Taxonomy of Dependable and Secure Computing

TL;DR: In this paper, the main definitions relating to dependability, a generic concept including a special case of such attributes as reliability, availability, safety, integrity, maintainability, etc.
Journal ArticleDOI

A survey of rollback-recovery protocols in message-passing systems

TL;DR: This survey covers rollback-recovery techniques that do not require special language constructs and distinguishes between checkpoint-based and log-based protocols, which rely solely on checkpointing for system state restoration.
Journal ArticleDOI

The N-Version Approach to Fault-Tolerant Software

TL;DR: Principal requirements for the implementation of N-version software are summarized and the DEDIX distributed supervisor and testbed for the execution of N -version software is described.
Journal ArticleDOI

Checkpointing and Rollback-Recovery for Distributed Systems

TL;DR: In this article, the authors consider the problem of bringing a distributed system to a consistent state after transient failures, and propose a distributed algorithm to create consistent checkpoints, as well as a rollback-recovery algorithm to recover the system from transient failures.
Journal ArticleDOI

Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines

TL;DR: An architectural framework for resilience and survivability in communication networks is provided and a survey of the disciplines that resilience encompasses is provided, along with significant past failures of the network infrastructure.
References
More filters
Journal ArticleDOI

The structure of the “THE”-multiprogramming system

TL;DR: A multiprogramming system is described in which all activities are divided over a number of sequential processes, in each of which one or more independent abstractions have been implemented.
Book ChapterDOI

A program structure for error detection and recovery

TL;DR: A method of structuring programs which aids the design and validation of facilities for the detection of and recovery from software errors and a mechanism for the automatic preservation of restart information at a level of overhead which is believed to be tolerable.
Proceedings Article

PLANNER: a language for proving theorems in robots

TL;DR: The deductive system of PLANNER is subordinate to the hierarchical control structure in order to make the language efficient and the use of a general purpose matching language makes the deductives system more powerful.
Proceedings ArticleDOI

Recovery semantics for a DB/DC system

TL;DR: A unified, systematic view of integrity/recovery as it relates to a data-processing system—whether man, machine, or both is presented.
Proceedings ArticleDOI

A recursive virtual machine architecture

Hugh C. Lauer, +1 more
TL;DR: This paper summarizes the preliminary design of a computer system with a recursive, virtual machine architecture and gives a brief account of the considerations leading to that design.
Related Papers (5)