scispace - formally typeset
Journal ArticleDOI

A model for error recovery with global checkpointing

Krishna Kant
- 01 Sep 1983 - 
- Vol. 30, Iss: 3, pp 225-239
TLDR
A new technique for providing software fault tolerance in concurrent systems is proposed that combines the traditional global checkpointing mechanism with the recovery block concept in order to come up with an easily implementable error recovery mechanism.
About
This article is published in Information Sciences.The article was published on 1983-09-01. It has received 4 citations till now. The article focuses on the topics: Software fault tolerance & Overhead (computing).

read more

Citations
More filters
Journal ArticleDOI

A survey of rollback-recovery protocols in message-passing systems

TL;DR: This survey covers rollback-recovery techniques that do not require special language constructs and distinguishes between checkpoint-based and log-based protocols, which rely solely on checkpointing for system state restoration.

Checkpointing and the modeling of program execution time

TL;DR: This chapter considers several models of checkpointing and recovery in a program in order to derive the distribution of program execution time or its expectation, and it is shown that the expected execution time increases linearly with the processing requirement in the presence of checkpoints.
Journal ArticleDOI

A quasi-synchronous checkpointing algorithm that prevents contention for stable storage

TL;DR: This paper proposes a staggered quasi-synchronous checkpointing algorithm which reduces contention for network stable storage without any synchronization overhead.
Journal ArticleDOI

On selecting rollback points for error recovery

TL;DR: A generalized formulation for checkpoint selection is given from which three different schemes are derived and each scheme is shown to have a smaller expected cost of recovery and a larger optimal checkpoint interval than rolling back to the most recent checkpoint.
References
More filters
Journal ArticleDOI

System structure for software fault tolerance

TL;DR: In this article, the authors present a method for structuring complex computing systems by the use of what they term "recovery blocks", "conversations", and "fault-tolerant interfaces".
Journal ArticleDOI

System structure for software fault tolerance

TL;DR: In this article, the authors present and discuss the rationale behind a method for structuring complex computing systems by the use of what they term "recovery blocks," "conversations," and "fault" tolerant interfa...
Journal ArticleDOI

Performance-Related Reliability Measures for Computing Systems

TL;DR: These measures, which reflect the interaction between the reliability and the performance characteristics of computing systems, can be used to evaluate traditional computer architectures; gracefully degrading systems; and distributed systems.
Journal ArticleDOI

On the Optimum Checkpoint Interval

TL;DR: It is shown that the optimum checkpoint interval is a function of the load of the system, and it is proved that the total operating time between successive checkpoints should be a deterministic quantity in order to maximize the availability.
Journal ArticleDOI

Theories of Software Reliability: How Good Are They and How Can They Be Improved?

TL;DR: An examination of the assumptions used in early bug-counting models of software reliability shows them to be deficient and it is suggested that current theories are only the first step along what threatens to be a long road.