scispace - formally typeset
Open AccessBook

Self-stabilization

TLDR
A formal impossibility proof shows that, in order to ensure the correct behavior of the system, less than one-third of the processors may be of the Byzantine type; that is, to design the system as if there were no (yesterday) past history—a system that can be started in any possible state of its state space.
Abstract
AULT tolerance and reliability are important issues for flight vehicles such as aircraft, space-shuttles, and satellites. A self-stabilizing system recovers automatically following disturbances that force the system to an arbitrary state. The self-stabilization concept is an essential property of any autonomous control/computing system. Important branches of distributed computing theory were initiated because of the need for fault-tolerance of aircraft computing devices. The Byzantine fault model, for example, was a creation of the NASA SIFT project of more than a couple of decades ago. The Byzantine fault model is an elegant abstraction of faults where it is assumed that faulty parts behave as adversaries. The idea is to use redundancy—for instance, in the number of processors—in order to overcome the behavior of faulty processors. This line of thinking fits the common practice in engineering in which the design of critical parts is done independently by several design teams to make sure that there is no mistake in the calculations. Analogously, in the Byzantine fault model, some processors are simultaneously computing the same calculations for implementing a robust flight controller; thus the flight controller can function well in spite of the faulty behavior of several of the processors. Faulty behavior cannot be anticipated, the most severe behavior is therefore assumed—one that is reminiscent of the behavior of an adversary in the Byzantine court, in which backstabbing was common. A formal impossibility proof shows that, in order to ensure the correct behavior of the system, less than one-third of the processors may be of the Byzantine type. The task examined is the agreement, or consensus, for which processors need to decide on the same output, which is the input of one of the processors. The decision can be viewed as choosing the common result among the results of the design teams in the engineering example. The intuition behind the impossibility result is as follows: assume you have met two persons, Alice and Bob, one of which is honest while the other is not. You may try to decide what to do by speaking with each of them. Because you do not know which of the two is honest, you have to find this out. You may try a direct question to Alice asking who among them is not honest; Alice will answer Bob, and Bob, if asked, will obviously answer Alice. Each of them may also describe the conversations he/she had with the other, knowing that no one listened to the communication between them. This symmetry in the weights of the answers of Alice and Bob make it impossible to decide. It is possible to formally prove that agreement can be achieved if, and only if, less than one-third of the processors are Byzantine (e.g. Ref. 7). Systems that tolerate Byzantine faults are designed for flight devices, which need to be extremely robust. In such a device, the assumptions made for reaching agreement can be violated: Is it reasonable to assume that, during any period of the execution, less than one-third of the processors are faulty? What happens if, for a short period, more than a third are faulty, or perhaps experience a weaker fault than a Byzantine fault (say, caused by a transient electric spark)? What happens if messages sent by non-faulty processors are lost in one instant of time? Seven years prior to the introduction of the Byzantine fault model, Edsger W. Dijkstra suggested an important fundamental fault tolerance property called self-stabilization; 3 that is, to design the system as if there were no (yesterday) past history—a system that can be started in any possible state of its state space. It would therefore not be assumed that consistency was maintained from the fixed initial state by always executing steps according to the program of the processors. Self-stabilizing systems thus overcome transient faults. Temporary violations of the assumptions made by the algorithm designer can be viewed as leaving the system in an arbitrary initial state from which the system resumes. Self-stabilizing systems work correctly when started in any initial system state. Thus, even if the system loses its consistency due to an unexpected temporary violation of the assumptions made, it converges to legal behavior once the assumptions start to hold again. Self-stabilization is a strong fault tolerance property for systems; it ensures automatic recovery once faults stop occurring. A self-stabilizing system is designed to start in any possible state where its components—e.g., processors, processes, communication links, communication buffers—are in an arbitrary state; i.e., arbitrary variable values,

read more

Citations
More filters
Journal ArticleDOI

Internet of things: Vision, applications and research challenges

TL;DR: A survey of technologies, applications and research challenges for Internetof-Things is presented, in which digital and physical entities can be linked by means of appropriate information and communication technologies to enable a whole new class of applications and services.
Proceedings ArticleDOI

Self-Managed Systems: an Architectural Challenge

TL;DR: Some of the current promising work in self-management is discussed and an outline three-layer reference model is presented as a context in which to articulate some of the main outstanding research challenges.
Proceedings ArticleDOI

Distributed computation in dynamic networks

TL;DR: A worst-case model in which the communication links for each round are chosen by an adversary, and nodes do not know who their neighbors for the current round are before they broadcast their messages is considered.
Journal ArticleDOI

Survey of local algorithms

TL;DR: This work surveys the state-of-the-art in the field of local algorithm design, covering impossibility results, deterministic local algorithms, randomized localgorithms, and local algorithms for geometric graphs.
Journal ArticleDOI

Dynamic networks: models and algorithms

TL;DR: This column surveys some recent work on dynamic network algorithms, focusing on the effect that model parameters such as the type of adversary, the network diameter, and the graph expansion can have on the performance of algorithms.
References
More filters
Journal ArticleDOI

Reaching Agreement in the Presence of Faults

TL;DR: It is shown that the problem is solvable for, and only for, n ≥ 3m + 1, where m is the number of faulty processors and n is the total number and this weaker assumption can be approximated in practice using cryptographic methods.
Journal ArticleDOI

Self-stabilizing microprocessor: analyzing and overcoming soft errors

TL;DR: This work presents design schemes for a self-stabilizing microprocessor and a new technique for analyzing the effect of soft errors and shows that the problem of computing the reliability of a circuit such that logical masking is taken into account is an NP-hard problem.

Self-Stabilizing Autonomic Recoverer for Eventual Byzantine Software (Extended Abstract)

TL;DR: A general yet practical framework and paradigm, based on a theoretical foundation, for the monitoring and restarting of systems, and an autonomic recoverer that monitors and restarts the system is proposed.