scispace - formally typeset
Open Access

Automatically increasing fault tolerance in distributed systems

Reads0
Chats0
TLDR
This dissertation presents a complete study of the relationship between fault-tolerance and round complexity of translations, and develops new translations that are optimal and proves that some previously developed translations are optimal.
Abstract
Developing fault-tolerant distributed protocols is a difficult task. The difficulty of this task increases with the severity of the failures to be tolerated. One way to deal with this difficulty is to develop protocols tolerant of benign failures and then transform these protocols into ones that are tolerant of more severe failures. This transformation mechanism is called a translation. This dissertation considers a variety of processor failures and synchrony models. The failures studied range from simple stopping failures to arbitrary faulty behavior. The synchrony models range from systems in which processors are fully synchronized (synchronous systems) to systems in which processors are not synchronized at all (asynchronous systems). For all synchrony models, the dissertation gives general definitions of translations and of measures to evaluate their performance. The two measures considered are communication complexity and fault-tolerance. Communication complexity is the communication overhead incurred when using a translation. Fault-tolerance is the maximum proportion of processors that can be faulty without affecting the correctness of the translations. For synchronous systems, this dissertation presents a complete study of the relationship between fault-tolerance and round complexity of translations. It develops new translations that are optimal and proves that some previously developed translations are optimal. For asynchronous systems, it proves that some previously developed translations are optimal. For systems that are only partially synchronous this dissertation discusses some of the issues involved in designing efficient translations.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Nysiad: practical protocol transformation to tolerate Byzantine failures

TL;DR: Nysiad is presented, a system that implements a new technique for transforming a scalable distributed system or network protocol tolerant only of crash failures into one that tolerates arbitrary failures, including such failures as freeloading and malicious attacks.
Journal ArticleDOI

Simplifying fault-tolerance: providing the abstraction of crash failures

TL;DR: Methods that automatically translate algorithms tolerant of simple crash failures into ones tolerant of more severe failures are considered, showing that previously developed translaions to send-omission failures are optimal with respect to both fault-tolerance and round-complexity.
Book ChapterDOI

Making distributed applications robust

TL;DR: A novel translation of systems that are tolerant of crash failures to systems that is tolerant of Byzantine failures in an asynchronous environment is presented, making weaker assumptions than previous approaches.
Journal ArticleDOI

Time Bounds for Decision Problems in the Presence of Timing Uncertainty and Failures

TL;DR: This paper presents a new stretching technique for deriving lower bounds in the presence of late timing failures and yields the following lower bounds for a semi-synchronous model of distributed message-passing when there is inexact information about time and process failures.
Related Papers (5)