scispace - formally typeset
Search or ask a question

Showing papers on "Consensus published in 1993"


Journal ArticleDOI
TL;DR: Research on the consensus problem is surveyed, approaches are compared, applications are outlined, and directions for future work are suggested.
Abstract: The consensus problem is concerned with the agreement on a system status by the fault-free segment of a processor population in spite of the possible inadvertent or even malicious spread of disinformation by the fault segment of that population. The resulting protocols are useful throughout fault-tolerant distributed systems and will impact the design of other decision systems to come. This paper surveys research on the consensus problem, compares approaches, outlines applications, and suggests directions for future work.

428 citations


Journal ArticleDOI
TL;DR: It is proved using a combinatorial argument that any k-resilient protocol for the k-set agreement problem would satisfy the uncertainty condition, while this is not true for any (k−1)-resilients in a totally asynchronous system.
Abstract: We define the k-SET CONSENSUS PROBLEM as an extension of the CONSENSUS problem, where each processor decides on a single value such that the set of decided values in any run is of size at most k. We require the agreement condition that all values decided upon are initial values of some processor. We show that the problem has a simple (k−1)-resilient protocol in a totally asynchronous system. In an attempt to come up with a matching lower bound on the number of failures, we study the uncertainty condition, which requires that there must be some initial configuration from which all possible input values can be decided. We prove using a combinatorial argument that any k-resilient protocol for the k-set agreement problem would satisfy the uncertainty condition, while this is not true for any (k−1)-resilient protocol. This result seems to strengthen the conjecture that there is no k-resilient protocol for this problem. We prove this result for a restricted class of protocols. Our motivation for studying this problem is to test whether the number of choices allowed to the processors is related to the number of faults. We hope that this will provide intuition towards achieving better bounds for more practical problems that arise in distributed computing, e.g., the renaming problem. The larger goal is to characterize the boundary between possibility and impossibility in asynchronous systems given multiple faults.

376 citations


Journal ArticleDOI
TL;DR: A modified version of the protocol yields a weak shared coin whose bias is guaranteed to be in the range 1/2 ± ϵ regardless of scheduler behavior, and which is the first such protocol for the shared-memory model to guarantee that all processors agree on the outcome of the coin.

59 citations


Journal ArticleDOI
TL;DR: Cloture Votes—the protocol presented in this paper—takes further steps in this direction, by making consensus possible withn = 4t + 1,r = t - 1, and polynomial message size, and measuring the quality of a consensus protocol using the following parameters.
Abstract: TheDistributed Consensus problem involvesn processors each of which holds an initial binary value. At mostt processors may be faulty and ignore any protocol (even behaving maliciously), yet it is required that the nonfaulty processors eventually agree on a value that was initially held by one of them. We measure the quality of a consensus protocol using the following parameters; total number of processorsn, number of rounds of message exchanger, and maximal message sizem. The known lower bounds are respectively 3t + 1,t + 1, and 1.

49 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: It is shown that for any k
Abstract: In the classical consensus problem,each of n processors receives a private input value and produces a decision value which is one of the original input values,with the requirement that all processors decide the same value. A central result in distributed computing is that,in several standard models including the asynchronous shared-memory model,this problem has no determinis- tic solution. The k-set agreement problem is a generalization of the classical consensus proposed by Chaudhuri (Inform. and Comput.,105 (1993),pp. 132-158),where the agreement condition is weak- ened so that the decision values produced may be different,as long as the number of distinct values is at most k .F or n>k ≥ 2 it was not known whether this problem is solvable deterministically in the asynchronous shared memory model. In this paper,we resolve this question by showing that for any k

46 citations


Journal ArticleDOI
TL;DR: This work considers problems requiring consistent, simultaneous coordination in synchronous distributed systems and analyses these problems in terms of common knowledge in several failure models, showing that such problems cannot be solved, even in failure-free executions.
Abstract: There is a very close relationship between common knowledge and simultaneity in synchronous distributed systems. The analysis of several well-known problems in terms of common knowledge has led to round-optimal protocols for these problems, including Reliable Broadcast, Distributed Consensus, and the Distributed Firing Squad problem. These problems require that the correct processors coordinate their actions in some way but place no restrictions on the behaviour of the faulty processors. In systems with benign processor failures, however, it is reasonable to require that the actions of a faulty processor be consistent with those of the correct processors, assuming it performs any action at all. We consider problems requiring consistent, simultaneous coordination. We then analyze these problems in terms of common knowledge in several failure models. The analysis of these stronger problems requires a stronger definition of common knowledge, and we study the relationship between these two definitions. In many cases, the two definitions are actually equivalent, and simple modifications of previous solutions yield round-optimal solutions to these problems. When the definitions differ, however, we show that such problems cannot be solved, even in failure-free executions.

31 citations


Journal ArticleDOI
TL;DR: This paper shows how to achieve consensus in the butterfly network usingO(t+lognloglogn) one-bit parallel transmission steps, while tolerating the asymptotically optimal number of faulty processors (O(n/logn); and decreases the number of exceptions to O(t) by using additional links, while maintaining the same running time.
Abstract: The Distributed Consensus problem involves n processors each of which holds an initial binary value. At most t of the processors may be faulty and ignore any protocol (even behaving maliciously), yet it is required that the non-faulty processors eventually agree on a value that was initially held by one of them. In this paper we focus on consensus in networks whose degree is bounded, following the work of Dwork, Peleg, Pippenger and Upfal [8]. In such a context, complete consensus among all the correct processors is not possible and some exceptions must be allowed. We first show how to achieve consensus in the butterfly network using O(t + log n loglog n) one-bit parallel transmission steps, while tolerating the asymptotically optimal number of faulty processors (O(n/log n)) and having the asymptotically minimal number of exceptions (O(t log t)). This result considerably improves on the running time of existing butterfly consensus protocols [2, 8]. In particular, it replaces the running time of O(n log n loglog n) of [2] with an asymptotically optimal one. As in [8], we can then decrease the number of exceptions to O(t) by using additional links, while maintaining the same running time. The protocol is derived from a consensus protocol for completely connected networks that is interesting in its own right: it achieves Distributed Consensus with optimal number of processors, asymptotically optimal total bit transfer and nearly optimal number of rounds.

26 citations


Journal ArticleDOI
TL;DR: A randomized algorithm for the consensus problem in the message-passing model based on the algorithm of Aspnes and Herlihy [AH] in the shared-memory model is presented, which is the fastest known randomized algorithm that solves the consensusproblem against a strong fail-stop adversary with one-half resiliency.
Abstract: This paper presents a schematic algorithm for distributed systems . This schematic algorithm uses a "black-box" procedure for communica- tion, the output of which must meet two requirements : a global-order requirement and a deadlock-free requirement . This algorithm is valid in any distributed system model that can provide such a communication procedure that complies with these requirements . Two such models exist in an asynchro- nous fail-stop environment : one in the shared-memory model and one in the message-passing model . The implementation of the block-box procedure in these models enables us to translate existing algorithms between the two models whenever these algorithms are based on the schematic algorithm . We demonstrate this idea in two ways . First, we present a randomized algorithm for the consensus problem in the message-passing model based on the algorithm of Aspnes and Herlihy (AH) in the shared-memory model . This solution is the fastest known randomized algorithm that solves the consensus problem against a strong fail-stop adversary with one-half resiliency . Second, we solve the processor renaming problem in the shared-memory model based on the solution of Attiya et al .(ABD +)in the message-passing model . The existence of the solution to the renaming problem should be contrasted with the impossibility result for the consensus problem in the shared-memory model (CIL), (DDS), (LA).

20 citations


Journal ArticleDOI
TL;DR: A consensus protocol for n processes which can tolerate up to dn=2ei1 failures and which uses a single (2d1:5n i 1e)-valued shared register is presented.

12 citations



01 Aug 1993
TL;DR: This paper defines a clear semantics of the virtually-synchronous model, and shows that distributed commit can be solved by the model, providing an interesting broader picture of the problem of building fault-tolerant applications.
Abstract: The purpose of this paper is to define a clear semantics of the virtually-synchronous model, and to show that distributed commit can be solved by the model. This is in a sense not surprising, as it has been shown that distributed consensus can be solved in the asynchronous model with a very weak failure detector. Considering this result, the virtually-synchronous model become extremely powerful, and more basic than the transaction model, providing an interesting broader picture of the problem of building fault-tolerant applications.

Journal ArticleDOI
TL;DR: This paper presents an already known consensus protocol which has a cost of O(n 2 ) in the number of exchanged messages, and O( n ) in terms of time needed to arrive at an agreement, and presents several refinements to this protocol which make it linear-in the absence of failures.

Journal ArticleDOI
TL;DR: It is shown that a necessary and sufficient condition for the existence of a deterministic consensus protocol is delivery of each broadcast message to at least ⌈(n+k+1)/2⌊ processes in ann-process system subject tok crash failures with either eventual fair broadcasting or eventual full broadcasting.
Abstract: We consider consensus protocols in asynchronous distributed systems that are based on broadcast communication. We show that a necessary and sufficient condition for the existence of a deterministic consensus protocol is delivery of each broadcast message to at least ⌈(n + k + 1)/2⌉ processes in an n-process system subject to k crash failures with either eventual fair broadcasting or eventual full broadcasting. The broadcast model captures the idea of a broadcast communication medium, such as the Ethernet, in which messages, if delivered, are delivered immediately and in order but not necessarily to all processes.

Journal ArticleDOI
TL;DR: A fault-tolerant server implemented on top of a distributed operating system, the MACH microkernel, which provides to user applications with a client-server communication mechanism where replications is transparent.

01 Dec 1993
TL;DR: This report presents the concepts and mechanisms used in fault tolerant distributed systems, and introduces the concepts of fault tolerant group commucation and distributed consensus.
Abstract: ARRAY(0x80e72ac) Fault-tolerance is an unavoidable requirement in distributed systems. First, multiple resources imply multiple potential causes of failure so much research on distributed systems has aimed to ensure that dependability is not degraded by distribution. Second, fault tolerance can in itself be a motivating factor for distribution. Indeed, fault tolerance cannot be ensured without redundancy, and the distribution of processing and data on different processors provides an approach for structuring and managing this redundancy. In this report, we present the concepts and mechanisms used in fault tolerant distributed systems. A distributed application can be considered either as a set of processes exchanging messages or as a set of transactions acting on distributed data items. The known techniques of fault-tolerance are given for both computational models. We then introduce the concepts of fault tolerant group commucation and distributed consensus.

Book
01 Jan 1993
TL;DR: High speed interconnection of workstations: concepts, problems and experiences (B. Heinrichs, T. Meuser, O. Spaniol).
Abstract: High speed interconnection of workstations: concepts, problems and experiences (B. Heinrichs, T. Meuser, O. Spaniol). High performance architecture issues (D.A. Nicole). Issues in object-oriented distributed systems (S. Krakowiak). Distributed Algorithms. What is a deadlock? (Y.C. Tay). A distributed algorithm for resource management (J. Ezpeleta, S. Haddad). Getting cooperative environment by coordinating services through a network (P. Bergougnoux, F. Barrere, P. Vidal). A distributed consensus protocol with a coordinator (F. Guerra, S. Arevalo, A. Alvarez, J. Miranda). Using global state properties to attain mutual exclusion in distributed systems (J. Vila-Carbo). Programming communicating distributed reactive automata: the weak synchronous paradigm (E Boniol, M. Adelantado). Parallel implementations of two algorithms for solving linear programming problems (G.L. Reijns, R.M. Wiegers, G.-J. Boesschen Hospers). Performance Evaluation. Analysis of the quality of service in a MAN environment (M. Conti An approximate analysis of DQDB networks with the bandwith balancing mechanism (Y. Matsumoto). LAN distributed fault-tolerance (J. Miro-Julia). A statistical study of the factors that affect the performance of a class of parallel programs on a MIMD computer (R. Candlin, J. Phillips). A multiprocessor parallel disk system evaluation (J. Carretero, F. Perez, P. de Miguel, F. Garcfa, L. Alonso). A decomposition approximation method for closed queueing networks with fork / join subnetworks (B. Baynat, Y. Dallery). Petri Nets. Modelling and analysis of deterministic concurrent systems with bulk services and arrivals (E. Teruel, J.M. Colom, M. Silva). A protocol specification language with a high-level petri net semantics (B. Zouari, S. Haddad, M. Taghelit). Interconnect. A performance comparison between the Fieldbus protocol standards PROFIBUS and FIP (M. Ettl, U. Klehmet). Analysis of a class of polling protocols for Fieldbus networks (P. Raja, G. Noubir, L. Ruiz, J. Hemandez, M. Riese, J.D. Decotignie). A theory to increase the effective redundancy in Wormhole networks (J. Duato). Distributed Operating Systems. A systematic approach to load distribution strategies for distributed systems (C. Jacqmot, E. Milgrom). A cooperative algorithm for load balancing in interconnected transputer network (H. Guyennet, F. Spies). Uniform co-scheduling, using, object-oriented design techniques (N. Islam, R. Campbell). Distributed access to persistent objects (S.B. Lim, L. Xiao, R. Campbell). Parallel Simulation. Distributed simulation: a simulation system for the discrete event systems (B. Dado, P. Menhart, J. Safarik). Towards the distributed implementation of discrete event simulation languages (J. Miguel, M. Grafia). Design Methods. Enhancing structured analysis by timed statecharts for real-time and concurrency specification (M. von der Beeck). Heuristics driven real time software design. (Part contents).

Proceedings ArticleDOI
23 Mar 1993
TL;DR: The authors provide the first asymptotically optimal distributed consensus protocol for semi-synchronous systems that tolerates general omission failures and terminates faster than the best known protocols for these failure classes.
Abstract: In a distributed concensus protocol, a number of processors communicating by message passing start with some initial values. The protocol terminates with all nonfaulty processors agreeing on one of these values. The authors investigate the time needed to reach consensus in partially synchronous systems under various classes of processor failures. They provide the first asymptotically optimal distributed consensus protocol for semi-synchronous systems that tolerates general omission failures. When the failures occurring are restricted to omission and crash failures, the protocol terminates faster, matching the best known protocols for these failure classes. >