scispace - formally typeset
Search or ask a question
Topic

State machine replication

About: State machine replication is a research topic. Over the lifetime, 525 publications have been published within this topic receiving 24983 citations. The topic is also known as: replicated state machine.


Papers
More filters
Proceedings ArticleDOI
22 Feb 1999
TL;DR: A new replication algorithm that is able to tolerate Byzantine faults that works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude.
Abstract: This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3% slower than a standard unreplicated NFS.

3,562 citations

Journal ArticleDOI
TL;DR: The state machine approach is a general method for implementing fault-tolerant services in distributed systems and protocols for two different failure models—Byzantine and fail stop are described.
Abstract: The state machine approach is a general method for implementing fault-tolerant services in distributed systems. This paper reviews the approach and describes protocols for two different failure models—Byzantine and fail stop. Systems reconfiguration techniques for removing faulty components and integrating repaired components are also discussed.

2,559 citations

Journal ArticleDOI
TL;DR: A new replication algorithm, BFT, is described that can be used to build highly available systems that tolerate Byzantine faults and is used to implement the first Byzantine-fault-tolerant NFS file system, BFS.
Abstract: Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2p faster to 24p slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the BFT library can be used to build practical systems that tolerate Byzantine faults.

2,190 citations

Proceedings Article
19 Jun 2014
TL;DR: Raft is a consensus algorithm for managing a replicated log that separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered.
Abstract: Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to (multi-)Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Results from a user study demonstrate that Raft is easier for students to learn than Paxos. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety.

1,811 citations

Journal ArticleDOI
TL;DR: Fault-tolerant consensus protocols are given for various cases of partial synchrony and various fault models that allow partially synchronous processors to reach some approximately common notion of time.
Abstract: The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound D on the time required for a message to be sent from one processor to another and a known fixed upper bound P on the relative speeds of different processors. In an asynchronous system no fixed upper bounds D and P exist. In one version of partial synchrony, fixed bounds D and P exist, but they are not known a priori. The problem is to design protocols that work correctly in the partially synchronous system regardless of the actual values of the bounds D and P. In another version of partial synchrony, the bounds are known, but are only guaranteed to hold starting at some unknown time T, and protocols must be designed to work correctly regardless of when time T occurs. Fault-tolerant consensus protocols are given for various cases of partial synchrony and various fault models. Lower bounds that show in most cases that our protocols are optimal with respect to the number of faults tolerated are also given. Our consensus protocols for partially synchronous processors use new protocols for fault-tolerant “distributed clocks” that allow partially synchronous processors to reach some approximately common notion of time.

1,613 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
84% related
Cache
59.1K papers, 976.6K citations
82% related
Scalability
50.9K papers, 931.6K citations
79% related
Network packet
159.7K papers, 2.2M citations
79% related
Wireless ad hoc network
49K papers, 1.1M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202144
202051
201956
201838
201737
201629