Practical byzantine fault tolerance and proactive recovery

doi:10.1145/571637.571640

Journal ArticleDOI

Practical byzantine fault tolerance and proactive recovery

Miguel Castro, +1 more

- 01 Nov 2002 -

ACM Transactions on Computer Systems

- Vol. 20, Iss: 4, pp 398-461

Chats0

TLDR

A new replication algorithm, BFT, is described that can be used to build highly available systems that tolerate Byzantine faults and is used to implement the first Byzantine-fault-tolerant NFS file system, BFS.

Abstract:

Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2p faster to 24p slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the BFT library can be used to build practical systems that tolerate Byzantine faults.

Practical byzantine fault tolerance and proactive recovery

Citations

Deconstructing Stellar Consensus.

Quantitative survivability evaluation of three virtual machine-based server architectures

A Decentralized Sharding Service Network Framework with Scalability

Coping with dependent failures in distributed systems

Towards Trustworthy Integrated Clinical Environments

References

Time, clocks, and the ordering of events in a distributed system

Time, clocks, and the ordering of events in a distributed system

The Byzantine Generals Problem

The Byzantine generals problem

Impossibility of distributed consensus with one faulty process

Related Papers (5)

The Byzantine Generals Problem

Implementing fault-tolerant services using the state machine approach: a tutorial

Practical Byzantine fault tolerance

Impossibility of distributed consensus with one faulty process

The part-time parliament