scispace - formally typeset
Search or ask a question

Showing papers on "Two-phase commit protocol published in 1980"


Journal ArticleDOI
TL;DR: A locking protocol to coordinate access to a distributed database and to maintain system consistency throughout normal and abnormal conditions is presented and a proposal for an extension aimed at optimizing operation of the algorithm to adapt to highly skewed distributions of activity is proposed.
Abstract: A locking protocol to coordinate access to a distributed database and to maintain system consistency throughout normal and abnormal conditions is presented. The proposed protocol is robust in the face of crashes of any participating site, as well as communication failures. Recovery from any number of failures during normal operation or any of the recovery stages is supported. Recovery is done in such a way that maximum forward progress is achieved by the recovery procedures. Integration of virtually any locking discipline including predicate lock methods is permitted by this protocol. The locking algorithm operates, and operates correctly, when the network is partitioned, either intentionally or by failure of communication lines. Each partition is able to continue with work local to it, and operation merges gracefully when the partitions are reconnected.A subroutine of the protocol, that assures reliable communication among sites, is shown to have better performance than two-phase commit methods. For many topologies of interest, the delay introduced by the overall protocol is not a direct function of the size of the network. The communications cost is shown to grow in a relatively slow, linear fashion with the number of sites participating in the transaction. An informal proof of the correctness of the algorithm is also presented in this paper.The algorithm has as its core a centralized locking protocol with distributed recovery procedures. A centralized controller with local appendages at each site coordinates all resource control, with requests initiated by application programs at any site. However, no site experiences undue load. Recovery is broken down into three disjoint mechanisms: for single node recovery, merge of partitions, and reconstruction of the centralized controller and tables. The disjointness of the mechanisms contributes to comprehensibility and ease of proof.The paper concludes with a proposal for an extension aimed at optimizing operation of the algorithm to adapt to highly skewed distributions of activity. The extension applies nicely to interconnected computer networks.

62 citations


Proceedings Article
01 Oct 1980
TL;DR: In this paper two strategies will be discussed for handling transactions during recovery, which selects for any transaction the strategy with minimal costs during recovery.
Abstract: If failures occur at one site of a distributed data base system, then in most cases multi-site-transactions will be affected. Since these transactions cannot commit, processing at remote sites is affected, too. Until the end of recovery the transaction throughput of these sites remains diminished. In this paper two strategies will be discussed for handling transactions during recovery. An algorithm will be given, which selects for any transaction the strategy with minimal costs during recovery.

2 citations