scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Hamsaz: replication coordination analysis and synthesis

02 Jan 2019-Vol. 3, pp 1-32
TL;DR: This work presents novel coordination protocols that are parametric in terms of the analysis results and provide the well-coordination requirements and implemented a tool called Hamsaz that can automatically analyze the given object, instantiate the protocols and synthesize replicated objects.
Abstract: Distributed system replication is widely used as a means of fault-tolerance and scalability. However, it provides a spectrum of consistency choices that impose a dilemma for clients between correctness, responsiveness and availability. Given a sequential object and its integrity properties, we automatically synthesize a replicated object that guarantees state integrity and convergence and avoids unnecessary coordination. Our approach is based on a novel sufficient condition for integrity and convergence called well-coordination that requires certain orders between conflicting and dependent operations. We statically analyze the given sequential object to decide its conflicting and dependent methods and use this information to avoid coordination. We present novel coordination protocols that are parametric in terms of the analysis results and provide the well-coordination requirements. We implemented a tool called Hamsaz that can automatically analyze the given object, instantiate the protocols and synthesize replicated objects. We have applied Hamsaz to a suite of use-cases and synthesized replicated objects that are significantly more responsive than the strongly consistent baseline.
Citations
More filters
Journal ArticleDOI
10 Oct 2019
TL;DR: This work presents a fundamentally different approach to programming in the presence of replicated state based on the use of invertible relational specifications of an inductively-defined data type as a mechanism to capture salient aspects of the data type relevant to how its different instances can be safely merged in a replicated environment.
Abstract: Programming geo-replicated distributed systems is challenging given the complexity of reasoning about different evolving states on different replicas. Existing approaches to this problem impose significant burden on application developers to consider the effect of how operations performed on one replica are witnessed and applied on others. To alleviate these challenges, we present a fundamentally different approach to programming in the presence of replicated state. Our insight is based on the use of invertible relational specifications of an inductively-defined data type as a mechanism to capture salient aspects of the data type relevant to how its different instances can be safely merged in a replicated environment. Importantly, because these specifications only address a data type's (static) structural properties, their formulation does not require exposing low-level system-level details concerning asynchrony, replication, visibility, etc. As a consequence, our framework enables the correct-by-construction synthesis of rich merge functions over arbitrarily complex (i.e., composable) data types. We show that the use of a rich relational specification language allows us to extract sufficient conditions to automatically derive merge functions that have meaningful non-trivial convergence properties. We incorporate these ideas in a tool called Quark, and demonstrate its utility via a detailed evaluation study on real-world benchmarks.

21 citations

Book ChapterDOI
25 Apr 2020
TL;DR: This work proposes a proof methodology for establishing that a given object maintains a given invariant, taking into account any concurrency control, for the subclass of state-based distributed systems.
Abstract: To provide high availability in distributed systems, object replicas allow concurrent updates. Although replicas eventually converge, they may diverge temporarily, for instance when the network fails. This makes it difficult for the developer to reason about the object's properties , and in particular, to prove invariants over its state. For the sub-class of state-based distributed systems, we propose a proof methodology for establishing that a given object maintains a given invariant, taking into account any concurrency control. Our approach allows reasoning about individual operations separately. We demonstrate that our rules are sound, and we illustrate their use with some representative examples. We automate the rule using Boogie, an SMT-based tool.

20 citations


Cites background or result from "Hamsaz: replication coordination an..."

  • ...This correctness problem has been addressed before; however, previous works mostly consider the operation-based propagation approach [11, 13, 19, 24]....

    [...]

  • ...This is in contrast to other works on the verification of properties of replicated objects [11, 13]....

    [...]

  • ...Houshmand et al.[13] extends CISE by lowering the causal consistency requirements and generating concurrency control protocols....

    [...]

DOI
01 Jan 2019
TL;DR: This work proposes a new language, Gallifrey, which provides orthogonal replication through restrictions with merge strategies, contingencies for conflicts arising from concurrency, and branches, a novel concurrency control construct inspired by version control, to contain provisional behavior.
Abstract: Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through _restrictions_ with _merge strategies_, _contingencies_ for conflicts arising from concurrency, and _branches_, a novel concurrency control construct inspired by version control, to contain provisional behavior.

17 citations

Journal ArticleDOI
13 Nov 2020
TL;DR: This paper presents an extension to Liquid Haskell that facilitates stating and semi-automatically proving properties of typeclasses, and implements a framework for programming distributed applications based on replicated data types (RDTs).
Abstract: This paper presents an extension to Liquid Haskell that facilitates stating and semi-automatically proving properties of typeclasses. Liquid Haskell augments Haskell with refinement types—our work allows such types to be attached to typeclass method declarations, and ensures that instance implementations respect these types. The engineering of this extension is a modular interaction between GHC, the Glasgow Haskell Compiler, and Liquid Haskell’s core proof infrastructure. The design sheds light on the interplay between modular proofs and typeclass resolution, which in Haskell is coherent by default (meaning that resolution always selects the same implementation for a particular instantiating type), but in other dependently typed languages is not. We demonstrate the utility of our extension by using Liquid Haskell to modularly verify that 34 instances satisfy the laws of five standard typeclasses. More substantially, we implement a framework for programming distributed applications based on replicated data types (RDTs). We define a typeclass whose Liquid Haskell type captures the mathematical properties RDTs should satisfy; prove in Liquid Haskell that these properties are sufficient to ensure that replicas’ states converge despite out-of-order update delivery; implement (and prove correct) several instances of our RDT typeclass; and use them to build two realistic applications, a multi-user calendar event planner and a collaborative text editor.

16 citations

Journal ArticleDOI
01 Aug 2020
TL;DR: This work proposes a new consistency protocol, the observable atomic consistency protocol (OACP), to make write-dominant applications as fast as possible and as consistent as needed, and provides a high-level programming interface to improve the efficiency and correctness of distributed programming.
Abstract: Strong consistency is widely used in systems such as relational databases. In a distributed system, strong consistency ensures that all clients observe consistent data updates atomically on all servers. However, such systems need to sacrifice availability when synchronization occurs. We propose a new consistency protocol, the observable atomic consistency protocol (OACP) to make write-dominant applications as fast as possible and as consistent as needed. OACP combines the advantages of (1) mergeable data types, specifically, convergent replicated data types, to reduce synchronization and (2) reliable total order broadcast to provide on-demand strong consistency. We also provide a high-level programming interface to improve the efficiency and correctness of distributed programming. We present a formal, mechanized model of OACP in rewriting logic and verify key correctness properties using the model checking tool Maude. Furthermore, we provide a prototype implementation of OACP based on Akka, a widely-used actor-based middleware. Our experimental evaluation shows that OACP can reduce coordination overhead compared to the state-of-the-art Raft consensus protocol. Our results also suggest that OACP increases availability through mergeable data types and provides acceptable latency for achieving strong consistency, enabling a principled relaxation of strong consistency to improve performance.

11 citations

References
More filters
Book ChapterDOI
Leslie Lamport1
TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Abstract: The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

8,381 citations

Journal ArticleDOI
Leslie Lamport1
TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Abstract: The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

6,804 citations

Journal ArticleDOI
TL;DR: In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process.
Abstract: The consensus problem involves an asynchronous system of processes, some of which may be unreliable The problem is for the reliable processes to agree on a binary value In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process By way of contrast, solutions are known for the synchronous case, the “Byzantine Generals” problem

4,389 citations


"Hamsaz: replication coordination an..." refers background or result in this paper

  • ...There has been a known dilemma [Abadi 2012; Fischer et al. 1985; Gilbert and Lynch 2002, 2012] between strong and weak consistency of replicated objects....

    [...]

  • ...Following fundamental impossibility results [Fischer et al. 1985; Gilbert and Lynch 2002], this protocol has a trade-off between availability and consistency....

    [...]

Proceedings ArticleDOI
14 Oct 2007
TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Abstract: Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

4,349 citations


"Hamsaz: replication coordination an..." refers background in this paper

  • ...…[Madhusudan and Thiagarajan 2001] to tolerate faults, online services rely on geo-replicated data stores [Cooper et al. 2008; Corbett et al. 2013; DeCandia et al. 2007; Li et al. 2012; Lloyd et al. 2011, 2013; Sovran et al. 2011] to manage the ever-growing amount of data and hand-held devices…...

    [...]

  • ...Embedded control systems replicate controllers [Madhusudan and Thiagarajan 2001] to tolerate faults, online services rely on geo-replicated data stores [Cooper et al. 2008; Corbett et al. 2013; DeCandia et al. 2007; Li et al. 2012; Lloyd et al. 2011, 2013; Sovran et al. 2011] to manage the ever-growing amount of data and hand-held devices replicate data for off-line use....

    [...]

Journal ArticleDOI
TL;DR: The Paxon parliament's protocol provides a new way of implementing the state machine approach to the design of distributed systems.
Abstract: Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part-time legislators. The legislators maintained consistent copies of the parliamentary record, despite their frequent forays from the chamber and the forgetfulness of their messengers. The Paxon parliament's protocol provides a new way of implementing the state machine approach to the design of distributed systems.

2,965 citations


"Hamsaz: replication coordination an..." refers methods in this paper

  • ...Strongly consistent replication (via Viewstamp [Oki and Liskov 1988], Paxos [Lamport 1998] and Raft [Ongaro and Ousterhout 2014] protocols) guarantees the same total order of operations across all replicas....

    [...]