Hamsaz: replication coordination analysis and synthesis

doi:10.1145/3290387

Home
/
Papers
/
Hamsaz: replication coordination analysis and synthesis

Journal Article•DOI•

Hamsaz: replication coordination analysis and synthesis

Farzin Houshmand¹, Mohsen Lesani¹•Institutions (1)

University of California, Riverside¹

02 Jan 2019-Vol. 3, pp 1-32

TL;DR: This work presents novel coordination protocols that are parametric in terms of the analysis results and provide the well-coordination requirements and implemented a tool called Hamsaz that can automatically analyze the given object, instantiate the protocols and synthesize replicated objects.

read less

Abstract: Distributed system replication is widely used as a means of fault-tolerance and scalability. However, it provides a spectrum of consistency choices that impose a dilemma for clients between correctness, responsiveness and availability. Given a sequential object and its integrity properties, we automatically synthesize a replicated object that guarantees state integrity and convergence and avoids unnecessary coordination. Our approach is based on a novel sufficient condition for integrity and convergence called well-coordination that requires certain orders between conflicting and dependent operations. We statically analyze the given sequential object to decide its conflicting and dependent methods and use this information to avoid coordination. We present novel coordination protocols that are parametric in terms of the analysis results and provide the well-coordination requirements. We implemented a tool called Hamsaz that can automatically analyze the given object, instantiate the protocols and synthesize replicated objects. We have applied Hamsaz to a suite of use-cases and synthesized replicated objects that are significantly more responsive than the strongly consistent baseline.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Mergeable replicated data types

[...]

Gowtham Kaki¹, Swarn Priya¹, KC Sivaramakrishnan², Suresh Jagannathan¹•Institutions (2)

Purdue University¹, Indian Institute of Technology Madras²

10 Oct 2019

TL;DR: This work presents a fundamentally different approach to programming in the presence of replicated state based on the use of invertible relational specifications of an inductively-defined data type as a mechanism to capture salient aspects of the data type relevant to how its different instances can be safely merged in a replicated environment.

...read moreread less

Abstract: Programming geo-replicated distributed systems is challenging given the complexity of reasoning about different evolving states on different replicas. Existing approaches to this problem impose significant burden on application developers to consider the effect of how operations performed on one replica are witnessed and applied on others. To alleviate these challenges, we present a fundamentally different approach to programming in the presence of replicated state. Our insight is based on the use of invertible relational specifications of an inductively-defined data type as a mechanism to capture salient aspects of the data type relevant to how its different instances can be safely merged in a replicated environment. Importantly, because these specifications only address a data type's (static) structural properties, their formulation does not require exposing low-level system-level details concerning asynchrony, replication, visibility, etc. As a consequence, our framework enables the correct-by-construction synthesis of rich merge functions over arbitrarily complex (i.e., composable) data types. We show that the use of a rich relational specification language allows us to extract sufficient conditions to automatically derive merge functions that have meaningful non-trivial convergence properties. We incorporate these ideas in a tool called Quark, and demonstrate its utility via a detailed evaluation study on real-world benchmarks.

...read moreread less

21 citations

Book Chapter•DOI•

Proving the Safety of Highly-Available Distributed Objects

[...]

Sreeja Nair, Gustavo Petri, Marc Shapiro

25 Apr 2020

TL;DR: This work proposes a proof methodology for establishing that a given object maintains a given invariant, taking into account any concurrency control, for the subclass of state-based distributed systems.

...read moreread less

Abstract: To provide high availability in distributed systems, object replicas allow concurrent updates. Although replicas eventually converge, they may diverge temporarily, for instance when the network fails. This makes it difficult for the developer to reason about the object's properties , and in particular, to prove invariants over its state. For the sub-class of state-based distributed systems, we propose a proof methodology for establishing that a given object maintains a given invariant, taking into account any concurrency control. Our approach allows reasoning about individual operations separately. We demonstrate that our rules are sound, and we illustrate their use with some representative examples. We automate the rule using Boogie, an SMT-based tool.

...read moreread less

20 citations

Cites background or result from "Hamsaz: replication coordination an..."

...This correctness problem has been addressed before; however, previous works mostly consider the operation-based propagation approach [11, 13, 19, 24]....
[...]
...This is in contrast to other works on the verification of properties of replicated objects [11, 13]....
[...]
...Houshmand et al.[13] extends CISE by lowering the causal consistency requirements and generating concurrency control protocols....
[...]

DOI•

A Tour of Gallifrey, a Language for Geodistributed Programming

[...]

Matthew Milano, Rolph Recto, Tom Magrino, Andrew C. Myers

01 Jan 2019

TL;DR: This work proposes a new language, Gallifrey, which provides orthogonal replication through restrictions with merge strategies, contingencies for conflicts arising from concurrency, and branches, a novel concurrency control construct inspired by version control, to contain provisional behavior.

...read moreread less

Abstract: Programming efficient distributed, concurrent systems requires new abstractions that go beyond traditional sequential programming. But programmers already have trouble getting sequential code right, so simplicity is essential. The core problem is that low-latency, high-availability access to data requires replication of mutable state. Keeping replicas fully consistent is expensive, so the question is how to expose asynchronously replicated objects to programmers in a way that allows them to reason simply about their code. We propose an answer to this question in our ongoing work designing a new language, Gallifrey, which provides orthogonal replication through _restrictions_ with _merge strategies_, _contingencies_ for conflicts arising from concurrency, and _branches_, a novel concurrency control construct inspired by version control, to contain provisional behavior.

...read moreread less

17 citations

Journal Article•DOI•

Verifying replicated data types with typeclass refinements in Liquid Haskell

[...]

Yiyun Liu¹, James Parker¹, Patrick Redmond², Lindsey Kuper², Michael Hicks¹, Niki Vazou³ - Show less +2 more•Institutions (3)

University of Maryland, College Park¹, University of California, Santa Cruz², IMDEA³

13 Nov 2020

TL;DR: This paper presents an extension to Liquid Haskell that facilitates stating and semi-automatically proving properties of typeclasses, and implements a framework for programming distributed applications based on replicated data types (RDTs).

...read moreread less

Abstract: This paper presents an extension to Liquid Haskell that facilitates stating and semi-automatically proving properties of typeclasses. Liquid Haskell augments Haskell with refinement types—our work allows such types to be attached to typeclass method declarations, and ensures that instance implementations respect these types. The engineering of this extension is a modular interaction between GHC, the Glasgow Haskell Compiler, and Liquid Haskell’s core proof infrastructure. The design sheds light on the interplay between modular proofs and typeclass resolution, which in Haskell is coherent by default (meaning that resolution always selects the same implementation for a particular instantiating type), but in other dependently typed languages is not. We demonstrate the utility of our extension by using Liquid Haskell to modularly verify that 34 instances satisfy the laws of five standard typeclasses. More substantially, we implement a framework for programming distributed applications based on replicated data types (RDTs). We define a typeclass whose Liquid Haskell type captures the mathematical properties RDTs should satisfy; prove in Liquid Haskell that these properties are sufficient to ensure that replicas’ states converge despite out-of-order update delivery; implement (and prove correct) several instances of our RDT typeclass; and use them to build two realistic applications, a multi-user calendar event planner and a collaborative text editor.

...read moreread less

16 citations

Journal Article•DOI•

Replicated data types that unify eventual consistency and observable atomic consistency

[...]

Xin Zhao¹, Philipp Haller¹•Institutions (1)

Royal Institute of Technology¹

01 Aug 2020

TL;DR: This work proposes a new consistency protocol, the observable atomic consistency protocol (OACP), to make write-dominant applications as fast as possible and as consistent as needed, and provides a high-level programming interface to improve the efficiency and correctness of distributed programming.

...read moreread less

Abstract: Strong consistency is widely used in systems such as relational databases. In a distributed system, strong consistency ensures that all clients observe consistent data updates atomically on all servers. However, such systems need to sacrifice availability when synchronization occurs. We propose a new consistency protocol, the observable atomic consistency protocol (OACP) to make write-dominant applications as fast as possible and as consistent as needed. OACP combines the advantages of (1) mergeable data types, specifically, convergent replicated data types, to reduce synchronization and (2) reliable total order broadcast to provide on-demand strong consistency. We also provide a high-level programming interface to improve the efficiency and correctness of distributed programming. We present a formal, mechanized model of OACP in rewriting logic and verify key correctness properties using the model checking tool Maude. Furthermore, we provide a prototype implementation of OACP based on Akka, a widely-used actor-based middleware. Our experimental evaluation shows that OACP can reduce coordination overhead compared to the state-of-the-art Raft consensus protocol. Our results also suggest that OACP increases availability through mergeable data types and provides acceptable latency for achieving strong consistency, enabling a principled relaxation of strong consistency to improve performance.

...read moreread less

11 citations

1
2
3
4
…
5
6
7
8

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

Time, clocks, and the ordering of events in a distributed system

[...]

Leslie Lamport¹•Institutions (1)

CA Technologies¹

04 Oct 2019-Concurrency and Computation: Practice and Experience

TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.

...read moreread less

Abstract: The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

...read moreread less

8,381 citations

Journal Article•DOI•

Time, clocks, and the ordering of events in a distributed system

[...]

Leslie Lamport¹•Institutions (1)

CA Technologies¹

01 Jul 1978-Communications of The ACM

TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.

...read moreread less

6,804 citations

Journal Article•DOI•

Impossibility of distributed consensus with one faulty process

[...]

Michael J. Fischer¹, Nancy Lynch², Mike Paterson³•Institutions (3)

Yale University¹, Massachusetts Institute of Technology², University of Warwick³

01 Apr 1985-Journal of the ACM

TL;DR: In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process.

...read moreread less

Abstract: The consensus problem involves an asynchronous system of processes, some of which may be unreliable The problem is for the reliable processes to agree on a binary value In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process By way of contrast, solutions are known for the synchronous case, the “Byzantine Generals” problem

...read moreread less

4,389 citations

"Hamsaz: replication coordination an..." refers background or result in this paper

...There has been a known dilemma [Abadi 2012; Fischer et al. 1985; Gilbert and Lynch 2002, 2012] between strong and weak consistency of replicated objects....
[...]
...Following fundamental impossibility results [Fischer et al. 1985; Gilbert and Lynch 2002], this protocol has a trade-off between availability and consistency....
[...]

Proceedings Article•DOI•

Dynamo: amazon's highly available key-value store

[...]

Giuseppe deCandia¹, Deniz Hastorun¹, Madan Mohan Rao Jampani¹, Gunavardhan Kakulapati¹, Avinash Lakshman¹, Alex Pilchin¹, Swaminathan Sivasubramanian¹, Peter Sven Vosshall¹, Werner Vogels¹ - Show less +5 more•Institutions (1)

Amazon.com¹

14 Oct 2007

TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

...read moreread less

Abstract: Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

...read moreread less

4,349 citations

"Hamsaz: replication coordination an..." refers background in this paper

...…[Madhusudan and Thiagarajan 2001] to tolerate faults, online services rely on geo-replicated data stores [Cooper et al. 2008; Corbett et al. 2013; DeCandia et al. 2007; Li et al. 2012; Lloyd et al. 2011, 2013; Sovran et al. 2011] to manage the ever-growing amount of data and hand-held devices…...
[...]
...Embedded control systems replicate controllers [Madhusudan and Thiagarajan 2001] to tolerate faults, online services rely on geo-replicated data stores [Cooper et al. 2008; Corbett et al. 2013; DeCandia et al. 2007; Li et al. 2012; Lloyd et al. 2011, 2013; Sovran et al. 2011] to manage the ever-growing amount of data and hand-held devices replicate data for off-line use....
[...]

Journal Article•DOI•

The part-time parliament

[...]

Leslie Lamport

01 May 1998-ACM Transactions on Computer Systems

TL;DR: The Paxon parliament's protocol provides a new way of implementing the state machine approach to the design of distributed systems.

...read moreread less

Abstract: Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part-time legislators. The legislators maintained consistent copies of the parliamentary record, despite their frequent forays from the chamber and the forgetfulness of their messengers. The Paxon parliament's protocol provides a new way of implementing the state machine approach to the design of distributed systems.

...read moreread less

2,965 citations

"Hamsaz: replication coordination an..." refers methods in this paper

...Strongly consistent replication (via Viewstamp [Oki and Liskov 1988], Paxos [Lamport 1998] and Raft [Ongaro and Ousterhout 2014] protocols) guarantees the same total order of operations across all replicas....
[...]