scispace - formally typeset
Open AccessJournal ArticleDOI

Safe replication through bounded concurrency verification

TLDR
A novel programming framework for replicated data types (RDTs) equipped with an automatic (bounded) verification technique that discovers and fixes weak consistency anomalies and shows that in practice, proving bounded safety guarantees typically generalize to the unbounded case.
Abstract
High-level data types are often associated with semantic invariants that must be preserved by any correct implementation. While having implementations enforce strong guarantees such as linearizability or serializability can often be used to prevent invariant violations in concurrent settings, such mechanisms are impractical in geo-distributed replicated environments, the platform of choice for many scalable Web services. To achieve high-availability essential to this domain, these environments admit various forms of weak consistency that do not guarantee all replicas have a consistent view of an application's state. Consequently, they often admit difficult-to-understand anomalous behaviors that violate a data type's invariants, but which are extremely challenging, even for experts, to understand and debug. In this paper, we propose a novel programming framework for replicated data types (RDTs) equipped with an automatic (bounded) verification technique that discovers and fixes weak consistency anomalies. Our approach, implemented in a tool called Q9, involves systematically exploring the state space of an application executing on top of an eventually consistent data store, under an unrestricted consistency model but with a finite concurrency bound. Q9 uncovers anomalies (i.e., invariant violations) that manifest as finite counterexamples, and automatically generates repairs for such anamolies by selectively strengthening consistency guarantees for specific operations. Using Q9, we have uncovered a range of subtle anomalies in implementations of well-known benchmarks, and have been able to apply the repairs it mandates to effectively eliminate them. Notably, these benchmarks were written adopting best practices suggested to manage distributed replicated state (e.g., they are composed of provably convergent RDTs (CRDTs), avoid mutable state, etc.). While the safety guarantees offered by our technique are constrained by the concurrency bound, we show that in practice, proving bounded safety guarantees typically generalize to the unbounded case.

read more

Citations
More filters
Posted Content

UniStore: A fault-tolerant marriage of causal and strong consistency (extended version).

TL;DR: UniStore as discussed by the authors combines causal and strong consistency to ensure liveness despite data center failures and ensure that a strong transaction takes a dependency on a causal transaction that is later lost because of a failure.
Posted Content

CISE3: Verifying Weakly Consistent Applications with Why3.

TL;DR: The goal of the tool is to aid the programmer reason about the correct balance of synchronization in the system and deduces which operations require synchronization in order for the program to safely execute in a distributed environment.
Book ChapterDOI

Implementation Correctness for Replicated Data Types, Categorically

TL;DR: This paper first gives categorical constructions for distilling an operational model from a specification, as well as its implementations, and then it defines a notion of implementation correctness via simulation.
Posted Content

A coordination-free, convergent, and safe replicated tree

TL;DR: In this article, the authors present a novel replicated tree that supports coordination-free concurrent atomic moves and provably maintains the tree invariant, and provide mechanized proof that the data structure is convergent and maintains the invariant.
Proceedings ArticleDOI

A categorical account of replicated data types

TL;DR: This work defines categories of visibility relations and arbitrations, shows the existence of relevant limits and colimits, and characterize rdt specifications as functors between such categories that preserve these additional structures.
References
More filters
Journal ArticleDOI

Cassandra: a decentralized structured storage system

TL;DR: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure.
Journal ArticleDOI

Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services

TL;DR: In this paper, it is shown that it is impossible to achieve consistency, availability, and partition tolerance in the asynchronous network model, and then solutions to this dilemma in the partially synchronous model are discussed.
Proceedings ArticleDOI

Managing update conflicts in Bayou, a weakly connected replicated storage system

TL;DR: Bayou as discussed by the authors is a replicated, weakly consistent storage system designed for a mobile computing environment that includes portable machines with less than ideal network connectivity, and it includes novel methods for conflict detection, called dependency checks, and per-write conflict resolution based on client-provid ed merge procedures.
Proceedings ArticleDOI

Towards robust distributed systems (abstract)

TL;DR: Several issues in an attempt to clean up the way the authors think about distributed systems, including the fault model, high availability, graceful degradation, data consistency, evolution, composition, and autonomy are looked at.
Related Papers (5)