scispace - formally typeset
Open AccessProceedings ArticleDOI

Replicated data types: specification, verification, optimality

TLDR
This work proposes a framework for specifying replicated data types using relations over events and verifying their implementations using replication-aware simulations, and shows how to specify consistency of replicated stores with multiple objects axiomatically, in analogy to prior work on weak memory models.
Abstract
Geographically distributed systems often rely on replicated eventually consistent data stores to achieve availability and performance. To resolve conflicting updates at different replicas, researchers and practitioners have proposed specialized consistency protocols, called replicated data types, that implement objects such as registers, counters, sets or lists. Reasoning about replicated data types has however not been on par with comparable work on abstract data types and concurrent data types, lacking specifications, correctness proofs, and optimality results.To fill in this gap, we propose a framework for specifying replicated data types using relations over events and verifying their implementations using replication-aware simulations. We apply it to 7 existing implementations of 4 data types with nontrivial conflict-resolution strategies and optimizations (last-writer-wins register, counter, multi-value register and observed-remove set). We also present a novel technique for obtaining lower bounds on the worst-case space overhead of data type implementations and use it to prove optimality of 4 implementations. Finally, we show how to specify consistency of replicated stores with multiple objects axiomatically, in analogy to prior work on weak memory models. Overall, our work provides foundational reasoning tools to support research on replicated eventually consistent stores.

read more

Content maybe subject to copyright    Report

HAL Id: hal-00934311
https://hal.inria.fr/hal-00934311
Submitted on 21 Jan 2014
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Replicated Data Types: Specication, Verication,
Optimality
Sebastian Burckhardt, Alexey Gotsman, Hongseok Yang, Marek Zawirski
To cite this version:
Sebastian Burckhardt, Alexey Gotsman, Hongseok Yang, Marek Zawirski. Replicated Data Types:
Specication, Verication, Optimality. POPL 2014: 41st ACM SIGPLAN-SIGACT Symposium
on Principles of Programming Languages, Jan 2014, San Diego, CA, United States. pp.271-284,
�10.1145/2535838.2535848�. �hal-00934311�

Replicated Data Types: Specification, Verification, Optimality
Sebastian Burckhardt
Microsoft Research
Alexey Gotsman
IMDEA Software Institute
Hongseok Yang
University of Oxford
Marek Zawirski
INRIA & UPMC-LIP6
Abstract
Geographically distributed systems often rely on replicated eventu-
ally consistent data stores to achieve availability and performance.
To resolve conflicting updates at different replicas, researchers
and practitioners have proposed specialized consistency protocols,
called replicated data types, that implement objects such as reg-
isters, counters, sets or lists. Reasoning about replicated data types
has however not been on par with comparable work on abstract data
types and concurrent data types, lacking specifications, correctness
proofs, and optimality results.
To fill in this gap, we propose a framework for specifying repli-
cated data types using relations over events and verifying their im-
plementations using replication-aware simulations. We apply it to
7 existing implementations of 4 data types with nontrivial conflict-
resolution strategies and optimizations (last-writer-wins register,
counter, multi-value register and observed-remove set). We also
present a novel technique for obtaining lower bounds on the worst-
case space overhead of data type implementations and use it to
prove optimality of 4 implementations. Finally, we show how to
specify consistency of replicated stores with multiple objects ax-
iomatically, in analogy to prior work on weak memory models.
Overall, our work provides foundational reasoning tools to support
research on replicated eventually consistent stores.
Categories and Subject Descriptors D.2.4 [Software Engineer-
ing]: Software/Program Verification; F.3.1 [Logics and Meanings
of Programs]: Specifying and Verifying and Reasoning about Pro-
grams
Keywords Replication; eventual consistency; weak memory
1. Introduction
To achieve availability and scalability, many networked computing
systems rely on replicated stores, allowing multiple clients to issue
operations on shared data on a number of replicas, which commu-
nicate changes to each other using message passing. For example,
large-scale Internet services rely on geo-replication, which places
data replicas in geographically distinct locations, and applications
for mobile devices store replicas locally to support offline use. One
benefit of such architectures is that the replicas remain locally avail-
able to clients even when network connections fail. Unfortunately,
the famous CAP theorem [19] shows that such high Availability
and tolerance to network Partitions are incompatible with strong
Consistency, i.e., the illusion of a single centralized replica han-
dling all operations. For this reason, modern replicated stores often
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from permissions@acm.org.
POPL ’14, January 22–24, 2014, San Diego, CA, USA.
Copyright
c
2014 ACM 978-1-4503-2544-8/14/01. . . $15.00.
http://dx.doi.org/10.1145/2535838.2535848
provide weaker forms of consistency, commonly dubbed eventual
consistency [36]. ‘Eventual’ usually refers to the guarantee that
if clients stop issuing update requests, then the replicas
will eventually reach a consistent state.
(1)
Eventual consistency is a hot research area, and new replicated
stores implementing it appear every year [1, 13, 16, 18, 23, 27,
33, 34, 37]. Unfortunately, their semantics is poorly understood:
the very term eventual consistency is a catch-all buzzword, and
different stores claiming to be eventually consistent actually pro-
vide subtly different guarantees. The property (1), which is a form
of quiescent consistency, is too weak to capture these. Although
it requires the replicas to converge to the same state eventually, it
doesn’t say which one it will be. Furthermore, (1) does not provide
any guarantees in realistic scenarios when updates never stop ar-
riving. The difficulty of reasoning about the behavior of eventually
consistent stores comes from a multitude of choices to be made in
their design, some of which we now explain.
Allowing the replicas to be temporarily inconsistent enables
eventually consistent stores to satisfy clients’ requests from the
local replica immediately, and broadcast the changes to the other
replicas only after the fact, when the network connection permits
this. However, this means that clients can concurrently issue con-
flicting operations on the same data item at different replicas; fur-
thermore, if the replicas are out-of-sync, these operations will be
applied to its copies in different states. For example, two users shar-
ing an online store account can write two different zip codes into
the delivery address; the same users connected to replicas with dif-
ferent views of the shopping cart can also add and concurrently
remove the same product. In such situations the store needs to en-
sure that, after the replicas exchange updates, the changes by dif-
ferent clients will be merged and all conflicts will be resolved in a
meaningful way. Furthermore, to ensure eventual consistency (1),
the conflict resolution has to be uniform across replicas, so that, in
the end, they converge to the same state.
The protocols achieving this are commonly encapsulated within
replicated data types [1, 10, 16, 18, 31, 33, 34] that implement ob-
jects, such as registers, counters, sets or lists, with various conflict-
resolution strategies. The strategies can be as simple as establishing
a total order on all operations using timestamps and letting the last
writer win, but can also be much more subtle. Thus, a data type
can detect the presence of a conflict and let the client deal with it:
e.g., the multi-value register used in Amazon’s Dynamo key-value
store [18] would return both conflicting zip codes in the above ex-
ample. A data type can also resolve the conflict in an application-
specific way. For example, the observed-remove set [7, 32] pro-
cesses concurrent operations trying to add and remove the same
element so that an add always wins, an outcome that may be appro-
priate for a shopping cart.
Replicated data type implementations are often nontrivial, since
they have to maintain not only client-observable object state, but
also metadata needed to detect and resolve conflicts and to han-
dle network failures. This makes reasoning about their behavior
challenging. The situation gets only worse if we consider multi-

ple replicated objects: in this case, asynchronous propagation of
updates between replicas may lead to counterintuitive behaviors—
anomalies, in database terminology. The following code illustrates
an anomaly happening in real replicated stores [1, 18]:
Replica r
1
x.wr(post)
i = y.rd // comment Replica r
2
y.wr(comment)
j = x.rd // empty
(2)
We have two clients reading from and writing to register objects x
and y at two different replicas; i and j are client-local variables.
The first client makes a post by writing to x at replica r
1
and then
comments on the post by writing to y. After every write, replica r
1
might send a message with the update to replica r
2
. If the messages
carrying the writes of post to x and comment to y arrive to replica r
2
out of the order they were issued in, the second client can see the
comment, but not the post. Different replicated stores may allow
such an anomaly or not, and this has to be taken into account when
reasoning about them.
In this paper, we propose techniques for reasoning about even-
tually consistent replicated stores in the following three areas.
1. Specification. We propose a comprehensive framework for
specifying the semantics of replicated stores. Its key novel com-
ponent is replicated data type specifications (§3), which provide
the first way of specifying the semantics of replicated objects
with advanced conflict resolution declaratively, like abstract data
types [25]. We achieve this by defining the result of a data type
operation not by a function of states, but of operation contexts
sets of events affecting the result of the operation, together with
some relationships between them. We show that our specifications
are sufficiently flexible to handle data types representing a variety
of conflict-resolution strategies: last-write-wins register, counter,
multi-value register and observed-remove set.
We then specify the semantics of a whole store with multiple
objects, possibly of different types, by consistency axioms (§7),
which constrain the way the store processes incoming requests in
the style of weak shared-memory models [2] and thus define the
anomalies allowed. As an illustration, we define consistency mod-
els used in existing replicated stores, including a weak form of
eventual consistency [1, 18] and different kinds of causal consis-
tency [23, 27, 33, 34]. We find that, when specialized to last-writer-
wins registers, these specifications are very close to fragments of
the C/C++ memory model [5]. Thus, our specification framework
generalizes axiomatic shared-memory models to replicated stores
with nontrivial conflict resolution.
2. Verification. We propose a method for proving the correctness
of replicated data type implementations with respect to our speci-
fications and apply it to seven existing implementations of the four
data types mentioned above, including those with nontrivial opti-
mizations. Reasoning about the implementations is difficult due to
the highly concurrent nature of a replicated store, with multiple
replicas simultaneously updating their object copies and exchang-
ing messages. We address this challenge by proposing replication-
aware simulations (§5). Like classical simulations from data refine-
ment [21], these associate a concrete state of an implementation
with its abstract description—structures on events, in our case. To
combat the complexity of replication, they consider the state of an
object at a single replica or a message in transit separately and as-
sociate it with abstract descriptions of only those events that led to
it. Verifying an implementation then requires only reasoning about
an instance of its code running at a single replica.
Here, however, we have to deal with another challenge: code at
a single replica can access both the state of an object and a message
at the same time, e.g., when updating the former upon receiving the
latter. To reason about such code, we often need to rely on cer-
tain agreement properties correlating the abstract descriptions of
the message and the object state. Establishing these properties re-
quires global reasoning. Fortunately, we find that agreement prop-
erties needed to prove realistic implementations depend only on ba-
sic facts about their messaging behavior and can thus be established
once for broad classes of data types. Then a particular implementa-
tion within such a class can be verified by reasoning purely locally.
By carefully structuring reasoning in this way, we achieve easy
and intuitive proofs of single data type implementations. We then
lift these results to stores with multiple objects of different types by
showing how consistency axioms can be proved given properties of
the transport layer and data type implementations (§7).
3. Optimality. Replicated data type designers strive to optimize
their implementations; knowing that one is optimal can help guide
such efforts in the most promising direction. However, proving
optimality is challengingly broad as it requires quantifying over all
possible implementations satisfying the same specification.
For most data types we studied, the primary optimization target
is the size of the metadata needed to resolve conflicts or handle net-
work failures. To establish optimality of metadata size, we present
a novel method for proving lower bounds on the worst-case meta-
data overhead of replicated data types—the proportion of metadata
relative to the client-observable content. The main idea is to find a
large family of executions of an arbitrary correct implementation
such that, given the results of data type operations from a certain
fixed point in any of the executions, we can recover the previous
execution history. This implies that, across executions, the states at
this point are distinct and thus must have some minimal size.
Using our method, we prove that four of the implementations
we verified have an optimal worst-case metadata overhead among
all implementations satisfying the same specification. Two of these
(counter, last-writer-wins register) are well-known; one (optimized
observed-remove set [6]) is a recently proposed nontrivial opti-
mization; and one (optimized multi-value register) is a small im-
provement of a known implementation [33] that we discovered dur-
ing a failed attempt to prove optimality of the latter. We summarize
all the bounds we proved in Fig. 10.
We hope that the theoretical foundations we develop will help
in exploring the design space of replicated data types and replicated
eventually consistent stores in a systematic way.
2. Replicated Data Types
We now describe our formal model for replicated stores and intro-
duce replicated data type implementations, which implement op-
erations on a single object at a replica and the protocol used by
replicas to exchange updates to this object. Our formalism follows
closely the models used by replicated data type designers [33].
A replicated store is organized as a collection of named ob-
jects Obj = {x, y, z, . . . }. Each object is hosted at all replicas
r, s Re p l ic a ID. The sets of objects and replicas may be infinite,
to model their dynamic creation. Clients interact with the store by
performing operations on objects at a specified replica. Each ob-
ject x Obj has a type τ = type(x) Type, whose type signa-
ture (Op
τ
, Val
τ
) determines the set of supported operations Op
τ
(ranged over by o) and the set of their return values Val
τ
(ranged
over by a, b, c, d). We assume that a special value Val
τ
be-
longs to all sets Val
τ
and is used for operations that return no
value. For example, we can define a counter data type ctr and
an integer register type intreg with operations for reading, incre-
menting or writing an integer a: Val
ctr
= Val
intreg
= Z {⊥},
Op
ctr
= {rd, inc} and Op
intreg
= {rd} {wr(a) | a Z}.
We also assume sets Message of messages (ranged over by m)
and timestamps Timestam p (ranged over by t). For simplicity, we
let timestamps be positive integers: Timestamp = N
1
.
DEFINITION 1. A replicated data type implementation for a data
type τ is a tuple D
τ
= , ~σ
0
, M, do, send, receive), where ~σ
0
:

Figure 1. Illustrations of a concrete (a) and two abstract executions (b, c)
1: x.inc
2: send
3: receive
4: x.inc
5: send
6: receive
7: x.rd
8: receive
to r
3
r
1
r
2
r
3
(a)
1: x.inc
4: x.inc
7: x.rd: 1
vis
vis
(b)
1: x.inc
4: x.inc
7: x.rd: 2
vis
vis
(c)
vis
ReplicaID Σ, M Message and
do : Op
τ
× Σ × Timestamp Σ × Val
τ
;
send : Σ Σ × M ; receive : Σ × M Σ.
We denote a component of D
τ
, such as do, by D
τ
.do. A tuple D
τ
defines the class of implementations of objects with type τ, meant
to be instantiated for every such object in the store. Σ is the set of
states (ranged over by σ) used to represent the current state of the
object, including metadata, at a single replica. The initial state at
every replica is given by ~σ
0
.
D
τ
provides three methods that the rest of the store implemen-
tation can call at a given replica; we assume that these methods
execute atomically. We visualize store executions resulting from re-
peated calls to the methods as in Fig. 1(a), by arranging the calls on
several vertical timelines corresponding to replicas at which they
occur and denoting the delivery of messages by diagonal arrows. In
§4, we formalize them as sequences of transitions called concrete
executions and define the store semantics by their sets; the intuition
given by Fig. 1(a) should suffice for the following discussion.
A client request to perform an operation o Op
τ
triggers the
call do(o, σ, t) (e.g., event 1 in Fig. 1(a)). This takes the current
state σ Σ of the object at the replica where the request is
issued and a timestamp t Timestamp provided by the rest of
the store implementation and produces the updated object state and
the return value of the operation. The data type implementation can
use the timestamp provided, e.g., to implement the last-writer-wins
conflict-resolution strategy mentioned in §1, but is free to ignore it.
Nondeterministically, in moments when the network is able to
accept messages, a replica calls send. Given the current state of the
object at the replica, send produces a message in M to broadcast to
all other replicas (event 2 in Fig. 1(a)); sometimes send also alters
the state of the object. Using broadcast rather than point-to-point
communication does not limit generality, since we can always tag
messages with the intended receiver. Another replica that receives
the message generated by send calls receive to merge the enclosed
update into its copy of the object state (event 3 in Fig. 1(a)).
We now reproduce three replicated data type implementations
due to Shapiro et al. [33].They fall into two categories: in op-
based implementations, each message carries a description of the
latest operations that the sender has performed, and in state-based
implementations, a description of all operations it knows about.
Op-based counter (ctr). Fig. 2(a) shows an implementation of
the ctr data type. A replica stores a pair ha, di, where a is the
current value of the counter, and d is the number of increments
performed since the last broadcast (we use angle brackets for tuples
representing states and messages). The send method returns d and
resets it; the receive method adds the content of the message to
a. This implementation is correct, as long as each message is
delivered exactly once (we show how to prove this in §5). Since inc
operations commute, they never conflict: applying them in different
orders at different replicas yields the same final state.
State-based counter (ctr). The implementation in Fig. 2(b)
summarizes the currently known history by recording the contri-
Figure 2. Three replicated data type implementations
(a) Op-based counter (ctr)
Σ = N
0
× N
0
M = N
0
~σ
0
= λr. h0, 0i
do(rd, ha, di, t) = (ha, di, a)
do(inc, ha, di, t) = (ha + 1, d + 1i, )
send(ha, di) = (ha, 0i, d)
receive(ha, di, d
) = ha + d
, di
(b) State-based counter (ctr)
Σ = ReplicaID × (ReplicaID N
0
)
σ
0
= λr. hr, λs. 0i
M = ReplicaID N
0
do(rd, hr, vi, t) = (hr, vi,
P
{v(s) | s ReplicaID})
do(inc, hr, vi, t) = (hr, v[r 7→ v(r) + 1]i, )
send(hr, vi) = (hr, vi, v)
receive(hr, vi, v
) = hr, (λs. max{v(s), v
(s)})i
(c) State-based last-writer-wins register (intreg)
Σ = Z × (Timestamp {0})
~σ
0
= λr. h0, 0i
M = Σ
do(rd, ha, ti, t
) = (ha, ti, a)
do(wr(a
), ha, ti, t
) = if t < t
then (ha
, t
i, ) else (ha, ti, )
send(ha, ti) = (ha, ti, ha, ti)
receive(ha, ti, ha
, t
i) = if t < t
then ha
, t
i else ha, ti
bution of every replica to the counter value separately (reminiscent
of vector clocks [29]). A replica stores its identifier r and a vector
v such that for each replica s the entry v(s) gives the number of
increments made by clients at s that have been received by r. A
rd operation returns the sum of all entries in the vector. An inc
operation increments the entry for the current replica. We denote
by v[i 7→ j] the function that has the same value as v everywhere,
except for i, where it has the value j. The send method returns the
vector, and the receive method takes the maximum of each entry in
the vectors v and v
given to it. This is correct because an entry for s
in either vector reflects a prefix of the sequence of increments done
at replica s. Hence, we know that min{v(s), v
(s)} increments by
s are taken into account both in v(s ) and in v
(s).
State-based last-writer-wins (LWW) register (intreg). Un-
like counters, registers have update operations that are not com-
mutative. To resolve conflicts, the implementation in Fig. 2 uses
the last-writer-wins strategy, creating a total order on writes by as-
sociating a unique timestamp with each of them. A state contains
the current value, returned by rd, and the timestamp at which it was
written (initially, we have 0 instead of a timestamp). A wr(a
) com-
pares its timestamp t
with the timestamp t of the current value a
and sets the value to the one with the highest timestamp. Note that
here we have to allow for t
< t, since we do not make any assump-
tions about timestamps apart from uniqueness: e.g., the rest of the
store implementation can compute them using physical or Lamport
clocks [22]. We show how to state assumptions about timestamps in
§4. The send method just returns the state, and the receive method
chooses the winning value by comparing the timestamps in the cur-
rent state and the message, like wr.
State-based vs. op-based. State-based implementations con-
verge to a consistent state faster than op-based implementations be-
cause they are transitively delivering, meaning that they can prop-
agate updates indirectly. For example, when using the counter in
Fig. 2(b), in the execution in Fig. 1(a) the read at r
3
(event 7) re-
turns 2, even though the message from r
1
has not arrived yet, be-
cause r
3
learns about r
1
s update via r
2
. State-based implementa-
tions are also resilient against transport failures like message loss,
reordering, or duplication. Op-based implementations require the
replicated store using them to mask such failures (e.g., using mes-
sage sequence numbers, retransmission buffers, or reorder buffers).

The potential weakness of state-based implementations is the
size of states and messages, which motivates our examination of
space optimality in §6. For example, we show that the counter
in Fig. 2(b) is optimal, meaning that no counter implementation
satisfying the same requirements (transitive delivery and resilience
against message loss, reordering, and duplication) can do better.
3. Specifying Replicated Data Types and Stores
Consider the concrete execution in Fig. 1(a). What are valid return
values for the read in event 7? Intuitively, 1 or 2 can be justifiable,
but not 100. We now present a framework for specifying the ex-
pected outcome declaratively, without referring to implementation
details. For example, we give a specification of a replicated counter
that is satisfied by both implementations in Fig. 2(a, b).
In presenting the framework, we rely on the intuitive under-
standing of the way a replicated store executes given in §2. Later we
define the store semantics formally (§4), which lets us state what it
means for a store to satisfy our specifications (§4 and §7).
3.1 Abstract Executions and Specification Structure
We define our specifications on abstract executions, which in-
clude only user-visible events (corresponding to do calls) and
describe the other information about the store processing in an
implementation-independent form. Informally, we consider a con-
crete execution correct if it can be justified by an abstract execution
satisfying the specifications that is “similar” to it and, in particular,
has the same operations and return values.
Abstract executions are inspired by axiomatic definitions of
weak shared-memory models [2]. In particular, we use their pre-
viously proposed reformulation with visibility and arbitration rela-
tions [13], which are similar to the reads-from and coherence rela-
tions from weak shared-memory models. We provide a comparison
with shared-memory models in §7 and with [13] in §8.
DEFINITION 2. An abstract execution is a tuple
A = (E, repl, obj, oper, rval, ro, v i s , ar), where
E Event is a set of events from a countable universe Eve nt;
each event e E describes a replica repl(e) ReplicaID
performing an operation oper(e) Op
type(obj(e))
on an object
obj(e) Obj, which returns the value rval(e) Val
type(obj(e))
;
ro E × E is a replica order, which is a union of transitive,
irreflexive and total orders on events at each replica;
vis E × E is an acyclic visibility relation such that
e, f E. e
vis
f = o b j( e ) = obj(f);
ar E × E is an arbitration relation, which is a union of
transitive, irreflexive and total orders on events on each object.
We also require that ro, vis and ar be well-founded.
In the following, we denote components of A and similar structures
as in A.repl. We also use (e, f ) r and e
r
f interchangeably.
Informally, e
vis
f means that f is aware of e and thus es
effect can influence f s return value. In implementation terms, this
may be the case if the update performed by e has been delivered to
the replica performing f before f is issued. The exact meaning of
“delivered”, however, depends on how much information messages
carry in the implementation. For example, as we explain in §3.2,
the return value of a read from a counter is equal to the number
of inc operations visible to it. Then, as we formalize in §4, the
abstract execution illustrated in Fig. 1(b) justifies the op-based
implementation in Fig. 2(a) reading 1 in the concrete execution in
Fig. 1(a). The abstract execution in Fig. 1(c) justifies the state-based
implementation in Fig. 2(b) reading 2 due to transitive delivery
(§2). There is no abstract execution that would justify reading 100.
x.wr(empty)
x.wr(post)
y.wr(comment)
ro
ro
y.rd: comment
x.rd: empty
ro
ar
vis
vis
The ar relation represents the
ordering information provided by
the store, e.g., via timestamps.
On the right we show an ab-
stract execution corresponding to
a variant of the anomaly (2). The
ar edge means that any replica
that sees both writes to x should assume that post overwrites empty.
We give a store specification by two components, constraining
abstract executions:
1. Replicated data type specifications determine return values of
operations in an abstract execution in terms of its vis and ar rela-
tions, and thus define conflict-resolution policies for individual
objects in the store. The specifications are the key novel compo-
nent of our framework, and we discuss them next.
2. Consistency axioms constrain v is and ar and thereby disallow
anomalies and extend the semantics of individual objects to that
of the entire store. We defer their discussion to §7. See Fig. 13 for
their flavor; in particular, COCV prohibits the anomaly above.
Each of these components can be varied separately, and our spec-
ifications will define the semantics of any possible combination.
Given a specification of a store, we can determine whether a set
of events can be observed by its users by checking if there is an
abstract execution with this set of events satisfying the data type
specifications and consistency axioms.
3.2 Replicated Data Type Specifications
In a sequential setting, the semantics of a data type τ can be
specified by a function S
τ
: Op
+
τ
Val
τ
, which, given a non-
empty sequence of operations performed on an object, specifies the
return value of the last operation. For a register, read operations
return the value of the last preceding write, or zero if there is no
prior write. For a counter, read operations return the number of
preceding increments. Thus, for any sequence of operations ξ:
S
intreg
(ξ rd) = a, if wr(0) ξ = ξ
1
wr(a) ξ
2
and
ξ
2
does not contain wr operations;
S
ctr
(ξ rd) = (the number of inc operations in ξ);
S
intreg
(ξ inc) = S
ctr
(ξ wr(a)) = .
In a replicated store, the story is more interesting. We specify
a data type τ by a function F
τ
, generalizing S
τ
. Just like S
τ
, this
determines the return value of an operation based on prior opera-
tions performed on the object. However, F
τ
takes as a parameter
not a sequence, but an operation context, which includes all we
need to know about a store execution to determine the return value
of a given operation o—the set E of all events that are visible to o,
together with the operations performed by the events and visibility
and arbitration relations on them.
DEFINITION 3. An operation context for a data type τ is a tuple
L = (o, E, oper, vis, ar), where o Op
τ
, E is a finite subset of
Event, oper : E Op
τ
, vis E ×E is acyclic and ar E ×E
is transitive, irreflexive and total.
We can extract the context of an event e A.E in an abstract
execution A by selecting all events visible to it according to A.vis:
ctxt(A, e) = (A.oper(e), G, (A.oper)|
G
, (A.vis)|
G
, (A.ar)|
G
),
where G = (A.vis)
1
(e) and ·|
G
is the restriction to events in G.
Thus, in the abstract execution in Fig. 1(b), the operation context of
the read from x includes only one increment event; in the execution
in Fig. 1(c) it includes two.
DEFINITION 4. A replicated data type specification for a type τ is
a function F
τ
that, given an operation context L for τ, specifies a
return value F
τ
(L) Val
τ
.

Citations
More filters
Journal ArticleDOI

Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory

TL;DR: In this article, the authors propose an axiomatic generic framework for weak memory modeling, which allows the user to specify the model of his choice in a concise way, and the tool becomes a simulator for that model.
Proceedings ArticleDOI

'Cause I'm strong enough: Reasoning about consistency choices in distributed systems

TL;DR: This work proposes the first proof rule for establishing that a particular choice of consistency guarantees for various operations on a replicated database is enough to ensure the preservation of a given data integrity invariant.
Posted Content

Herding Cats - Modelling, simulation, testing, and data-mining for weak memory

TL;DR: An axiomatic generic framework for modelling weak memory is proposed, and how to instantiate this framework for Sequential Consistency (SC), Total Store Order (TSO), C++ restricted to release-acquire atomics, and Power is shown.
Book

Principles of Eventual Consistency

TL;DR: This tutorial deconstructs consistency into individual guarantees relating the data type, the conflict resolution, and the ordering, and reassemble them into a hierarchy of consistency models that starts with linearizability and gradually descends into sequential, causal, eventual, and quiescent consistency.
Proceedings ArticleDOI

Cure: Strong Semantics Meets High Availability and Low Latency

TL;DR: The protocols for highly available transactions, and an experimental evaluation showing that Cure is able to achieve scalability similar to eventually-consistent NoSQL databases, while providing stronger guarantees.
References
More filters
Book ChapterDOI

Time, clocks, and the ordering of events in a distributed system

TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Journal ArticleDOI

Time, clocks, and the ordering of events in a distributed system

TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Proceedings ArticleDOI

Dynamo: amazon's highly available key-value store

TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Journal ArticleDOI

Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services

TL;DR: In this paper, it is shown that it is impossible to achieve consistency, availability, and partition tolerance in the asynchronous network model, and then solutions to this dilemma in the partially synchronous model are discussed.

Virtual Time and Global States of Distributed Systems

TL;DR: This work argues that a linearly ordered structure of time is not (always) adequate for distributed systems and proposes a generalized non-standard model of time which consists of vectors of clocks which are partially ordered and form a lattice.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions in "Replicated data types: specification, verification, optimality" ?

To fill in this gap, the authors propose a framework for specifying replicated data types using relations over events and verifying their implementations using replication-aware simulations. The authors also present a novel technique for obtaining lower bounds on the worstcase space overhead of data type implementations and use it to prove optimality of 4 implementations. Finally, the authors show how to specify consistency of replicated stores with multiple objects axiomatically, in analogy to prior work on weak memory models. 

Although their work marks a big step forward, it is only a beginning, and creates plenty of opportunities for future research. In the future the authors would also like to study more data types, such as lists used for collaborative editing [ 32 ], and to investigate metadata bounds for data type implementations other than state-based ones, including more detailed overhead metrics capturing optimizations invisible to the worst-case overhead analysis. Finally, by bringing together prior work on shared-memory models and data replication, the authors wish to promote an exchange of ideas and results between the research communities of programming languages and verification on one side and distributed systems on the other.