scispace - formally typeset
Open AccessJournal ArticleDOI

CAP Theorem: Revision of its related consistency models

TLDR
This paper explores the set of consistency models not supported in an available and partition-tolerant service (CAP-constrained models) and proposes a hierarchy of consistency model depending on their strength and convergence is built.

Content maybe subject to copyright    Report

CAP Theorem: Revision of its Related Consistency
Models
Francesc D. Mu
˜
noz-Esco
´
ı
Rub
´
en de Juan-Mar
´
ın
Jos
´
e-Ram
´
on Garc
´
ıa-Escriv
´
a
Instituto Universitario Mixto Tecnol
´
ogico de Inform
´
atica,
Universitat Polit
`
ecnica de Val
`
encia, 46022 Valencia (Spain)
J. R. Gonz
´
alez de Mend
´
ıvil
Depto. de Ingenier
´
ıa Matem
´
atica e Inform
´
atica,
Universidad P
´
ublica de Navarra, 31006 Pamplona (Spain)
Jos
´
e M. Bernab
´
eu-Aub
´
an
Instituto Universitario Mixto Tecnol
´
ogico de Inform
´
atica,
Universitat Polit
`
ecnica de Val
`
encia, 46022 Valencia (Spain)
Abstract
The CAP theorem states that only two of these properties can be simultaneously guaranteed in
a distributed service: (i) consistency, (ii) availability, and (iii) network partition tolerance. This
theorem was stated and proved assuming that “consistency” refers to atomic consistency. However,
multiple consistency models exist and atomic consistency is located at the strongest edge of that
spectrum.
Many distributed services deployed in cloud platforms should be highly available and scalable.
Network partitions may arise in those deployments and should be tolerated. One way of dealing
with CAP constraints consists in relaxing consistency. Therefore, it is interesting to explore the set
of consistency models not supported in an available and partition-tolerant service (CAP-constrained
models). Other weaker consistency models could be maintained when scalable services are deployed
in partitionable systems (CAP-free models). Three contributions arise: (1) multiple other CAP-
constrained models are identified, (2) a borderline between CAP-constrained and CAP-free models
is set, and (3) a hierarchy of consistency models depending on their strength and convergence is built.
KEYWORDS: Inter-replica consistency; CAP theorem; Service availability; Network partition; Con-
sistency model
1 Introduction
Scalable distributed services try to maintain their service continuity in all situations. When they are
geo-replicated, a trade-off exists among three properties: replica consistency (C), service availability
(A) and network partition tolerance (P). Only two of those three properties can be simultaneously
guaranteed. Such trade-off was suggested long time ago (Davidson et al., 1985) [1], explained by
Fox and Brewer [2] in 1999 and proved by Gilbert and Lynch [3] in 2002. However, the compromise
between strongly consistent actions, availability and tolerance to network partitions was already implicit
in Johnson and Thomas (1975) [4] and justified by Birman and Friedman [5] in 1996.
Service availability and network partition tolerance are dichotomies. They are either respected
or not. Service availability means that every client request that reaches a service instance should be
e-mail: fmunyoz@iti.upv.es

answered. When a network partition arises, the instances of a service may be spread among multiple
disjoint node subgroups. Network partition tolerance means that every service instance subgroup goes
on while the network remains partitioned.
On the other hand, service replica consistency admits a gradation of consistency levels. In spite of
this, when we simply refer to “consistency” we understand that it means atomic consistency [6]; i.e.,
that all instances are able to maintain the same values for each variable at the same time, providing a
behaviour equivalent to that of a single copy. This led to assume that kind of consistency in the original
proofs of the CAP theorem [3].
With the advent of cloud computing, it is easy to develop and deploy highly scalable distributed
services [7]. Those applications usually provide world-wide services: they are deployed in multiple
datacentres and this implies that network partition tolerance is a must for those services. Thus, those
services regularly prioritise availability when they should deal with the constraints of the CAP theorem,
and consistency is the property being sacrificed. However, that sacrifice should not be complete. Brewer
[8] explains that network partitions are rare, even for world-wide geo-replicated services. If services
demand partition tolerance and availability, their consistency may still be quite strong most of the time,
relaxing it when any temporary network partition arises.
It is worth exploring which levels of consistency are strong enough to be directly implied by the
CAP constraints; i.e., those CAP-constrained models are not supported when the network becomes
partitioned. On the other hand, there are several relaxed models that remain available when a network
partition arises. They constitute the CAP-free set of models and there is a (not yet completely known)
frontier between CAP-free and CAP-constrained models. Two questions arise in this scope: (1) Does
CAP affect only to atomic consistency or are there any other “CAP-constrained” models? (2) If there
were any other models, what would the CAP-constrained vs. CAP-free frontier be? Although some
partial answers to these questions have been given in previous papers [9, 10, 11], let us provide a revised
answer to them in the following sections.
2 System Model
A distributed system S = (P,O) is assumed. The real-time domain is represented by set T. S is partially
synchronous and consists of: (1) a set of processes P connected by a network where processes com-
municate through message passing, and (2) a set of objects O, with their states and methods. Processes
in P may fail. Scalable distributed services may be deployed in S. Those services consist of a set of
objects O. Objects are replicated in order to improve their availability. Their instances are deployed in
P using a replication protocol and respecting some replica consistency model.
Function Connect : P × P × T { f alse,true}, used as Connect(p
1
, p
2
,t), returns true when pro-
cesses p
1
and p
2
are connected at time t, and false otherwise. Communication may fail when a tempo-
rary network partition occurs, defined as follows.
Definition 2.1 (Network partition). When a network partition NP = (S, K,it, et) occurs in a system
S = (P,O) from some initial time it T to an end time et T (it < et), S becomes partitioned in a set
K of network components, with | K |> 1, such that:
1.
S
iK
S
i
S, where S
i
= (P
i
,O)
2.
S
iK
P
i
P
3. i, j K,i 6= j : P
i
T
P
j
= /0
4. i, j K,i 6= j,p
m
P
i
,p
n
P
j
,t T,it t et : Connect(p
m
, p
n
,t) = f alse
5. i K,p
m
, p
n
P
i
,t T,it t et : Connect(p
m
, p
n
,t) = true
Processes in different components cannot communicate with each other. Processes in the same
component intercommunicate without problems. A partitionable system model is assumed in regard to
process behaviour.

Proposition 2.1 (Partitionable system). When a network partition NP = (S,K, it,et) occurs in S =
(P,O), every operation from every process p
i
P is able to start and/or finish in a regular way in the
(it, et) interval, independently on the connectivity of p
i
with each other process p
j
P.
According to Prop. 2.1, no operation gets indefinitely blocked while a network partition lasts in
S. Considering the CAP constraints, availability and network partition tolerance are respected, while
consistency compliance may be sacrificed.
3 Basic Specification
Viotti and Vukoli
´
c propose a framework for specifying distributed (non-transactional) data consistency
models in [12], based on that presented in [13, 14]. Since the CAP theorem involves software services
deployed in distributed systems, it makes sense to consider those models in this scope. That framework
may be summarised as follows.
3.1 Specification Framework
Services consist of processes and objects. Object values belong to set V . Processes interact with objects
invoking their operations, whose types belong to set OT .
Tuples (proc,type,obj,ival,oval,st,rt) represent operations, where:
proc P is the identifier of the process that invokes the operation.
type OT is the operation type; e.g., wr for writes and rd for reads.
obj O is the identifier of the invoked object.
ival V ∪{t} is the operation input value, or t in case of a read operation.
oval V ∪{t,,Θ} is the operation output value, or t in case of a write or if the operation
does not return or Θ when a write completes in proc but not in other subsets of P.
st T is the operation invocation (i.e., start) time.
rt T is the operation return time.
In a tuple T = (e
1
,...,e
n
), T.e
i
refers to element e
i
in that tuple.
A history H is a set of operations. A history contains all operations invoked in an execution E of
S. H |
wr
(respectively, H |
rd
) denotes the set of write (respectively, read) operations in a history H.
Formally, H |
wr
= {op H : op.type = wr}.
The following relations are needed: (1) rb (returns-before) is a partial order on H based on real-time
precedence: rb {(a, b) : a,b H a.rt < b.st}, (2) ss (same-session) is an equivalence relation on H
that groups the operations invoked by the same process: ss {(a,b) : a, b H a.proc = b.proc}, (3) so
(session order) is a partial order defined as: so rbss, (4) ob (same-object) is an equivalence relation
on H that groups the operations invoked on the same object: ob {(a,b) : a, b H a.ob j = b.ob j},
and (5) concur is a symmetric binary relation that includes all pairs of real-time concurrent operations
invoked on the same object: concur ob \ rb.
Moreover, there are other specification aspects to be considered. To begin with, the concur relation
is complemented with a function Concur : H 2
H
that denotes the set of write operations concurrent
with a given operation: Concur(a) {b H |
wr
: (a,b) concur}. The projection rel |
wrrd
identifies
all pairs of operations in relation rel that consist of a write and a read operation. H/
rel
denotes
the set of equivalence classes determined by relation rel, rel
1
denotes the inverse relation of rel and
rel(a) = {b A : (a, b) rel}. Note that rel(a) is a set, since there may be many elements related
transitively to a.
An execution is defined as E = (H,vis,ar) and is built on a history H, complemented with two
relations vis and ar on elements of H, where: (1) vis (visibility) is an acyclic partial order that accounts
for the propagation of write operations; two write operations are invisible to each other when they are

Table 1: Definition of basic consistency predicates.
Predicate Definition
RVAL(F ) op H : op.oval F (op, cxt(E, op))
PRAM so vis
SINGLEORDER H
0
{op H : op.oval = } : vis = ar \ (H
0
× H)
LAZYSINGLEORDER H
0
{op H : op.oval {,Θ}} : vis = ar \ (H
0
× H)
REALTIME rb ar
REALTIMEWRITES rb |
wrop
ar
SEQRVAL(F ) op H : Concur(op) = /0 op.oval F (op, cxt(E,op))
EVENTUALVISIBILITY a H,[ f ] H/
ss
:| {b [ f ] : (a,b) rb (a,b) 6∈ vis} |<
NOCIRCULARCAUSALITY acyclic(hb)
STRONGCONVERGENCE a,b H |
rd
: vis
1
(a) |
wr
= vis
1
(b) |
wr
a.oval = b.oval
CAUSALVISIBILITY hb vis
CAUSALARBITRATION hb ar
TIMEDVISIBILITY() a H |
wr
,b H,t T : a.rt = t b.st = t +
(a, b) vis
REALTIMEWW rb |
wrwr
ar
CONCURRVAL(F ) op H : op.oval F (op, cxt(E, op) Concur(op))
K-REALTIMEREADS(K) a H |
wr
,b H |
rd
,PW H |
wr
,pw PW :| PW |< K
(a, pw) ar (pw, b) rb (a,b) rb (a, b) ar
NOJOIN a
i
,b
i
,a
j
,b
j
H : a
i
6≈
ss
a
j
(a
i
,a
j
) ar \ vis a
i
so
b
i
a
j
so
b
j
(b
i
,b
j
),(b
j
,b
i
) 6∈ vis
ATMOSTONEJOIN a
i
,a
j
H : a
i
6≈
ss
a
j
(a
i
,a
j
) ar \ vis ⇒| {b
i
H : a
i
so
b
i
(b
j
H : a
j
so
b
j
(b
i
,b
j
) vis)} |≤ 1 | {b
j
H : a
j
so
b
j
(b
i
H : a
i
so
b
i
(b
j
,b
i
) vis)} |≤ 1
PEROBJECTPRAM (so ob) vis
PEROBJECTSINGLEORDER H
0
{op H : op.oval = } : vis ob = ar ob \ (H
0
× H)
not ordered by vis, and (2) ar (arbitration) is a total order on operations of the history that specifies
how conflicts due to invisible operations are resolved in E in order to respect its consistency models.
The happens-before (hb) partial order is defined as the transitive closure of the union of so and vis;
i.e., hb (so vis)
+
.
Some extensions to [12] are needed in order to deal with partitionable networks. Those extensions
are specified hereafter.
E is the set of executions in S. E
P
is the subset of E that contains all executions in which the
conditions of Def. 2.1 are met, i.e., their network becomes temporarily partitioned. On the other hand,
E
C
is the complementary subset of E
P
in which no network partition has occurred. Thus, E = E
P
E
C
and E
P
E
C
= /0.
The context C of an operation op in execution E is defined as: C
op
= cxt(E, op) (E.vis
1
(op),
E.vis |
C
op
.H
, E.ar |
C
op
.H
), i.e., a projection of E that only keeps in its history those operations in
vis
1
(op). For each data type, function F specifies the set of intended return values of op in relation to
its context: F (op,cxt(E,op)). With F , the return value consistency is defined as: RVAL(F ) op
E.H : op.oval F (op, cxt(E,op)). In this scope, we use by default a register data type (F
reg
). Let us
explain how op.oval is chosen from C
op
in F
reg
. From vis
1
(op), only those op
2
C
op
.H : op
2
.oval 6∈
{,Θ} op
2
.ob j = op.ob j are considered. Multiple candidates may arise. If so, only those operations
without vis-successors in C
op
.H are assessed. From that subset, with operations invisible to each other,
the read value is that of the latest operation in ar order. If no candidate exists, then op.oval is a special
value .
Let us use an execution E
x
for explaining the specification aspects presented in previous para-
graphs. Let S be ({p
1
, p
2
},{x}) and E
x
= ({o
1
= (p
1
,wr,x,1,t,0,1), o
2
= (p
2
,wr,x,2,t,0,1), o
3
=
(p
1
,rd,x,t,1,1,2), o
4
= (p
2
,rd,x,t,2,1,2), o
5
= (p
1
,rd,x,t,2,3,4), o
6
= (p
2
,rd,x,t,1,3,4)}, {(o
1
,o
3
),

(o
3
,o
5
), (o
2
,o
4
), (o
4
,o
6
), (o
2
,o
5
), (o
1
,o
6
)}, {(o
4
,o
1
),(o
1
,o
6
),(o
6
,o
3
),(o
3
,o
2
),(o
2
,o
5
)}). Local ex-
ecution order introduces (o
1
,o
3
), (o
3
,o
5
), (o
2
,o
4
) and (o
4
,o
6
) in vis. Values written in o
1
and o
2
are
propagated to the other process, so (o
2
,o
5
) and (o
1
,o
6
) are in vis. Since ar is a total order, it sets this
ordering in E
x
: o
4
< o
1
< o
6
< o
3
< o
2
< o
5
. There are four reads: o
3
,o
4
,o
5
and o
6
, with these context
histories: C
o
3
.H = {o
1
}, C
o
4
.H = {o
2
}, C
o
5
.H = {o
1
,o
3
,o
2
}, C
o
6
.H = {o
2
,o
4
,o
1
}. In each C
i
.H, the
underlined operations are discarded when RVAL(F ) is applied, since they have subsequent operations
in vis that are also in C
i
.H. From the remaining subsets, any potential conflict is resolved according to
ar. This explains the read values.
3.2 Distributed Consistency Models
Viotti and Vukoli
´
c [12] distinguish ten groups of consistency models: (1) linearisable and other strong
models, (2) weak and eventual consistency, (3) PRAM and sequential consistency, (4) session guaran-
tees, (5) causal models, (6) staleness-based models, (7) fork-based models, (8) composite and tunable
models, (9) per-object models, and (10) synchronised models. Synchronised models are described in
[12] for completeness; they make sense in multiprocessor computers but not in general distributed sys-
tems. The models in the eighth group cannot be specified with the proposed consistency predicates.
Therefore, no relation with the models contained in other groups can be set for them. Those two groups
are not considered hereafter. Table 1 shows a set of consistency predicates. With those predicates,
consistency models may be specified as shown in Table 2.
Consistency models are also known as consistency conditions. Both terms are synonyms, but gen-
erate two different kinds of names. Conditions use nouns (e.g., linearisability [15]) while models use
adjectives (e.g., atomic, regular and safe [6]). For the sake of uniformity, this paper uses models and
adjectives in order to refer to consistency in all cases.
An execution E satisfies a consistency model M built as a conjunction of multiple consistency
predicates (M P
1
· · · P
n
) iff E satisfies all those predicates. Formally: E |= M E |= P
1
··· P
n
.
In regard to the consistency models specified in Table 2, PREFIXSEQUENTIAL(F ) is derived from
the “prefix consistency” proposed in Bayou [23]. Bayou manages a partitionable system. To this end,
write operations have two states: tentative and committed, that are modelled using Θ or t, respectively,
as the value of the oval operation attribute. That management is specified using LAZYSINGLEORDER
in Table 2. In Bayou, a write operation op returns control once it reaches a single server p
i
. At that time,
op is still tentative (i.e., op.oval = Θ). To be committed, p
i
propagates op to a primary manager. The
primary manager for op.ob j chooses a commit order (that conditions the ar relation in that execution)
for all new writes on that object and that chosen sequence is kept in a log and lazily communicated
to every other process. Disconnected nodes should eventually contact the primary manager to learn
that commit order. At that time, those previously disconnected processes communicate their tentative
writes to the primary (to be ordered on the next commit) and apply the already committed writes on their
local replicas. This means that tentative writes may be undone and reapplied in their correct sequence
position when they had been initially applied in a disconnected node. When a write op is applied onto
the replica of object op.ob j in a process p
j
in the commit order, op.oval becomes t in the p
j
s view
of that history. Such view may be represented as H |
p
j
. This explains why different processes in the
same execution may have different available committed prefixes of that execution at the same time in
the PREFIXSEQUENTIAL(F ) model.
3.3 CAP-related Definitions
Let assume that the executions in system S are driven by a consistency model CM P
1
···P
n
. All
executions in E
C
comply always with the definition of CM. However, that behaviour may vary when
network partitions arise. That fact originates the following definitions.
Definition 3.1 (CAP-free consistency model). CM is CAP-free if every execution E in E
P
respects all
consistency predicates that define CM.
Formally: E E
P
: E |= P
1
··· P
n
.

Citations
More filters
Journal ArticleDOI

Cloud storage availability and performance assessment: a study based on NoSQL DBMS

TL;DR: This work proposes an approach based on reliability block diagrams and generalized stochastic Petri nets to evaluate availability and performance of cloud storage systems, considering redundant nodes and eventual consistency based on NoSQL DBMS.
Posted Content

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

TL;DR: In this article, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice, and comparing the advantages, shortcomings and possible use cases of available big data file formats for Hadoop.
Book ChapterDOI

Understanding Data Toward Going to Data Science

TL;DR: In this paper , the authors proposed that various scientific fields, mainly physics, are present, but focus on certain things from the data, which results in overlapping and inefficiency in the study.
Journal ArticleDOI

Distributed Trust, a Blockchain Election Scheme

TL;DR: This work introduces political parties as active partners in the elections as a mechanism to encourage more traditional electors to participate and proposes a voting system focused on trust.
Posted Content

Bivariate, Cluster and Suitability Analysis of NoSQL Solutions for Different Application Areas.

TL;DR: A feature analysis of 80 NoSQL solutions is presented, elaborating on the criteria and points that a developer must consider while making a possible choice, to determine the suitability of a NoSQL solution for an application area.
References
More filters
Journal ArticleDOI

Time, clocks, and the ordering of events in a distributed system

TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Journal ArticleDOI

The Byzantine Generals Problem

TL;DR: The Albanian Generals Problem as mentioned in this paper is a generalization of Dijkstra's dining philosophers problem, where two generals have to come to a common agreement on whether to attack or retreat, but can communicate only by sending messengers who might never arrive.
Journal ArticleDOI

Impossibility of distributed consensus with one faulty process

TL;DR: In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process.
Journal ArticleDOI

Linearizability: a correctness condition for concurrent objects

TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Journal ArticleDOI

How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

TL;DR: Many large sequential computers execute operations in a different order than is specified by the program, and a correct execution by each processor does not guarantee the correct execution of the entire program.
Related Papers (5)