CAP Theorem: Revision of its related consistency models

doi:10.1093/COMJNL/BXY142

CAP Theorem: Revision of its Related Consistency

Models

Francesc D. Mu

˜

noz-Esco

´

ı

∗

Rub

´

en de Juan-Mar

´

ın

Jos

´

e-Ram

´

on Garc

´

ıa-Escriv

´

a

Instituto Universitario Mixto Tecnol

´

ogico de Inform

´

atica,

Universitat Polit

`

ecnica de Val

`

encia, 46022 Valencia (Spain)

J. R. Gonz

´

alez de Mend

´

ıvil

Depto. de Ingenier

´

ıa Matem

´

atica e Inform

´

atica,

Universidad P

´

ublica de Navarra, 31006 Pamplona (Spain)

Jos

´

e M. Bernab

´

eu-Aub

´

an

Instituto Universitario Mixto Tecnol

´

ogico de Inform

´

atica,

Universitat Polit

`

ecnica de Val

`

encia, 46022 Valencia (Spain)

Abstract

The CAP theorem states that only two of these properties can be simultaneously guaranteed in

a distributed service: (i) consistency, (ii) availability, and (iii) network partition tolerance. This

theorem was stated and proved assuming that “consistency” refers to atomic consistency. However,

multiple consistency models exist and atomic consistency is located at the strongest edge of that

spectrum.

Many distributed services deployed in cloud platforms should be highly available and scalable.

Network partitions may arise in those deployments and should be tolerated. One way of dealing

with CAP constraints consists in relaxing consistency. Therefore, it is interesting to explore the set

of consistency models not supported in an available and partition-tolerant service (CAP-constrained

models). Other weaker consistency models could be maintained when scalable services are deployed

in partitionable systems (CAP-free models). Three contributions arise: (1) multiple other CAP-

constrained models are identiﬁed, (2) a borderline between CAP-constrained and CAP-free models

is set, and (3) a hierarchy of consistency models depending on their strength and convergence is built.

KEYWORDS: Inter-replica consistency; CAP theorem; Service availability; Network partition; Con-

sistency model

1 Introduction

Scalable distributed services try to maintain their service continuity in all situations. When they are

geo-replicated, a trade-off exists among three properties: replica consistency (C), service availability

(A) and network partition tolerance (P). Only two of those three properties can be simultaneously

guaranteed. Such trade-off was suggested long time ago (Davidson et al., 1985) [1], explained by

Fox and Brewer [2] in 1999 and proved by Gilbert and Lynch [3] in 2002. However, the compromise

between strongly consistent actions, availability and tolerance to network partitions was already implicit

in Johnson and Thomas (1975) [4] and justiﬁed by Birman and Friedman [5] in 1996.

Service availability and network partition tolerance are dichotomies. They are either respected

or not. Service availability means that every client request that reaches a service instance should be

∗

e-mail: fmunyoz@iti.upv.es

answered. When a network partition arises, the instances of a service may be spread among multiple

disjoint node subgroups. Network partition tolerance means that every service instance subgroup goes

on while the network remains partitioned.

On the other hand, service replica consistency admits a gradation of consistency levels. In spite of

this, when we simply refer to “consistency” we understand that it means atomic consistency [6]; i.e.,

that all instances are able to maintain the same values for each variable at the same time, providing a

behaviour equivalent to that of a single copy. This led to assume that kind of consistency in the original

proofs of the CAP theorem [3].

With the advent of cloud computing, it is easy to develop and deploy highly scalable distributed

services [7]. Those applications usually provide world-wide services: they are deployed in multiple

datacentres and this implies that network partition tolerance is a must for those services. Thus, those

services regularly prioritise availability when they should deal with the constraints of the CAP theorem,

and consistency is the property being sacriﬁced. However, that sacriﬁce should not be complete. Brewer

[8] explains that network partitions are rare, even for world-wide geo-replicated services. If services

demand partition tolerance and availability, their consistency may still be quite strong most of the time,

relaxing it when any temporary network partition arises.

It is worth exploring which levels of consistency are strong enough to be directly implied by the

CAP constraints; i.e., those CAP-constrained models are not supported when the network becomes

partitioned. On the other hand, there are several relaxed models that remain available when a network

partition arises. They constitute the CAP-free set of models and there is a (not yet completely known)

frontier between CAP-free and CAP-constrained models. Two questions arise in this scope: (1) Does

CAP affect only to atomic consistency or are there any other “CAP-constrained” models? (2) If there

were any other models, what would the CAP-constrained vs. CAP-free frontier be? Although some

partial answers to these questions have been given in previous papers [9, 10, 11], let us provide a revised

answer to them in the following sections.

2 System Model

A distributed system S = (P,O) is assumed. The real-time domain is represented by set T. S is partially

synchronous and consists of: (1) a set of processes P connected by a network where processes com-

municate through message passing, and (2) a set of objects O, with their states and methods. Processes

in P may fail. Scalable distributed services may be deployed in S. Those services consist of a set of

objects O. Objects are replicated in order to improve their availability. Their instances are deployed in

P using a replication protocol and respecting some replica consistency model.

Function Connect : P × P × T → { f alse,true}, used as Connect(p

1

, p

2

,t), returns true when pro-

cesses p

1

and p

2

are connected at time t, and false otherwise. Communication may fail when a tempo-

rary network partition occurs, deﬁned as follows.

Deﬁnition 2.1 (Network partition). When a network partition NP = (S, K,it, et) occurs in a system

S = (P,O) from some initial time it ∈ T to an end time et ∈ T (it < et), S becomes partitioned in a set

K of network components, with | K |> 1, such that:

1.

S

i∈K

S

i

⊆ S, where S

i

= (P

i

,O)

2.

S

i∈K

P

i

⊆ P

3. ∀i, j ∈ K,i 6= j : P

i

T

P

j

= /0

4. ∀i, j ∈ K,i 6= j,∀p

m

∈ P

i

,∀p

n

∈ P

j

,∀t ∈ T,it ≤ t ≤ et : Connect(p

m

, p

n

,t) = f alse

5. ∀i ∈ K,∀p

m

, p

n

∈ P

i

,∀t ∈ T,it ≤ t ≤ et : Connect(p

m

, p

n

,t) = true

Processes in different components cannot communicate with each other. Processes in the same

component intercommunicate without problems. A partitionable system model is assumed in regard to

process behaviour.

Proposition 2.1 (Partitionable system). When a network partition NP = (S,K, it,et) occurs in S =

(P,O), every operation from every process p

i

∈ P is able to start and/or ﬁnish in a regular way in the

(it, et) interval, independently on the connectivity of p

i

with each other process p

j

∈ P.

According to Prop. 2.1, no operation gets indeﬁnitely blocked while a network partition lasts in

S. Considering the CAP constraints, availability and network partition tolerance are respected, while

consistency compliance may be sacriﬁced.

3 Basic Speciﬁcation

Viotti and Vukoli

´

c propose a framework for specifying distributed (non-transactional) data consistency

models in [12], based on that presented in [13, 14]. Since the CAP theorem involves software services

deployed in distributed systems, it makes sense to consider those models in this scope. That framework

may be summarised as follows.

3.1 Speciﬁcation Framework

Services consist of processes and objects. Object values belong to set V . Processes interact with objects

invoking their operations, whose types belong to set OT .

Tuples (proc,type,obj,ival,oval,st,rt) represent operations, where:

• proc ∈ P is the identiﬁer of the process that invokes the operation.

• type ∈ OT is the operation type; e.g., wr for writes and rd for reads.

• obj ∈ O is the identiﬁer of the invoked object.

• ival ∈ V ∪{t} is the operation input value, or t in case of a read operation.

• oval ∈ V ∪{t,∇,Θ} is the operation output value, or t in case of a write or ∇ if the operation

does not return or Θ when a write completes in proc but not in other subsets of P.

• st ∈ T is the operation invocation (i.e., start) time.

• rt ∈ T is the operation return time.

In a tuple T = (e

1

,...,e

n

), T.e

i

refers to element e

i

in that tuple.

A history H is a set of operations. A history contains all operations invoked in an execution E of

S. H |

wr

(respectively, H |

rd

) denotes the set of write (respectively, read) operations in a history H.

Formally, H |

wr

= {op ∈ H : op.type = wr}.

The following relations are needed: (1) rb (returns-before) is a partial order on H based on real-time

precedence: rb ≡ {(a, b) : a,b ∈ H ∧ a.rt < b.st}, (2) ss (same-session) is an equivalence relation on H

that groups the operations invoked by the same process: ss ≡ {(a,b) : a, b ∈ H ∧a.proc = b.proc}, (3) so

(session order) is a partial order deﬁned as: so ≡ rb∩ss, (4) ob (same-object) is an equivalence relation

on H that groups the operations invoked on the same object: ob ≡ {(a,b) : a, b ∈ H ∧ a.ob j = b.ob j},

and (5) concur is a symmetric binary relation that includes all pairs of real-time concurrent operations

invoked on the same object: concur ≡ ob \ rb.

Moreover, there are other speciﬁcation aspects to be considered. To begin with, the concur relation

is complemented with a function Concur : H → 2

H

that denotes the set of write operations concurrent

with a given operation: Concur(a) ≡ {b ∈ H |

wr

: (a,b) ∈ concur}. The projection rel |

wr→rd

identiﬁes

all pairs of operations in relation rel that consist of a write and a read operation. H/ ≈

rel

denotes

the set of equivalence classes determined by relation rel, rel

−1

denotes the inverse relation of rel and

rel(a) = {b ∈ A : (a, b) ∈ rel}. Note that rel(a) is a set, since there may be many elements related

transitively to a.

An execution is deﬁned as E = (H,vis,ar) and is built on a history H, complemented with two

relations vis and ar on elements of H, where: (1) vis (visibility) is an acyclic partial order that accounts

for the propagation of write operations; two write operations are invisible to each other when they are

Table 1: Deﬁnition of basic consistency predicates.

Predicate Deﬁnition

RVAL(F ) ∀op ∈ H : op.oval ∈ F (op, cxt(E, op))

PRAM so ⊆ vis

SINGLEORDER ∃H

0

⊆ {op ∈ H : op.oval = ∇} : vis = ar \ (H

0

× H)

LAZYSINGLEORDER ∃H

0

⊆ {op ∈ H : op.oval ∈ {∇,Θ}} : vis = ar \ (H

0

× H)

REALTIME rb ⊆ ar

REALTIMEWRITES rb |

wr→op

⊆ ar

SEQRVAL(F ) ∀op ∈ H : Concur(op) = /0 ⇒ op.oval ∈ F (op, cxt(E,op))

EVENTUALVISIBILITY ∀a ∈ H,∀[ f ] ∈ H/ ≈

ss

:| {b ∈ [ f ] : (a,b) ∈ rb ∧ (a,b) 6∈ vis} |< ∞

NOCIRCULARCAUSALITY acyclic(hb)

STRONGCONVERGENCE ∀a,b ∈ H |

rd

: vis

−1

(a) |

wr

= vis

−1

(b) |

wr

⇒ a.oval = b.oval

CAUSALVISIBILITY hb ⊆ vis

CAUSALARBITRATION hb ⊆ ar

TIMEDVISIBILITY(∆) ∀a ∈ H |

wr

,∀b ∈ H,∀t ∈ T : a.rt = t ∧ b.st = t + ∆

⇒ (a, b) ∈ vis

REALTIMEWW rb |

wr→wr

⊆ ar

CONCURRVAL(F ) ∀op ∈ H : op.oval ∈ F (op, cxt(E, op) ∪Concur(op))

K-REALTIMEREADS(K) ∀a ∈ H |

wr

,∀b ∈ H |

rd

,∀PW ⊆ H |

wr

,∀pw ∈ PW :| PW |< K∧

(a, pw) ∈ ar ∧ (pw, b) ∈ rb ∧ (a,b) ∈ rb ⇒ (a, b) ∈ ar

NOJOIN ∀a

i

,b

i

,a

j

,b

j

∈ H : a

i

6≈

ss

a

j

∧ (a

i

,a

j

) ∈ ar \ vis ∧ a

i



so

b

i

∧

a

j



so

b

j

⇒ (b

i

,b

j

),(b

j

,b

i

) 6∈ vis

ATMOSTONEJOIN ∀a

i

,a

j

∈ H : a

i

6≈

ss

a

j

∧ (a

i

,a

j

) ∈ ar \ vis ⇒| {b

i

∈ H : a

i



so

b

i

∧

(∃b

j

∈ H : a

j



so

b

j

∧ (b

i

,b

j

) ∈ vis)} |≤ 1∧ | {b

j

∈ H : a

j



so

b

j

∧(∃b

i

∈ H : a

i



so

b

i

∧ (b

j

,b

i

) ∈ vis)} |≤ 1

PEROBJECTPRAM (so ∩ ob) ⊆ vis

PEROBJECTSINGLEORDER ∃H

0

⊆ {op ∈ H : op.oval = ∇} : vis ∩ ob = ar ∩ ob \ (H

0

× H)

not ordered by vis, and (2) ar (arbitration) is a total order on operations of the history that speciﬁes

how conﬂicts due to invisible operations are resolved in E in order to respect its consistency models.

The happens-before (hb) partial order is deﬁned as the transitive closure of the union of so and vis;

i.e., hb ≡ (so ∪ vis)

+

.

Some extensions to [12] are needed in order to deal with partitionable networks. Those extensions

are speciﬁed hereafter.

E is the set of executions in S. E

P

is the subset of E that contains all executions in which the

conditions of Def. 2.1 are met, i.e., their network becomes temporarily partitioned. On the other hand,

E

C

is the complementary subset of E

P

in which no network partition has occurred. Thus, E = E

P

∪ E

C

and E

P

∩ E

C

= /0.

The context C of an operation op in execution E is deﬁned as: C

op

= cxt(E, op) ≡ (E.vis

−1

(op),

E.vis |

C

op

.H

, E.ar |

C

op

.H

), i.e., a projection of E that only keeps in its history those operations in

vis

−1

(op). For each data type, function F speciﬁes the set of intended return values of op in relation to

its context: F (op,cxt(E,op)). With F , the return value consistency is deﬁned as: RVAL(F ) ≡ ∀op ∈

E.H : op.oval ∈ F (op, cxt(E,op)). In this scope, we use by default a register data type (F

reg

). Let us

explain how op.oval is chosen from C

op

in F

reg

. From vis

−1

(op), only those op

2

∈ C

op

.H : op

2

.oval 6∈

{∇,Θ} ∧ op

2

.ob j = op.ob j are considered. Multiple candidates may arise. If so, only those operations

without vis-successors in C

op

.H are assessed. From that subset, with operations invisible to each other,

the read value is that of the latest operation in ar order. If no candidate exists, then op.oval is a special

value ⊥.

Let us use an execution E

x

for explaining the speciﬁcation aspects presented in previous para-

graphs. Let S be ({p

1

, p

2

},{x}) and E

x

= ({o

1

= (p

1

,wr,x,1,t,0,1), o

2

= (p

2

,wr,x,2,t,0,1), o

3

=

(p

1

,rd,x,t,1,1,2), o

4

= (p

2

,rd,x,t,2,1,2), o

5

= (p

1

,rd,x,t,2,3,4), o

6

= (p

2

,rd,x,t,1,3,4)}, {(o

1

,o

3

),

(o

3

,o

5

), (o

2

,o

4

), (o

4

,o

6

), (o

2

,o

5

), (o

1

,o

6

)}, {(o

4

,o

1

),(o

1

,o

6

),(o

6

,o

3

),(o

3

,o

2

),(o

2

,o

5

)}). Local ex-

ecution order introduces (o

1

,o

3

), (o

3

,o

5

), (o

2

,o

4

) and (o

4

,o

6

) in vis. Values written in o

1

and o

2

are

propagated to the other process, so (o

2

,o

5

) and (o

1

,o

6

) are in vis. Since ar is a total order, it sets this

ordering in E

x

: o

4

< o

1

< o

6

< o

3

< o

2

< o

5

. There are four reads: o

3

,o

4

,o

5

and o

6

, with these context

histories: C

o

3

.H = {o

1

}, C

o

4

.H = {o

2

}, C

o

5

.H = {o

1

,o

3

,o

2

}, C

o

6

.H = {o

2

,o

4

,o

1

}. In each C

i

.H, the

underlined operations are discarded when RVAL(F ) is applied, since they have subsequent operations

in vis that are also in C

i

.H. From the remaining subsets, any potential conﬂict is resolved according to

ar. This explains the read values.

3.2 Distributed Consistency Models

Viotti and Vukoli

´

c [12] distinguish ten groups of consistency models: (1) linearisable and other strong

models, (2) weak and eventual consistency, (3) PRAM and sequential consistency, (4) session guaran-

tees, (5) causal models, (6) staleness-based models, (7) fork-based models, (8) composite and tunable

models, (9) per-object models, and (10) synchronised models. Synchronised models are described in

[12] for completeness; they make sense in multiprocessor computers but not in general distributed sys-

tems. The models in the eighth group cannot be speciﬁed with the proposed consistency predicates.

Therefore, no relation with the models contained in other groups can be set for them. Those two groups

are not considered hereafter. Table 1 shows a set of consistency predicates. With those predicates,

consistency models may be speciﬁed as shown in Table 2.

Consistency models are also known as consistency conditions. Both terms are synonyms, but gen-

erate two different kinds of names. Conditions use nouns (e.g., linearisability [15]) while models use

adjectives (e.g., atomic, regular and safe [6]). For the sake of uniformity, this paper uses models and

adjectives in order to refer to consistency in all cases.

An execution E satisﬁes a consistency model M built as a conjunction of multiple consistency

predicates (M ≡ P

1

∧ · · · ∧ P

n

) iff E satisﬁes all those predicates. Formally: E |= M ⇔ E |= P

1

∧

··· ∧ P

n

.

In regard to the consistency models speciﬁed in Table 2, PREFIXSEQUENTIAL(F ) is derived from

the “preﬁx consistency” proposed in Bayou [23]. Bayou manages a partitionable system. To this end,

write operations have two states: tentative and committed, that are modelled using Θ or t, respectively,

as the value of the oval operation attribute. That management is speciﬁed using LAZYSINGLEORDER

in Table 2. In Bayou, a write operation op returns control once it reaches a single server p

i

. At that time,

op is still tentative (i.e., op.oval = Θ). To be committed, p

i

propagates op to a primary manager. The

primary manager for op.ob j chooses a commit order (that conditions the ar relation in that execution)

for all new writes on that object and that chosen sequence is kept in a log and lazily communicated

to every other process. Disconnected nodes should eventually contact the primary manager to learn

that commit order. At that time, those previously disconnected processes communicate their tentative

writes to the primary (to be ordered on the next commit) and apply the already committed writes on their

local replicas. This means that tentative writes may be undone and reapplied in their correct sequence

position when they had been initially applied in a disconnected node. When a write op is applied onto

the replica of object op.ob j in a process p

j

in the commit order, op.oval becomes t in the p

j

’s view

of that history. Such view may be represented as H |

p

j

. This explains why different processes in the

same execution may have different available committed preﬁxes of that execution at the same time in

the PREFIXSEQUENTIAL(F ) model.

3.3 CAP-related Deﬁnitions

Let assume that the executions in system S are driven by a consistency model CM ≡ P

1

∧···∧P

n

. All

executions in E

C

comply always with the deﬁnition of CM. However, that behaviour may vary when

network partitions arise. That fact originates the following deﬁnitions.

Deﬁnition 3.1 (CAP-free consistency model). CM is CAP-free if every execution E in E

P

respects all

consistency predicates that deﬁne CM.

Formally: ∀E ∈ E

P

: E |= P

1

∧ ··· ∧ P

n

.

CAP Theorem: Revision of its related consistency models

Figures

Citations

Cloud storage availability and performance assessment: a study based on NoSQL DBMS

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Understanding Data Toward Going to Data Science

Distributed Trust, a Blockchain Election Scheme

Bivariate, Cluster and Suitability Analysis of NoSQL Solutions for Different Application Areas.

References

Time, clocks, and the ordering of events in a distributed system

The Byzantine Generals Problem

Impossibility of distributed consensus with one faulty process

Linearizability: a correctness condition for concurrent objects

How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

Related Papers (5)

Design and evaluation of a conit-based continuous consistency model for replicated services

Design and evaluation of a continuous consistency model for replicated services

Just-Right Consistency: reconciling availability and safety

Update Consistency for Wait-Free Concurrent Objects

Convergence Through a Weak Consistency Model: Timed Causal Consistency