scispace - formally typeset
Open AccessJournal ArticleDOI

Optimistic replication

Yasushi Saito, +1 more
- 01 Mar 2005 - 
- Vol. 37, Iss: 1, pp 42-81
Reads0
Chats0
TLDR
This article identifies key challenges facing optimistic replication systems---ordering operations, detecting and resolving conflicts, propagating changes efficiently, and bounding replica divergence---and provides a comprehensive survey of techniques developed for addressing these challenges.
Abstract
Data replication is a key technology in distributed systems that enables higher availability and performance. This article surveys optimistic replication algorithms. They allow replica contents to diverge in the short term to support concurrent work practices and tolerate failures in low-quality communication links. The importance of such techniques is increasing as collaboration through wide-area and mobile networks becomes popular.Optimistic replication deploys algorithms not seen in traditional “pessimistic” systems. Instead of synchronous replica coordination, an optimistic algorithm propagates changes in the background, discovers conflicts after they happen, and reaches agreement on the final contents incrementally.We explore the solution space for optimistic replication algorithms. This article identifies key challenges facing optimistic replication systems---ordering operations, detecting and resolving conflicts, propagating changes efficiently, and bounding replica divergence---and provides a comprehensive survey of techniques developed for addressing these challenges.

read more

Content maybe subject to copyright    Report

Optimistic replication
Yasushi Saito
Hewlett-Packard Laboratories, Palo Alto, CA (USA)
and
Marc Shapiro
Microsoft Research Ltd., Cambridge (UK)
Data replication is a key technology in distributed data sharing systems, enabling higher availability and perfor-
mance. This paper surveys optimistic replication algorithms that allow replica contents to diverge in the short
term, in order to support concurrent work practices and to tolerate failures in low-quality communication links.
The importance of such techniques is increasing as collaboration through wide-area and mobile networks be-
comes popular.
Optimistic replication techniques are different from traditional “pessimistic” ones. Instead of synchronous
replica coordination, an optimistic algorithm propagates changes in the background, discovers conflicts after they
happen and reaches agreement on the final contents incrementally.
We explore the solution space for optimistic replication algorithms. This paper identifies key challenges facing
optimistic replication systems ordering operations, detecting and resolving conflicts, propagating changes
efficiently, and bounding replica divergence and provides a comprehensive survey of techniques developed for
addressing these challenges.
Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems—Dis-
tributed applications; H.3.4 [Information Storage and Retrieval]: Systems and Software—Distributed systems
General Terms: Algorithms, Performance
Additional Key Words and Phrases: Replication, Distributed Systems, Internet
1. INTRODUCTION
Data replication consists of maintaining multiple copies of critical data, called replicas, on
separate computers. It is a critical enabling technology of distributed services, improving
both their availability and performance. Availability is improved by allowing access to the
data even when some of the replicas are unavailable. Performance improvements concern
reduced latency, which improves by letting users access nearby replicas and avoiding re-
mote network access, and increased throughput, by letting multiple computers serve the
data.
This work is supported in part by DARPA Grant F30602-97-2-0226 and National Science Foundation Grant #
EIA-9870740.
Authors’ addresses: Yasushi Saito, Hewlett-Packard Laboratories, 1501 Page Mill Rd, MS 1U-34, Palo Alto, CA,
93403, USA. mailto:yasushi@cs.washington.edu, http://www.hpl.hp.com/personal/Yasushi Saito. Marc
Shapiro, Microsoft Research Ltd., 7 J J Thomson Ave, Cambridge CB3 0FB, United Kingdom. mailto:Marc.
Shapiro@acm.org, http://www-sor.inria.fr/
shapiro/.
Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without
fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice,
the title of the publication, and its date appear, and notice is given that coping is by permission of the ACM, Inc.
To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission
and/or a fee.

2 · Saito and Shapiro
This paper surveys optimistic replication algorithms. Compared to traditional “pes-
simistic” techniques, optimistic replication promises higher availability and performance,
but lets replicas temporarily diverge and lets users see inconsistent data. The remainder of
this introduction overviews the concept of optimistic replication, defines its basic elements,
and compares it to traditional replication techniques.
1.1 Traditional replication techniques and their limitations
Traditional replication techniques try to maintain single-copy consistency they give
users an illusion of having a single, highly available copy of data [Bernstein and Goodman
1983; Bernstein et al. 1987].This goal can be achieved in many ways, but the basic concept
remains the same: traditional techniques block access to a replica unless it is provably up
to date. We call these techniques “pessimistic” for this reason. For example, primary-copy
algorithms, used widely in commercial systems, elect a primary replica that is responsible
for handling all accesses to a particular object [Bernstein et al. 1987; Dietterich 1994; Or-
acle 1996]. After an update, the primary synchronously writes the change to the secondary
replicas. If the primary crashes, secondaries confer to elect a new primary. Such pes-
simistic techniques perform well in local-area networks, in which latencies are small and
failures uncommon. Given the continuing progress of Internet technologies, it is tempt-
ing to apply pessimistic algorithms to wide-area data replication. We cannot expect good
performance and availability in this environment, however, for three key reasons.
First, the Internet remains slow and unreliable. The Internet’s communication end-to-
end latency and availability do not seem to be improving [Zhang et al. 2000; Chandra
et al. 2001]. In addition, mobile computers with intermittent connectivity are becoming
increasingly popular. A pessimistic replication algorithm, attempting to synchronize with
an unavailable site, would block completely. Well-known impossibility results even raise
the possiblity that it might corrupt data; for instance it is impossible to agree on a single
primary after a failure when network delay is unpredictable [Fischer et al. 1985; Chandra
and Toueg 1996].
Second, pessimistic algorithms scale poorly in the wide area. It is difficult to build a
large, pessimistically replicated system with frequent updates, because its throughput and
availability suffer as the number of sites increases [Yu and Vahdat 2001; Yu and Vahdat
2002]. This is why many Internet and mobile services are optimistic, for instance Usenet
[Spencer and Lawrence 1998; Lidl et al. 1994], DNS [Mockapetris 1987; Mockapetris and
Dunlap 1988; Albitz and Liu 2001], and mobile file and database systems [Walker et al.
1983; Kistler and Satyanarayanan 1992; Moore 1995; Ratner 1998].
Third, some human activities require asynchronous data sharing. Cooperative engineer-
ing or program development often requires people to work in relative isolation. It is better
to allow concurrent operations, and to repair occasional conflicts after they happen, than to
lock out the data while someone is editing it.
1.2 What is optimistic replication?
Optimistic replication is a group of techniques for sharing data efficiently in wide-area
or mobile environments. The key feature that separates optimistic replication algorithms
from their pessimistic counterparts is their approach to concurrency control. Pessimistic
algorithms synchronously coordinate replicas during accesses and block the other users
during an update. In contrast, optimistic algorithms let data be read or written without
a priori synchronization, based on the “optimistic” assumption that problems will occur

Optimistic replication · 3
only rarely, if at all. Updates are propagated in the background, and occasional conflicts
are fixed after they happen. It is not a new idea,
1
but its use has exploded due to the
proliferation of the Internet and mobile computing technologies.
Optimistic algorithms offer many advantages over their pessimistic counterparts. First,
they improve availability: applications make progress even when network links and sites
are unreliable.
2
Second, they are flexible with respect to networking, because techniques
such as epidemic replication propagate operations reliably to all replicas, even when the
communication graph is unknown and variable. Third, optimistic algorithms should be
able to scale to a large number of replicas, because they require little synchronization
among sites. Fourth, sites and users are highly autonomous: for example, services such
as FTP and Usenet mirroring [Nakagawa 1996; Krasel 2000] let a replica be added with
no change to existing sites. Optimistic replication also enables asynchronous collaboration
between users, for instance in CVS [Cederqvist et al. 2001; Vesperman 2003] or Lotus
Notes [Kawell et al. 1988]. Finally, optimistic algorithms provide quick feedback, as they
can apply updates tentatively as soon as they are submitted.
These benefits, however, come at a cost. Any distributed system faces a trade-off be-
tween availability and consistency [Fox and Brewer 1999; Yu and Vahdat 2002]. Where a
pessimistic algorithm waits, an optimistic one speculates. Optimistic replication faces the
unique challenges of diverging replicas and conflicts between concurrent operations. It is
thus applicable only for applications that can tolerate occasional conflicts and inconsistent
data. Fortunately, in many real-world systems, especially file systems, conflicts are known
to be rather rare, thanks to the data partitioning and access arbitration that naturally happen
between users [Ousterhout et al. 1985; Baker et al. 1991; Vogels 1999; Wang et al. 2001].
1.3 Elements of optimistic replication
This section introduces some basic concepts of optimistic replication and defines com-
mon terms used throughout the paper. Figure 1 illustrates how these concepts fit together,
and Table 1 provides a reference for common terms. This section provides only a terse
overview, as later ones will go into more detail.
1.3.1 Objects, replicas, and sites. Any replicated system has a concept of the minimal
unit of replication. We call such unit an object. A replica is a copy of an object stored in
a site, or a computer. A site may store replicas of multiple objects, but we often use terms
replica and site interchangeably, since most optimistic replication algorithms manage each
object independently. When describing algorithms, it is useful to distinguish sites that can
update an object called master sites from those that store read-only replicas. We use
the symbol N to denote the total number of replicas and M to denote the number of master
replicas for a given object. Common values are M = 1 (single-master systems) and M = N.
1.3.2 Operations. An optimistic replication system must allow access to a replica even
while it is disconnected. In this paper, we call a self-contained update to an object an
operation. To update an object, a user submits an operation at some site. An operation
includes a prescription to update the object as well as a precondition for detecting conflicts.
The concrete nature of prescriptions and preconditions varies widely among systems.
1
Our earliest reference is from Johnson and Thomas [1976], but the idea was certainly developed much earlier.
2
Tolerating Byzantine (malicious) failures is outside our scope; we cite a few recent papers in this area: Spreitzer
et al. [1997], Minsky [2002] and Mazi
`
eres and Shasha [2002].

4 · Saito and Shapiro
1
2
1
2
1
2
1
2
1
1
2
2
1
2
1+2
(a) Operation submission:
Users at different sites submit
operations independently.
(b) Propagation: Sites
communicate and exchange
operations.
(c) Scheduling: Sites
compute the ordering
of operations.
(d) Conflict resolution: Sites detect conflicts
and transform offending operations to
produce results intended by users.
(e) Commitment: Sites agree on the final
ordering and reconciliation result. Their
changes become permanent.
1+2
1+2
Fig. 1. Elements of optimistic replication and their roles. Disks represent replicas, memo sheets represent
operations, and arrows represent communications between replicas.
Many systems support only whole-object updates, including Palm [PalmSource 2002] and
DNS [Albitz and Liu 2001]. Such systems are called state-transfer systems, as they only
need to record and transmit the final values of objects, not the sequence of operations.
Other systems, called operation-transfer systems, allow for more sophisticated descrip-
tions of updates. For example, updates in Bayou [Terry et al. 1995] are written in SQL.
A site applies an operation locally immediately, and it exchanges and applies remote
operations in the background. Such systems are said to offer eventual consistency, because
they guarantee that the state of replicas will converge only eventually. Such a weak guar-
antee is enough for many optimistic replication applications, but some systems provide
stronger guarantees, e.g., that a replica’s state is never more than 1 hour old.
1.3.3 Propagation. An operation submitted by the user of a replica is tentatively ap-
plied to the local replica to let the user continue working based on that update. It is also
logged, i.e., remembered in order to be propagated to other sites later. These systems of-
ten deploy epidemic propagation to let all sites receive operations, even when they cannot
communicate with each other directly [Demers et al. 1987]. Epidemic propagation lets any
two sites that happen to communicate exchange their local operations as well as operations
they received from a third site an operation spreads like a virus does among humans.
1.3.4 Tentative execution and scheduling. Because of background propagation, opera-
tions are not always received in the same order at all sites. Each site must reconstruct an
appropriate ordering that produces an equivalent result across sites and matches the users’
intuitive expectations. Thus, an operation is initially considered tentative. A site might
reorder or transform operations repeatedly until it agrees with others on the final operation
ordering. We use the term scheduling to refer to the (often non-deterministic) ordering
policy.
1.3.5 Detecting and resolving conflicts. With no a priori site coordination, multiple
users may update the same object at the same time. One could simply ignore such a situa-

Optimistic replication · 5
tion for instance, a room-booking system could handle two requests to the same room
by picking one arbitrarily and discarding the other. However, simply dropping concurrent
requests is not desirable in many applications, including room booking. This problem is
called lost updates.
A better way to handle this problem is to detect operations that are in conflict and resolve
them, for example, by letting people renegotiate their schedule. A conflict happens when
the precondition of an operation is violated, if it is to be executed according to the system’s
scheduling policy. In many systems, preconditions are built implicitly into the replication
algorithm. The simplest example is when all concurrent operations are flagged to be in
conflict, as with the Palm Pilot [PalmSource 2002] and the Coda mobile file system [Kumar
and Satyanarayanan 1995]. Other systems let users write preconditions explicitly for
example, in a room booking system written in Bayou, a precondition might check the status
of the room and disallow double booking [Terry et al. 1995].
Conflict resolution is usually highly application specific. Most systems simply flag a
conflict and let users fix it manually. Some systems can resolve a conflict automatically.
For example, in Coda, concurrent writes to a ’*.o’ file can be resolved simply by recom-
piling the source file [Kumar and Satyanarayanan 1995]. We discuss conflict detection and
resolution in more detail in Sections 5 and 6.
1.3.6 Commitment. Scheduling and conflict resolution often both involve non-
deterministic choices, e.g., regarding ordering of concurrent operations. Moreover, a
replica may not have received all the operations that others have. Commitment refers to
an algorithm to converge the state of replicas by letting sites agree on the set of operations
and their final ordering and conflict-resolution results.
1.4 Comparison with advanced transaction models
Optimistic replication is related to relaxed (or advanced) transaction models [Elmagarmid
1992; Ramamritham and Chrysanthis 1996]. Both relax the ACID requirements of tradi-
tional databases to improve performance and availability, but the motives are different.
3
Advanced transaction models try to increase the system’s throughput by, for example,
letting transactions read values produced by non-committed transactions [Pu et al. 1995].
Designed for a single-node or well-connected distributed database, they require frequent
communication during transaction execution.
Optimistic replication systems, in contrast, are designed to work with a high degree of
asynchrony and autonomy. Sites exchange operations in the background and still agree on
a common state. They must learn about relationships between operations, often long after
they were submitted, and at sites different from where submitted. Their techniques, such
as the use of operations, scheduling, and conflict detection, reflect the characteristics of
environments for which they are designed. Preconditions play a role similar to traditional
concurrency control mechanisms, such as two-phase locking or optimistic concurrency
control [Bernstein et al. 1987], but it operates without inter-site coordination. Conflict
resolution corresponds to transaction abortion, in that both are designed to fix problems in
concurrency control.
That said, there are many commonalities between optimistic replication and advanced
3
ACID demands that a group of operations, called a transaction, be: Atomic (all-or-nothing), Consistent (safe
when executed sequentially), Isolated (intermediate state is not observable) and Durable (the final state is persis-
tent) [Gray and Reuter 1993].

Citations
More filters
Journal Article

Conflict-free Replicated Data Types

TL;DR: This paper formalises two popular approaches (state- and operation-based) and their relevant sufficient conditions and studies a number of useful CRDTs, such as sets with clean semantics, supporting both add and remove operations, and considers in depth the more complex Graph data type.

A comprehensive study of Convergent and Commutative Replicated Data Types

TL;DR: This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case, and describes several useful CRDTs, including container data types supporting bothadd and remove operations with clean semantics, and more complex types such as graphs, montonic DAGs, and sequences.
Proceedings ArticleDOI

Logically centralized?: state distribution trade-offs in software defined networks

TL;DR: The state exchange points in a distributed SDN control plane are characterized and two key state distribution trade-offs are identified and simulated in the context of an existing SDN load balancer application.
Proceedings ArticleDOI

Making geo-replicated systems fast as possible, consistent when necessary

TL;DR: This work proposes RedBlue consistency, which enables blue operations to be fast while the remaining red operations are strongly consistent (and slow), and introduces a method that increases the space of potential blue operations by breaking them into separate generator and shadow phases.
References
More filters
Book ChapterDOI

Time, clocks, and the ordering of events in a distributed system

TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Journal ArticleDOI

Time, clocks, and the ordering of events in a distributed system

TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.
Journal ArticleDOI

Impossibility of distributed consensus with one faulty process

TL;DR: In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process.
Book

Concurrency Control and Recovery in Database Systems

TL;DR: In this article, the design and implementation of concurrency control and recovery mechanisms for transaction management in centralized and distributed database systems is described. But this can lead to interference between queries and updates.
Proceedings Article

Hypertext Transfer Protocol -- HTTP/1.1

TL;DR: The Hypertext Transfer Protocol is an application-level protocol for distributed, collaborative, hypermedia information systems, which can be used for many tasks beyond its use for hypertext through extension of its request methods, error codes and headers.
Related Papers (5)
Frequently Asked Questions (7)
Q1. What is the main focus of section 6?

Section 6 focuses on a simpleOptimistic replication · 7subclass of optimistic replication systems, called state-transfer systems, and several interesting techniques available to them. 

Because sites may receive operations in different orders, they must undo and redo operations repeatedly as they gradually learn the final order. 

Such systems are called state-transfer systems, as they only need to record and transmit the final values of objects, not the sequence of operations. 

That said, there are many commonalities between optimistic replication and advanced3 ACID demands that a group of operations, called a transaction, be: Atomic (all-or-nothing), Consistent (safe when executed sequentially), Isolated (intermediate state is not observable) and Durable (the final state is persistent) [Gray and Reuter 1993].transaction models. 

Their techniques, such as the use of operations, scheduling, and conflict detection, reflect the characteristics of environments for which they are designed. 

This is why many Internet and mobile services are optimistic, for instance Usenet [Spencer and Lawrence 1998; Lidl et al. 1994], DNS [Mockapetris 1987; Mockapetris and Dunlap 1988; Albitz and Liu 2001], and mobile file and database systems [Walker et al. 

Section 3 introduces six key design choices for optimistic replication systems, including the number of masters, state- vs operation transfer, scheduling, conflict management, operation propagation, and consistency guaratees.