scispace - formally typeset
Search or ask a question
Journal ArticleDOI

On optimistic methods for concurrency control

01 Jun 1981-ACM Transactions on Database Systems (ACM)-Vol. 6, Iss: 2, pp 213-226
TL;DR: In this paper, two families of non-locking concurrency controls are presented, which are optimistic in the sense that they rely mainly on transaction backup as a control mechanism, "hoping" that conflicts between transactions will not occur.
Abstract: Most current approaches to concurrency control in database systems rely on locking of data objects as a control mechanism. In this paper, two families of nonlocking concurrency controls are presented. The methods used are “optimistic” in the sense that they rely mainly on transaction backup as a control mechanism, “hoping” that conflicts between transactions will not occur. Applications for which these methods should be more efficient than locking are discussed.

Summary (2 min read)

1. INTRODUCTION

  • The correctness criteria used for validation are based on the notion of serial equivalence [4, 12, 141.
  • In the next two sections concurrency controls that rely on the serial equivalence criteria developed in Section 3 for validation are presented.
  • The family of concurrency controls in Section 4 have serial final validation steps, while the concurrency controls of Section 5 have completely parallel validation, at however higher total cost.

3. THE VALIDATION PHASE

  • Where "0" is the usual notation for functional composition.
  • The idea behind this correctness criterion is that, first, each transaction is assumed to have been written so as to individually preserve the integrity of the shared data structure.
  • That is, if d satisfies all integrity criteria, then for each T,, Ti (d) satisfies all integrity criteria.
  • Now, if dinitial satisfies all integrity criteria and the concurrent execution of T,, T2, . . . , T,, is serially equivalent, then from (l), by repeated application of the integrity-preserving property of each transaction, dcnal satisfies all integrity criteria.
  • If semantic information is available, then other approaches may be more attractive (see, e.g., [6, 81) .

3.1 Validation of Serial Equivalence

  • Condition (1) states that Ti actually completes before Tj starts.
  • Condition (2) states that the writes of Ti do not affect the read phase of Tj, and that Ti finishes writing before Tj starts writing, hence does not overwrite Tj (also, note that Tj cannot affect the read phase of Ti).
  • Finally, condition (3) is similar to condition (2) but does not require that T, finish writing before Tj starts writing; it simply requires that Ti not affect the read phase or the write phase of Tj (again note that T; cannot affect the read phase of Ti, by the last part of the condition).
  • See [12] for a set of similar conditions for serialization.

3.2 Assigning Transaction Numbers

  • When a transaction number is needed, the counter is incremented, and the resulting value returned.
  • Also, transaction numbers must be assigned somewhere before validation, since the validation conditions above require knowledge of the transaction number of the transaction being validated.
  • On first thought, the authors might assign transaction numbers at the beginning of the read phase; however, this is not optimistic (hence contrary to the philosophy of this paper) for the following reason.
  • Tz completes its read phase much earlier than Tl, before being validated TZ must wait for the completion of the read phase of Tl, since the validation of TZ in this case relies on knowledge of the write set of Tl .
  • For these and similar considerations the authors assign transaction numbers at the end of the read phase.

3.3 Some Practical Considerations

  • Since the concurrency control can only maintain finitely many write sets, the authors have a difficulty (this difficulty does not arise if transaction numbers are assigned at the beginning of the read phase).
  • Of course, the authors take the optimistic approach and assume such transactions are very rare; still, a solution is needed.
  • The authors solve this problem by only requiring the concurrency control to maintain some finite number of the most recent write sets where the number is large enough to validate almost all transactions (they say write set a is more recent than write set b if the transaction number associated with a is greater than that associated with 6).
  • In such a case the transaction is aborted and restarted, receiving a new transaction number at the completion of the read phase.

4. SERIAL VALIDATION

  • Until now the authors have not considered the question of read-only transactions, or queries.
  • This need not occur in a critical section, so the above discussion on multiple validation stages does not apply to queries.
  • This method for handling queries also applies to the concurrency controls of the next section.
  • Note that for query-dominant systems, validation will often be trivial:.
  • For this type of system an optimistic approach appears ideal.

5. PARALLEL VALIDATION

  • Finally, a solution is possible where transactions that have been invalidated by a transaction in finish active wait for that transaction to either be invalidated, and hence ignored, or validated, causing backup (this possibility was pointed out by James Saxe).
  • This solution involves a more sophisticated process communication mechanism than the binary semaphore needed to implement the critical sections above.

6. ANALYSIS OF AN APPLICATION

  • The authors final and most important consideration is determining how likely it is that one insertion will cause another concurrent insertion to be invalidated.
  • Clearly this depends on the size of the write set of II, and this is determined by the degree of splitting.
  • The authors also assume that an insertion accesses any path from root to leaf equally likely.

7. CONCLUSIONS

  • A more general problem is the following: Consider the case of a database system where transaction conflict is rare, but not rare enough to justify the use of any of the optimistic approaches presented here.
  • Some type of generalized concurrency control is needed that provides "just the right amount" of locking versus backup.
  • Ideally, this should vary as the likelihood of transaction conflict in the system varies.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

On Optimistic Methods for Concurrency
Control
H.T. KUNG and JOHN T. ROBINSON
Carnegie-Mellon University
Most current approaches to concurrency control in database systems rely on locking of data objects
as a control mechanism. In this paper, two families of nonlocking concurrency controls are presented.
The methods used are “optimistic” in the sense that they rely mainly on transaction backup as a
control mechanism, “hoping” that conflicts between transactions will not occur. Applications for
which these methods should be more efficient than locking are discussed.
Key Words and Phrases: databases, concurrency controls,
transaction processing
CR Categories: 4.32, 4.33
1. INTRODUCTION
Consider the problem of providing shared access to a database organized as a
collection of objects. We assume that certain distinguished objects, called the
roots, are always present and access to any object other than a root is gained only
by first accessing a root and then following pointers to that object. Any sequence
of accesses to the database that preserves the integrity constraints of the data is
called a transaction (see, e.g., [4]).
If our goal is to maximize the throughput of accesses to the database, then
there are at least two cases where highly concurrent access is desirable.
(1) The amount of data is sufficiently great that at any given time only a fraction
of the database can be present in primary memory, so that it is necessary to
swap parts of the database from secondary memory as needed.
(2) Even if the entire database can be present in primary memory, there may be
multiple processors.
In both cases the hardware will be underutilized if the degree of concurrency
is too low.
However, as is well known, unrestricted concurrent access to a shared database
will, in general, cause the integrity of the database to b6 lost. Most current
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
This research was supported in part by the National Science Foundation under Grant MCS 78-236-76
and the Office of Naval Research under Contract NOOO14-76-C-0370.
Authors’ address: Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA
15213.
0 1981 ACM 0362-5915/81/0600~0213 $00.75
ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981, Pages 213-226.
i

214 .
H. T. Kung and J. T. Robinson
approaches to this problem involve some type of locking. That is, a mechanism
is provided whereby one process can deny certain other processes access to some
portion of the database. In particular, a lock may be associated with each node of
the directed graph, and any given process is required to follow some locking
protocol so as to guarantee that no other process can ever discover any lack of
integrity in the database temporarily caused by the given process.
The locking approach has the following inherent disadvantages.
(1) Lock maintenance represents an overhead that is not present in the sequential
case. Even read-only transactions (queries), which cannot possibly affect the
integrity of the data, must, in general, use locking in order to guarantee that
the data being read are not modified by other transactions at the same time.
Also, if the locking protocol is not deadlock-free, deadlock detection must be
considered to be part of lock maintenance overhead.
(2) There are no general-purpose deadlock-free locking protocols for databases
that always provide high concurrency. Because of this, some research has
been directed at developing special-purpose locking protocols for various
special cases. For example, in the case of B-trees [l], at least nine locking
protocols have been proposed [2, 3,9, 10, 131.
(3) In the case that large parts of the database are on secondary memory,
concurrency is significantly lowered whenever it is necessary to leave some
congested node locked (a congested node is one that is often accessed, e.g.,
the root of a tree) while waiting for a secondary memory access.
(4) To allow a transaction to abort itself when mistakes occur, locks cannot be
released until the end of the transaction. This may again significantly lower
concurrency.
(5) Most important for the purposes of this paper, locking may be necessary only
in the worst case. Consider the following simple example: The directed graph
consists solely of roots, and each transaction involves one root only, any root
equally likely. Then if there are n roots and two processes executing trans-
actions at the same rate, locking is really needed (if at all) every n transac-
tions, on the average.
In general, one may expect the argument of (5) to hold whenever (a) the
number of nodes in the graph is very large compared to the total number of nodes
involved in all the running transactions at a given time, and (b) the probability
of modifying a congested node is small. In many applications, (a) and (b) are
designed to hold (see Section 6 for the B-tree application).
Research directed at finding deadlock-free locking protocols may be seen as an
attempt to lower the expense of concurrency control by eliminating transaction
backup as a control mechanism. In this paper we consider the converse problem,
that of eliminating locking. We propose two families of concurrency controls that
do not use locking. These methods are “optimistic” in the sense that they rely for
efficiency on the hope that conflicts between transactions will not occur. If (5)
does hold, such conflict will be rare. This approach also has the advantage that
it is completely general, applying equally well to any shared directed graph
structure and associated access algorithms. Since locks are not used, it is deadlock-
free (however, starvation is a possible problem, a solution for which we discuss).
ACM Transactions on Database Systems, Vol. 6, No. 2, June 1961

Optimistic Methods for Concurrency Control
*
215
read
validation write
\
I
/
\
I
I
b time
Fig. 1. The three phases of a transaction.
It is also possible using this approach to avoid problems (3) and (4) above. Finally,
if the transaction pattern becomes query dominant (i.e., most transactions are
read-only), then the concurrency control overhead becomes almost totally negli-
gible (a partial solution to problem (1)).
The idea behind this optimistic approach is quite simple, and may be summa-
rized as follows.
(1) Since reading a value or a pointer from a node can never cause a loss of
integrity, reads are completely unrestricted (however, returning a result from
a query is considered to be equivalent to a write, and so is subject to validation
as discussed below).
(2) Writes are severely restricted. It is required that any transaction consist of
two or three phases: a read phase,
a
validation phase, and a possible write
phase (see Figure 1). During the read phase, all writes take place on local
copies of the nodes to be modified. Then, if it can be established during the
validation phase that the changes the transaction made will not cause a loss
of integrity, the local copies are made global in the write phase. In the case
of a query, it must be determined that the result the query would return is
actually correct. The step in which it is determined that the transaction will
not cause a loss of integrity (or that it will return the correct result) is called
validation.
If, in a locking approach, locking is only necessary in the worst case, then in an
optimistic approach validation will fail also only in the worst case. If validation
does fail, the transaction will be backed up and start over again as a new
transaction. Thus a transaction will have a write phase only if the preceding
validation succeeds.
In Section 2 we discuss in more detail the read and write phases of transactions.
In Section 3 a particularly strong form of validation is presented. The correctness
criteria used for validation are based on the notion of serial equivalence [4, 12,
141. In the next two sections concurrency controls that rely on the serial equiva-
lence criteria developed in Section 3 for validation are presented. The family of
concurrency controls in Section 4 have serial final validation steps, while the
concurrency controls of Section 5 have completely parallel validation, at however
higher total cost. In Section 6 we analyze the application of optimistic methods
to controlling concurrent insertions in B-trees. Section 7 contains a summary and
a discussion of future research.
ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981.

216 *
H. T. Kung and J. T. Robinson
2. THE READ AND WRITE PHASES
In this section we briefly discuss how the concurrency control can support the
read and write phases of user-programmed transactions (in a manner invisible to
the user), and how this can be implemented efficiently. The validation phase will
be treated in the following three sections.
We assume that an underlying system provides for the manipulation of objects
of various types. For simplicity, assume all objects are of the same type. Objects
are manipulated by the following procedures, where n is the name of an object,
i
is a parameter to the type manager, and v is a value of arbitrary type (v could
be a pointer, i.e., an object name, or data):
create create a new object and return its name.
deZete(
n)
delete object n.
read(n, i)
read item i of object n and return its value.
write (n, i, u) write u as item i of object n.
In order to support the read and write phases of transactions we also use the
following procedures:
COPY( n)
create a new object that is a copy of object
n and return its name.
exchange(n1, n2) exchange the names of objects nl and n2.
The concurrency control is invisible to the user; transactions are written as if
the above procedures were used directly. However, transactions are required to
use the syntactically identical procedures tcreate, tdelete, tread, and twrite. For
each transaction, the concurrency control maintains sets of object names accessed
by the transaction. These sets are initialized to be empty by a tbegin call. The
body of the user-written transaction is in fact the read phase mentioned in the
introduction; the subsequent validation phase does not begin until after a tend
call. The procedures tbegin and tend are shown in detail in Sections 4 and 5. The
semantics of the remaining procedures are as follows:
tcreate = (
n := create;
create set := create set U {IL} ;
return n)
twrite(n, i, u) = (
if n E
create set
then
write(n, i, u)
else if
n E write set
then
write( copies[ n], i, u)
else (
m := copy(n);
copies[ n] := m;
write set := write set U {n);
write (copies[n], i, u)))
tread(n, i) = (
read set := read set U {n} ;
if
n E write set
then return
read (copies[ n], i)
ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981

Optimistic Methods for Concurrency Control *
217
else
return read (n, i))
tdelete (n) = (
delete set := delete set U (n}).
Above,
copies
is an associative vector of object names, indexed by object name.
We see that in the read phase, no global writes take place. Instead, whenever the
first write to a given object is requested, a copy is made, and all subsequent writes
are directed to the copy. This copy is potentially global but is inaccessible to
other transactions during the read phase by our convention that all nodes are
accessed only by following pointers from a root node. If the node is a root node,
the copy is inaccessible since it has the wrong name (all transactions “know” the
global names of root nodes). It is assumed that no root node is created or deleted,
that no dangling pointers are left to deleted nodes, and that created nodes become
accessible by writing new pointers (these conditions are part of the integrity
criteria for the data structure that each transaction is required to individually
preserve).
When the transaction completes, it will request its validation and write phases
via a tend call. If validation succeeds, then the transaction enters the write phase,
which is simply
for n E write set do exchange ( TZ, copies[ a]).
After the write phase all written values become “global,” all created nodes
become accessible, and all deleted nodes become inaccessible. Of course
some
cleanup is necessary, which we do not consider to be part of the write phase since
it does not interact with other transactions:
(for n E delete set do delete(n);
for n E write set do delete( copies[n])).
This cleanup is also necessary if a transaction is aborted.
Note that since objects are virtual (objects are referred to by name, not by
physical address), the exchange operation, and hence the write phase, can be
made quite fast: essentially, all that is necessary is to exchange the physical
address parts of the two object descriptors.
Finally, we note that the concept of two-phase transactions appears to be quite
valuable for recovery purposes, since at the end of the read phase, all changes
that the transaction intends to make to the data structure are known.
3. THE VALIDATION PHASE
A widely used criterion for verifying the correctness of concurrent execution of
transactions has been variously called serial equivalence [4], serial reproducibility
[ll], and linearizability [14]. This criterion may be defined as follows.
Let transactions Tl, TX, . . . ,T, be executed concurrently. Denote an instance of
the shared data structure by d, and let D be the
set
of all possible d, so that each
Ti may be considered as a function:
Ti:D+D.
ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981.

Citations
More filters
Journal ArticleDOI
12 Nov 2000
TL;DR: OceanStore monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data.
Abstract: OceanStore is a utility infrastructure designed to span the globe and provide continuous access to persistent information. Since this infrastructure is comprised of untrusted servers, data is protected through redundancy and cryptographic techniques. To improve performance, data is allowed to be cached anywhere, anytime. Additionally, monitoring of usage patterns allows adaptation to regional outages and denial of service attacks; monitoring also enhances performance through pro-active movement of data. A prototype implementation is currently under development.

3,376 citations

Book
01 Aug 1990
TL;DR: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
Abstract: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud computing and streaming data applications, has forced a renewal of interest in distributed and parallel data management, while, at the same time, requiring a rethinking of some of the traditional techniques. This book covers the breadth and depth of this re-emerging field. The coverage consists of two parts. The first part discusses the fundamental principles of distributed data management and includes distribution design, data integration, distributed query processing and optimization, distributed transaction management, and replication. The second part focuses on more advanced topics and includes discussion of parallel database systems, distributed object management, peer-to-peer data management, web data management, data stream systems, and cloud computing. New in this Edition: New chapters, covering database replication, database integration, multidatabase query processing, peer-to-peer data management, and web data management. Coverage of emerging topics such as data streams and cloud computing Extensive revisions and updates based on years of class testing and feedback Ancillary teaching materials are available.

2,395 citations

Journal ArticleDOI
TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Abstract: Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate the problem: In order to manipulate large sets of complex objects as efficiently as today's database systems manipulate simple records, query-processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

1,427 citations

Book
30 Jan 2009
TL;DR: What constitutes a distributed operating system and how it is distinguished from a computer network are discussed, and several examples of current research projects are examined in some detail.
Abstract: Distributed operating systems have many aspects in common with centralized ones, but they also differ in certain ways This paper is intended as an introduction to distributed operating systems, and especially to current university research about them After a discussion of what constitutes a distributed operating system and how it is distinguished from a computer network, various key design issues are discussed Then several examples of current research projects are examined in some detail, namely, the Cambridge Distributed Computing System, Amoeba, V, and Eden

1,327 citations

Book
01 Jan 2000
TL;DR: The article gives an overview of technologies to distribute the execution of simulation programs over multiple computer systems, with particular emphasis on synchronization (also called time management) algorithms as well as data distribution techniques.
Abstract: Originating from basic research conducted in the 1970's and 1980's, the parallel and distributed simulation field has matured over the last few decades. Today, operational systems have been fielded for applications such as military training, analysis of communication networks, and air traffic control systems, to mention a few. The article gives an overview of technologies to distribute the execution of simulation programs over multiple computer systems. Particular emphasis is placed on synchronization (also called time management) algorithms as well as data distribution techniques.

1,217 citations

References
More filters
Journal ArticleDOI
TL;DR: It is argued that a transaction needs to lock a logical rather than a physical subset of the database, and an implementation of predicate locks which satisfies the consistency condition is suggested.
Abstract: In database systems, users access shared data under the assumption that the data satisfies certain consistency constraints. This paper defines the concepts of transaction, consistency and schedule and shows that consistency requires that a transaction cannot request new locks after releasing a lock. Then it is argued that a transaction needs to lock a logical rather than a physical subset of the database. These subsets may be specified by predicates. An implementation of predicate locks which satisfies the consistency condition is suggested.

2,031 citations

Book ChapterDOI
Jim Gray1
01 Jan 1978
TL;DR: This paper is a compendium of data base management operating systems folklore and focuses on particular issues unique to the transaction management component especially locking and recovery.
Abstract: This paper is a compendium of data base management operating systems folklore. It is an early paper and is still in draft form. It is intended as a set of course notes for a class on data base operating systems. After a brief overview of what a data management system is it focuses on particular issues unique to the transaction management component especially locking and recovery.

1,635 citations

Journal ArticleDOI
TL;DR: Several efficiently recognizable subclasses of the class of senahzable histories are introduced and it is shown how these results can be extended to far more general transaction models, to transactions with partly interpreted functions, and to distributed database systems.
Abstract: A sequence of interleaved user transactions in a database system may not be ser:ahzable, t e, equivalent to some sequential execution of the individual transactions Using a simple transaction model, it ~s shown that recognizing the transaction histories that are serlahzable is an NP-complete problem. Several efficiently recognizable subclasses of the class of senahzable histories are therefore introduced; most of these subclasses correspond to senahzabdity principles existing in the hterature and used in practice Two new principles that subsume all previously known ones are also proposed Necessary and sufficient conditions are given for a class of histories to be the output of an efficient history scheduler, these conditions imply that there can be no efficient scheduler that outputs all of senahzable histories, and also that all subclasses of senalizable histories studied above have an efficient scheduler Finally, it is shown how these results can be extended to far more general transaction models, to transactions with partly interpreted functions, and to distributed database systems

1,028 citations

Journal ArticleDOI
TL;DR: The B-tree and its variants have been found to be highly useful for storing large amounts of information, especially on secondary storage devices, and a single additional “link” pointer in each node allows a process to easily recover from tree modifications performed by other concurrent processes.
Abstract: The B-tree and its variants have been found to be highly useful (both theoretically and in practice) for storing large amounts of information, especially on secondary storage devices. We examine the problem of overcoming the inherent difficulty of concurrent operations on such structures, using a practical storage model. A single additional “link” pointer in each node allows a process to easily recover from tree modifications performed by other concurrent processes. Our solution compares favorably with earlier solutions in that the locking scheme is simpler (no read-locks are used) and only a (small) constant number of nodes are locked by any update process at any given time. An informal correctness proof for our system is given.

528 citations

Journal ArticleDOI
Rudolf Bayer, M. Schkolnick1
TL;DR: It is concluded that B-trees can be used advantageously in a multi-user environment because the solution presented here uses simple locking protocols which can be tuned to specific requirements.
Abstract: Concurrent operations on B-trees pose the problem of insuring that each operation can be carried out without interfering with other operations being performed simultaneously by other users. This problem can become critical if these structures are being used to support access paths, like indexes, to data base systems. In this case, serializing access to one of these indexes can create an unacceptable bottleneck for the entire system. Thus, there is a need for locking protocols that can assure integrity for each access while at the same time providing a maximum possible degree of concurrency. Another feature required from these protocols is that they be deadlock free, since the cost to resolve a deadlock may be high. Recently, there has been some questioning on whether B-tree structures can support concurrent operations. In this paper, we examine the problem of concurrent access to B-trees. We present a deadlock free solution which can be tuned to specific requirements. An analysis is presented which allows the selection of parameters so as to satisfy these requirements. The solution presented here uses simple locking protocols. Thus, we conclude that B-trees can be used advantageously in a multi-user environment.

405 citations

Frequently Asked Questions (1)
Q1. What are the contributions in "On optimistic methods for concurrency control" ?

In this paper, two families of nonlocking concurrency controls are presented. Applications for which these methods should be more efficient than locking are discussed.