Journal Article•DOI•

On optimistic methods for concurrency control

Hsiang-Tsung Kung¹, John T. Robinson¹•Institutions (1)

01 Jun 1981-ACM Transactions on Database Systems (ACM)-Vol. 6, Iss: 2, pp 213-226

TL;DR: In this paper, two families of non-locking concurrency controls are presented, which are optimistic in the sense that they rely mainly on transaction backup as a control mechanism, "hoping" that conflicts between transactions will not occur.

read less

Abstract: Most current approaches to concurrency control in database systems rely on locking of data objects as a control mechanism. In this paper, two families of nonlocking concurrency controls are presented. The methods used are “optimistic” in the sense that they rely mainly on transaction backup as a control mechanism, “hoping” that conflicts between transactions will not occur. Applications for which these methods should be more efficient than locking are discussed.

...read moreread less

Summary (2 min read)

Jump to: [1. INTRODUCTION] – [3. THE VALIDATION PHASE] – [3.1 Validation of Serial Equivalence] – [3.2 Assigning Transaction Numbers] – [3.3 Some Practical Considerations] – [4. SERIAL VALIDATION] – [5. PARALLEL VALIDATION] – [6. ANALYSIS OF AN APPLICATION] and [7. CONCLUSIONS]

1. INTRODUCTION

The correctness criteria used for validation are based on the notion of serial equivalence [4, 12, 141.
In the next two sections concurrency controls that rely on the serial equivalence criteria developed in Section 3 for validation are presented.
The family of concurrency controls in Section 4 have serial final validation steps, while the concurrency controls of Section 5 have completely parallel validation, at however higher total cost.

3. THE VALIDATION PHASE

Where "0" is the usual notation for functional composition.
The idea behind this correctness criterion is that, first, each transaction is assumed to have been written so as to individually preserve the integrity of the shared data structure.
That is, if d satisfies all integrity criteria, then for each T,, Ti (d) satisfies all integrity criteria.
Now, if dinitial satisfies all integrity criteria and the concurrent execution of T,, T2, . . . , T,, is serially equivalent, then from (l), by repeated application of the integrity-preserving property of each transaction, dcnal satisfies all integrity criteria.
If semantic information is available, then other approaches may be more attractive (see, e.g., [6, 81) .

3.1 Validation of Serial Equivalence

Condition (1) states that Ti actually completes before Tj starts.
Condition (2) states that the writes of Ti do not affect the read phase of Tj, and that Ti finishes writing before Tj starts writing, hence does not overwrite Tj (also, note that Tj cannot affect the read phase of Ti).
Finally, condition (3) is similar to condition (2) but does not require that T, finish writing before Tj starts writing; it simply requires that Ti not affect the read phase or the write phase of Tj (again note that T; cannot affect the read phase of Ti, by the last part of the condition).
See [12] for a set of similar conditions for serialization.

3.2 Assigning Transaction Numbers

When a transaction number is needed, the counter is incremented, and the resulting value returned.
Also, transaction numbers must be assigned somewhere before validation, since the validation conditions above require knowledge of the transaction number of the transaction being validated.
On first thought, the authors might assign transaction numbers at the beginning of the read phase; however, this is not optimistic (hence contrary to the philosophy of this paper) for the following reason.
Tz completes its read phase much earlier than Tl, before being validated TZ must wait for the completion of the read phase of Tl, since the validation of TZ in this case relies on knowledge of the write set of Tl .
For these and similar considerations the authors assign transaction numbers at the end of the read phase.

3.3 Some Practical Considerations

Since the concurrency control can only maintain finitely many write sets, the authors have a difficulty (this difficulty does not arise if transaction numbers are assigned at the beginning of the read phase).
Of course, the authors take the optimistic approach and assume such transactions are very rare; still, a solution is needed.
The authors solve this problem by only requiring the concurrency control to maintain some finite number of the most recent write sets where the number is large enough to validate almost all transactions (they say write set a is more recent than write set b if the transaction number associated with a is greater than that associated with 6).
In such a case the transaction is aborted and restarted, receiving a new transaction number at the completion of the read phase.

4. SERIAL VALIDATION

Until now the authors have not considered the question of read-only transactions, or queries.
This need not occur in a critical section, so the above discussion on multiple validation stages does not apply to queries.
This method for handling queries also applies to the concurrency controls of the next section.
Note that for query-dominant systems, validation will often be trivial:.
For this type of system an optimistic approach appears ideal.

5. PARALLEL VALIDATION

Finally, a solution is possible where transactions that have been invalidated by a transaction in finish active wait for that transaction to either be invalidated, and hence ignored, or validated, causing backup (this possibility was pointed out by James Saxe).
This solution involves a more sophisticated process communication mechanism than the binary semaphore needed to implement the critical sections above.

6. ANALYSIS OF AN APPLICATION

The authors final and most important consideration is determining how likely it is that one insertion will cause another concurrent insertion to be invalidated.
Clearly this depends on the size of the write set of II, and this is determined by the degree of splitting.
The authors also assume that an insertion accesses any path from root to leaf equally likely.

7. CONCLUSIONS

A more general problem is the following: Consider the case of a database system where transaction conflict is rare, but not rare enough to justify the use of any of the optimistic approaches presented here.
Some type of generalized concurrency control is needed that provides "just the right amount" of locking versus backup.
Ideally, this should vary as the likelihood of transaction conflict in the system varies.

Did you find this useful? Give us your feedback

Content maybe subject to copyright Report

On Optimistic Methods for Concurrency

Control

H.T. KUNG and JOHN T. ROBINSON

Carnegie-Mellon University

Most current approaches to concurrency control in database systems rely on locking of data objects

as a control mechanism. In this paper, two families of nonlocking concurrency controls are presented.

The methods used are “optimistic” in the sense that they rely mainly on transaction backup as a

control mechanism, “hoping” that conflicts between transactions will not occur. Applications for

which these methods should be more efficient than locking are discussed.

Key Words and Phrases: databases, concurrency controls,

transaction processing

CR Categories: 4.32, 4.33

1. INTRODUCTION

Consider the problem of providing shared access to a database organized as a

collection of objects. We assume that certain distinguished objects, called the

roots, are always present and access to any object other than a root is gained only

by first accessing a root and then following pointers to that object. Any sequence

of accesses to the database that preserves the integrity constraints of the data is

called a transaction (see, e.g., [4]).

If our goal is to maximize the throughput of accesses to the database, then

there are at least two cases where highly concurrent access is desirable.

(1) The amount of data is sufficiently great that at any given time only a fraction

of the database can be present in primary memory, so that it is necessary to

swap parts of the database from secondary memory as needed.

(2) Even if the entire database can be present in primary memory, there may be

multiple processors.

In both cases the hardware will be underutilized if the degree of concurrency

is too low.

However, as is well known, unrestricted concurrent access to a shared database

will, in general, cause the integrity of the database to b6 lost. Most current

Permission to copy without fee all or part of this material is granted provided that the copies are not

made or distributed for direct commercial advantage, the ACM copyright notice and the title of the

publication and its date appear, and notice is given that copying is by permission of the Association

for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific

permission.

This research was supported in part by the National Science Foundation under Grant MCS 78-236-76

and the Office of Naval Research under Contract NOOO14-76-C-0370.

Authors’ address: Department of Computer Science, Carnegie-Mellon University, Pittsburgh, PA

15213.

0 1981 ACM 0362-5915/81/0600~0213 $00.75

ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981, Pages 213-226.

214 .

H. T. Kung and J. T. Robinson

approaches to this problem involve some type of locking. That is, a mechanism

is provided whereby one process can deny certain other processes access to some

portion of the database. In particular, a lock may be associated with each node of

the directed graph, and any given process is required to follow some locking

protocol so as to guarantee that no other process can ever discover any lack of

integrity in the database temporarily caused by the given process.

The locking approach has the following inherent disadvantages.

(1) Lock maintenance represents an overhead that is not present in the sequential

case. Even read-only transactions (queries), which cannot possibly affect the

integrity of the data, must, in general, use locking in order to guarantee that

the data being read are not modified by other transactions at the same time.

Also, if the locking protocol is not deadlock-free, deadlock detection must be

considered to be part of lock maintenance overhead.

(2) There are no general-purpose deadlock-free locking protocols for databases

that always provide high concurrency. Because of this, some research has

been directed at developing special-purpose locking protocols for various

special cases. For example, in the case of B-trees [l], at least nine locking

protocols have been proposed [2, 3,9, 10, 131.

(3) In the case that large parts of the database are on secondary memory,

concurrency is significantly lowered whenever it is necessary to leave some

congested node locked (a congested node is one that is often accessed, e.g.,

the root of a tree) while waiting for a secondary memory access.

(4) To allow a transaction to abort itself when mistakes occur, locks cannot be

released until the end of the transaction. This may again significantly lower

concurrency.

(5) Most important for the purposes of this paper, locking may be necessary only

in the worst case. Consider the following simple example: The directed graph

consists solely of roots, and each transaction involves one root only, any root

equally likely. Then if there are n roots and two processes executing trans-

actions at the same rate, locking is really needed (if at all) every n transac-

tions, on the average.

In general, one may expect the argument of (5) to hold whenever (a) the

number of nodes in the graph is very large compared to the total number of nodes

involved in all the running transactions at a given time, and (b) the probability

of modifying a congested node is small. In many applications, (a) and (b) are

designed to hold (see Section 6 for the B-tree application).

Research directed at finding deadlock-free locking protocols may be seen as an

attempt to lower the expense of concurrency control by eliminating transaction

backup as a control mechanism. In this paper we consider the converse problem,

that of eliminating locking. We propose two families of concurrency controls that

do not use locking. These methods are “optimistic” in the sense that they rely for

efficiency on the hope that conflicts between transactions will not occur. If (5)

does hold, such conflict will be rare. This approach also has the advantage that

it is completely general, applying equally well to any shared directed graph

structure and associated access algorithms. Since locks are not used, it is deadlock-

free (however, starvation is a possible problem, a solution for which we discuss).

ACM Transactions on Database Systems, Vol. 6, No. 2, June 1961

Optimistic Methods for Concurrency Control

215

read

validation write

b time

Fig. 1. The three phases of a transaction.

It is also possible using this approach to avoid problems (3) and (4) above. Finally,

if the transaction pattern becomes query dominant (i.e., most transactions are

read-only), then the concurrency control overhead becomes almost totally negli-

gible (a partial solution to problem (1)).

The idea behind this optimistic approach is quite simple, and may be summa-

rized as follows.

(1) Since reading a value or a pointer from a node can never cause a loss of

integrity, reads are completely unrestricted (however, returning a result from

a query is considered to be equivalent to a write, and so is subject to validation

as discussed below).

(2) Writes are severely restricted. It is required that any transaction consist of

two or three phases: a read phase,

validation phase, and a possible write

phase (see Figure 1). During the read phase, all writes take place on local

copies of the nodes to be modified. Then, if it can be established during the

validation phase that the changes the transaction made will not cause a loss

of integrity, the local copies are made global in the write phase. In the case

of a query, it must be determined that the result the query would return is

actually correct. The step in which it is determined that the transaction will

not cause a loss of integrity (or that it will return the correct result) is called

validation.

If, in a locking approach, locking is only necessary in the worst case, then in an

optimistic approach validation will fail also only in the worst case. If validation

does fail, the transaction will be backed up and start over again as a new

transaction. Thus a transaction will have a write phase only if the preceding

validation succeeds.

In Section 2 we discuss in more detail the read and write phases of transactions.

In Section 3 a particularly strong form of validation is presented. The correctness

criteria used for validation are based on the notion of serial equivalence [4, 12,

141. In the next two sections concurrency controls that rely on the serial equiva-

lence criteria developed in Section 3 for validation are presented. The family of

concurrency controls in Section 4 have serial final validation steps, while the

concurrency controls of Section 5 have completely parallel validation, at however

higher total cost. In Section 6 we analyze the application of optimistic methods

to controlling concurrent insertions in B-trees. Section 7 contains a summary and

a discussion of future research.

ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981.

216 *

H. T. Kung and J. T. Robinson

2. THE READ AND WRITE PHASES

In this section we briefly discuss how the concurrency control can support the

read and write phases of user-programmed transactions (in a manner invisible to

the user), and how this can be implemented efficiently. The validation phase will

be treated in the following three sections.

We assume that an underlying system provides for the manipulation of objects

of various types. For simplicity, assume all objects are of the same type. Objects

are manipulated by the following procedures, where n is the name of an object,

is a parameter to the type manager, and v is a value of arbitrary type (v could

be a pointer, i.e., an object name, or data):

create create a new object and return its name.

deZete(

delete object n.

read(n, i)

read item i of object n and return its value.

write (n, i, u) write u as item i of object n.

In order to support the read and write phases of transactions we also use the

following procedures:

COPY( n)

create a new object that is a copy of object

n and return its name.

exchange(n1, n2) exchange the names of objects nl and n2.

The concurrency control is invisible to the user; transactions are written as if

the above procedures were used directly. However, transactions are required to

use the syntactically identical procedures tcreate, tdelete, tread, and twrite. For

each transaction, the concurrency control maintains sets of object names accessed

by the transaction. These sets are initialized to be empty by a tbegin call. The

body of the user-written transaction is in fact the read phase mentioned in the

introduction; the subsequent validation phase does not begin until after a tend

call. The procedures tbegin and tend are shown in detail in Sections 4 and 5. The

semantics of the remaining procedures are as follows:

tcreate = (

n := create;

create set := create set U {IL} ;

return n)

twrite(n, i, u) = (

if n E

create set

then

write(n, i, u)

else if

n E write set

then

write( copies[ n], i, u)

else (

m := copy(n);

copies[ n] := m;

write set := write set U {n);

write (copies[n], i, u)))

tread(n, i) = (

read set := read set U {n} ;

n E write set

then return

read (copies[ n], i)

ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981

Optimistic Methods for Concurrency Control *

217

else

return read (n, i))

tdelete (n) = (

delete set := delete set U (n}).

Above,

copies

is an associative vector of object names, indexed by object name.

We see that in the read phase, no global writes take place. Instead, whenever the

first write to a given object is requested, a copy is made, and all subsequent writes

are directed to the copy. This copy is potentially global but is inaccessible to

other transactions during the read phase by our convention that all nodes are

accessed only by following pointers from a root node. If the node is a root node,

the copy is inaccessible since it has the wrong name (all transactions “know” the

global names of root nodes). It is assumed that no root node is created or deleted,

that no dangling pointers are left to deleted nodes, and that created nodes become

accessible by writing new pointers (these conditions are part of the integrity

criteria for the data structure that each transaction is required to individually

preserve).

When the transaction completes, it will request its validation and write phases

via a tend call. If validation succeeds, then the transaction enters the write phase,

which is simply

for n E write set do exchange ( TZ, copies[ a]).

After the write phase all written values become “global,” all created nodes

become accessible, and all deleted nodes become inaccessible. Of course

some

cleanup is necessary, which we do not consider to be part of the write phase since

it does not interact with other transactions:

(for n E delete set do delete(n);

for n E write set do delete( copies[n])).

This cleanup is also necessary if a transaction is aborted.

Note that since objects are virtual (objects are referred to by name, not by

physical address), the exchange operation, and hence the write phase, can be

made quite fast: essentially, all that is necessary is to exchange the physical

address parts of the two object descriptors.

Finally, we note that the concept of two-phase transactions appears to be quite

valuable for recovery purposes, since at the end of the read phase, all changes

that the transaction intends to make to the data structure are known.

3. THE VALIDATION PHASE

A widely used criterion for verifying the correctness of concurrent execution of

transactions has been variously called serial equivalence [4], serial reproducibility

[ll], and linearizability [14]. This criterion may be defined as follows.

Let transactions Tl, TX, . . . ,T, be executed concurrently. Denote an instance of

the shared data structure by d, and let D be the

set

of all possible d, so that each

Ti may be considered as a function:

Ti:D+D.

ACM Transactions on Database Systems, Vol. 6, No. 2, June 1981.

HTML Viewer

Frequently Asked Questions (1)

Q1. What are the contributions in "On optimistic methods for concurrency control" ?

In this paper, two families of nonlocking concurrency controls are presented. Applications for which these methods should be more efficient than locking are discussed.

On optimistic methods for concurrency control

Summary (2 min read)

1. INTRODUCTION

3. THE VALIDATION PHASE

3.1 Validation of Serial Equivalence

3.2 Assigning Transaction Numbers

3.3 Some Practical Considerations

4. SERIAL VALIDATION

5. PARALLEL VALIDATION

6. ANALYSIS OF AN APPLICATION

7. CONCLUSIONS

Citations

References

Related Papers (5)

Frequently Asked Questions (1)

Q1. What are the contributions in "On optimistic methods for concurrency control" ?