scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Software transactional memory for dynamic-sized data structures

TL;DR: A new form of software transactional memory designed to support dynamic-sized data structures, and a novel non-blocking implementation of this STM that uses modular contention managers to ensure progress in practice.
Abstract: We propose a new form of software transactional memory (STM) designed to support dynamic-sized data structures, and we describe a novel non-blocking implementation. The non-blocking property we consider is obstruction-freedom. Obstruction-freedom is weaker than lock-freedom; as a result, it admits substantially simpler and more efficient implementations. A novel feature of our obstruction-free STM implementation is its use of modular contention managers to ensure progress in practice. We illustrate the utility of our dynamic STM with a straightforward implementation of an obstruction-free red-black tree, thereby demonstrating a sophisticated non-blocking dynamic data structure that would be difficult to implement by other means. We also present the results of simple preliminary performance experiments that demonstrate that an "early release" feature of our STM is useful for reducing contention, and that our STM lends itself to the effective use of modular contention managers.

Summary (3 min read)

1. INTRODUCTION

  • Using locks in programs for shared-memory multiprocessors introduces well-known software engineering problems.
  • Coarse-grained locks, which protect relatively large amounts of data, generally do not scale: threads block one another even when they do not really interfere, and the lock becomes a source of contention.
  • Fine-grained locks can mitigate these scalability problems, but they introduce software engineering problems as the locking conventions for guarartteeing correctness and avoiding deadlock become complex and error-prone.
  • A thread preempted while holding a lock will obstruct other threads.

PODC'03,

  • Dynamic Software Transactional Memory (DSTM) is a low-level application programming interface (API) for synchronizing shared data without using locks.
  • Transactional memory supports a computational model in which each thread announces the start of a transaction, executes a sequence of operations on shared objects, and then tries to commit the transaction.
  • Like stronger non-blocking progress conditions such as lock-freedom and wait-freedom, obstruction-freedom ensures that a halted thread cannot prevent other threads from making progress.
  • It consults a contention manager to determine whether it should abort the other transaction immediately or wait for some time to allow the other transaction a chance to complete.
  • The authors believe that this approach will yield simpler and more efficleat concurrent data structures, which will help accelerate their widespread acceptance and deployment.

2. OVERVIEW AND EXAMPLES

  • Ultimately, the authors would like DSTM to support nested transactions, so that a class whose methods use transactions can invoke from within a transaction methods of other classes that also use transactions.
  • The authors have not acquired sufficient experience programming with DSTM to decide on the appropriate nesting semantics, so they do not specify this behavior for now.

2.1 Extended Example

  • An attractive feature of DSTM is that the authors can reason about this code almost as if it were sequential.
  • The principal differences are the need to catch Denied exceptions and to retry transactions that fail to commit, and the need to distinguish between transactional :nodes and non-transactional list elements.
  • Note that after catching a Denied exception, the authors must still call commitTransaction to terminate the transaction, even though it is guaranteed to fail.

2.2 Conflict Reduction

  • DSTM provides several mechanisms for eliminating unneeded conflicts.
  • One conventional mechanism is to allow transactions to open nodes in read-only mode, indicating that the transaction will not modify the object.

List list = (List)node.open(READ);

  • Concurrent transactions that open the same transactional object for reading do not conflict.
  • Once an object has been released, other transactions accessing that object do not conflict with the releasing transaction over the released object.
  • The effects in this case are potentially even worse because that transaction can actually commit, even though it is not linearizable.
  • A transaction that adds an element to the list "upgrades" its access to the node to be modified by reopening that node in WRITE mode.
  • Because a transaction may open the same object several times, the DSTM matches, for each object, invocations of 4This is analogous to the technique of lock coupling (see [5] , e.g.), but of course does not use any locks.

3. IMPLEMENTATION

  • A transaction object (class Transaction) has a status field that is initialized to be ACTIVE, and is later set to either COMMITTED or AB01:tTED using a CAS instruction.
  • 5 (CAS functionality is provided by the AtemicRe:f class in the experimental prototype of Doug Lea's java.util, concurrent package [1] .).

3.1 Opening a Transactional Object

  • To match invocations of open(READ) and release, the transaction also maintains a counter for each pair in its readonly table.
  • If an object is opened in READ mode when it already has an entry in the table, the transaction increments the corresponding counter instead of inserting a new pair.
  • This counter is decremented by the release method, and the pair is removed when the counter is reduced to zero.

3.2 Validating and Committing a Transaction

  • This problem does not arise in their Java implementation, because garbage collection (GC) ensures that a Locater object does not get recycled until no thread has a pointer to it.
  • While GC eliminates the ABA problem in this case, the authors caution the reader against assuming that the ABA problem can never occur in environments that support GC.
  • Committing a transaction requires two steps: validating the entries in the read-only table as described above, and calling CAS to attempt to change the status field of the Transaction object from ACTIVE to COMMITTED.

3.3 Costs

  • Validating a transaction that has opened W objects for writing and R objects for reading (that have not been released) requires O(R) work.
  • Because validation must be performed whenever an object is opened and when the transaction commits, the total overhead due to the DSTM implementation for a transaction that opens R for reading and W objects for writing is O((R + W)R) plus the cost of copying each of the W objects opened for writing once.
  • Note that, in addition to reducing the potential for conflict, releasing objects opened for reading also reduces the overhead due to validation: released objects do not need to be validated.
  • Thus, if at most K objects are open for reading at any time, then the total overhead for a transaction is only O((R + W)K) plus the cost of cloning the W objects.

4. CONTENTION MANAGEMENT

  • In addition to their explicit purposes, the contention manager's methods may implement other measures, such as backoff and queuing, to manage contention.
  • The authors have done only preliminary work using these methods to implement some simple contention management strategies, and they expect the Contention.
  • Manager interface to evolve as the authors gain experience with what methods--especially notification methods-are useful for implementing more sophisticated strategies.

4.1 Examples

  • One can imagine many variations on this strategy, as well as different strategies based on queuing rather than backoff combined with spinning.
  • Discovering which strategies work best remains an open area of research.

5. RESULTS

  • One shortcoming of their current DSTM implementation with respect to the range of possible contention managers is that there is no way for one transaction to detect that another transaction has opened an object in READ mode.
  • By opening that object in WRITE mode, the first transaction will cause the other transaction to abort.
  • Clearly there is a tradeoff between the amount of synchronization needed to open an object for reading in a "visible" way in order to enable competing transactions to "be polite" and the benefit derived from doing so.
  • The authors axe currently working on some ideas in this direction.

6. CONCLUDING REMARKS

  • Thanks to Ron Larson for getting us access to the Sun Fire 15K computer, to Doug Lea for his ex-perimentM j a v a .
  • Thanks also to Guy Steele and Jan-Willem Maessen for useful feedback, especially about the DSTM interface.
  • (Jan Mso suggested nulling out the extra pointer of locators whose transactions are aborted or committed to allow garbage collection of the obsolete version.).

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Software Transactional Memory for Dynamic-Sized Data
Structures
Maurice Herlihy
Department of Computer Science
Brown University
Providence, RI 02912, USA
mph@cs.brown.edu
Mark Moir
Sun Microsystems Laboratories
1 Network Drive
Burlington, MA 01803, USA
mark.moir@sun.com
Victor Luchangco
Sun Microsystems Laboratories
1 Network Drive
Burlington, MA 01803, USA
victor.luchangco@sun.com
William N. Scherer III
Department of Computer Science
University of Rochester
Rochester, NY 14620, USA
scherer@cs.rochester.edu
ABSTRACT
We propose a new form of software transactional memory
(STM) designed to support dynamic-sized data structures,
and we describe a novel non-blocking implementation. The
non-blocking property we consider is
obstruction-~eedom.
Obstruction-freedom is weaker than lock-freedom; as a re-
sult, it admits substantially simpler and more efficient im-
plementations. A novel feature of our obstruction-free STM
implementation is its use of modular contention managers to
ensure progress in practice. We illustrate the utility of our
dynamic STM with a straightforward implementation of an
obstruction-free red-black tree, thereby demonstrating a
so-
phisticated
non-blocking dynamic data structure that would
be difficult to implement by other means. We also present
the results of simple preliminary performance experiments
that demonstrate that an "early release" feature of our STM
is useful for reducing contention, and that our STM lends
itself to the effective use of modular contention managers.
1. INTRODUCTION
Using locks in programs for shared-memory multiproces-
sors introduces well-known software engineering problems.
Coarse-grained locks, which protect relatively large amounts
of data, generally do not scale: threads block one another
even when they do not really interfere, and the lock be-
comes a source of contention. Fine-grained locks can miti-
gate these scalability problems, but they introduce software
engineering problems as the locking conventions for guarart-
teeing correctness and avoiding deadlock become complex
and error-prone. Locks also cause vulnerability to thread
Copyright is held by Sun Micros~tems, Inc.
PODC'03,
July 13-16, 2003,
Boston, Massachusetts,
USA.
ACM 1-58113-708-7/03/0007.
failures and delays. For example, a thread preempted while
holding a lock will obstruct other threads.
Dynamic Software Transactional Memory
(DSTM) is a
low-level application programming interface (API) for syn-
chronizing shared data without using locks. A
transaction
is a sequence of steps executed by a single thread. Transac-
tions are
atomic:
each transaction either
commits
(it takes
effect) or
aborts
(its effects are discarded). Transactions
are
linearizable
[9]: they appear to take effect in a one-at-
a-time order. Transactional memory supports a computa-
tional model in which each thread announces the start of
a transaction, executes a sequence of operations on shared
objects, and then tries to commit the transaction. If the
commit succeeds, the transaction's operations take effect;
otherwise, they are discarded. Although transactional mem-
ory was originally proposed as a hardware architecture [8],
there have been several proposals for non-blocking 1 software
transactional memory (STM) and similar constructs [3, 4,
10, 13, 14, 15].
We present the first
dynaraic
STM. Prior STM designs
required both the memory usage and the transactions to be
defined statically in advance. In contrast, our new DSTM
allows transactions and transactional objects to be created
dynamically, and transactions may determine the sequence
of objects to access based on the values observed in objects
accessed earlier in the same transaction. As a result, DSTM
is well suited to the implementation of dynamic-sized data
structures such as lists and trees.
We have developed prototype implementations of DSTM
in the C++ and Java
TM
programming languages. In this
paper, we focus on the Java version, which is considerably
simpler because there is no need for explicit memory man-
agement. Our Java implementation uses an experimental
prototype of Doug Lea's j ava. util. concurrent package [1]
to call native compare-and-swap (CAS) operations.
1We use "non-blocking" broadly to include all progress con-
ditions requiring that the failure or indefinite delay of a
thread cannot prevent other threads from making progress,
rather than as a synonym for "lock-free", as some authors
prefer.
92

Much of the simplicity of our implementation is due to
our choice of non-blocking progress condition. A synchro-
nization mechanism is
obstruction-free
[7] if any thread that
runs by itself for long enough nmkes progress (which implies
that a thread makes progress if it runs for long enough with-
out encountering a synchronization conflict from a concur-
rent thread). Like stronger non-blocking progress conditions
such as lock-freedom and wait-freedom, obstruction-freedom
ensures that a halted thread cannot prevent other threads
from making progress.
Unlike lock-freedom, obstruction-freedom does not rule
out
livelock;
interfering concurrent threads may repeatedly
prevent one another from making progress. Livelock is, of
course, unacceptable. Nonetheless, we believe that there is
great benefit in treating the mechanisms that ensure progress
as a matter of policy, evaluated by their empirical effective-
ness tbr a given application and execution environment. As
demonstrated here and elsewhere [7, 11], compared to lock-
freedom, obstruction-freedom admits substantially simpler
implementations that are more efficient in the absence of
synchronization conflicts among concurrent threads.
Obstruction-freedom also allows simple schemes for pri-
oritizing transactions because it allows any transaction to
abort any other transaction at. any time. In particular, a
high-priority transaction may always abort a low-priority
transaction. In a lock-based approach, the high-priority
transaction would be blocked if the low-priority transac-
lion held a lock that the high-priority transaction required,
resulting in priority inversion and intricate schemes to cir-
cumvent this inversion. On the other hand, in a lock-free
implementation, the high-priority transaction may have to
help the low-priority transaction complete in order to ensure
that some transaction will complete.
Our obstruction-free DSTM implementation provides a
simple open-ended mechanism ~r guaranteeing progress and
prioritizing transactions. Specifically, one transaction can
detect that it is about abort another before it does so. In
this case, it consults a
contention manager
to determine
whether it should abort the other transaction immediately
or wait for some time to allow the other transaction a chance
to complete. Contention managers in our implementation
axe modular: various contention management schemes can
be implemented and "plugged in" without affecting the cor-
rectness of the transaction code. Thus we can design, imple-
ment and verify an obstruction-free data structure once, and
then vary the contention managers to provide the desired
progress guarantees and transaction prioritization. These
contention managers can exploit information about time,
operating systems services, scheduling, hardware environ-
ments, and other details about; the system and execution
environment, as well as programmer-supplied information.
These practical sources of information have been largely ne-
glected in the literature on lock-free synchronization. We
believe that this approach will yield simpler and more effi-
cleat concurrent data structures, which will help accelerate
their widespread acceptance and deployment.
Section 2 illustrates the use of DSTM through a series of
simple examples. To evaluate the utility of DSTM for im-
plementing complex data structures, we have also used it
to implement an obstruction-free red-black tree. As far as
we are aware, this red-black tree is the most complex non-
blocking data structure achieved to date. Although our ira-
plementation is a reasonably straightforward transformation
of a sequential implementation [6], it would be very difficult
to construct such a non-blocking implementation from first
principles. Indeed, it would be difficult to implement even
a lock-based red-black tree that allows operations accessing
different parts of the tree to proceed in parallel.
Section 3 describes how our STM detects synchronization
conflicts and how transactions commit and abort, with an
emphasis on how the obstruction-free property simplifies the
underlying algorithm. In Section 4, we describe how our im-
plementation interfaces with contention managers, which are
responsible for ensuring progress. Section 5 describes some
simple experiments conducted with our prototype DSTM
implementation. Concluding remarks appear in Section 6.
Code for our DSTM implementation, contention managers,
and related experiments is publicly available [2].
2. OVERVIEW AND EXAMPLES
In this section, we illustrate the use of DSTM through
a series of simple examples. DSTM manages a collection
of
transactional objects,
which are accessed by
transactions.
A transaction is a short-lived, single-threaded computation
that either
commits
or
aborts.
A transactional object is a
container for a regular Java object. A transaction can access
the contained object by
opening
the transactional object,
and then reading or modifying the regular object. Changes
to objects opened by a transaction are not seen outside the
transaction until the transaction commits. If the transaction
commits, then these changes take effect; otherwise, they are
discarded.
Transactional objects can be created dynamically at any
time. The creation and initialization of a transactional ob-
ject is not performed as part of any transaction.
Concretely, the basic unit of parallel computation is the
TMThread class, which extends regular Java threads. Like a
regular Java thread, it provides a run() method that does
all the work. In addition, the TMThread class provides addi-
tional methods for starting, committing or aborting trans-
actions, and for checking on the status of a transaction.
Threads can be created and destroyed dynamically.
Transactional objects are implemented by the TM0bject
class. To implement an atomic counter, one would create
a new instance of a Counter class (not shown), and then
create a TM0bject to hold it:
Counter counter
= new Counter(0);
TM0bject tm0bject = new TM0bject(counter) ;
Any class whose objects may be encapsulated within a
transactional object must implement the TMCloneable in-
terface. This interface requires the object to export a public
clone O method that returns a new, logically disjoint copy
of the object. DSTM uses this method when opening trans-
actional objects, as described below. (DSTM guarantees
that the object being cloned does not change during the
cloning, so no synchronization is necessary in the clone()
method.)
A thread calls beginTransaction () to start a transaction.
Once it is started, a transaction is
active
until it is either
committed or aborted.
While it is active, a transaction can access the encapsu-
lated counter by calling open():
Counter counter
=
(Counter)tm0bject.open(WRITE)
;
counter.inc(); // increment the counter
93

The argument to open() is a constant indicating that the
caller may modify the object. The open() method returns a
copy of the encapsulated regular Java object 2 created using
that object's clone() method; we call this copy the trans-
action's version.
The thread can manipulate its version of an object by
calling its methods in the usual way. DSTM guarantees
that no other thread can access this version, so there is no
need for further synchronization.
Note that a transaction's version is meaningful only dur-
ing the lifetime of the transaction. References to versions
should not be stored in other objects; only references to
transactional objects are meaningful across transactions.
A thread attempts to commit its transaction by invoking
commitTransaction(), which returns t~te if and only if the
commit is successful. A thread may also abort its transac-
tion by invoking abortTransaction().
We guarantee that successfully committed transactions
are linearizable: they appear to execute in a one-at-a-time
order. But what kind of consistency guarantee should we
make for a transaction that eventually aborts? One might
argue that it does not matter, as the transaction's changes to
transactional objects are discarded anyway. However, syn-
chronization conflicts could cause a transaction to observe
inconsistencies among the objects it opens before it aborts.
For example, while a transaction T is executing, another
transaction might modify objects that T has already ac-
cessed as well as objects that T will subsequently access. In
this case, T will see only partial effects of that transaction.
Because transactions should appear to execute in isolation,
observing such inconsistencies may cause a transaction to
have unexpected side-effects, such as dereferencing a null
pointer, array bounds violations, and so on.
DSTM addresses this problem by validating a transac-
tion whenever it opens a transactional object. Validation
consists of checking for synchronization conflicts, that is,
whether any object opened by the transaction has since
been opened in a conflicting mode by another transaction.
If a synchronization conflict has occurred, open() throws a
Denied exception instead of returning a value, indicating to
the transaction that it cannot successfully commit in the
fllture. The set of transactional objects opened before the
first such exception is guaranteed to be consistent: open()
returns the actual states of the objects at some recent in-
stant. (Throwing an exception also allows the thread to
avoid wasting effort by continuing the transaction.)
Ultimately, we would like DSTM to support nested trans-
actions, so that a class whose methods use transactions can
invoke from within a transaction methods of other classes
that also use transactions. However, we have not acquired
sufficient experience programming with DSTM to decide on
the appropriate nesting semantics, so we do not specify this
behavior for now. 3
2.1 Extended Example
Consider a linked list whose values are stored in increasing
order. We will use this list to implement an integer set (class
2The
open()
method actually returns an object of class
java.lang.0bject, which we must explicitly cast back to
class
Counter.
3Our implementation does support a rudimentary form of
nested transactions, but we do not use it in any of the ex-
amples discussed in this paper.
public class IntSet {
private TM0bject first;
class
List implements TMCloneable {
int value;
TM0bject next;
List(int v) {
this.value = v;
}
public Object clone() {
List newList = new List(this.value);
newList.next = this.next;
return newList;
}
public IntSet() {
List firstList = new List(Integer.MIN_VALUE);
this.first = new TM0bject(firstList);
firstList.next =
new
TM0bject(new List(Integer.MAX_VALUE));
}
public boolean insert(int v) {
List newList = new List(v);
TM0bject newNode = new TM0bject(newList);
TMThread thread =
(TMThread)Thread.currentThread();
while (true) {
thread.beginTransaction();
boolean result =
true;
try {
List prevList =
(List)this.first.open(WRITE);
List currList =
(List)prevList.next.open(WRITE);
while (eurrList.value < v) {
prevList = currList;
currList =
(List)currList.next.open(WRITE);
}
if (currList.value == v) {
result
= false;
} else {
result = true;
newList.next = prevList.next;
prevList.next = newNode;
}
} catch (Denied d){}
if (thread.commitTransaction())
return result;
}
Figure 1: Integer Set Example
IntSet) that provides insert(), delete(), and member()
methods. Relevant code excerpts are shown in Figure 1.
The IntSet class uses two types of objects: nodes and list
elements; nodes are transactional objects (class TMObject)
that contain list elements (class List), which are regular
Java objects. The List class has the following fields: value
is the integer value, and next is the TM0bject containing the
next list element. We emphasize that next is a TM0bject,
not a list element, because this field must be meaningful
across transactions. Because list elements are encapsulated
within transactional objects, the List class implements the
94

TMCloneable interface, providing a public clone() method.
The IntSet constructor allocates two sentinel nodes, con-
taining list elements holding the. minimum and maximum in-
teger values (which we assume are never inserted or deleted).
For brevity, we focus on insert(). This method takes an
integer value; it returns
true
if the insertion takes place, and
false
if the value was already in the set. It first creates a
new list element to hold the integer argument, and a new
node to hold that list element. It then repeatedly retries
the following transaction until :it succeeds. The transaction
traverses the list, maintaining a "current" node and a "pre-
vious" node. At the end of the traversal, the current node
contains the smallest value in the list that is greater than or
equal to the value being inserted. Depending on the value of
the current node, the transaction either detects a duplicate
or inserts the new node between the previous and current
nodes, and then tries to commit. If the commit succeeds,
the method returns; otherwise, it resumes the loop to retry
the transaction.
An attractive feature of DSTM is that we can reason
about this code almost as if it were sequential. The principal
differences are the need to catch
Denied
exceptions and to
retry transactions that fail to commit, and the need to dis-
tinguish between transactional :nodes and non-transactional
list elements. Note that after catching a Denied exception,
we must still call commitTransaction() to terminate the
transaction, even though it is guaranteed to fail.
2.2 Conflict Reduction
A transaction A will typically fail to commit if a con-
cnrrent transaction B opens an object already opened by
A. Ultimately, it is the responsibility of the contention
manager (discussed in Section 4) to ensure that conflict-
ing transactions eventually do not overlap. Even so, the
IntSet implementation just described introduces a number
of unnecessary conflicts. For example, consider a transac-
tion that calls member O to test whether a particular value is
in the set, running concurrently with a transaction that calls
insert () to insert a larger value. One transaction will cause
the other to abort, since they will conflict on opening the
first node of the list. Such a conflict is unnecessary, however,
because the transaction inserting the value does not modify
any of the nodes traversed by the other transaction. Design-
ing the operations to avoid such conflicts reduces the need
for contention management, and thereby generally improves
performance and scalability.
DSTM provides several mechanisms for eliminating un-
needed conflicts. One conventional mechanism is to allow
transactions to open nodes in read-only mode, indicating
that the transaction will not modify the object.
List list = (List)node.open(READ);
Concurrent transactions that open the same transactional
object for reading do not conflict. Because it is often diffi-
cult, especially in the face of aliasing, for a transaction to
keep track of the objects it has opened, and in what mode
each was opened, we allow a transaction to open an object
several times, and in different modes.
The revised insert() (not shown) method walks down
the list in read-only mode until it identifies which nodes to
modify. It then "upgrades" its access from read-only to reg-
ular access by reopening that transactional object in WRITE
mode. Read-only access is particularly useful for navigating
public boolean delete(int v) {
TMThread thread =
(TMThread) Thread. currentThread O ;
while (true) {
thread.beginTransaction();
boolean result = true;
try {
TM0bject lastNode = null;
TM0bject prevNede = this.first;
List prevList = (List)prevNede.open(READ);
List currList =
(List)prevList.next.epen(RFEAD);
while (currList.value < v) {
if (lastNode != null)
lastNede.release();
lastNode = prevNode;
prevNode = prevList.next;
prevList = currList;
currList = (List)currList.next.open(READ);
}
if (currList.value != v) {
result = false;
} else {
result = true;
prevList = (List)prevNode.open(WRITE);
prevList.next.open(WRITE);
prevList.next = currList.next;
}
} catch (Denied d){}
if (thread.commitTransaction())
return result;
Figure 2: Delete method with early
release
through tree-like data structures where all transactions pass
through a common root, but most do not modify the root.
DSTM also provides a novel and more powerful (and more
dangerous!) way to reduce conflicts. Before it commits, a
transaction may
release
an object that it has opened in
READ mode by invoking the release() method. Once an
object has been released, other transactions accessing that
object do not conflict with the releasing transaction over the
released object. The programmer must ensure that subse-
quent changes by other transactions to released objects will
not violate the linearizability of the releasing transaction.
The danger here is similar to the problem mentioned earlier
to motivate validation; releasing objects from a transaction
causes future validations of that transaction to ignore the
released objects. Therefore, as before, a transaction can
observe inconsistent state. The effects in this case are po-
tentially even worse because that transaction can actually
commit, even though it is not linearizable.
In our IntSet exarnple, releasing nodes is useful for nav-
igating through the list with a minimum of conflicts, as
shown in Figure 2. As a transaction traverses the list, open-
ing each node in READ mode, it releases every node before its
prey node. 4 A transaction that adds an element to the list
"upgrades" its access to the node to be modified by reopen-
ing that node in WRITE mode. A transaction that removes
an element from the list opens in WRITE mode both the node
to be modified and the node to be removed. It is easy to
check that these steps preserve linearizability.
Because a transaction may open the same object several
times, the DSTM matches, for each object, invocations of
4This is analogous to the technique of lock coupling (see [5],
e.g.), but of course does not use any locks.
95

aborted
T.Ob e )
1
Figure 3: Transactional object structure
release() with invocations of open(READ); an object is not
actually released until release() has been invoked as many
times as open(KEAD) for that object. "Objects opened in
WRITE mode by a transaction cannot be released before the
transaction commits; if a transaction opens an object in
READ mode and then "upgrades" to WRITE mode, subsequent
requests to release the object are silently ignored.
Clearly, the release facility must be used with care; care-
less use may violate transaction linearizability. Nevertheless,
we have found it useful for designing shared pointer-based
data structures such as lists and trees, in which a transaction
reads its way through a complex structure.
3. IMPLEMENTATION
We now describe our DSTM implementation. A
transac-
tion
object (class Transaction) has a status field that is
initialized to be ACTIVE, and is later set to either COMMITTED
or AB01:tTED using a CAS instruction. 5 (CAS functionality is
provided by the AtemicRe:f class in the experimental proto-
type of Doug Lea's java.util, concurrent package [1].)
3.1 Opening a Transactional Object
Recall that a transactional object (class TM0bject) is a
container for a regular Java object, which we call a
version.
Logically, each transactional object has three fields:
transaction points to the transaction that most re-
cently opened the transactional object in WRITE mode;
old0bject points to an
old object version;
and
new0bject points to a
new object version.
The
current
(i.e., most recently committed) version of a
transactional object is determined by the status of the trans-
action that most recently opened the object in WRITE mode.
If that transaction is committed, then the new object is the
current version and the old object is meaningless. If the
transaction is aborted, then the old object is the current
version and the new object is meaningless. If the transac-
tion is active, then the old object is the current version, and
the new object is the active transaction's tentative version.
This version will become current if the transaction com-
mits successfully; otherwise, it will be discarded. Observe
that, if several transactional objects have most recently been
opened in WRITE mode by the same active transaction, then
changing the status field of that transaction from ACTIVE
5A CAS (a, e,n) instruction takes three parameters: an ad-
dress a, an expected value e, and a new value n. If the value
currently stored at address a matches the expected value e,
then CAS stores the new value n at address a and returns
true;
we say that the CAS
succeeds
in this case. Otherwise,
CAS returns
false
and does not modify the memory; we say
that the CAS
fails
in this case.
Figure 4: Opening
transactional object after recent
commit
T
Figure 5: Opening transactional object after recent
abort
to COMMITTED atomically changes the current version of each
respective object from its old version to its new version; 6
this is the essence of how atomic transactions are achieved
in our implementation.
The interesting part of our implementation is how a trans-
action can safely open a transactional object without chang-
ing its current version (which should occur only when the
transaction successfully commits). To achieve this, we need
to atomically access the three fields mentioned above. How-
ever, current architectures do not generally provide hard-
ware support for such atomic updates. Therefore, we in-
troduce a level of indirection, whereby each TMObject has a
single reference field start that points to a Locator object
(Figure 3). The Locator object contains the three fields
mentioned above: transaction points to the transaction
that created the Locator, and old0bject and new0bject
point to the old and new object versions. This indirection
allows us to change the three fields atomically by calling
CAS to swing the start pointer from one Locator object to
another.
We now explain in more detail how transaction A opens
a TM0bject in WRITE mode. Let B be the transaction that
most recently opened the object in WRITE mode. A prepares
a new Locator object with transaction set to A. Suppose
B is committed. A sets the new locator's old0bject field
6Because objects opened for reading by a transaction that
successfully commits can change after the transaction suc-
cessfully validates them but before it executes the CAS that
changes its status, the transaction is linearized to the invoca-
tion of the commit, not to the point that the CAS succeeds.
This point is subtle and we defer a complete discussion and
the proof of linearizability to the full version of this paper.
96

Citations
More filters
Book
Maurice Herlihy1
14 Mar 2008
TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.
Abstract: Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed.As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarse-grained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problem.Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This talk will survey the area, with a focus on open research problems.

1,268 citations

Proceedings ArticleDOI
30 Sep 2008
TL;DR: This paper introduces the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems and uses the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.
Abstract: Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

934 citations


Cites background or methods from "Software transactional memory for d..."

  • ...Additional performance can be achieved in the program by using early-release [19], as described in [40]....

    [...]

  • ...To benefit from the increasing number of cores per chip, application developers have to develop parallel programs and deal with cumbersome issues such as synchronization tradeoffs, deadlock avoidance, and races....

    [...]

Journal Article
TL;DR: This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique, which is ten-fold faster than a single lock.
Abstract: The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any systems memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available.

893 citations


Cites methods from "Software transactional memory for d..."

  • ...We present here a set of microbenchmarks that have become standard in the community [25], comparing a sequential red-black tree made concurrent using various algorithms representing state-of-the-art non-blocking [6] and lock-based [5, 18] STMs....

    [...]

Book ChapterDOI
18 Sep 2006
TL;DR: TL2 as mentioned in this paper is a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique, which is ten times faster than a single lock.
Abstract: The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any system's memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available.

891 citations

Proceedings ArticleDOI
05 Mar 2011
TL;DR: A lightweight, high-performance persistent object system called NV-heaps is implemented that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about.
Abstract: Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent.We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.

850 citations

References
More filters
Book
01 Jan 1990
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Abstract: From the Publisher: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures. Like the first edition,this text can also be used for self-study by technical professionals since it discusses engineering issues in algorithm design as well as the mathematical aspects. In its new edition,Introduction to Algorithms continues to provide a comprehensive introduction to the modern study of algorithms. The revision has been updated to reflect changes in the years since the book's original publication. New chapters on the role of algorithms in computing and on probabilistic analysis and randomized algorithms have been included. Sections throughout the book have been rewritten for increased clarity,and material has been added wherever a fuller explanation has seemed useful or new information warrants expanded coverage. As in the classic first edition,this new edition of Introduction to Algorithms presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers. Further,the algorithms are presented in pseudocode to make the book easily accessible to students from all programming language backgrounds. Each chapter presents an algorithm,a design technique,an application area,or a related topic. The chapters are not dependent on one another,so the instructor can organize his or her use of the book in the way that best suits the course's needs. Additionally,the new edition offers a 25% increase over the first edition in the number of problems,giving the book 155 problems and over 900 exercises thatreinforcethe concepts the students are learning.

21,651 citations

Journal ArticleDOI
TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Abstract: A concurrent object is a data object shared by concurrent processes. Linearizability is a correctness condition for concurrent objects that exploits the semantics of abstract data types. It permits a high degree of concurrency, yet it permits programmers to specify and reason about concurrent objects using known techniques from the sequential domain. Linearizability provides the illusion that each operation applied by concurrent processes takes effect instantaneously at some point between its invocation and its response, implying that the meaning of a concurrent object's operations can be given by pre- and post-conditions. This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.

3,396 citations


"Software transactional memory for d..." refers background in this paper

  • ...Coarse-grained locks, which protect relatively large amounts of data, generally do not scale: threads block one another even when they do not really interfere, and the lock be- comes a source of contention....

    [...]

Proceedings ArticleDOI
01 May 1993
TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Abstract: A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock-free synchronization as efficient (and easy to use) as conventional techniques based on mutual exclusion. Transactional memory allows programmers to define customized read-modify-write operations that apply to multiple, independently-chosen words of memory. It is implemented by straightforward extensions to any multiprocessor cache-coherence protocol. Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.

2,406 citations


"Software transactional memory for d..." refers background in this paper

  • ...Victor Luchangco Sun Microsystems Laboratories 1 Network Drive Burlington, MA 01803, USA victor.luchangco@sun.com William N. Scherer III Department of Computer Science University of Rochester Rochester, NY 14620, USA scherer@cs.rochester.edu failures and delays....

    [...]

Proceedings ArticleDOI
01 May 1996
TL;DR: Experiments on a 12-node SGI Challenge multiprocessor indicate that the new non-blocking queue consistently outperforms the best known alternatives; it is the clear algorithm of choice for machines that provide a universal atomic primitive (e.g., compare_and_swap or load_linked/store_conditional).
Abstract: Drawing ideas from previous authors, we present a new non-blocking concurrent queue algorithm and a new two-lock queue algorithm in which one enqueue and one dequeue can proceed concurrently. Both algorithms are simple, fast, and practical; we were surprised not to find them in the literature. Experiments on a 12-node SGI Challenge multiprocessor indicate that the new non-blocking queue consistently outperforms the best known alternatives; it is the clear algorithm of choice for machines that provide a universal atomic primitive (e.g., compare_and_swap or load_linked/store_conditional). The two-lock concurrent queue outperforms a single lock when several processes are competing simultaneously for access; it appears to be the algorithm of choice for busy queues on machines with non-universal atomic primitives (e.g., test_and_set). Since much of the motivation for non-blocking algorithms is rooted in their immunity to large, unpredictable delays in process execution, we report experimental results both for systems with dedicated processors and for systems with several processes multiprogrammed on each processor.

939 citations


"Software transactional memory for d..." refers background in this paper

  • ...about the ABA problem [8], in which a CAS operation fails to notice that the location it accesses has changed to a new value and then back to the original value, causing the CAS to succeed...

    [...]

Journal ArticleDOI
TL;DR: STM is used to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction, and outperforms Herlihy’s translation method for sufficiently large numbers of processors.
Abstract: As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load–Linked/Store–Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load–Linked/Store–Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that our method always outperforms the non-blocking translation methods in the style of Barnes, and outperforms Herlihy’s translation method for sufficiently large numbers of processors. The key to the efficiency of our software-transactional approach is that unlike Barnes style methods, it is not based on a costly “recursive helping” policy.

880 citations


"Software transactional memory for d..." refers background in this paper

  • ...3.2 Validating and Committing a Transaction After open() has determined which version of an object to return, and before it actually returns that version, the DSTM must validate the calling transaction in order to en- sure that the user transaction code can never observe an inconsistent state....

    [...]

  • ...In Proceedings of the 13th Annual ACM Symposium on Principles of Distributed Computing, pages 151-160, 1994....

    [...]

  • ...In Proceedings of the 20th International Symposium in Computer Architecture, pages 289-300, 1993....

    [...]

  • ...Journal of Parallel and Distributed Computing, 51(1):1-26, 1998....

    [...]

  • ...In Proceedings of the 27th Annual ACM Symposium on Theory of Computing, pages 538-547, 1995....

    [...]