Revocation techniques for Java concurrency

doi:10.1002/CPE.1008

Journal Article•DOI•

Revocation techniques for Java concurrency

Adam Welc¹, Suresh Jagannathan¹, Antony L. Hosking¹•Institutions (1)

01 Oct 2006-Concurrency and Computation: Practice and Experience (John Wiley & Sons, Ltd.)-Vol. 18, Iss: 12, pp 1613-1656

TL;DR: Two approaches to managing concurrency in Java using a guarded region abstraction are proposed, one of which extends the functionality of revocable monitors by implementing guarded regions as lightweight transactions that can be executed concurrently (or in parallel on multiprocessor platforms).

read less

Abstract: This paper proposes two approaches to managing concurrency in Java using a guarded region abstraction. Both approaches use revocation of such regions—the ability to undo their effects automatically and transparently. These new techniques alleviate many of the constraints that inhibit construction of transparently scalable and robust concurrent applications. The first solution, revocable monitors, augments existing mutual exclusion monitors with the ability to dynamically resolve priority inversion and deadlock, by reverting program execution to a consistent state when such situations are detected, while preserving Java semantics. The second technique, transactional monitors, extends the functionality of revocable monitors by implementing guarded regions as lightweight transactions that can be executed concurrently (or in parallel on multiprocessor platforms). The presentation includes discussion of design and implementation issues for both schemes, as well as a detailed performance study to compare their behavior with the traditional, state-of-the-art implementation of Java monitors based on mutual exclusion. Copyright © 2006 John Wiley & Sons, Ltd.

...read moreread less

Summary (10 min read)

Jump to: [1. Introduction] – [2. Revocable monitors: Overview] – [3. Revocable monitors: Design] – [3.1. The Java memory model (JMM)] – [3.2. Preserving JMM-consistency] – [4. Revocable monitors: Implementation] – [4.1. Monitor roll-back] – [4.1.1. Bytecode transformation] – [4.1.2. Compiler and run-time modifications] – [4.1.3. Discussion] – [4.2. Priority inversion avoidance] – [5. Revocable monitors: Experiments] – [5.1. Benchmark program] – [5.2. Results] – [6. Transactional monitors: Overview] – [7. Transactional monitors: Design] – [8. Transactional monitors: Implementation] – [8.1. Low-contention concurrency] – [8.1.1. Initialization] – [8.1.2. Read and write barriers] – [8.1.3. Conflict detection] – [8.1.4. Monitor exit] – [8.2. High-contention concurrency] – [8.2.1. Initialization] – [8.2.2. Read and write barriers] – [8.2.3. Conflict detection] – [8.2.4. Monitor exit] – [8.3.1. Native methods] – [8.3.2. Existing synchronization mechanisms] – [8.3.3. Wait-notify] – [9. Transactional monitors: Experiments] – [9.1. The OO7 benchmark] – [9.2. Measurements] – [9.3. Results] – [10. Related work] and [11. Conclusions]

1. Introduction

Managing complexity is a major challenge in constructing robust large-scale server applications (such as database management systems, application servers, airline reservation systems,etc).
Race conditionsare a serious issue for non-trivial concurrent programs.
Some instances of deadlock may be resolvedby revocation.
These extend the functionality of revocable monitors by implementing guarded regions as lightweight transactions that can be executed concurrently (or in parallel on multiprocessor platforms).

2. Revocable monitors: Overview

There are several ways to remedy erroneous or undesirable behavior in concurrent programs.
Priority ceiling and priority inheritance solve anu bounded priority inversionproblem, illustrated using the code fragment in Figure 1 (bothTl andTh execute the same code and methodsfoo and bar contain an arbitrary sequence of operations).
The priority ceiling technique raises the priority of any locking thread to the highest priority of any thread that ever uses that lock (ie, its priority ceiling).
In Figure 4(d) the same thread entersmon2, updates objecto2 and attempts to enter monitormon1.
Note however, that while revocable monitors are unable to assist in resolving schedule-independent deadlocks, the final observable effect of the resultinglivelock (ie, repeated attempts to resolve the deadlock situation via reocation) is the same for deadlock – none of the threads will make progress.

3. Revocable monitors: Design

One of the main principles underlying the design of revocable monitors is acompliance requirement: programmers must perceive all programs executing in their system to behave exactly the same as on all other platforms implemented according to the specificationof a given language.
In order to achieve this goal the authors must adhere to the execution semantics of the language and follow the memory access rules specified by those semantics.
The authors approach is inspired by optimistic concurrency controlp tocols [29].
In the read phase all updates are redirected to the log, the validation phase verifies the integrity of all data accessed during the entire execution, and the write phase atomically installs all updates into the shared space.
It is only when a monitor is revoked that the information from the log is used to roll back changes performed by a thread executing that monitor.

3.1. The Java memory model (JMM)

For single-threaded execution the happens-before relation is defined by program order.
As a consequence, it is possible that partial results computed by some threadT executing within monitorM become visible to (and are used by) another threadT ′ even before threadT releasesM if accesses to those updated objects performed byT ′ are not mediated by first acquiringM.
For the execution presented in Figure 7vol is a volatile variable and edges depict a happens-before relation.
As in the previous example, the execution is JMM-consistent up to the roll-back point because a read performed byT ′ is allowed, but the roll-back would violate consistency.

3.2. Preserving JMM-consistency

The authors might trace read-write dependncies among all threads and upon rollback of a monitor trigger a cascade of roll-backs for threadswhose read-write dependencies are violated.
An obvious disadvantage of this approach is the need to considerall operations (including non-monitored ones) for a potential roll-back.
Without the ability to restore the full execution context ofT ′, the subsequent roll-back of monitorinner by that thread becomes infeasible.
The solution that does seem flexible enough to handle all possible problematic cases, and simple enough to avoid using complex analyses and/or maintaining significant additional meta-data, is to disable the revocability of monitors whose roll-back couldcreate inconsistencies with respect to the JMM.
If the arriving thread itself holds an outer monitor then it now becomes associated with the monitor.

4. Revocable monitors: Implementation

To demonstrate the validity of their approach, the authors base their imple entation on a well-known Java execution environment with a high-quality compiler.
The authors useIBM’s Jikes RVM [3], a state-of-the-art ∗The write may additionally be guarded by other monitors nested withinM.
†A monitor object associated with the receiver object is released upon a call towait and re-acquired after returning from the call.
Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.
Prepared usingcpeauth.cls research virtual machine (VM) for Java with performance comparable to many production VMs.

4.1. Monitor roll-back

The authors implementation uses bytecode rewriting∗ to save program state (values of local variables and method parameters) for re-execution and the existing exception mechanisms to return control to the beginning of the synchronized block.
The authors modify the compiler and run-time system to suppress generation (and invocation) of undesirable exception handlers during a roll-back operation, to insert access “barriers”† for logging and to revert updates performed up to revocationof a synchronized block.

4.1.1. Bytecode transformation

There exist two different synchronization constructs in Java: synchronized methods and synchronized blocks.
For each synchronized method the authors create a non-synchronized wrapper with a signature identical to the original method.
The roll-back exception is thrown internally by the VM (see below), but the code to catch it is injected into the bytecode.
Otherwise, th handler re-throws theRollback exception to the enclosing synchronized block.
Note that their solution does not preclude the use of languages that do not have a similar intermediate repr sentation – the authors could use source-code rewriting instead.

4.1.2. Compiler and run-time modifications

The roll-back operation is initiated by throwing aRollback exception.
To handle this, the authors augment exception handling to ignore all handlers (includingfinally blocks) that do not explicitly catch theRollback exception, when it is thrown.
Roll-back relies on information collected within the compiler inserted write barriers.
For object and array stores, three values are recorded: the target object or array, the offset of the modified field or array slot, and the previous (old) value in that field/slot.
Upon monitor revocation information stored in the log is used to undo updates to shared data performed by the thread executing this monitor.

4.1.3. Discussion

Instead of using bytecode transformations, the authors note that an altern tive strategy might be to implement re-execution entirely at the VM level (ie, all the code modifications necessary to support roll-backs would only involve the compiler and the Java run-time system).
This approach simply requires that the current state of the thread (ie, contents of local variables, non-volatile registers, stack pointer, etc) be remembered upon entry to a synchronized block, and restored when a roll-back is required.
Unfortunately, this strategy has the significant drawback that it introduces implicit control-flow edges in the program’s control-flow graph that are not visible to the compiler.
Liveness information necessary for the garbage collector may be computed incorrectly, since a roll-back action Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.
Unfortunately, in the absence of any compiler modifications, the built-in exception handling mechanism may execute an arbitrary number of other user-defined exception handlers and finalizers, violating the transparency of their design.

4.2. Priority inversion avoidance

A thread acquiring a monitor deposits its priority in the header of the monitor object.
Before another thread can acquire the monitor, the scheduler checks whether its own priority is higher than the priority of the thread currently executing within the synchronized block.
The Jikes RVM does not include a priority scheduler; threadsare scheduled in a round-robin order.
This does not affect the generality of their solution nor does it invalidate the results obtained, since the problems solved by their mechanisms cannot be solved simply byusing a priority scheduler.

5. Revocable monitors: Experiments

The authors quantify the overhead of the revocable monitors mechanism using a detailed micro-benchmark.
The authors measure executions that exhibit priority inversion to verify if the increased overheads induced by their implementation are mitigated by higher overall throughputof high-priority threads.
The experiments are performed for a uni-processor system, since revocable monitors do nothing to increase concurrency in applications, so applications will exhibit no more parallelism using revocable monitors on multiprocessors than they would using non-revocable monitors.
The authors results Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.
Prepared usingcpeauth.cls quantify this sacrifice of total throughput to be approximately 30%, while throughput for high-priority threads improves by 25% to 100%.

5.1. Benchmark program

The micro-benchmark executes several low and high-priority threads contending on the same lock.
The remaining parameters for their benchmark include: .
The benchmark code executed on this VM is compiled using the Jikes RVM optimizing compiler without any modification.
The authors also record the impact that their solution has on the overall elapsed time of the entire micro-benchmark, including low-priority elapsed times: this is simply the difference between the end time-stamp of the last thread to finish and the begin time-stamp of the first thread to start, regardless of priority.

5.2. Results

Figures 9 and 10 plot elapsed times for high priority threadsexecuted on both the modified VM (indicated by a solid line) and unmodified VM (indicated by a dotted line), normalized with respect to the configuration executing 100% reads on an unmodified VM using tandard non-revocable monitors.
If the authors discard the configuration where there are eight high-priority threads competing with only two low-priority ones, the average elapsed time of a high-priority thread is half that of the reference implementation.
As expected, if the number of write operations within a synchronized block is sufficiently large, the overhead of logging and roll-backs may start outweighinpotential benefit.
If the authors disallowed revocability Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.
Since their focus is on lowering elapsed times for high priority threads, the authors consider the impact on overall elapsed time (on average 30% higher on the modified VM) to be acceptable.

6. Transactional monitors: Overview

Revocable monitors solve one class of problems related to wri ing concurrent programs, but because of the compliance requirement with respect to Java’s execution semantics and memory model they are still required to use mutual exclusion as an underlying synchronization mechanism.
Transactional monitors permit concurrent execution within the monitor so long as the effects of the resulting schedule ares rializable: the effects of concurrent execution of the monitor are equivalent to somelegal serial schedule that would arise if no interleaving ofthe actions of different threads occurred within the guarded region.
The interleaving presented in Figure 14 results in both threads successfully completing their executions – it preserves serializability sinceT ′’s withdrawal from the checking account does not compromiseT ’ read from the savings account.
Isolation and consistency imply that shared state appears unchanged by other threads.

7. Transactional monitors: Design

Transactional monitors maintain serializability by tracking all accesses to shared data performed during the read phase within a thread-specificlog .
A log is invalidated if committing its changes would violate serializability of actions performed in the monitored region.
One of the most important principles underlying their design is transparency of the transactional monitors mechanism: an application programmer should not be concerned with how monitors are represented, nor with details of the logging mechanism, abort or commit operations.
The decision aboutwhena thread should attempt to detect serializability violations is strongly dependent on the cost of detection and may vary from one impleentation of transactional monitors to another.
The updates of child monitors are visible upon release only within the scope of their parent (and, upon release of the outermost monitor, are propagated to the shared space).

8. Transactional monitors: Implementation

An implementation that directly reflects the concept behindtransactional monitors would redirect all shared data accesses performed by a thread within a transactional monitor to a thread-local log.
When an object is first accessed, the accessing thread records itscurrent value in the log and refers to it for all subsequent operations.
When a transactional monitor is entered, actions to initialize logs,etc, may have to be taken by threads before they are allowed to enter the monitor.
While nesting adds complications, there are no inherent difficulties in supporting them [33].

8.1. Low-contention concurrency

Conceptually, transactional monitors use thread-local logs t record updates and install these updates into the original objects when a thread commits.
If the contention on shared data accesses is low, the log is superfluous.
If the number of objects concurrently written by different threads executing within the same monitor is small and the number of threads performing concurrent writes is also small,∗ then reads and writes can operate directly over the originaldat .
To preserve correctness, an implementation must still prevent multiple non-serializable writes to objects and must disallow readers from seeing partial or inconsistent updates to objects performed by the writers.
To address these concerns, the authors devise a low-contention implementation that stores the following information in each transactional monitor object: writer: the thread currently executing within the monitor that hasperformed writes to objects guarded by the monitor; thread count: the number of threads concurrently executing within the monitor.

8.1.1. Initialization

A thread attempting to enter the monitor must first check whether there is any active writer within the monitor.
If there is no active writer, the thread can freely proceed after incrementing the thread count.
Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.

8.1.2. Read and write barriers

Because there are no object copies or logs, there are no read barriers; threads read values from the original shared objects.
Write barriers are necessary to ensur that no other thread has performed writes within the monitor.
The writer field in the monitor object is nil, indicating no other writers are executing within the monitor.
The current thread atomically sets the writer field and executes the write.
If either condition does not hold then the thread must roll back nd re-execute the monitor.

8.1.3. Conflict detection

In order for the shared data operations of a thread exiting the monitor to be consistent and serializable with respect to other threads, there must have been no other writ rs within the monitor besides the exiting thread.
This is guaranteed by exclusion of other threads from entering monitors in which a writer exists, and by the write barrier which revokes threads that try to write when a writer already exists.
So long as there has been no concurrent writer withinthe monitor, actions of read-only threads are trivially serializable.
Thus, read-only threads simply check this condition on monitor exit.

8.1.4. Monitor exit

All threads decrement the monitor thread count on exit from the monitor.
The last thread to leave the monitor (ie, when the thread count reaches zero) clears the monitor writer field.
Since there are no copies or logs, all updates are immediately visible in the original object.
The actions performed by the low-contention scheme executing the account example from Figure 13 are illustrated in Figure 16, where wavy lines represent threadsT andT ′, circles represent objects c (checking account) ands (savings account), and updated objects are marked grey.
Subsequently threadT reads objectc ), threadT ′ updates objects and exits the monitor ) (no conflicts are detected since there were no intervening writes on behalf of other threads executing within the monitor).

8.2. High-contention concurrency

When there is even moderate contention for shared data, the previous strategy is unlikely to perform well because attempts to execute multiple writes, even to distinct objects, may result in conflicts and ∗This example is based on an interleaving of operations wheret conflict really exists (ie, serializability is violated).
Prepared usingcpeauth.cls aborting all but one of the writers.
The authors can avoid being penalized for contentionby permitting threads to manipulate copiesof shared objects, committing their changes only when they do not conflict with the actions of other threads.
The global write map and thread count can be combined into one data structure to simplify access to it.
Each thread also holds the following (thread-local) information: local writes: a list of object copies created by the thread when executingwithin the current transactional monitor; local read map: a local bit-map, implemented similarly to the global writemap, which identifies those objects read by the thread within the current monitor.

8.2.1. Initialization

The first thread attempting to enter a monitor initializes the monitor by clearing the global write map and setting the thread counter to one.
If updates have already been installed, the remaining threads still executing within the monitor are allowed to continue their execution,but no further threads are allowed to enter the monitor.
The authors do this so as to avoid accumulating spurious conflicts due to threads that have successfully exited the monitor after having performed writes.
Otherwise out-dated global write map Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.
Each thread entering a monitor must also clear its local data structures.

8.2.2. Read and write barriers

The following actions are taken before writing to an object: .
The object’s bit in the local write map is set, and the write is redirected to the copy.
If the bit representing the object in the local write map is set, then the current thread may already have a copy of this object (the mapping is imprecise).
If a copy exists, the read is performed against the copy, otherwise the original object is read; in both cases the local read map is set.

8.2.3. Conflict detection

Before a thread can exit a monitor, conflict detection checksif the global write map and the thread’s local read map are disjoint.
If they are disjoint then no reads by the current thread could have been interleaved with committed writes of other threads within the monitor, so the thread proceeds to exit the monitor.
If the maps intersect then a potentially harmful interleaving may have occurred that may violate serializability; in this case, the exiting thread must abort and re-execute the monitored region.
Only if the thread passes the test for conflicts can it proceedto xit the monitor, as follows.

8.2.4. Monitor exit

Having passed the test for possible conflicts, the thread proceeds to commit its updates atomically before exiting the monitor.
The copies are discarded from their circular copy list.
Prepared usingcpeauth.cls ands (saving account), and updated objects are marked grey.
In Figure 17(d) threadT ′ modifies objects, objects is greyed and the update also is reflected inT ′’s local write map (the authors assume Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.
ThreadT subsequently reads objects marking its local read map (Fig 17(f)) and attempts to exit the monitor ).

8.3.1. Native methods

Thus, the authors disallow execution of native methods within regions guarded by transactional monitors.
If the effects ofexecuting a native method do not affect the shared state (eg, a call to obtain the current system time), it can safely be performed within a guarded region.
Their current implementation does not provide such functionality.
If the commit succeeds, the updates are retained, a d execution reverts to mutual-exclusion semantics: a conventional mutual-exclusion lock is acquired for the remainder of the monitor.
Any other thread that attempts to commit its changes while the lock is held must abort.

8.3.2. Existing synchronization mechanisms

Double guarding a code fragment with both a transactional monitor and a mutual-exclusion monitor (the latter expressed using Java’ssynchronized keyword) does not strengthen existing serializability guarantees.
Indeed, code protected in such a manner will behave correctly.
The visibility rule for mutual-exclusion monitors embeddewithin a transactional monitor will change with respect to the original Java memory model: all updates performed within a region guarded by a mutual-exclusion monitor become visible only upon commit of the transactional monitor guarding that region.
∗This example is also based on an interleaving of operations where the conflict really exists (ie, serializability invariants are violated).

8.3.3. Wait-notify

The authors allow invocation ofwait andnotify methods inside of a region guarded by a transactional monitor, provided that they are also guarded by a mutual-exclusion monitor (and invoked on the object representing that mutual-exclusion monitor).
Thisrequirement is identical to the original Java execution semantics – a thread invoking wait or notify must hold the corresponding monitor.
Invokingwait releases the corresponding mutual-exclusion monitor and the current thread waits for notification, but updates performed so far do not become visible until the thread resumes and exits the transactional monitor.
Invokingnotify postpones the effects of notification until exit from the transactional monitor.
That is, notification modifies the shared state of a program and is therefore subject to the same visibility rules as other shared updates.

9. Transactional monitors: Experiments

To evaluate the performance of the prototype implementation, the authors use and extend the multi-threaded version of the OO7 object operations benchmark [14], originally developed in the database community.
The authors incarnation of OO7 uses modified traversal routines to allow parameterization of synchronization and concurrency behavior.
The authors have selected this benchmark because it provides a great deal of flexibility in the choice of run-time parameters (eg, percentage of reads and writes to shared data performed by the application) and extended it to allow contrl over placement of synchronization primitives and the amount of contention on data access.
Whenc oosing OO7 for their measurements, their goal was to accurately gauge various trade-offs inherent with different implementations of transactional monitors, rather than emulating workloads of elected potential applications.
Thus, the authors believe the benchmark captures essential features of scalable concurrent programs that can be used to quantify the impact of the design decisions underlying a transactional monitor implementation.

9.1. The OO7 benchmark

The OO7 benchmark suite [14] provides a great deal of flexibility for benchmark parameters (eg, database structure, fractions of reads/writes to shared/private data).
Each composite part comprises a graph ofatomic parts, and adocument object containing a small amount of text.
The authors implementation of OO7 conforms to the standard OO7 database specification.
The parameter’s of the workload control the mix of these four basic operations: read/write and private/shared.
To foster some degree of interesting interleaving and contention in the case of concurrent execution, their traversals also take a parameter that allows extra overheadto be added to read operations to increase the time spent performing traversals.

9.2. Measurements

The authors measurements were obtained on an eight-way 700MHz IntelPentium III with 2GB of RAM running Linux kernel version 2.4.20-20.9 (RedHat 9.0) in sigle-user mode.
The authors ran each benchmark configuration in its own invocation of RVM, repeating the benchmark six times in each invocation, and discarding the results of the first iteration, in which the benchmark classes are loaded and compiled, to eliminate the overheads of compilation.
Copyright c© 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract.
Level of the assembly hierarchy at which monitors were enterd: level one (module level), level three (second layer of composite parts) and level six (fifth layer of composite parts).
Every thread performs 1000 traversals (enters 1000 guardedregions) and visits 2M atomic parts during each iteration.

9.3. Results

The expected behavior for transactional monitor implementations optimized for low-contention applications is one in which performance is maximized when co tention on guarded shared data accesses is low, for example, if most operations in guarded regions are reads.
Potential performance improvements over a mutual-exclusion implementation arise f om the improved scalability that should be observable when executing on multi-processor platforms.
The authors experimental results confirm these hypotheses.
The remaining graphs illustrate the scalability of both schemes by plotting normalized execution times for the high-contention scheme -23(a)) and low-contention scheme -23(b)) when varying the number of threads (and processors) for monitor entries placed at levels one, three, and six , along with the information concerning number of aborts and copies created -27(c), respectively).

11. Conclusions

The authors have presented a revocation-based priority inversion avid nce technique and demonstrated its utility in improving throughput of high priority threads ina priority scheduling environment.
The solution proposed is relatively simple to implement, portable, and can be adopted to solve other types of problems (eg, deadlocks).
The authors have also introduced transactional monitors, a new synchronization mechanism, alternative to mutual-exclusion.
The authors have presented two different schemes tailored to different concurrent access patterns and examined their performance and scalability.
All the techniques the authors described use compiler support to insert barriers to monitor accesses to shared data, and run-time modifications to implement revocation.

Did you find this useful? Give us your feedback

Figures (28)

Figure 11. Overall time, 100K iterations

Figure 3. Revoking the effects of a synchronized block’s execution – priority inversion

Figure 16. Low contention scheme example

Figure 4. Revoking the effects of a synchronized block’s execution – deadlock

Figure 21. Normalized execution times – monitor entries at level 1

Figure 23. Normalized execution times – monitor entries at level 6

Figure 22. Normalized execution times – monitor entries at level 3

Figure 8. Rescheduling thread execution in the presence of rollback may not always be correct

Figure 7. Erroneous revocation sequence due to volatile variable access

Figure 19. Total aborts for 64 threads running on 8 processors

Figure 20. Total copies created for 64 threads running on 8 processors

Figure 18. Normalized execution time for 64 threads runningo 8 processors

Figure 26. Total aborts – monitor entries at level 6

Figure 24. Total aborts – monitor entries at level 1

Figure 25. Total aborts – monitor entries at level 3

Figure 12. Overall time, 500K iterations

Figure 17. High contention scheme example

Content maybe subject to copyright Report

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE

Concurrency Computat.: Pract. Exper. 2005; 00:1–41 Prepared using cpeauth.cls [Version: 2002/09/19 v2.02]

Revocation techniques for Java

concurrency

Adam Welc

†

, Suresh Jagannathan

‡

, Antony L. Hosking

Department of Computer Sciences

Purdue University

250 N. University Street

West Lafayette, IN 47907-2066, U.S.A.

SUMMARY

This paper proposes two approaches to managing concurrency in Java using a guarded region abstraction.

Both approaches use revocation of such regions – the ability to undo their effects automatically

and transparently. These new techniques alleviate many of the constraints that inhibit construction

of transparently scalable and robust concurrent applications. The ﬁrst solution, revocable monitors,

augments existing mutual exclusion monitors with the ability to resolve priority inversion and deadlock

dynamically, by reverting program execution to a consistent state when such situations are detected,

while preserving Java semantics. The second technique, transactional monitors, extends the functionality

of revocable monitors by implementing guarded regions as lightweight transactions that can be executed

concurrently (or in parallel on multiprocessor platforms). The presentation includes discussion of design

and implementation issues for both schemes, as well as a detailed performance study to compare their

behavior with the traditional, state-of-the-art implementation of Java monitors based on mutual exclusion.

KEY WORDS: isolation, atomicity, concurrency, synchronization, Java, speculation

1. Introduction

Managing complexity is a major challenge in constructing robust large-scale server applications

(such as database management systems, application servers, airline reservation systems, etc). In

a typical environment, large numbers of clients may access a server application concurrently. To

provide satisfactory response time and throughput, applications are often made concurrent. Thus, many

programming languages (eg, Smalltalk, C++, ML, Modula-3, Java) provide mechanisms that enable

concurrent programming via a thread abstraction, with threads being the smallest unit of concurrent

†

E-mail: welc@cs.purdue.edu

‡

E-mail: suresh@cs.purdue.edu

E-mail: hosking@cs.purdue.edu

Contract/grant sponsor: National Science Foundation; contract/grant number: IIS-9988637, CCR-0085792, STI-0034141

 2005 John Wiley & Sons, Ltd.

2 A. WELC, S. JAGANNATHAN, A. L. HOSKING

execution. Another key mechanism offered by these languages is the notion of guarded code regions in

which accesses to shared data performed by one thread are isolated from accesses performed by other

threads, and all updates performed by a thread within a guarded region become visible to the other

threads atomically, once the executing thread exits the region. Guarded regions (eg, Java synchronized

methods and blocks, Modula-3 LOCK statements) are usually implemented using mutual-exclusion

locks.

In this paper, we explore two new approaches to concurrent programming, comparing their

performance against use of a state-of-the-art mutual exclusion implementation that uses thin locks

to minimize the overhead of locking [4]. Our discussion is grounded in the context of the Java

programming language, but is applicable to any language that offers the following mechanisms:

• Multithreading: concurrent threads of control executing over objects in a shared address space.

• Synchronized blocks: lexically-delimited blocks of code, guarded by dynamically-scoped

monitors (locks). Threads synchronize on a given monitor, acquiring it on entry to the block

and releasing it on exit. Only one thread may be perceived to execute within a synchronized

block at any time, ensuring exclusive access to all monitor-protected blocks.

• Exception scopes: blocks of code in which an error condition can change the normal ﬂow

of control of the active thread, by exiting active scopes, and transferring control to a handler

associated with each block.

Difﬁculties arising in the use of mutual exclusion locking with multiple threads are widely-

recognized, such as race conditions, priority inversion and deadlock.

Race conditions are a serious issue for non-trivial concurrent programs. A race exists when two

threads can access the same object, and one of the accesses is a write. To avoid races, programmers

must carefully construct their application to trade off performance and throughput (by maximizing

concurrent access to shared data) for correctness (by limiting concurrent access when it could lead to

incorrect behavior), or rely on race detector tools that identify when races occur [7, 8, 18]. Recent work

has advocated higher-level safety properties such as atomicity for concurrent applications [19].

In languages with priority scheduling of threads, a low-priority thread may hold a lock even while

other threads, which may have higher priority, are waiting to acquire it. Priority inversion results when

a low-priority thread T

holds a lock required by some high-priority thread T

, forcing the high-priority

to wait until T

releases the lock. Even worse, an unbounded number of runnable medium-priority

threads T

may exist, thus preventing T

from running, making unbounded the time that T

(and hence

) must wait. Such situations can cause havoc in applications where high-priority threads demand

some level of guaranteed throughput.

Deadlock results when two or more threads are unable to proceed because each is waiting on a lock

held by another. Such a situation is easily constructed for two threads, T

and T

: T

ﬁrst acquires lock

while T

acquires L

, then T

tries to acquire L

while T

tries to acquire L

, resulting in deadlock.

Deadlocks may also result from a far more complex interaction among multiple threads and may stay

undetected until and beyond application deployment. The ability to resolve a deadlock dynamically is

much more attractive than permanently stalling some subset of concurrent threads.

For real-world concurrent programs with complex module and dependency structures, it is difﬁcult

to perform an exhaustive exploration of the space of possible interleavings to determine statically

when races, deadlocks, or priority inversions may arise. For such applications, the ability to redress

undesirable interactions transparently among scheduling decisions and lock managementis very useful.

 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 00:1–41

Prepared using cpeauth.cls

REVOCATION TECHNIQUES FOR JAVA CONCURRENCY 3

These observations inspire the ﬁrst solution we propose: revocable monitors. Our technique augments

existing mutual exclusion monitors with the ability to resolve priority inversion dynamically (and

automatically). Some instances of deadlock may be resolved by revocation. However, we note that

deadlocks inherent to a program that are independent of scheduling decisions will manifest themselves

as livelock when revocation is used.

A second difﬁculty with using mutual exclusion to mediate data accesses among threads is ensuring

adequate performance when running on multi-processor platforms. To manipulate a complex shared

data structure like a tree or heap, applications must either impose a global locking scheme on the

roots, or employ locks at lower-level nodes in the structure. The former strategy is simple, but reduces

realizable concurrency and may induce false exclusion: threads wishing to access a distinct piece of the

structure may nonetheless block while waiting for another thread that is accessing an unrelated piece

of the structure. The latter approach permits multiple threads to access the structure simultaneously,

but incurs implementation complexity, and requires more memory to hold the necessary lock state.

Our solution to this problem is an alternative to lock-based mutual exclusion: transactional

monitors. These extend the functionality of revocable monitors by implementing guarded regions as

lightweight transactions that can be executed concurrently (or in parallel on multiprocessor platforms).

Transactional monitors deﬁne the following data visibility property that preserves isolation and

atomicity invariants on shared data protected by the monitor: all updates to objects guarded by a

transactional monitor become visible to other threads only on successful completion of the monitor

transaction.

∗

Because transactional monitors impose serializability invariants on the regions they

protect (ie, preserve the appearance of serial execution), they can help reduce race conditions by

allowing programmers to more aggressively guard code regions that may access shared data without

paying a signiﬁcant performance penalty. Since the system dynamically records and redresses state

violations (by revoking the effects of the transaction when a serializability violation is detected),

programmers are relieved from the burden of having to determine when mutual exclusion can safely

be relaxed. Thus, programmers can afford to over-specify code regions that must be guarded, provided

the implementation can relax such over-speciﬁcation safely and efﬁciently whenever possible.

While revocable monitors and transactional monitors rely on similar mechanisms, and can exist

side-by-side in the same virtual machine, their semantics and intended utility are quite different. We

expect revocable monitors to be used primarily to resolve deadlock as well as to improvethroughputfor

high-priority threads by transparently averting priority inversion. In contrast, we envision transactional

monitors as an entirely new synchronization framework that addresses the performance impact of

classical mutual exclusion while simplifying concurrent programming.

We examine the performance and scalability of these different approaches in the context of a state-of-

the-art Java compiler and virtual machine, namely the Jikes Research Virtual Machine (RVM) [3] from

IBM. Jikes RVM is an ideal platform to compare our solutions with pure lock-based mutual exclusion,

since it already uses sophisticated strategies to minimize the overhead of traditional mutual-exclusion

locks [4]. A detailed evaluation in this context provides an accurate depiction of the tradeoffs embodied

and beneﬁts obtained using the solutions we propose.

∗

A slightly weaker visibility property is present in Java for updates performed within a synchronized block (or method);

these are guaranteed to be visible to other threads only upon exit from the block.

 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 00:1–41

Prepared using cpeauth.cls

4 A. WELC, S. JAGANNATHAN, A. L. HOSKING

synchronized(mon) {

o1.f++;

o2.f++;

bar();

}

foo();

Figure 1. Priority inversion

2. Revocable monitors: Overview

There are several ways to remedy erroneous or undesirable behavior in concurrent programs. Static

techniques can sometimes identify erroneous conditions, allowing programmers to restructure their

application appropriately. When static techniques are infeasible, dynamic techniques can be used both

to identify problems and remedy them when possible. Solutions to priority inversionsuch as the priority

ceiling and priority inheritance protocols [40] are good examples of such dynamic solutions.

Priority ceiling and priority inheritance solve an unbounded priority inversion problem, illustrated

using the code fragment in Figure 1 (both T

and T

execute the same code and methods foo() and

bar() contain an arbitrary sequence of operations). Let us assume that thread T

(low priority) is ﬁrst

to acquire the monitor mon, modiﬁes objects o

and o

, and is then preempted by thread T

(medium

priority). Note that thread T

(high priority) is not permitted to enter monitor mon until it has been

released by T

, but since method foo() executed by T

may contain arbitrary sequence of actions (eg,

synchronous communication with another thread), it may take arbitrary time before T

is allowed to run

again (and exit the monitor). Thus thread T

may be forced to wait for an unbounded amount of time

before it is allowed to complete its actions.

The priority ceiling technique raises the priority of any locking thread to the highest priority of

any thread that ever uses that lock (ie, its priority ceiling). This requires the programmer to supply

the priority ceiling for each lock used throughout the execution of a program. In contrast, priority

inheritance will raise the priority of a thread only when holding a lock causes it to block a higher

priority thread. When this happens, the low priority thread inherits the priority of the higher priority

thread it is blocking. Both of these solutions prevent a medium priority thread from blocking the

execution of the low priority thread (and thus also the high priority thread) indeﬁnitely. However, even

in the absence of the medium priority thread, the high priority thread is forced to wait until the low

priority thread releases its lock. In the example given, the time to execute method bar() is potentially

unbounded, thus high priority thread T

may still be delayed indeﬁnitely until low priority thread T

ﬁnishes executing bar() and releases the monitor. Neither priority ceiling nor priority inheritance

offer a solution to this problem.

Besides priority inversion, deadlock is another potentially unwanted consequence of using mutual-

exclusion abstractions. A typical deadlock situation is illustrated with the code fragment in Figure 2.

Let us assume the following sequence of actions: thread T

acquires monitor mon1 and updates object

 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 00:1–41

Prepared using cpeauth.cls

REVOCATION TECHNIQUES FOR JAVA CONCURRENCY 5

synchronized(mon1) {

o1.f++;

synchronized(mon2) {

bar();

}

synchronized(mon2) {

o2.f++;

synchronized(mon1) {

bar();

}

Figure 2. Deadlock

, thread T

acquires monitor mon2 and updates object o

, thread T

attempts to acquire monitor mon2

blocks since mon2 is already held by thread T

) and thread T

attempts to acquire monitor mon1

blocks as well since mon1 is already held by T

). The result is that both threads are deadlocked –

they will remain blocked indeﬁnitely and method bar() will never get executed by any of the threads.

In both of the scenarios illustrated by Figures 1 and 2, one can identify a single offending thread that

must be revoked in order to resolve either the priority inversion or the deadlock. For priority inversion

the offending thread is the low-priority thread currently executing the monitor. For deadlock, it is either

of the threads engaged in deadlock – there exist various techniques for preventing or detecting deadlock

[21], but all require that the actions of one of the threads leading to deadlock be revoked.

Revocable monitors can alleviate both these issues. Our approach to revocation combines compiler

techniques with run-time detection and resolution. When the need for revocation is encountered, the

run-time system selectively revokes the offending thread executing the monitor (ie, synchronized

block) and its effects. All updates to shared data performed within the monitor are logged. Upon

detecting priority inversion or deadlock (either at lock acquisition, or in the background), the run-time

system interrupts the offending thread, uses the logged updates to undo that thread’s shared updates,

and transfers control of the thread back to the beginning of the block for retry. Externally, the effect of

the roll-back is to make it appear that the offending thread never entered the block.

The process of revoking the effects performed by a low priority thread within a monitor is illustrated

in Figure 3 where wavy lines represent threads T

and T

, circles represent objects o

and o

, updated

objects are marked grey, and the box represents the dynamic scope of a common monitor guarding a

synchronized block executed by the threads. This scenario is based on the code from Figure 1 (data

access operations performed within method bar() have been omitted for brevity). In Figure 3(a) low-

priority thread T

is about to enter the synchronized block, which it does in Figure 3(b), modifying

object o

. High-priority thread T

tries to acquire the same monitor, but is blocked by low-priority

(Figure 3(c)). Here, a priority inheritance approach [40] would raise the priority of thread T

that of T

, but T

would still have to wait for T

to release the lock. If a priority ceiling protocol was

used, the priority of T

would be raised to the ceiling upon its entry to the synchronized block, but

the problem of T

being forced to wait for T

to release the lock would remain. Instead, our approach

preempts T

, undoing any updates to o

, and transfers control in T

back to the point of entry to the

synchronized block. Here T

must wait while T

enters the monitor, and updates objects o

(Figure 3(e))

 2005 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2005; 00:1–41

Prepared using cpeauth.cls

HTML Viewer

Revocation techniques for Java concurrency

Summary (10 min read)

1. Introduction

2. Revocable monitors: Overview

3. Revocable monitors: Design

3.1. The Java memory model (JMM)

3.2. Preserving JMM-consistency

4. Revocable monitors: Implementation

4.1. Monitor roll-back

4.1.1. Bytecode transformation

4.1.2. Compiler and run-time modifications

4.1.3. Discussion

4.2. Priority inversion avoidance

5. Revocable monitors: Experiments

5.1. Benchmark program

5.2. Results

6. Transactional monitors: Overview

7. Transactional monitors: Design

8. Transactional monitors: Implementation

8.1. Low-contention concurrency

8.1.1. Initialization

8.1.2. Read and write barriers

8.1.3. Conflict detection

8.1.4. Monitor exit

8.2. High-contention concurrency

8.2.1. Initialization

8.2.2. Read and write barriers

8.2.3. Conflict detection

8.2.4. Monitor exit

8.3.1. Native methods

8.3.2. Existing synchronization mechanisms

8.3.3. Wait-notify

9. Transactional monitors: Experiments

9.1. The OO7 benchmark

9.2. Measurements

9.3. Results

10. Related work

11. Conclusions

Figures (28)

Citations

References

Related Papers (5)