scispace - formally typeset
Search or ask a question

Showing papers by "Michael Spear published in 2007"


Proceedings ArticleDOI
12 Aug 2007
TL;DR: It is argued that privatization comprises a pair of symmetric subproblems: private operations may fail to see updates made by transactions that have committed but not yet completed; conversely, transactions that are doomed but have not yet aborted may see Updates made by private code, causing them to perform erroneous, externally visible operations.
Abstract: Early implementations of software transactional memory (STM) assumed that sharable data would be accessed only within transactions. Memory may appear inconsistent in programs that violate this assumption, even when program logic would seem to make extra-transactional accesses safe. Designing STM systems that avoid such inconsistency has been dubbed the privatization problem. We argue that privatization comprises a pair of symmetric subproblems: private operations may fail to see updates made by transactions that have committed but not yet completed; conversely, transactions that are doomed but have not yet aborted may see updates made by private code, causing them to perform erroneous, externally visible operations. We explain how these problems arise in different styles of STM, present strategies to address them, and discuss their implementation tradeoffs. We also propose a taxonomy of contracts between the system and the user, analogous to programmer-centric memory consistency models, which allow us to classify programs based on their privatization requirements. Finally, we present empirical comparisons of several privatization strategies. Our results suggest that the best strategy may depend on application characteristics.

163 citations


Proceedings ArticleDOI
09 Jun 2007
TL;DR: This work describes an alert on update mechanism (AOU) that allows a thread to receive fast, asynchronous notification when previously-identified lines are written by other threads, and a programmable data isolation mechanism (PDI) that allowing athread to hide its speculative writes from otherthreads, ignoring conflicts, until software decides to make them visible.
Abstract: There has been considerable recent interest in both hardware andsoftware transactional memory (TM). We present an intermediateapproach, in which hardware serves to accelerate a TM implementation controlled fundamentally by software. Specifically, we describe an alert on update mechanism (AOU) that allows a thread to receive fast, asynchronous notification when previously-identified lines are written by other threads, and a programmable data isolation mechanism (PDI) that allows a thread to hide its speculative writes from other threads, ignoring conflicts, until software decides to make them visible. These mechanisms reduce bookkeeping, validation, and copying overheads without constraining software policy on a host of design decisions.We have used AOU and PDI to implement a hardwareacceleratedsoftware transactional memory system we call RTM. We have also used AOU alone to create a simpler "RTM-Lite". Across a range of microbenchmarks, RTM outperforms RSTM, a publicly available software transactional memory system, by as much as 8.7x (geometric mean of 3.5x) in single-thread mode. At 16 threads, it outperforms RSTM by as much as 5x, with an average speedup of 2x. Performance degrades gracefully when transactions overflow hardware structures. RTM-Lite is slightly faster than RTM for transactions that modify only small objects; full RTM is significantly faster when objects are large. In a strongargument for policy flexibility, we find that the choice between eager (first-access) and lazy (commit-time) conflict detection can lead to significant performance differences in both directions, depending on application characteristics.

120 citations


Proceedings ArticleDOI
27 Sep 2007
TL;DR: An open-source implementation of Delaunay triangulation that uses transactions as one component of a larger parallelization strategy, and employs one of the fastest known sequential algorithms to triangulate geometrically partitioned regions in parallel.
Abstract: Transactional memory has been widely hailed as a simpler alternative to locks in multithreaded programs, but few nontrivial transactional programs are currently available. We describe an open-source implementation of Delaunay triangulation that uses transactions as one component of a larger parallelization strategy. The code is written in C+ +, for use with the RSTM software transactional memory library (also open source). It employs one of the fastest known sequential algorithms to triangulate geometrically partitioned regions in parallel; it then employs alternating, barrier-separated phases of transactional and partitioned work to stitch those regions together. Experiments on multiprocessor and multicore machines confirm excellent single-thread performance and good speedup with increasing thread count. Since execution time is dominated by geometrically partitioned computation, performance is largely insensitive to the overhead of transactions, but highly sensitive to any costs imposed on shamble data that are currently "privatized".

65 citations


01 Jan 2007
TL;DR: It is concluded that while the interface is a significant improvement on earlier efforts, and makes it practical for systems researchers to build nontrivial applications, it fails to realize the programming simplicity that was supposed to be the motivation for transactions in the first place.
Abstract: Like many past extensions to user programming models, transactions can be added to the programming language or implemented in a library using existing language features. We describe a library-based transactional memory API for C++. Designed to address the limitations of an earlier API with similar functionality, the new interface leverages macros, exceptions, multiple inheritance, generics (templates), and overloading of operators (including pointer dereference) in an attempt to minimize syntactic clutter, admit a wide variety of back-end implementations, avoid arbitrary restrictions on otherwise valid language constructs, enable privatization, catch as many programmer errors as possible, and provide semantics that “seem natural” to C++ programmers. Having used our API to construct several small and one large application, we conclude that while the interface is a significant improvement on earlier efforts, and makes it practical for systems researchers to build nontrivial applications, it fails to realize the programming simplicity that was supposed to be the motivation for transactions in the first place. Several groups have proposed compiler support as a way to improve the performance of transactions. We conjecture that compiler—and language—support will be even more important as a way to improve the programming model.

37 citations


Proceedings ArticleDOI
09 Jun 2007
TL;DR: A second nonblocking STM system is presented that uses multiple AOU lines (one per accessed object) to eliminate validation overhead entirely, resulting in a nonblocking, zero-indirection STM systems that outperforms competing systems by as much as a factor of 2.
Abstract: Nonblocking implementations of software transactional memory (STM) typically impose an extra level of indirection when accessing an object; some researchers have claimed that the cost of this indirection outweighs the semantic advantages of nonblocking progress guarantees. We consider this claim in the context of a simple hardware assist, alert-on-update (AOU), which allows a thread to request immediate notification if specified line(s) are replaced or invalidated in its cache. We show that even a single AOU line allows us to construct a simple, nonblocking STM system without extra indirection. At the same time, we observe that per-load validation operations, required for intra-object consistency in both the new system and in lock-based (blocking) STM, at least partially negate the resulting performance gain. Moreover, inter-object consistency checks, also required in both kinds of systems, remain the dominant cost for transactions that access many objects. We therefore present a second nonblocking STM system that uses multiple AOU lines (one per accessed object) to eliminate validation overhead entirely, resulting in a nonblocking, zero-indirection STM system that outperforms competing systems by as much as a factor of 2.

36 citations


Proceedings ArticleDOI
14 Mar 2007
TL;DR: A traditional advantage of shared-memory multiprocessors is their ability to support very fast implicit communication: if thread A modifies location D, thread B will see the change as soon as it tries to read D; no explicit receive is required.
Abstract: A traditional advantage of shared-memory multiprocessors is their ability to support very fast implicit communication: if thread A modifies location D, thread B will see the change as soon as it tries to read D; no explicit receive is required. There are times, however, when B needs to know of A’s action immediately. Event-based programs and condition synchronization are obvious examples, but there are many others. Consider a program in which B reads V from D, computes a new value V ′ (a potentially time-consuming task), and uses compare-and-swap to install V ′ in D only if no other thread has completed an intervening update. If some thread A has completed an update, then all of B’s work subsequent to that update will be wasted. More significantly, suppose we generalize the notion of atomic update to implement software transactional memory (STM) [4]. Now A may not only force B to back out (abort) and retry, it may also allow B to read mutually inconsistent values from different locations. If B does not learn of A’s action immediately, these inconsistent values (which should not logically be visible together) may causeB to perform erroneous operations that cannot be undone [5]. STM systems typically avoid such errors by performing incremental validation prior to every potentially dangerous operation—in effect, they poll for conflicts. Since validation effort is proportional to the number of objects read so far, the total cost is quadratic in the number of objects read, and may cripple performance [5]. Interprocessor interrupts are the standard alternative to polling in shared-memory multiprocessors, but they are typically triggered by the operating system and have prohibitive latencies. This cost is truly unfortunate, since most of the infrastructure necessary to

21 citations


Proceedings ArticleDOI
12 Aug 2007
TL;DR: This note describes one candidate benchmark: an implementation of Delaunay triangulation, which employs one of the fastest known sequential algorithms to triangulate geometrically partitioned regions in parallel.
Abstract: With the rise of multicore processors, much recent attention has focused on transactional memory (TM). Unfortunately, the field has yet to develop standard benchmarks to capture application characteristics or to facilitate system comparisons. This note describes one candidate benchmark: an implementation of Delaunay triangulation [4]. Source for this benchmark is packaged with Version 3 of the Rochester Software Transactional Memory (RSTM) open-source C++ library [1,9]. It employs one of the fastest known sequential algorithms to triangulate geometrically partitioned regions in parallel; it then employs alternating, barrier-separated phases of transactional and partitioned (“privatized”) work to stitch those regions together. Experiments on multiprocessor and multicore machines confirm good speedup and excellent single-thread performance. They also highlight the cost of extra indirection in the implementation of transactional data: since execution time is dominated by privatized phases, performance is largely insensitive to the overhead of transactions per se, but highly sensitive to any costs imposed on privatized data. Experience with the applicationwriting process provides strong anecdotal evidence that TM will eventually require language and compiler support.

12 citations


Book ChapterDOI
24 Sep 2007
TL;DR: Interoperability enables seemless integration with legacy code, atomic composition of nonblocking operations, and the equivalent of hand-optimized, closed nested transactions.
Abstract: This brief announcement focuses on interoperability of software transactions with ad hoc nonblocking algorithms. Specifically, we modify arbitrary nonblocking operations so that (1) they can be used both inside and outside transactions, (2) external uses serialize with transactions, and (3) internal uses succeed if and only if the surrounding transaction commits. Interoperability enables seemless integration with legacy code, atomic composition of nonblocking operations, and the equivalent of hand-optimized, closed nested transactions.

3 citations