scispace - formally typeset
Search or ask a question

Showing papers on "Transactional memory published in 2006"


Journal Article
TL;DR: This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique, which is ten-fold faster than a single lock.
Abstract: The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any systems memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available.

893 citations


Book ChapterDOI
18 Sep 2006
TL;DR: TL2 as mentioned in this paper is a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique, which is ten times faster than a single lock.
Abstract: The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any system's memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available.

891 citations


Proceedings ArticleDOI
27 Feb 2006
TL;DR: This paper presents a new implementation of transactional memory, log-based transactionalMemory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place.
Abstract: Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (retained if the transaction aborts) values. Most (hardware) TM systems leave old values "in place" (the target memory address) and buffer new values elsewhere until commit. This makes aborts fast, but penalizes (the much more frequent) commits. In this paper, we present a new implementation of transactional memory, log-based transactional memory (LogTM), that makes commits fast by storing old values to a per-thread log in cacheable virtual memory and storing new values in place. LogTM makes two additional contributions. First, LogTM extends a MOESI directory protocol to enable both fast conflict detection on evicted blocks and fast commit (using lazy cleanup). Second, LogTM handles aborts in (library) software with little performance penalty. Evaluations running micro- and SPLASH-2 benchmarks on a 32-way multiprocessor support our decision to optimize for commit by showing that only 1-2% of transactions abort.

724 citations


Proceedings ArticleDOI
20 Oct 2006
TL;DR: Using a simulated multiprocessor with HTM support, the viability of the HyTM approach is demonstrated: it can provide performance and scalability approaching that of an unbounded HTM implementation, without the need to support all transactions with complicatedHTM support.
Abstract: Transactional memory (TM) promises to substantially reduce the difficulty of writing correct, efficient, and scalable concurrent programs But "bounded" and "best-effort" hardware TM proposals impose unreasonable constraints on programmers, while more flexible software TM implementations are considered too slow Proposals for supporting "unbounded" transactions in hardware entail significantly higher complexity and risk than best-effort designsWe introduce Hybrid Transactional Memory (HyTM), an approach to implementing TMin software so that it can use best effort hardware TM (HTM) to boost performance but does not depend on HTM Thus programmers can develop and test transactional programs in existing systems today, and can enjoy the performance benefits of HTM support when it becomes availableWe describe our prototype HyTM system, comprising a compiler and a library The compiler allows a transaction to be attempted using best-effort HTM, and retried using the software library if it fails We have used our prototype to "transactify" part of the Berkeley DB system, as well as several benchmarks By disabling the optional use of HTM, we can run all of these tests on existing systems Furthermore, by using a simulated multiprocessor with HTM support, we demonstrate the viability of the HyTM approach: it can provide performance and scalability approaching that of an unbounded HTM implementation, without the need to support all transactions with complicated HTM support

525 citations


Proceedings ArticleDOI
29 Mar 2006
TL;DR: McRT-STM as mentioned in this paper is a software transactional memory (STM) system that is part of McRT, an experimental Multi-Core RunTime (MCRT) implementation that supports nested transactions with partial aborts, conditional signaling within a transaction, and object based conflict detection for C/C++ applications.
Abstract: Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed this concurrency using locks (mutex based synchronization). Unfortunately, lock based synchronization often leads to deadlocks, makes fine-grained synchronization difficult, hinders composition of atomic primitives, and provides no support for error recovery. Transactions avoid many of these problems, and therefore, promise to ease concurrent programming.We describe a software transactional memory (STM) system that is part of McRT, an experimental Multi-Core RunTime. The McRT-STM implementation uses a number of novel algorithms, and supports advanced features such as nested transactions with partial aborts, conditional signaling within a transaction, and object based conflict detection for C/C++ applications. The McRT-STM exports interfaces that can be used from C/C++ programs directly or as a target for compilers translating higher level linguistic constructs.We present a detailed performance analysis of various STM design tradeoffs such as pessimistic versus optimistic concurrency, undo logging versus write buffering, and cache line based versus object based conflict detection. We also show a MCAS implementation that works on arbitrary values, coexists with the STM, and can be used as a more efficient form of transactional memory. To provide a baseline we compare the performance of the STM with that of fine-grained and coarse-grained locking using a number of concurrent data structures on a 16-processor SMP system. We also show our STM performance on a non-synthetic workload -- the Linux sendmail application.

487 citations


Proceedings ArticleDOI
20 Oct 2006
TL;DR: An innovative concurrent-program invariant that captures programmers' atomicity assumptions is proposed and a tool with two implementations is described that can automatically extract such invariants and detect atomicity violation bugs.
Abstract: Concurrency bugs are among the most difficult to test and diagnose of all software bugs. The multicore technology trend worsens this problem. Most previous concurrency bug detection work focuses on one bug subclass, data races, and neglects many other important ones such as atomicity violations, which will soon become increasingly important due to the emerging trend of transactional memory models.This paper proposes an innovative, comprehensive, invariantbased approach called AVIO to detect atomicity violations. Our idea is based on a novel observation called access interleaving invariant, which is a good indication of programmers' assumptions about the atomicity of certain code regions. By automatically extracting such invariants and detecting violations of these invariants at run time, AVIO can detect a variety of atomicity violations.Based on this idea, we have designed and built two implementations of AVIO and evaluated the trade-offs between them. The first implementation, AVIO-S, is purely in software, while the second, AVIO-H, requires some simple extensions to the cache coherence hardware. AVIO-S is cheaper and more accurate but incurs much higher overhead and thus more run-time perturbation than AVIOH. Therefore, AVIO-S is more suitable for in-house bug detection and postmortem bug diagnosis, while AVIO-H can be used for bug detection during production runs.We evaluate both implementations of AVIO using large realworld server applications (Apache and MySQL) with six representative real atomicity violation bugs, and SPLASH-2 benchmarks. Our results show that AVIO detects more tested atomicity violations of various types and has 25 times fewer false positives than previous solutions on average.

406 citations


Journal ArticleDOI
11 Jun 2006
TL;DR: A new 'direct access' implementation that avoids searching thread-private logs is introduced, compiler optimizations to reduce the amount of logging are developed, and a series of GC-time techniques to compact the logs generated by long-running atomic blocks are presented.
Abstract: Atomic blocks allow programmers to delimit sections of code as 'atomic', leaving the language's implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multi-processor performance without requiring changes to the basic structure of objects in the heap. However, these implementations perform poorly because they interpose on all accesses to shared memory in the atomic block, redirecting updates to a thread-private log which must be searched by reads in the block and later reconciled with the heap when leaving the block.This paper takes a four-pronged approach to improving performance: (1) we introduce a new 'direct access' implementation that avoids searching thread-private logs, (2) we develop compiler optimizations to reduce the amount of logging (e.g. when a thread accesses the same data repeatedly in an atomic block), (3) we use runtime filtering to detect duplicate log entries that are missed statically, and (4) we present a series of GC-time techniques to compact the logs generated by long-running atomic blocks.Our implementation supports short-running scalable concurrent benchmarks with less than 50\% overhead over a non-thread-safe baseline. We support long atomic blocks containing millions of shared memory accesses with a 2.5-4.5x slowdown.

332 citations


Journal ArticleDOI
11 Jun 2006
TL;DR: A high-performance software transactional memory system (STM) integrated into a managed runtime environment is presented and the JIT compiler is the first to optimize the overheads of STM, and novel techniques for enabling JIT optimizations on STM operations are shown.
Abstract: Programmers have traditionally used locks to synchronize concurrent access to shared data. Lock-based synchronization, however, has well-known pitfalls: using locks for fine-grain synchronization and composing code that already uses locks are both difficult and prone to deadlock. Transactional memory provides an alternate concurrency control mechanism that avoids these pitfalls and significantly eases concurrent programming. Transactional memory language constructs have recently been proposed as extensions to existing languages or included in new concurrent language specifications, opening the door for new compiler optimizations that target the overheads of transactional memory.This paper presents compiler and runtime optimizations for transactional memory language constructs. We present a high-performance software transactional memory system (STM) integrated into a managed runtime environment. Our system efficiently implements nested transactions that support both composition of transactions and partial roll back. Our JIT compiler is the first to optimize the overheads of STM, and we show novel techniques for enabling JIT optimizations on STM operations. We measure the performance of our optimizations on a 16-way SMP running multi-threaded transactional workloads. Our results show that these techniques enable transactional memory's performance to compete with that of well-tuned synchronization.

318 citations


Journal ArticleDOI
01 May 2006
TL;DR: Bulk is presented, a novel approach to simplify these mechanisms to hash-encode a thread's access information in a concise signature, and then support in hardware signature operations that efficiently process sets of addresses that implement the mechanisms described.
Abstract: Transactional Memory (TM), Thread-Level Speculation (TLS), and Checkpointed multiprocessors are three popular architectural techniques based on the execution of multiple, cooperating speculative threads. In these environments, correctly maintaining data dependences across threads requires mechanisms for disambiguating addresses across threads, invalidating stale cache state, and making committed state visible. These mechanisms are both conceptually involved and hard to implement. In this paper, we present Bulk, a novel approach to simplify these mechanisms. The idea is to hash-encode a thread's access information in a concise signature, and then support in hardware signature operations that efficiently process sets of addresses. Such operations implement the mechanisms described. Bulk operations are inexact but correct, and provide substantial conceptual and implementation simplicity. We evaluate Bulk in the context of TLS using SPECint2000 codes and TM using multithreaded Java workloads. Despite its simplicity, Bulk has competitive performance with more complex schemes. We also find that signature configuration is a key design parameter.

314 citations


Journal ArticleDOI
TL;DR: A hardware implementation of unbounded transactional memory, called UTM, is described, which exploits the common case for performance without sacrificing correctness on transactions whose footprint can be nearly as large as virtual memory.
Abstract: This article advances the following thesis: transactional memory should be virtualized to support transactions of arbitrary footprint and duration. Such support should be provided through hardware and be made visible to software through the machines instruction set architecture. We call a transactional memory system unbounded if the system can handle transactions of arbitrary duration that have footprints nearly as big as the systems virtual memory. The primary goal of unbounded transactional memory is to make concurrent programming easier without incurring much implementation overhead. Unbounded transactional-memory architectures can achieve high performance in the common case of small transactions, without sacrificing correctness in large transactions

295 citations


Proceedings ArticleDOI
16 Oct 2006
TL;DR: DSTM2, a Java™ software library that provides a flexible framework for implementing object-based software transactional memory (STM), is described, providing a substantial improvement over the awkward programming interface of the previous DSTM library.
Abstract: We describe DSTM2, a Java™ software library that provides a flexible framework for implementing object-based software transactional memory (STM). The library uses transactional factories to transform sequential (unsynchronized) classes into atomic (transactionally synchronized) ones, providing a substantial improvement over the awkward programming interface of our previous DSTM library. Furthermore, researchers can experiment with alternative STM mechanisms by providing their own factories. We demonstrate this flexibility by presenting two factories: one that uses essentially the same mechanisms as the original DSTM (with some enhancements),and another that uses a completely different approach.Because DSTM2 is packaged as a Java library, a wide range of programmers can easily try it out, and the community can begin to gain experience with transactional programming. Furthermore, researchers will be able to use the body of transactional programs that arises from this community experience to test and evaluate different STM mechanisms simply by supplying new transactional factories. We believe that this flexible approach will help to build consensus about the best ways to implement transactions, and will avoid the premature "lock-in" that may arise if STM mechanisms are baked into compilers before such experimentation is done.

Journal ArticleDOI
TL;DR: It is shown that a direct translation of lock-based critical sections into transactions can introduce deadlock into otherwise correct programs, and the terms strong atomicity and weak atomicity are introduced to describe the interaction of transactional and non-transactional code.
Abstract: Transactional memory has great potential for simplifying multithreaded programming by allowing programmers to specify regions of the program that must appear to execute atomically Transactional memory implementations then optimistically execute these transactions concurrently to obtain high performance This work shows that the same atomic guarantees that give transactions their power also have unexpected and potentially serious negative effects on programs that were written assuming narrower scopes of atomicity We make four contributions: (1) we show that a direct translation of lock-based critical sections into transactions can introduce deadlock into otherwise correct programs, (2) we introduce the terms strong atomicity and weak atomicity to describe the interaction of transactional and non-transactional code, (3) we show that code that is correct under weak atomicity can deadlock under strong atomicity, and (4) we demonstrate that sequentially composing transactional code can also introduce deadlocks These observations invalidate the intuition that transactions are strictly safer than lock-based critical sections, that strong atomicity is strictly safer than weak atomicity, and that transactions are always composable

Proceedings ArticleDOI
29 Mar 2006
TL;DR: This work proposes a novel hybrid hardware-software transactional memory scheme that approaches the performance of a hardware scheme when resources are not exhausted and gracefully falls back to a software scheme otherwise.
Abstract: High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they have a number of disadvantages, including possibly unnecessary serialization, and possible deadlock. Transactional memory is an alternative mechanism that makes parallel programming easier. With transactional memory, a transaction provides atomic and serializable operations on an arbitrary set of memory locations. When a transaction commits, all operations within the transaction become visible to other threads. When it aborts, all operations in the transaction are rolled back.Transactional memory can be implemented in either hardware or software. A straightforward hardware approach can have high performance, but imposes strict limits on the amount of data updated in each transaction. A software approach removes these limits, but incurs high overhead. We propose a novel hybrid hardware-software transactional memory scheme that approaches the performance of a hardware scheme when resources are not exhausted and gracefully falls back to a software scheme otherwise.

Proceedings ArticleDOI
20 Oct 2006
TL;DR: This paper extends the recently-proposed flat Log-based Transactional Memory (LogTM) with nested transactions and proposes escape actions that allow trusted code to run outside the confines of the transactional memory system.
Abstract: Nested transactional memory (TM) facilitates software composition by letting one module invoke another without either knowing whether the other uses transactions. Closed nested transactions extend isolation of an inner transaction until the toplevel transaction commits. Implementations may flatten nested transactions into the top-level one, resulting in a complete abort on conflict, or allow partial abort of inner transactions. Open nested transactions allow a committing inner transaction to immediately release isolation, which increases parallelism and expressiveness at the cost of both software and hardware complexity.This paper extends the recently-proposed flat Log-based Transactional Memory (LogTM) with nested transactions. Flat LogTM saves pre-transaction values in a log, detects conflicts with read (R) and write (W) bits per cache block, and, on abort, invokes a software handler to unroll the log. Nested LogTM supports nesting by segmenting the log into a stack of activation records and modestly replicating R/W bits. To facilitate composition with nontransactional code, such as language runtime and operating system services, we propose escape actions that allow trusted code to run outside the confines of the transactional memory system.

Journal ArticleDOI
01 May 2006
TL;DR: This paper introduces three key mechanisms: two-phase commit; support for software handlers on commit, violation, and abort; and full support for open- and closed-nested transactions with independent rollback, which provide a flexible interface to implement programming language and operating system functionality.
Abstract: Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional state buffering and conflict resolution. Missing is a robust hardware/software interface, not limited to simplistic instructions defining transaction boundaries. Without rich semantics, current TM systems cannot support basic features of modern programming languages and operating systems such as transparent library calls, conditional synchronization, system calls, I/O, and runtime exceptions. This paper presents a comprehensive instruction set architecture (ISA) for TM systems. Our proposal introduces three key mechanisms: two-phase commit; support for software handlers on commit, violation, and abort; and full support for open- and closed-nested transactions with independent rollback. These mechanisms provide a flexible interface to implement programming language and operating system functionality. We also show that these mechanisms are practical to implement at the ISA and microarchitecture level for various TM systems. Using an execution-driven simulation, we demonstrate both the functionality (e.g., I/O and conditional scheduling within transactions) and performance potential (2.2× improvement for SPECjbb2000) of the proposed mechanisms. Overall, this paper establishes a rich and efficient interface to foster both hardware and software research on transactional memory.

Proceedings ArticleDOI
09 Dec 2006
TL;DR: This paper is the first to propose architectural support for accelerating transactions executed entirely in software by adapting a high-performance STM algorithm supporting rich transactional semantics to the ISA extensions (called hardware accelerated software transactional memory or HASTM).
Abstract: Transactional memory provides a concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. Researchers have proposed several different implementations of transactional memory, broadly classified into software transactional memory (STM) and hardware transactional memory (HTM). Both approaches have their pros and cons: STMs provide rich and flexible transactional semantics on stock processors but incur significant overheads. HTMs, on the other hand, provide high performance but implement restricted semantics or add significant hardware complexity. This paper is the first to propose architectural support for accelerating transactions executed entirely in software. We propose instruction set architecture (ISA) extensions and novel hardware mechanisms that improve STM performance. We adapt a high-performance STM algorithm supporting rich transactional semantics to our ISA extensions (called hardware accelerated software transactional memory or HASTM). HASTM accelerates fully virtualized nested transactions, supports language integration, and provides both object-based and cache-line based conflict detection. We have implemented HASTM in an accurate multi-core IA32 simulator. Our simulation results show that (1) HASTM single-thread performance is comparable to a conventional HTM implementation; (2) HASTM scaling is comparable to a STM implementation; and (3) HASTM is resilient to spurious aborts and can scale better than HTM in a multi-core setting. Thus, HASTM provides the flexibility and rich semantics of STM, while giving the performance of HTM.

Journal ArticleDOI
11 Jun 2006
TL;DR: The implementation of the Atomos scheduler demonstrates the use of open nesting within the virtual machine and introduces the concept of transactional memory violation handlers that allow programs to recover from data dependency violations without rolling back.
Abstract: Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional waiting constructs with simpler transactional alternatives.The Atomos watch statement allows programmers to specify fine-grained watch sets used with the Atomos retry conditional waiting statement for efficient transactional conflict-driven wakeup even in transactional memory systems with a limited number of transactional contexts. Atomos supports open-nested transactions, which are necessary for building both scalable application programs and virtual machine implementations.The implementation of the Atomos scheduler demonstrates the use of open nesting within the virtual machine and introduces the concept of transactional memory violation handlers that allow programs to recover from data dependency violations without rolling back.Atomos programming examples are given to demonstrate the usefulness of transactional programming primitives. Atomos and Java are compared through the use of several benchmarks. The results demonstrate both the improvements in parallel programming ease and parallel program performance provided by Atomos.

01 Jan 2006
TL;DR: This work considers the design of low-overhead, obstruction-free software transactional memory for non-garbage-collected languages and eliminates dynamic allocation of transactional metadata and co-locates data that are separate in other systems, thereby reducing the expected number of cache misses on the common-case code path.
Abstract: Recent years have seen the development of several dierent systems for software transactional memory (STM). Most either employ locks in the underlying implementation or depend on thread-safe general-purpose garbage collection to collect stale data and metadata. We consider the design of low-overhead, obstruction-free software transactional memory for non-garbage-collected languages. Our design eliminates dynamic allocation of transactional metadata and co-locates data that are separate in other systems, thereby reducing the expected number of cache misses on the common-case code path, while preserving nonblocking progress and requiring no atomic instructions other than single-word load, store, and compare-and-swap (or load-linked/store-conditional). We also employ a simple, epoch-based storage management system and introduce a novel conservative mechanism to make reader transactions visible to writers without inducing additional metadata copying or dynamic allocation. Experimental results show throughput significantly higher than that of existing nonblocking STM systems, and highlight significant, application-specific dierences among conflict detection and validation strategies.


Journal ArticleDOI
TL;DR: This paper proposes the use of Versioned Boxes, which keep a history of values, as the basis for language-level memory transactions, which may improve significantly the concurrency on applications which have longer transactions and a high read/write ratio.

Proceedings ArticleDOI
11 Jan 2006
TL;DR: Pessimistic atomic sections are presented, a fresh approach that retains many of the advantages of optimistic atomic sections as seen in "transactional memory" without sacrificing performance or compatibility.
Abstract: The movement to multi-core processors increases the need for simpler, more robust parallel programming models. Atomic sections have been widely recognized for their ease of use. They are simpler and safer to use than manual locking and they increase modularity. But existing proposals have several practical problems, including high overhead and poor interaction with I/O. We present pessimistic atomic sections, a fresh approach that retains many of the advantages of optimistic atomic sections as seen in "transactional memory" without sacrificing performance or compatibility. Pessimistic atomic sections employ the locking mechanisms familiar to programmers while relieving them of most burdens of lock-based programming, including deadlocks. Significantly, pessimistic atomic sections separate correctness from performance: they allow programmers to extract more parallelism via finer-grained locking without fear of introducing bugs. We believe this property is crucial for exploiting multi-core processor designs.We describe a tool, Autolocker, that automatically converts pessimistic atomic sections into standard lock-based code. Autolocker relies extensively on program analysis to determine a correct locking policy free of deadlocks and race conditions. We evaluate the expressiveness of Autolocker by modifying a 50,000 line high-performance web server to use atomic sections while retaining the original locking policy. We analyze Autolocker's performance using microbenchmarks, where Autolocker outperforms software transactional memory by more than a factor of 3.

Journal ArticleDOI
TL;DR: A reference model for nested transactions at the level of memory accesses is offered, and possible hardware architecture designs that implement that model are sketched, both closed and open.


Proceedings ArticleDOI
29 Mar 2006
TL;DR: A transactional memory runtime system providing scaling and strong consistency on commodity clusters for both distributed scientific applications and database applications is investigated and shows near-linear scaling up to 8 transactional nodes for the most common e-commerce workload, the TPC-W shopping mix.
Abstract: We investigate a transactional memory runtime system providing scaling and strong consistency, i.e., 1-copy serializability on commodity clusters for both distributed scientific applications and database applications. We introduce a novel page-level distributed concurrency control algorithm, called Distributed Multiversioning (DMV). DMV automatically detects and resolves conflicts caused by data races for distributed transactions accessing shared in-memory data structures. DMV's key novelty is in exploiting the distributed data versions that naturally come about in a replicated cluster in order to avoid read-write conflicts, hence provide scaling. DMV runs conflicting read-only and update transactions in parallel on different replicas instead of using different physical data copies within a single node as in classic multiversioning. In its most general form, DMV can be used to implement a software transactional memory system on a cluster for scaling C++ applications. DMV supports highly multithreaded database applications as well by centralizing updates on a master replica and creating the required page versions for read-only transactions lazily, on a set of slave replicas. We also show that a version-aware scheduling technique can distribute the read-only transactions across the slaves in such a way to minimize version conflicts.In our evaluation, we use DMV as a lightweight approach to scaling a hash table microbenchmark workload and the industry-standard e-commerce workload of the TPC-W benchmark on a commodity cluster. Our measurements show scaling for both benchmarks. In particular, we show near-linear scaling up to 8 transactional nodes for the most common e-commerce workload, the TPC-W shopping mix. We further show that our scaling for the TPC-W e-commerce benchmark compares favorably with that of an existing coarse-grained asynchronous replication technique.

Proceedings ArticleDOI
20 Oct 2006
TL;DR: This paper introduces Page-based Transactional Memory to support unbounded transactions, and combines transaction bookkeeping with the virtual memory system to support fast transaction conflict detection, commit, abort, and to maintain transactions' speculative data.
Abstract: Exploiting thread level parallelism is paramount in the multicore era Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model Virtualized transactions (unbounded in space and time) are desirable, as they can increase the scope of transactions' use, and thereby further simplify a programmer's job However, hardware support is essential to support efficient execution of unbounded transactions In this paper, we introduce Page-based Transactional Memory to support unbounded transactions We combine transaction bookkeeping with the virtual memory system to support fast transaction conflict detection, commit, abort, and to maintain transactions' speculative data

01 Jan 2006
TL;DR: This work investigates the performance of a STM using snapshot isolation and a novel lazy multi-version snapshot algorithm to decrease the validation costs which can increase quadratically with the number of objects read in STMs with invisible reads.
Abstract: Software transactional memory (STM) has been proposed to simplify the development and to increase the scalability of concurrent programs. One problem of existing STMs is that of having long-running read transactions co-exist with shorter update transactions. This problem is of practical importance and has so far not been addressed by other papers in this domain. We approach this problem by investigating the performance of a STM using snapshot isolation and a novel lazy multi-version snapshot algorithm to decrease the validation costs which can increase quadratically with the number of objects read in STMs with invisible reads. Our measurements demonstrate that snapshot isolation can increase throughput for workloads with long transactions. In comparison to other STMs with invisible reads, we can reduce the validation costs by using our lazy consistent snapshot algorithm.

Proceedings ArticleDOI
10 Jun 2006
TL;DR: This paper is the first to integrate a software transactional memory system with a malloc/free based memory allocator and presents the first algorithm which ensures that space allocated in an aborted transaction is properly freed and does not lead to a space blowup.
Abstract: Emerging multi-core processors promise to provide an exponentially increasing number of hardware threads with every generation. Applications will need to be highly concurrent to fullyuse the power of these processors. To enable maximum concurrency, libraries (such as malloc-free packages) would therefore need to use non-blocking algorithms. But lock-free algorithms are notoriously difficult to reason about and inappropriate for average programmers. Transactional memory promises to significantly ease concurrent programming for the average programmer. This paper describes a highly efficient non-blocking malloc/free algorithm that supports memory allocation and deallocation inside transactional code blocks. Thus this paper describes a memory allocator that is suitable for emerging multi-core applications, while supporting modern concurrency constructs.This paper makes several novel contributions. It is the first to integrate a software transactional memory system with a malloc/free based memory allocator. We present the first algorithm which ensures that space allocated in an aborted transaction is properly freed and does not lead to a space blowup. Unlike previous lock-free malloc packages, our algorithm avoids atomic operations on typical code paths, making our algorithm substantially more efficient.

Book ChapterDOI
18 Sep 2006
TL;DR: The most comprehensive study to date of conflict detection strategies is presented, characterizing the tradeoffs among them and identifying the ones that perform the best for various types of workload and introducing a lightweight heuristic mechanism—the global commit counter— that can greatly reduce the cost of validation and of single-threaded execution.
Abstract: In a software transactional memory (STM) system, conflict detection is the problem of determining when two transactions cannot both safely commit. Validation is the related problem of ensuring that a transaction never views inconsistent data, which might potentially cause a doomed transaction to exhibit irreversible, externally visible side effects. Existing mechanisms for conflict detection vary greatly in their degree of speculation and their relative treatment of read-write and write-write conflicts. Validation, for its part, appears to be a dominant factor—perhaps the dominant factor—in the cost of complex transactions. We present the most comprehensive study to date of conflict detection strategies, characterizing the tradeoffs among them and identifying the ones that perform the best for various types of workload. In the process we introduce a lightweight heuristic mechanism—the global commit counter—that can greatly reduce the cost of validation and of single-threaded execution. The heuristic also allows us to experiment with mixed invalidation, a more opportunistic interleaving of reading and writing transactions. Experimental results on a 16-processor SunFire machine running our RSTM system indicate that the choice of conflict detection strategy can have a dramatic impact on performance, and that the best choice is workload dependent. In workloads whose transactions rarely conflict, the commit counter does little to help (and can even hurt) performance. For less scalable applications, however—those in which STM performance has traditionally been most problematic—it can improve transaction throughput many fold.

Proceedings ArticleDOI
29 Mar 2006
TL;DR: This paper presents new algorithms for runtime (dynamic) detection of violations of conflict-atomicity and view- Atomicity, which are analogous to conflict-serializability andView- serializability in database systems and are more efficient and accurate than previous algorithms with comparable asymptotic complexity.
Abstract: Atomicity is an important correctness condition for concurrent systems. Informally, atomicity is the property that every concurrent execution of a set of transactions is equivalent to some serial execution of the same transactions. In multi-threaded programs, executions of procedures (or methods) can be regarded as transactions. Correctness in the presence of concurrency often requires atomicity of these transactions. Tools that automatically detect atomicity violations can uncover subtle errors that are hard to find with traditional debugging and testing techniques.This paper presents new algorithms for runtime (dynamic) detection of violations of conflict-atomicity and view-atomicity, which are analogous to conflict-serializability and view-serializability in database systems. In these algorithms, the recorded events are formed into a graph with edges representing the synchronization within each transaction and possible interactions between transactions. We give conditions on the graph that imply conflict-atomicity and view-atomicity. Experiments show that these new algorithms are more efficient in most experiments and are more accurate than previous algorithms with comparable asymptotic complexity.

Proceedings ArticleDOI
20 Oct 2006
TL;DR: eXtended Transactional Memory is presented, the first TM virtualization system that virtualizes all aspects of transactional execution (time, space, and nesting depth), and it is demonstrated that despite being software-based, XTM and its enhancements are competitive with hardware-based alternatives.
Abstract: For transactional memory (TM) to achieve widespread acceptance, transactions should not be limited to the physical resources of any specific hardware implementation. TM systems should guarantee correct execution even when transactions exceed scheduling quanta, overflow the capacity of hardware caches and physical memory, or include more independent nesting levels than what is supported in hardware. Existing proposals for TM virtualization are either incomplete or rely on complex hardware implementations, which are an overkill if virtualization is invoked infrequently in the common case.We present eXtended Transactional Memory (XTM), the first TM virtualization system that virtualizes all aspects of transactional execution (time, space, and nesting depth). XTM is implemented in software using virtual memory support. It operates at page granularity, using private copies of overflowed pages to buffer memory updates until the transaction commits and snapshots of pages to detect interference between transactions. We also describe two enhancements to XTM that use limited hardware support to address key performance bottlenecks.We compare XTM to hardwarebased virtualization using both real applications and synthetic microbenchmarks. We show that despite being software-based, XTM and its enhancements are competitive with hardware-based alternatives. Overall, we demonstrate that XTM provides a complete, flexible, and low-cost mechanism for practical TM virtualization.