scispace - formally typeset
Search or ask a question

Showing papers on "Transactional memory published in 2010"


Proceedings ArticleDOI
09 Jan 2010
TL;DR: An ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusually clean semantics is presented, and the experience suggests that NOrec may be an ideal candidate for such a software system.
Abstract: Drawing inspiration from several previous projects, we present an ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusually clean semantics. While unlikely to scale to hundreds of active threads, this "NOrec" system offers many appealing features: very low fast-path latency--as low as any system we know of that admits concurrent updates; publication and privatization safety; livelock freedom; a small, constant amount of global metadata, and full compatibility with existing data structure layouts; no false conflicts due to hash collisions; compatibility with both managed and unmanaged languages, and both static and dynamic compilation; and easy acccommodation of closed nesting, inevitable (irrevocable) transactions, and starvation avoidance mechanisms. To the best of our knowledge, no extant STM system combines this set of features.While transactional memory for processors with hundreds of cores is likely to require hardware support, software implementations will be required for backward compatibility with current and near-future processors with 2--64 cores, as well as for fall-back in future machines when hardware resources are exhausted. Our experience suggests that NOrec may be an ideal candidate for such a software system. We also observe that it has considerable appeal for use within the operating system, and in systems that require both closed nesting and publication safety.

327 citations


Book
02 Jun 2010
TL;DR: This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early spring 2010.
Abstract: The advent of multicore processors has renewed interest in the idea of incorporating transactions into the programming model used to write parallel programs. This approach, known as transactional memory, offers an alternative, and hopefully better, way to coordinate concurrent threads. The ACI (atomicity, consistency, isolation) properties of transactions provide a foundation to ensure that concurrent reads and writes of shared data do not produce inconsistent or incorrect results. At a higher level, a computation wrapped in a transaction executes atomically - either it completes successfully and commits its result in its entirety or it aborts. In addition, isolation ensures the transaction produces the same result as if no other transactions were executing concurrently. Although transactions are not a parallel programming panacea, they shift much of the burden of synchronizing and coordinating parallel computations from a programmer to a compiler, to a language runtime system, or to hardware. The challenge for the system implementers is to build an efficient transactional memory infrastructure. This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early spring 2010. Table of Contents: Introduction / Basic Transactions / Building on Basic Transactions / Software Transactional Memory / Hardware-Supported Transactional Memory / Conclusions

309 citations


Journal ArticleDOI
TL;DR: This special issue on transactional memory introduces transactionalMemory as a concept, presents an overview of some of the most important approaches so far, and includes five articles that advances the state-of-the-art in transactionalmemory research.

305 citations


Proceedings ArticleDOI
09 Jan 2010
TL;DR: A user-study in which 237 undergraduate students in an operating systems course implement the same programs using coarse and fine-grain locks, monitors, and transactions is described and the number and types of programming errors the students made was much lower for transactions than for locks.
Abstract: Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. The promise of increased performance for all applications through ever more parallel hardware requires good tools for concurrent programming, especially for average programmers. Transactional memory (TM) has enjoyed recent interest as a tool that can help programmers program concurrently.The transactional memory (TM) research community is heavily invested in the claim that programming with transactional memory is easier than alternatives (like locks), but evidence for or against the veracity of this claim is scant. In this paper, we describe a user-study in which 237 undergraduate students in an operating systems course implement the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the students' code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.

158 citations


Journal ArticleDOI
TL;DR: The first time-based STM algorithm, the Lazy Snapshot Algorithm (LSA), is formally introduced and its semantics and the impact of its design parameters, notably multiversioning and dynamic snapshot extension are studied.
Abstract: Software transactional memory (STM) is a concurrency control mechanism that is widely considered to be easier to use by programmers than other mechanisms such as locking. The first generations of STMs have either relied on visible read designs, which simplify conflict detection while pessimistically ensuring a consistent view of shared data to the application, or optimistic invisible read designs that are significantly more efficient but require incremental validation to preserve consistency, at a cost that increases quadratically with the number of objects read in a transaction. Most of the recent designs now use a “time-based” (or “time stamp-based”) approach to still benefit from the performance advantage of invisible reads without incurring the quadratic overhead of incremental validation. In this paper, we give an overview of the time-based STM approach and discuss its benefits and limitations. We formally introduce the first time-based STM algorithm, the Lazy Snapshot Algorithm (LSA). We study its semantics and the impact of its design parameters, notably multiversioning and dynamic snapshot extension. We compare it against other classical designs and we demonstrate that its performance is highly competitive, both for obstruction-free and lock-based STM designs.

150 citations


Proceedings ArticleDOI
13 Apr 2010
TL;DR: Measurements on a wide range of benchmarks indicate that the overheads traditionally associated with software transactional memories can be significantly reduced with the help of ASF.
Abstract: AMD's Advanced Synchronization Facility (ASF) is an x86 instruction set extension proposal intended to simplify and speed up the synchronization of concurrent programs. In this paper, we report our experiences using ASF for implementing transactional memory. We have extended a C/C++ compiler to support language-level transactions and generate code that takes advantage of ASF. We use a software fallback mechanism for transactions that cannot be committed within ASF (e.g., because of hardware capacity limitations). Our evaluation uses a cycle-accurate x86 simulator that we have extended with ASF support. Building a complete ASF-based software stack allows us to evaluate the performance gains that a user-level program can obtain from ASF. Our measurements on a wide range of benchmarks indicate that the overheads traditionally associated with software transactional memories can be significantly reduced with the help of ASF.

135 citations


Book
24 Sep 2010
TL;DR: The aim of this book is to provide theoretical foundations for transactional memory, as well as answering precisely when a TM implementation is correct, what kind of properties it can ensure, what are the power and limitations of a TM, and what inherent trade-offs are involved in designing a TM algorithm.
Abstract: Transactional memory (TM) is an appealing paradigm for concurrent programming on shared memory architectures. With a TM, threads of an application communicate, and synchronize their actions, via in-memory transactions. Each transaction can perform any number of operations on shared data, and then either commit or abort. When the transaction commits, the effects of all its operations become immediately visible to other transactions; when it aborts, however, those effects are entirely discarded. Transactions are atomic: programmers get the illusion that every transaction executes all its operations instantaneously, at some single and unique point in time. Yet, a TM runs transactions concurrently to leverage the parallelism offered by modern processors. The aim of this book is to provide theoretical foundations for transactional memory. This includes defining a model of a TM, as well as answering precisely when a TM implementation is correct, what kind of properties it can ensure, what are the power and limitations of a TM, and what inherent trade-offs are involved in designing a TM algorithm. While the focus of this book is on the fundamental principles, its goal is to capture the common intuition behind the semantics of TMs and the properties of existing TM implementations.

119 citations


Proceedings ArticleDOI
25 Jul 2010
TL;DR: This paper studies inherent properties of STMs that use multiple versions to guarantee successful commits of all read-only transactions, and presents an STM algorithm using visible reads that efficiently garbage collects useless object versions.
Abstract: An effective way to reduce the number of aborts in software transactional memory (STM) is to keep multiple versions of transactional objects. In this paper, we study inherent properties of STMs that use multiple versions to guarantee successful commits of all read-only transactions. We first show that these STMs cannot be disjoint-access parallel. We then consider the problem of garbage collecting old object versions, and show that no STM can be optimal in the number of previous versions kept. Moreover, we show that garbage collecting useless versions is impossible in STMs that implement invisible reads. Finally, we present an STM algorithm using visible reads that efficiently garbage collects useless object versions.

117 citations


Patent
10 Nov 2010
TL;DR: In this paper, the authors present a method for selecting a first transaction execution mode to begin a transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes.
Abstract: In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.

99 citations


Proceedings ArticleDOI
17 Jan 2010
TL;DR: A generalization of transactional memory in which a transaction consists of coarse-grained data-type operations rather than simple memory read/write operations is defined, and how the semantics applies to numerous TM implementation details discussed widely in the literature is discussed.
Abstract: Traditional transactional memory systems suffer from overly conservative conflict detection, yielding so-called false conflicts, because they are based on fine-grained, low-level read/write conflicts. In response, the recent trend has been toward integrating various abstract data-type libraries using ad-hoc methods of high-level conflict detection. These proposals have led to improved performance but a lack of a unified theory has led to confusion in the literature.We clarify these recent proposals by defining a generalization of transactional memory in which a transaction consists of coarse-grained (abstract data-type) operations rather than simple memory read/write operations. We provide semantics for both pessimistic (e.g. transactional boosting) and optimistic (e.g. traditional TMs and recent alternatives) execution. We show that both are included in the standard atomic semantics, yet find that the choice imposes different requirements on the coarse-grained operations: pessimistic requires operations be left-movers, optimistic requires right-movers. Finally, we discuss how the semantics applies to numerous TM implementation details discussed widely in the literature.

77 citations


Proceedings ArticleDOI
09 Jan 2010
TL;DR: This work proposes, implements and evaluates several novel kernel-level scheduling support mechanisms for TM contention management, and introduces kernel- level TM scheduling support into both the Linux and Solaris kernels, believed to be the first to investigatekernel-level support forTM contention management.
Abstract: Transactional Memory (TM) is considered as one of the most promising paradigms for developing concurrent applications. TM has been shown to scale well on >multiple cores when the data access pattern behaves "well," i.e., when few conflicts are induced. In contrast, data patterns with frequent write sharing, with long transactions, or when many threads contend for a smaller number of cores, result in numerous conflicts. Until recently, TM implementations had little control of transactional threads, which remained under the supervision of the kernel's transaction-ignorant scheduler. Conflicts are thus traditionally resolved by consulting an STM-level contention manager. Consequently, the contention managers of these "conventional" TM implementations suffer from a lack of precision and often fail to ensure reasonable performance in high-contention workloads.Recently, scheduling-based TM contention-management has been proposed for increasing TM efficiency under high-contention [2, 5, 19]. However, only user-level schedulers have been considered. In this work, we propose, implement and evaluate several novel kernel-level scheduling support mechanisms for TM contention management. We also investigate different strategies for efficient communication between the kernel and the user-level TM library. To the best of our knowledge, our work is the first to investigate kernel-level support for TM contention management.We have introduced kernel-level TM scheduling support into both the Linux and Solaris kernels. Our experimental evaluation demonstrates that lightweight kernel-level scheduling support significantly reduces the number of aborts while improving transaction throughput on various workloads.

Proceedings ArticleDOI
02 Dec 2010
TL;DR: EigenBench, a lightweight yet powerful microbenchmark for fully evaluating a transactional memory system, is presented and it is shown that EigenBench is useful for thoroughly exploring the orthogonal space of TM application characteristics.
Abstract: There are a significant number of Transactional Memory(TM) proposals, varying in almost all aspects of the design space. Although several transactional benchmarks have been suggested, a simple, yet thorough, evaluation framework is still needed to completely characterize a TM system and allow for comparison among the various proposals. Unfortunately, TM system evaluation is difficult because the application characteristics which affect performance are often difficult to isolate from each other. We propose a set of orthogonal application characteristics that form a basis for transactional behavior and are useful in fully understanding the performance of a TM system. In this paper, we present EigenBench, a lightweight yet powerful microbenchmark for fully evaluating a transactional memory system. We show that EigenBench is useful for thoroughly exploring the orthogonal space of TM application characteristics. Because of its flexibility, our microbenchmark is also capable of reproducing a representative set of TM performance pathologies. In this paper, we use Eigenbench to evaluate two well-known TM systems and provide significant insight about their strengths and weaknesses. We also demonstrate how EigenBench can be used to mimic the evaluation coverage of a popular TM benchmark suite called STAMP.

Patent
27 May 2010
Abstract: Mechanisms are provided, in a data processing system having a processor and a transactional memory, for executing a transaction in the data processing system. These mechanisms execute a transaction comprising one or more instructions that modify at least a portion of the transactional memory. The transaction is suspended in response to a transaction suspend instruction being executed by the processor. A suspended block of code is executed in a non-transactional manner while the transaction is suspended. A determination is made as to whether an interrupt occurs while the transaction is suspended. In response to an interrupt occurring while the transaction is suspended, a transaction abort operation is delayed until after the transaction suspension is discontinued.

Proceedings ArticleDOI
04 Dec 2010
TL;DR: An out-of-order hardware design to implement ASF on a future AMD processor is developed and the experimental results show that the combined use of the L1 cache and the LS unit is very helpful for the performance robustness of ASF-based lock free data structures, and that the selective use of speculative accesses enables transactional programs to scale with limited ASF hardware resources.
Abstract: Advanced Synchronization Facility (ASF) is an AMD64 hardware extension for lock-free data structures and transactional memory. It provides a speculative region that atomically executes speculative accesses in the region. Five new instructions are added to demarcate the region, use speculative accesses selectively, and control the speculative hardware context. Programmers can use speculative regions to build flexible multi-word atomic primitives with no additional software support by relying on the minimum guarantee of available ASF hardware resources for lock-free programming. Transactional programs with high-level TM language constructs can either be compiled directly to the ASF code or be linked to software TM systems that use ASF to accelerate transactional execution. In this paper we develop an out-of-order hardware design to implement ASF on a future AMD processor and evaluate it with an in-house simulator. The experimental results show that the combined use of the L1 cache and the LS unit is very helpful for the performance robustness of ASF-based lock free data structures, and that the selective use of speculative accesses enables transactional programs to scale with limited ASF hardware resources.

Journal ArticleDOI
TL;DR: This paper identifies where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and points several open research problems whose solution is deemed as essential to materialize the Cloud-TM vision.
Abstract: One of the main challenges to harness the potential of Cloud computing is the design of programming models that simplify the development of large-scale parallel applications and that allow ordinary programmers to take full advantage of the computing power and the storage provided by the Cloud, both of which made available, on demand, in a pay-only-forwhat-you-use pricing model.In this paper, we discuss the use of the Transactional Memory programming model in the context of the cloud computing paradigm, which we refer to as Cloud-TM. We identify where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and we point several open research problems whose solution we deem as essential to materialize the Cloud-TM vision.

Proceedings ArticleDOI
28 Jan 2010
TL;DR: This article presents an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results and captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases.
Abstract: Nowadays the 2-Phase-Locking (2PL) concurrency control algorithm still plays a core rule in the construction of transactional systems (e.g. database systems and transactional memories). Hence, any technique allowing accurate analysis and prediction of the performance of 2PL based systems can be of wide interest and applicability. In this article we present an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results. In particular our model captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases. Also, our model provides significantly more accurate performance predictions in heavy contention scenarios, where the number of transactions enqueued due to conflicting lock requests is expected to be non-minimal. The accuracy of our model has been verified against simulation results based on both synthetic data access patterns and patterns derived from the TPC-C benchmark.

Proceedings ArticleDOI
24 Apr 2010
TL;DR: An efficient implementation of commit-time invalidation, a strategy where transactions resolve their conflicts with in-flight (uncommitted) transactions before they commit, which is up to 3 x faster than TL2, a state-of-the-art validating STM.
Abstract: To improve the performance of transactional memory (TM), researchers have found many eager and lazy optimizations for conflict detection, the process of determining if transactions can commit. Despite these optimizations, nearly all TMs perform one aspect of lazy conflict detection in the same manner to preserve serializability. That is, they perform commit-time validation, where a transaction is checked for conflicts with previously committed transactions during its commit phase. While commit-time validation is efficient for workloads that exhibit limited contention, it can limit transaction throughput for contending workloads.This paper presents an efficient implementation of commit-time invalidation, a strategy where transactions resolve their conflicts with in-flight (uncommitted) transactions before they commit. Commit-time invalidation supplies the contention manager (CM) with data that is unavailable through commit-time validation, allowing the CM to make decisions that increase transaction throughput. Commit-time invalidation also requires notably fewer operations than commit-time validation for memory-intensive transactions, uses zero commit-time operations for dynamically detected read-only transactions, and guarantees full opacity for any transaction in O(N) time, an improvement over incremental validation's O(N2) time. Our experimental results show that for contending workloads, our efficient commit-time invalidating software TM (STM) is up to 3 x faster than TL2, a state-of-the-art validating STM.

Proceedings ArticleDOI
25 Jul 2010
TL;DR: Transactional predication, a method for building transactional maps and sets on top of an underlying non-composable concurrent map, is introduced, and an experimental evaluation shows that predication has better performance than existing transactional collection algorithms across a range of workloads.
Abstract: Concurrent collection classes are widely used in multi-threaded programming, but they provide atomicity only for a fixed set of operations. Software transactional memory (STM) provides a convenient and powerful programming model for composing atomic operations, but concurrent collection algorithms that allow their operations to be composed using STM are significantly slower than their non-composable alternatives. We introduce transactional predication, a method for building transactional maps and sets on top of an underlying non-composable concurrent map. We factor the work of most collection operations into two parts: a portion that does not need atomicity or isolation, and a single transactional memory access. The result approximates semantic conflict detection using the STM's structural conflict detection mechanism. The separation also allows extra optimizations when the collection is used outside a transaction. We perform an experimental evaluation that shows that predication has better performance than existing transactional collection algorithms across a range of workloads.

Patent
11 Jun 2010
TL;DR: In this article, a processing core of a plurality of processing cores is configured to execute a speculative region of code as a single atomic memory transaction with respect one or more others of the plurality of processors.
Abstract: A processing core of a plurality of processing cores is configured to execute a speculative region of code as a single atomic memory transaction with respect one or more others of the plurality of processing cores. In response to determining an abort condition for an issued one of the plurality of program instructions and in response to determining that the issued program instruction is not part of a mispredicted execution path, the processing core is configured to abort an attempt to execute the speculative region of code.

Book ChapterDOI
31 Aug 2010
TL;DR: A novel program logic that uses invariants on history traces to reason about optimistic concurrency algorithms and verifies Michael's non-blocking stack algorithm and shows that the intuition behind such algorithm can be naturally captured using trace invariants.
Abstract: Optimistic concurrency algorithms provide good performance for parallel programs but they are extremely hard to reason about. Program logics such as concurrent separation logic and rely-guarantee reasoning can be used to verify these algorithms, but they make heavy uses of history variables which may obscure the high-level intuition underlying the design of these algorithms. In this paper, we propose a novel program logic that uses invariants on history traces to reason about optimistic concurrency algorithms. We use past tense temporal operators in our assertions to specify execution histories. Our logic supports modular program specifications with history information by providing separation over both space (program states) and time. We verify Michael's non-blocking stack algorithm and show that the intuition behind such algorithm can be naturally captured using trace invariants.

Proceedings ArticleDOI
04 Dec 2010
TL;DR: DynTM (Dynamically Adaptable HTM) is presented, the first fully-flexible HTM system that permits the simultaneous execution of transactions using complementary version and conflict management strategies and obtains an average speedup of 34% over HTM systems that employ fixed version and Conflict management policies.
Abstract: Most Hardware Transactional Memory (HTM) implementations choose fixed version and conflict management policies at design time. While eager HTM systems store transactional state in-place in memory and resolve conflicts when they are produced, lazy HTM systems buffer the transactional state in specialized hardware and defer the resolution of conflicts until commit time. Each scheme has its strengths and weaknesses, but, unfortunately, both approaches are too inflexible in the way they manage data versioning and transactional contention. Thus, fixed HTM systems may result in a significant performance opportunity loss when they execute complex transactional applications. In this paper, we present DynTM (Dynamically Adaptable HTM), the first fully-flexible HTM system that permits the simultaneous execution of transactions using complementary version and conflict management strategies. In the heart of DynTM is a novel coherence protocol that allows tracking conflicts among eager and lazy transactions. Both the eager and the lazy execution modes of DynTM exhibit very high performance compared to modern HTM systems. For example, the DynTM lazy execution mode implements local commits to improve on previous proposals. In addition, lazy transactions share the majority of hardware support with eager transactions, reducing substantially the hardware cost compared to other lazy HTM systems. By utilizing a simple predictor to decide the best execution mode for each transaction at runtime, DynTM obtains an average speedup of 34% over HTM systems that employ fixed version and conflict management policies.

Journal Article
TL;DR: Dynamically Adaptive HTM (DynTM) as mentioned in this paper is the first fully flexible HTM system that permits the simultaneous execution of transactions using complementary version and conflict management strategies.
Abstract: Most Hardware Transactional Memory (HTM) implementations choose fixed version and conflict management policies at design time. While eager HTM systems store transactional state in-place in memory and resolve conflicts when they are produced, lazy HTM systems buffer the transactional state in specialized hardware and defer the resolution of conflicts until commit time. Each scheme has its strengths and weaknesses, but, unfortunately, both approaches are too inflexible in the way they manage data versioning and transactional contention. Thus, fixed HTM systems may result in a significant performance opportunity loss when they execute complex transactional applications. In this paper, we present DynTM (Dynamically Adaptable HTM), the first fully-flexible HTM system that permits the simultaneous execution of transactions using complementary version and conflict management strategies. In the heart of DynTM is a novel coherence protocol that allows tracking conflicts among eager and lazy transactions. Both the eager and the lazy execution modes of DynTM exhibit very high performance compared to modern HTM systems. For example, the DynTM lazy execution mode implements local commits to improve on previous proposals. In addition, lazy transactions share the majority of hardware support with eager transactions, reducing substantially the hardware cost compared to other lazy HTM systems. By utilizing a simple predictor to decide the best execution mode for each transaction at runtime, DynTM obtains an average speedup of 34% over HTM systems that employ fixed version and conflict management policies.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper uses Sun's prototype multicore chip, code-named Rock, to experiment with HTM algorithms, and discusses ways in which its limitations prevent better results, or would prevent production use of algorithms even if they are successful.
Abstract: We explore the potential of hardware transactional memory (HTM) to improve concurrent algorithms. We illustrate a number of use cases in which HTM enables significantly simpler code to achieve similar or better performance than existing algorithms for conventional architectures. We use Sun's prototype multicore chip, code-named Rock, to experiment with these algorithms, and discuss ways in which its limitations prevent better results, or would prevent production use of algorithms even if they are successful. Our use cases include concurrent data structures such as double ended queues, work stealing queues and scalable non-zero indicators, as well as a scalable malloc implementation and a simulated annealing application. We believe that our paper makes a compelling case that HTM has substantial potential to make effective concurrent programming easier, and that we have made valuable contributions in guiding designers of future HTM features to exploit this potential.

Patent
31 Mar 2010
TL;DR: The transactional memory system described in this paper implements parallel co-transactions that access a shared memory such that at most one of the transactions in a set will succeed and all others will fail.
Abstract: The transactional memory system described herein may implement parallel co-transactions that access a shared memory such that at most one of the co-transactions in a set will succeed and all others will fail (eg, be aborted) Co-transactions may improve the performance of programs that use transactional memory by attempting to perform the same high-level operation using multiple algorithmic approaches, transactional memory implementations and/or speculation options in parallel, and allowing only the first to complete to commit its results If none of the co-transactions succeeds, one or more may be retried, possibly using a different approach and/or transactional memory implementation The at-most-one property may be managed through the use of a shared “done” flag Conflicts between co-transactions in a set and accesses made by transactions or activities outside the set may be managed using lazy write ownership acquisition and/or a priority-based approach Each co-transaction may execute on a different processor resource

Patent
26 Oct 2010
TL;DR: In this article, the authors propose an alert-on-update mechanism for fast software-controlled conflict detection and programmable data isolation, allowing potentially conflicting readers and writers to proceed concurrently under software control.
Abstract: In a transactional memory technique, hardware serves simply to optimize the performance of transactions that are controlled fundamentally by software. The hardware support reduces the overhead of common TM tasks—conflict detection, validation, and data isolation—for common-case bounded transactions. Software control preserves policy flexibility and supports transactions unbounded in space and in time. The hardware includes 1) an alert-on-update mechanism for fast software-controlled conflict detection; and 2) programmable data isolation, allowing potentially conflicting readers and writers to proceed concurrently under software control.

Patent
31 Mar 2010
TL;DR: In this article, locality information (as reflected by the value of a respective locale guard) associated with each of a plurality of data partitions (locales) in a shared memory to elide various operations in transactional read/write fences when transactions access data in locales owned by their threads.
Abstract: The system and methods described herein may reduce read/write fence latencies and cache pressure related to STM metadata accesses. These techniques may leverage locality information (as reflected by the value of a respective locale guard) associated with each of a plurality of data partitions (locales) in a shared memory to elide various operations in transactional read/write fences when transactions access data in locales owned by their threads. The locale state may be disabled, free, exclusive, or shared. For a given memory access operation of an atomic transaction targeting an object in the shared memory, the system may implement the memory access operation using a contention mediation mechanism selected based on the value of the locale guard associated with the locale in which the target object resides. For example, a traditional read/write fence may be employed in some memory access operations, while other access operations may employ an optimized read/write fence.

Proceedings ArticleDOI
Hanjun Kim1, Arun Raman1, Feng Liu1, Jae W. Lee1, David I. August1 
04 Dec 2010
TL;DR: Distributed Software Multi-threaded Transactional Memory (DSMTX) as discussed by the authors is a runtime system for non-shared memory clusters, allowing them to efficiently address inter-node communication costs.
Abstract: While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In particular, high inter-node communication cost and lack of globally shared memory appear to make clusters suitable only for server applications with abundant task-level parallelism and scientific applications with regular and independent units of work. Clever use of pipeline parallelism (DSWP), thread-level speculation (TLS), and speculative pipeline parallelism (Spec-DSWP) can mitigate the costs of inter-thread communication on shared memory multicore machines. This paper presents Distributed Software Multi-threaded Transactional memory (DSMTX), a runtime system which makes these techniques applicable to non-shared memory clusters, allowing them to efficiently address inter-node communication costs. Initial results suggest that DSMTX enables efficient cluster execution of a wider set of application types. For 11 sequential C programs parallelized for a 4-core 32-node (128 total core) cluster without shared memory, DSMTX achieves a geomean speedup of 49x. This compares favorably to the 15x speedup achieved by our implementation of TLS-only support for clusters.

Proceedings ArticleDOI
13 Apr 2010
TL;DR: It is shown that the STM provides not only ease of programming, but also better performance than that achievable with state-of-the-art lock-based programming, for this realistic high impact application.
Abstract: In this paper, we study parallelization of multiplayer games using software Transactional Memory (STM) support. We show that the STM provides not only ease of programming, but also better performance than that achievable with state-of-the-art lock-based programming, for this realistic high impact application.For this purpose, we use a game benchmark, SynQuake, that extracts the main data structures and the essential features of the popular game Quake. SynQuake can be driven with a synthetic workload generator that flexibly emulates client game actions and various hot-spot scenarios in the game world. We implement, evaluate and compare the STM version of SynQuake with a state-of-the-art lock-based parallelization of Quake, which we ported to SynQuake. While in STM-SynQuake support for maintaining the consistency of each complex game action is automatic, conservative locking of surrounding objects within a bounding box, for the duration of the game action is inherently needed in lock-based SynQuake. This leads to higher scalability of STM-SynQuake versus lock-based SynQuake, due to a higher degree of false sharing in the latter. Task assignment to threads has a second-order effect on the scalability of STM-SynQuake, due to its impact on the application's true sharing patterns. We show that a dynamic locality-aware task assignment to threads provides the best trade-off between load balancing and conflict reduction.

Proceedings ArticleDOI
02 May 2010
TL;DR: Two STMs for graphics processors are designed and implemented, one blocking and one non-blocking, and experimental results comparing the performance of the two STMs are described and explained.
Abstract: The introduction of general purpose computing on many-core graphics processor systems, and the general shift in the industry towards parallelism, has created a demand for ease of parallelization. Software transactional memory (STM) simplifies development of concurrent code by allowing the programmer to mark sections of code to be executed concurrently and atomically in an optimistic manner. In contrast to locks, STMs are easy to compose and do not suffer from deadlocks. We have designed and implemented two STMs for graphics processors, one blocking and one non-blocking. The design issues involved in the designing of these two STMs are described and explained in the paper together with experimental results comparing the performance of the two STMs.

Journal ArticleDOI
TL;DR: An adaptive locking technique that dynamically observes whether a critical section would be best executed transactionally or while holding a mutex lock is proposed, finding adaptive locks to consistently match or outperform the better of the two component mechanisms (mutexes or transactions).