Showing papers on "Transactional memory published in 2010"

PDF

Open Access

Proceedings Article•DOI•

NOrec: streamlining STM by abolishing ownership records

[...]

Luke Dalessandro¹, Michael Spear², Michael L. Scott¹•Institutions (2)

University of Rochester¹, Lehigh University²

09 Jan 2010

TL;DR: An ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusually clean semantics is presented, and the experience suggests that NOrec may be an ideal candidate for such a software system.

...read moreread less

Abstract: Drawing inspiration from several previous projects, we present an ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusually clean semantics. While unlikely to scale to hundreds of active threads, this "NOrec" system offers many appealing features: very low fast-path latency--as low as any system we know of that admits concurrent updates; publication and privatization safety; livelock freedom; a small, constant amount of global metadata, and full compatibility with existing data structure layouts; no false conflicts due to hash collisions; compatibility with both managed and unmanaged languages, and both static and dynamic compilation; and easy acccommodation of closed nesting, inevitable (irrevocable) transactions, and starvation avoidance mechanisms. To the best of our knowledge, no extant STM system combines this set of features.While transactional memory for processors with hundreds of cores is likely to require hardware support, software implementations will be required for backward compatibility with current and near-future processors with 2--64 cores, as well as for fall-back in future machines when hardware resources are exhausted. Our experience suggests that NOrec may be an ideal candidate for such a software system. We also observe that it has considerable appeal for use within the operating system, and in systems that require both closed nesting and publication safety.

...read moreread less

327 citations

Book•

Transactional Memory, 2nd Edition

[...]

Tim Harris, James R. Larus, Ravi Rajwar

02 Jun 2010

TL;DR: This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early spring 2010.

...read moreread less

Abstract: The advent of multicore processors has renewed interest in the idea of incorporating transactions into the programming model used to write parallel programs. This approach, known as transactional memory, offers an alternative, and hopefully better, way to coordinate concurrent threads. The ACI (atomicity, consistency, isolation) properties of transactions provide a foundation to ensure that concurrent reads and writes of shared data do not produce inconsistent or incorrect results. At a higher level, a computation wrapped in a transaction executes atomically - either it completes successfully and commits its result in its entirety or it aborts. In addition, isolation ensures the transaction produces the same result as if no other transactions were executing concurrently. Although transactions are not a parallel programming panacea, they shift much of the burden of synchronizing and coordinating parallel computations from a programmer to a compiler, to a language runtime system, or to hardware. The challenge for the system implementers is to build an efficient transactional memory infrastructure. This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early spring 2010. Table of Contents: Introduction / Basic Transactions / Building on Basic Transactions / Software Transactional Memory / Hardware-Supported Transactional Memory / Conclusions

...read moreread less

309 citations

Journal Article•DOI•

Transactional memory

[...]

Håkan Grahn

01 Oct 2010-Journal of Parallel and Distributed Computing

TL;DR: This special issue on transactional memory introduces transactionalMemory as a concept, presents an overview of some of the most important approaches so far, and includes five articles that advances the state-of-the-art in transactionalmemory research.

...read moreread less

305 citations

Proceedings Article•DOI•

Is transactional programming actually easier

[...]

Christopher J. Rossbach¹, Owen S. Hofmann¹, Emmett Witchel¹•Institutions (1)

University of Texas at Austin¹

09 Jan 2010

TL;DR: A user-study in which 237 undergraduate students in an operating systems course implement the same programs using coarse and fine-grain locks, monitors, and transactions is described and the number and types of programming errors the students made was much lower for transactions than for locks.

...read moreread less

Abstract: Chip multi-processors (CMPs) have become ubiquitous, while tools that ease concurrent programming have not. The promise of increased performance for all applications through ever more parallel hardware requires good tools for concurrent programming, especially for average programmers. Transactional memory (TM) has enjoyed recent interest as a tool that can help programmers program concurrently.The transactional memory (TM) research community is heavily invested in the claim that programming with transactional memory is easier than alternatives (like locks), but evidence for or against the veracity of this claim is scant. In this paper, we describe a user-study in which 237 undergraduate students in an operating systems course implement the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the students' code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.

...read moreread less

158 citations

Journal Article•DOI•

Time-Based Software Transactional Memory

[...]

Pascal Felber¹, Christof Fetzer², Patrick Marlier¹, Torvald Riegel²•Institutions (2)

University of Neuchâtel¹, Dresden University of Technology²

01 Dec 2010-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The first time-based STM algorithm, the Lazy Snapshot Algorithm (LSA), is formally introduced and its semantics and the impact of its design parameters, notably multiversioning and dynamic snapshot extension are studied.

...read moreread less

Abstract: Software transactional memory (STM) is a concurrency control mechanism that is widely considered to be easier to use by programmers than other mechanisms such as locking. The first generations of STMs have either relied on visible read designs, which simplify conflict detection while pessimistically ensuring a consistent view of shared data to the application, or optimistic invisible read designs that are significantly more efficient but require incremental validation to preserve consistency, at a cost that increases quadratically with the number of objects read in a transaction. Most of the recent designs now use a “time-based” (or “time stamp-based”) approach to still benefit from the performance advantage of invisible reads without incurring the quadratic overhead of incremental validation. In this paper, we give an overview of the time-based STM approach and discuss its benefits and limitations. We formally introduce the first time-based STM algorithm, the Lazy Snapshot Algorithm (LSA). We study its semantics and the impact of its design parameters, notably multiversioning and dynamic snapshot extension. We compare it against other classical designs and we demonstrate that its performance is highly competitive, both for obstruction-free and lock-based STM designs.

...read moreread less

150 citations

Proceedings Article•DOI•

Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack

[...]

Dave Christie¹, Jaewoong Chung¹, Stephan Diestelhorst¹, Michael P. Hohmuth¹, Martin T. Pohlack¹, Christof Fetzer², Martin Nowack², Torvald Riegel², Pascal Felber³, Patrick Marlier³, Etienne Rivière³ - Show less +7 more•Institutions (3)

Advanced Micro Devices¹, Dresden University of Technology², University of Neuchâtel³

13 Apr 2010

TL;DR: Measurements on a wide range of benchmarks indicate that the overheads traditionally associated with software transactional memories can be significantly reduced with the help of ASF.

...read moreread less

Abstract: AMD's Advanced Synchronization Facility (ASF) is an x86 instruction set extension proposal intended to simplify and speed up the synchronization of concurrent programs. In this paper, we report our experiences using ASF for implementing transactional memory. We have extended a C/C++ compiler to support language-level transactions and generate code that takes advantage of ASF. We use a software fallback mechanism for transactions that cannot be committed within ASF (e.g., because of hardware capacity limitations). Our evaluation uses a cycle-accurate x86 simulator that we have extended with ASF support. Building a complete ASF-based software stack allows us to evaluate the performance gains that a user-level program can obtain from ASF. Our measurements on a wide range of benchmarks indicate that the overheads traditionally associated with software transactional memories can be significantly reduced with the help of ASF.

...read moreread less

135 citations

Book•

Principles of Transactional Memory

[...]

Rachid Guerraoui¹, Michal Kapalka¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

24 Sep 2010

TL;DR: The aim of this book is to provide theoretical foundations for transactional memory, as well as answering precisely when a TM implementation is correct, what kind of properties it can ensure, what are the power and limitations of a TM, and what inherent trade-offs are involved in designing a TM algorithm.

...read moreread less

Abstract: Transactional memory (TM) is an appealing paradigm for concurrent programming on shared memory architectures. With a TM, threads of an application communicate, and synchronize their actions, via in-memory transactions. Each transaction can perform any number of operations on shared data, and then either commit or abort. When the transaction commits, the effects of all its operations become immediately visible to other transactions; when it aborts, however, those effects are entirely discarded. Transactions are atomic: programmers get the illusion that every transaction executes all its operations instantaneously, at some single and unique point in time. Yet, a TM runs transactions concurrently to leverage the parallelism offered by modern processors. The aim of this book is to provide theoretical foundations for transactional memory. This includes defining a model of a TM, as well as answering precisely when a TM implementation is correct, what kind of properties it can ensure, what are the power and limitations of a TM, and what inherent trade-offs are involved in designing a TM algorithm. While the focus of this book is on the fundamental principles, its goal is to capture the common intuition behind the semantics of TMs and the properties of existing TM implementations.

...read moreread less

119 citations

Proceedings Article•DOI•

On maintaining multiple versions in STM

[...]

Dmitri Perelman¹, Rui Fan¹, Idit Keidar¹•Institutions (1)

Technion – Israel Institute of Technology¹

25 Jul 2010

TL;DR: This paper studies inherent properties of STMs that use multiple versions to guarantee successful commits of all read-only transactions, and presents an STM algorithm using visible reads that efficiently garbage collects useless object versions.

...read moreread less

Abstract: An effective way to reduce the number of aborts in software transactional memory (STM) is to keep multiple versions of transactional objects. In this paper, we study inherent properties of STMs that use multiple versions to guarantee successful commits of all read-only transactions. We first show that these STMs cannot be disjoint-access parallel. We then consider the problem of garbage collecting old object versions, and show that no STM can be optimal in the number of previous versions kept. Moreover, we show that garbage collecting useless versions is impossible in STMs that implement invisible reads. Finally, we present an STM algorithm using visible reads that efficiently garbage collects useless object versions.

...read moreread less

117 citations

Patent•

Performing Mode Switching In An Unbounded Transactional Memory (UTM) System

[...]

Jan Gray¹, Martin Taillefer¹, Yossi Levanoni¹, Ali-Reza Adl-Tabatabai¹, Dave Detlefs¹, Vinod Grover¹, Mike Magruder¹, Matt Tolton¹, Bratin Saha¹, Gad Sheaffer¹, Vadim Bassin¹ - Show less +7 more•Institutions (1)

Intel¹

10 Nov 2010

TL;DR: In this paper, the authors present a method for selecting a first transaction execution mode to begin a transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes.

...read moreread less

Abstract: In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.

...read moreread less

99 citations

Proceedings Article•DOI•

Coarse-grained transactions

[...]

Eric Koskinen¹, Matthew Parkinson¹, Maurice Herlihy²•Institutions (2)

University of Cambridge¹, Brown University²

17 Jan 2010

TL;DR: A generalization of transactional memory in which a transaction consists of coarse-grained data-type operations rather than simple memory read/write operations is defined, and how the semantics applies to numerous TM implementation details discussed widely in the literature is discussed.

...read moreread less

Abstract: Traditional transactional memory systems suffer from overly conservative conflict detection, yielding so-called false conflicts, because they are based on fine-grained, low-level read/write conflicts. In response, the recent trend has been toward integrating various abstract data-type libraries using ad-hoc methods of high-level conflict detection. These proposals have led to improved performance but a lack of a unified theory has led to confusion in the literature.We clarify these recent proposals by defining a generalization of transactional memory in which a transaction consists of coarse-grained (abstract data-type) operations rather than simple memory read/write operations. We provide semantics for both pessimistic (e.g. transactional boosting) and optimistic (e.g. traditional TMs and recent alternatives) execution. We show that both are included in the standard atomic semantics, yet find that the choice imposes different requirements on the coarse-grained operations: pessimistic requires operations be left-movers, optimistic requires right-movers. Finally, we discuss how the semantics applies to numerous TM implementation details discussed widely in the literature.

...read moreread less

77 citations

Proceedings Article•DOI•

Scheduling support for transactional memory contention management

[...]

Walther Maldonado¹, Patrick Marlier¹, Pascal Felber¹, Adi Suissa², Danny Hendler², Alexandra Fedorova³, Julia Lawall⁴, Gilles Muller⁵ - Show less +4 more•Institutions (5)

University of Neuchâtel¹, Ben-Gurion University of the Negev², Simon Fraser University³, University of Copenhagen⁴, French Institute for Research in Computer Science and Automation⁵

09 Jan 2010

TL;DR: This work proposes, implements and evaluates several novel kernel-level scheduling support mechanisms for TM contention management, and introduces kernel- level TM scheduling support into both the Linux and Solaris kernels, believed to be the first to investigatekernel-level support forTM contention management.

...read moreread less

Abstract: Transactional Memory (TM) is considered as one of the most promising paradigms for developing concurrent applications. TM has been shown to scale well on >multiple cores when the data access pattern behaves "well," i.e., when few conflicts are induced. In contrast, data patterns with frequent write sharing, with long transactions, or when many threads contend for a smaller number of cores, result in numerous conflicts. Until recently, TM implementations had little control of transactional threads, which remained under the supervision of the kernel's transaction-ignorant scheduler. Conflicts are thus traditionally resolved by consulting an STM-level contention manager. Consequently, the contention managers of these "conventional" TM implementations suffer from a lack of precision and often fail to ensure reasonable performance in high-contention workloads.Recently, scheduling-based TM contention-management has been proposed for increasing TM efficiency under high-contention [2, 5, 19]. However, only user-level schedulers have been considered. In this work, we propose, implement and evaluate several novel kernel-level scheduling support mechanisms for TM contention management. We also investigate different strategies for efficient communication between the kernel and the user-level TM library. To the best of our knowledge, our work is the first to investigate kernel-level support for TM contention management.We have introduced kernel-level TM scheduling support into both the Linux and Solaris kernels. Our experimental evaluation demonstrates that lightweight kernel-level scheduling support significantly reduces the number of aborts while improving transaction throughput on various workloads.

...read moreread less

Proceedings Article•DOI•

Eigenbench: A simple exploration tool for orthogonal TM characteristics

[...]

Sungpack Hong¹, Tayo Oguntebi¹, Jared Casper¹, Nathan Bronson¹, Christos Kozyrakis¹, Kunle Olukotun¹ - Show less +2 more•Institutions (1)

Stanford University¹

02 Dec 2010

TL;DR: EigenBench, a lightweight yet powerful microbenchmark for fully evaluating a transactional memory system, is presented and it is shown that EigenBench is useful for thoroughly exploring the orthogonal space of TM application characteristics.

...read moreread less

Abstract: There are a significant number of Transactional Memory(TM) proposals, varying in almost all aspects of the design space. Although several transactional benchmarks have been suggested, a simple, yet thorough, evaluation framework is still needed to completely characterize a TM system and allow for comparison among the various proposals. Unfortunately, TM system evaluation is difficult because the application characteristics which affect performance are often difficult to isolate from each other. We propose a set of orthogonal application characteristics that form a basis for transactional behavior and are useful in fully understanding the performance of a TM system. In this paper, we present EigenBench, a lightweight yet powerful microbenchmark for fully evaluating a transactional memory system. We show that EigenBench is useful for thoroughly exploring the orthogonal space of TM application characteristics. Because of its flexibility, our microbenchmark is also capable of reproducing a representative set of TM performance pathologies. In this paper, we use Eigenbench to evaluate two well-known TM systems and provide significant insight about their strengths and weaknesses. We also demonstrate how EigenBench can be used to mimic the evaluation coverage of a popular TM benchmark suite called STAMP.

...read moreread less

Patent•

Transactional Memory System Supporting Unbroken Suspended Execution

[...]

Harold W. Cain¹, Bradly G. Frey¹, Benjamin Herrenschmidt¹, Hung Q. Le¹, Cathy May¹, Maged M. Michael¹, José E. Moreira¹, Priya Nagpurkar¹, Naresh Nayar¹, Randal C. Swanberg¹ - Show less +6 more•Institutions (1)

IBM¹

27 May 2010

Abstract: Mechanisms are provided, in a data processing system having a processor and a transactional memory, for executing a transaction in the data processing system. These mechanisms execute a transaction comprising one or more instructions that modify at least a portion of the transactional memory. The transaction is suspended in response to a transaction suspend instruction being executed by the processor. A suspended block of code is executed in a non-transactional manner while the transaction is suspended. A determination is made as to whether an interrupt occurs while the transaction is suspended. In response to an interrupt occurring while the transaction is suspended, a transaction abort operation is delayed until after the transaction suspension is discontinued.

...read moreread less

Proceedings Article•DOI•

ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory

[...]

Jaewoong Chung¹, Luke Yen¹, Stephan Diestelhorst¹, Martin T. Pohlack¹, Michael P. Hohmuth¹, David S. Christie¹, Dan Grossman² - Show less +3 more•Institutions (2)

Advanced Micro Devices¹, University of Washington²

04 Dec 2010

TL;DR: An out-of-order hardware design to implement ASF on a future AMD processor is developed and the experimental results show that the combined use of the L1 cache and the LS unit is very helpful for the performance robustness of ASF-based lock free data structures, and that the selective use of speculative accesses enables transactional programs to scale with limited ASF hardware resources.

...read moreread less

Abstract: Advanced Synchronization Facility (ASF) is an AMD64 hardware extension for lock-free data structures and transactional memory. It provides a speculative region that atomically executes speculative accesses in the region. Five new instructions are added to demarcate the region, use speculative accesses selectively, and control the speculative hardware context. Programmers can use speculative regions to build flexible multi-word atomic primitives with no additional software support by relying on the minimum guarantee of available ASF hardware resources for lock-free programming. Transactional programs with high-level TM language constructs can either be compiled directly to the ASF code or be linked to software TM systems that use ASF to accelerate transactional execution. In this paper we develop an out-of-order hardware design to implement ASF on a future AMD processor and evaluate it with an in-house simulator. The experimental results show that the combined use of the L1 cache and the LS unit is very helpful for the performance robustness of ASF-based lock free data structures, and that the selective use of speculative accesses enables transactional programs to scale with limited ASF hardware resources.

...read moreread less

Journal Article•DOI•

Cloud-TM: harnessing the cloud with distributed transactional memories

[...]

Paolo Romano¹, Luís Rodrigues¹, Nuno Carvalho¹, João Cachopo¹•Institutions (1)

INESC-ID¹

14 Apr 2010-Operating Systems Review

TL;DR: This paper identifies where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and points several open research problems whose solution is deemed as essential to materialize the Cloud-TM vision.

...read moreread less

Abstract: One of the main challenges to harness the potential of Cloud computing is the design of programming models that simplify the development of large-scale parallel applications and that allow ordinary programmers to take full advantage of the computing power and the storage provided by the Cloud, both of which made available, on demand, in a pay-only-forwhat-you-use pricing model.In this paper, we discuss the use of the Transactional Memory programming model in the context of the cloud computing paradigm, which we refer to as Cloud-TM. We identify where existing Distributed Transactional Memory platforms still fail to meet the requirements of the cloud and of its users, and we point several open research problems whose solution we deem as essential to materialize the Cloud-TM vision.

...read moreread less

Proceedings Article•DOI•

Analytical modeling of lock-based concurrency control with arbitrary transaction data access patterns

[...]

Pierangelo Di Sanzo, Roberto Palmieri, Bruno Ciciani, Francesco Quaglia, Paolo Romano¹ - Show less +1 more•Institutions (1)

INESC-ID¹

28 Jan 2010

TL;DR: This article presents an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results and captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases.

...read moreread less

Abstract: Nowadays the 2-Phase-Locking (2PL) concurrency control algorithm still plays a core rule in the construction of transactional systems (e.g. database systems and transactional memories). Hence, any technique allowing accurate analysis and prediction of the performance of 2PL based systems can be of wide interest and applicability. In this article we present an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results. In particular our model captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases. Also, our model provides significantly more accurate performance predictions in heavy contention scenarios, where the number of transactions enqueued due to conflicting lock requests is expected to be non-minimal. The accuracy of our model has been verified against simulation results based on both synthetic data access patterns and patterns derived from the TPC-C benchmark.

...read moreread less

Proceedings Article•DOI•

An efficient software transactional memory using commit-time invalidation

[...]

Justin Gottschlich¹, Manish Vachharajani¹, Jeremy G. Siek¹•Institutions (1)

University of Colorado Boulder¹

24 Apr 2010

TL;DR: An efficient implementation of commit-time invalidation, a strategy where transactions resolve their conflicts with in-flight (uncommitted) transactions before they commit, which is up to 3 x faster than TL2, a state-of-the-art validating STM.

...read moreread less

Abstract: To improve the performance of transactional memory (TM), researchers have found many eager and lazy optimizations for conflict detection, the process of determining if transactions can commit. Despite these optimizations, nearly all TMs perform one aspect of lazy conflict detection in the same manner to preserve serializability. That is, they perform commit-time validation, where a transaction is checked for conflicts with previously committed transactions during its commit phase. While commit-time validation is efficient for workloads that exhibit limited contention, it can limit transaction throughput for contending workloads.This paper presents an efficient implementation of commit-time invalidation, a strategy where transactions resolve their conflicts with in-flight (uncommitted) transactions before they commit. Commit-time invalidation supplies the contention manager (CM) with data that is unavailable through commit-time validation, allowing the CM to make decisions that increase transaction throughput. Commit-time invalidation also requires notably fewer operations than commit-time validation for memory-intensive transactions, uses zero commit-time operations for dynamically detected read-only transactions, and guarantees full opacity for any transaction in O(N) time, an improvement over incremental validation's O(N2) time. Our experimental results show that for contending workloads, our efficient commit-time invalidating software TM (STM) is up to 3 x faster than TL2, a state-of-the-art validating STM.

...read moreread less

Proceedings Article•DOI•

Transactional predication: high-performance concurrent sets and maps for STM

[...]

Nathan G. Bronson¹, Jared Casper¹, Hassan Chafi¹, Kunle Olukotun¹•Institutions (1)

Stanford University¹

25 Jul 2010

TL;DR: Transactional predication, a method for building transactional maps and sets on top of an underlying non-composable concurrent map, is introduced, and an experimental evaluation shows that predication has better performance than existing transactional collection algorithms across a range of workloads.

...read moreread less

Abstract: Concurrent collection classes are widely used in multi-threaded programming, but they provide atomicity only for a fixed set of operations. Software transactional memory (STM) provides a convenient and powerful programming model for composing atomic operations, but concurrent collection algorithms that allow their operations to be composed using STM are significantly slower than their non-composable alternatives. We introduce transactional predication, a method for building transactional maps and sets on top of an underlying non-composable concurrent map. We factor the work of most collection operations into two parts: a portion that does not need atomicity or isolation, and a single transactional memory access. The result approximates semantic conflict detection using the STM's structural conflict detection mechanism. The separation also allows extra optimizations when the collection is used outside a transaction. We perform an experimental evaluation that shows that predication has better performance than existing transactional collection algorithms across a range of workloads.

...read moreread less

Patent•

Processor support for hardware transactional memory

[...]

Jaewoong Chung¹, David S. Christie, Michael P. Hohmuth, Stephan Diestelhorst, Martin T. Pohlack, Luke Yen - Show less +2 more•Institutions (1)

Advanced Micro Devices¹

11 Jun 2010

TL;DR: In this article, a processing core of a plurality of processing cores is configured to execute a speculative region of code as a single atomic memory transaction with respect one or more others of the plurality of processors.

...read moreread less

Abstract: A processing core of a plurality of processing cores is configured to execute a speculative region of code as a single atomic memory transaction with respect one or more others of the plurality of processing cores. In response to determining an abort condition for an issued one of the plurality of program instructions and in response to determining that the issued program instruction is not part of a mispredicted execution path, the processing core is configured to abort an attempt to execute the speculative region of code.

...read moreread less

Book Chapter•DOI•

Reasoning about optimistic concurrency using a program logic for history

[...]

Ming Fu¹, Yong Li¹, Xinyu Feng¹, Zhong Shao², Yu Zhang¹ - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Yale University²

31 Aug 2010

TL;DR: A novel program logic that uses invariants on history traces to reason about optimistic concurrency algorithms and verifies Michael's non-blocking stack algorithm and shows that the intuition behind such algorithm can be naturally captured using trace invariants.

...read moreread less

Abstract: Optimistic concurrency algorithms provide good performance for parallel programs but they are extremely hard to reason about. Program logics such as concurrent separation logic and rely-guarantee reasoning can be used to verify these algorithms, but they make heavy uses of history variables which may obscure the high-level intuition underlying the design of these algorithms. In this paper, we propose a novel program logic that uses invariants on history traces to reason about optimistic concurrency algorithms. We use past tense temporal operators in our assertions to specify execution histories. Our logic supports modular program specifications with history information by providing separation over both space (program states) and time. We verify Michael's non-blocking stack algorithm and show that the intuition behind such algorithm can be naturally captured using trace invariants.

...read moreread less

Proceedings Article•DOI•

A Dynamically Adaptable Hardware Transactional Memory

[...]

Marc Lupon¹, Grigorios Magklis², Antonio González²•Institutions (2)

Polytechnic University of Catalonia¹, Intel²

04 Dec 2010

TL;DR: DynTM (Dynamically Adaptable HTM) is presented, the first fully-flexible HTM system that permits the simultaneous execution of transactions using complementary version and conflict management strategies and obtains an average speedup of 34% over HTM systems that employ fixed version and Conflict management policies.

...read moreread less

Abstract: Most Hardware Transactional Memory (HTM) implementations choose fixed version and conflict management policies at design time. While eager HTM systems store transactional state in-place in memory and resolve conflicts when they are produced, lazy HTM systems buffer the transactional state in specialized hardware and defer the resolution of conflicts until commit time. Each scheme has its strengths and weaknesses, but, unfortunately, both approaches are too inflexible in the way they manage data versioning and transactional contention. Thus, fixed HTM systems may result in a significant performance opportunity loss when they execute complex transactional applications. In this paper, we present DynTM (Dynamically Adaptable HTM), the first fully-flexible HTM system that permits the simultaneous execution of transactions using complementary version and conflict management strategies. In the heart of DynTM is a novel coherence protocol that allows tracking conflicts among eager and lazy transactions. Both the eager and the lazy execution modes of DynTM exhibit very high performance compared to modern HTM systems. For example, the DynTM lazy execution mode implements local commits to improve on previous proposals. In addition, lazy transactions share the majority of hardware support with eager transactions, reducing substantially the hardware cost compared to other lazy HTM systems. By utilizing a simple predictor to decide the best execution mode for each transaction at runtime, DynTM obtains an average speedup of 34% over HTM systems that employ fixed version and conflict management policies.

...read moreread less

Journal Article•

A Dynamically Adaptable Hardware Transactional Memory

[...]

Marc Lupon, Grigorios Magklis, Antonio González

01 Jan 2010-IEEE Internet Computing

TL;DR: Dynamically Adaptive HTM (DynTM) as mentioned in this paper is the first fully flexible HTM system that permits the simultaneous execution of transactions using complementary version and conflict management strategies.

...read moreread less

Proceedings Article•DOI•

Simplifying concurrent algorithms by exploiting hardware transactional memory

[...]

Dave Dice¹, Yossi Lev¹, Virendra J. Marathe¹, Mark S. Moir¹, Dan Nussbaum¹, Marek Olszewski¹ - Show less +2 more•Institutions (1)

Oracle Corporation¹

13 Jun 2010

TL;DR: This paper uses Sun's prototype multicore chip, code-named Rock, to experiment with HTM algorithms, and discusses ways in which its limitations prevent better results, or would prevent production use of algorithms even if they are successful.

...read moreread less

Abstract: We explore the potential of hardware transactional memory (HTM) to improve concurrent algorithms. We illustrate a number of use cases in which HTM enables significantly simpler code to achieve similar or better performance than existing algorithms for conventional architectures. We use Sun's prototype multicore chip, code-named Rock, to experiment with these algorithms, and discuss ways in which its limitations prevent better results, or would prevent production use of algorithms even if they are successful. Our use cases include concurrent data structures such as double ended queues, work stealing queues and scalable non-zero indicators, as well as a scalable malloc implementation and a simulated annealing application. We believe that our paper makes a compelling case that HTM has substantial potential to make effective concurrent programming easier, and that we have made valuable contributions in guiding designers of future HTM features to exploit this potential.

...read moreread less

Patent•

System and Method for Executing a Transaction Using Parallel Co-Transactions

[...]

Mark S. Moir¹, Robert E. Cypher¹, Daniel S. Nussbaum¹•Institutions (1)

Business International Corporation¹

31 Mar 2010

TL;DR: The transactional memory system described in this paper implements parallel co-transactions that access a shared memory such that at most one of the transactions in a set will succeed and all others will fail.

...read moreread less

Abstract: The transactional memory system described herein may implement parallel co-transactions that access a shared memory such that at most one of the co-transactions in a set will succeed and all others will fail (eg, be aborted) Co-transactions may improve the performance of programs that use transactional memory by attempting to perform the same high-level operation using multiple algorithmic approaches, transactional memory implementations and/or speculation options in parallel, and allowing only the first to complete to commit its results If none of the co-transactions succeeds, one or more may be retried, possibly using a different approach and/or transactional memory implementation The at-most-one property may be managed through the use of a shared “done” flag Conflicts between co-transactions in a set and accesses made by transactions or activities outside the set may be managed using lazy write ownership acquisition and/or a priority-based approach Each co-transaction may execute on a different processor resource

...read moreread less

Patent•

System and method for hardware acceleration of a software transactional memory

[...]

Michael L. Scott¹, Sandhya Dwarkadas¹, Arrvindh Shriraman¹, Virendra J. Marathe¹, Michael Spear¹ - Show less +1 more•Institutions (1)

University of Rochester¹

26 Oct 2010

TL;DR: In this article, the authors propose an alert-on-update mechanism for fast software-controlled conflict detection and programmable data isolation, allowing potentially conflicting readers and writers to proceed concurrently under software control.

...read moreread less

Abstract: In a transactional memory technique, hardware serves simply to optimize the performance of transactions that are controlled fundamentally by software. The hardware support reduces the overhead of common TM tasks—conflict detection, validation, and data isolation—for common-case bounded transactions. Software control preserves policy flexibility and supports transactions unbounded in space and in time. The hardware includes 1) an alert-on-update mechanism for fast software-controlled conflict detection; and 2) programmable data isolation, allowing potentially conflicting readers and writers to proceed concurrently under software control.

...read moreread less

Patent•

System and method for providing locale-based optimizations in a transactional memory

[...]

Virendra J. Marathe¹, Mark S. Moir¹•Institutions (1)

Business International Corporation¹

31 Mar 2010

TL;DR: In this article, locality information (as reflected by the value of a respective locale guard) associated with each of a plurality of data partitions (locales) in a shared memory to elide various operations in transactional read/write fences when transactions access data in locales owned by their threads.

...read moreread less

Abstract: The system and methods described herein may reduce read/write fence latencies and cache pressure related to STM metadata accesses. These techniques may leverage locality information (as reflected by the value of a respective locale guard) associated with each of a plurality of data partitions (locales) in a shared memory to elide various operations in transactional read/write fences when transactions access data in locales owned by their threads. The locale state may be disabled, free, exclusive, or shared. For a given memory access operation of an atomic transaction targeting an object in the shared memory, the system may implement the memory access operation using a contention mediation mechanism selected based on the value of the locale guard associated with the locale in which the target object resides. For example, a traditional read/write fence may be employed in some memory access operations, while other access operations may employ an optimized read/write fence.

...read moreread less

Proceedings Article•DOI•

Scalable Speculative Parallelization on Commodity Clusters

[...]

Hanjun Kim¹, Arun Raman¹, Feng Liu¹, Jae W. Lee¹, David I. August¹ - Show less +1 more•Institutions (1)

Princeton University¹

04 Dec 2010

TL;DR: Distributed Software Multi-threaded Transactional Memory (DSMTX) as discussed by the authors is a runtime system for non-shared memory clusters, allowing them to efficiently address inter-node communication costs.

...read moreread less

Abstract: While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In particular, high inter-node communication cost and lack of globally shared memory appear to make clusters suitable only for server applications with abundant task-level parallelism and scientific applications with regular and independent units of work. Clever use of pipeline parallelism (DSWP), thread-level speculation (TLS), and speculative pipeline parallelism (Spec-DSWP) can mitigate the costs of inter-thread communication on shared memory multicore machines. This paper presents Distributed Software Multi-threaded Transactional memory (DSMTX), a runtime system which makes these techniques applicable to non-shared memory clusters, allowing them to efficiently address inter-node communication costs. Initial results suggest that DSMTX enables efficient cluster execution of a wider set of application types. For 11 sequential C programs parallelized for a 4-core 32-node (128 total core) cluster without shared memory, DSMTX achieves a geomean speedup of 49x. This compares favorably to the 15x speedup achieved by our implementation of TLS-only support for clusters.

...read moreread less

Proceedings Article•DOI•

Transactional memory support for scalable and transparent parallelization of multiplayer games

[...]

Daniel Lupei¹, Bogdan Simion¹, Don Pinto¹, Matthew Misler¹, Mihai Burcea¹, William Krick¹, Cristiana Amza¹ - Show less +3 more•Institutions (1)

University of Toronto¹

13 Apr 2010

TL;DR: It is shown that the STM provides not only ease of programming, but also better performance than that achievable with state-of-the-art lock-based programming, for this realistic high impact application.

...read moreread less

Abstract: In this paper, we study parallelization of multiplayer games using software Transactional Memory (STM) support. We show that the STM provides not only ease of programming, but also better performance than that achievable with state-of-the-art lock-based programming, for this realistic high impact application.For this purpose, we use a game benchmark, SynQuake, that extracts the main data structures and the essential features of the popular game Quake. SynQuake can be driven with a synthetic workload generator that flexibly emulates client game actions and various hot-spot scenarios in the game world. We implement, evaluate and compare the STM version of SynQuake with a state-of-the-art lock-based parallelization of Quake, which we ported to SynQuake. While in STM-SynQuake support for maintaining the consistency of each complex game action is automatic, conservative locking of surrounding objects within a bounding box, for the duration of the game action is inherently needed in lock-based SynQuake. This leads to higher scalability of STM-SynQuake versus lock-based SynQuake, due to a higher degree of false sharing in the latter. Task assignment to threads has a second-order effect on the scalability of STM-SynQuake, due to its impact on the application's true sharing patterns. We show that a dynamic locality-aware task assignment to threads provides the best trade-off between load balancing and conflict reduction.

...read moreread less

Proceedings Article•DOI•

Towards a software transactional memory for graphics processors

[...]

Daniel Cederman¹, Philippas Tsigas¹, Muhammad Tayyab Chaudhry¹•Institutions (1)

Chalmers University of Technology¹

02 May 2010

TL;DR: Two STMs for graphics processors are designed and implemented, one blocking and one non-blocking, and experimental results comparing the performance of the two STMs are described and explained.

...read moreread less

Abstract: The introduction of general purpose computing on many-core graphics processor systems, and the general shift in the industry towards parallelism, has created a demand for ease of parallelization. Software transactional memory (STM) simplifies development of concurrent code by allowing the programmer to mark sections of code to be executed concurrently and atomically in an optimistic manner. In contrast to locks, STMs are easy to compose and do not suffer from deadlocks. We have designed and implemented two STMs for graphics processors, one blocking and one non-blocking. The design issues involved in the designing of these two STMs are described and explained in the paper together with experimental results comparing the performance of the two STMs.

...read moreread less

Journal Article•DOI•

Adaptive locks: Combining transactions and locks for efficient concurrency

[...]

Takayuki Usui¹, Reimer Behrends¹, Jacob Evans², Yannis Smaragdakis²•Institutions (2)

University of Oregon¹, University of Massachusetts Amherst²

01 Oct 2010-Journal of Parallel and Distributed Computing

TL;DR: An adaptive locking technique that dynamically observes whether a critical section would be best executed transactionally or while holding a mutex lock is proposed, finding adaptive locks to consistently match or outperform the better of the two component mechanisms (mutexes or transactions).

...read moreread less

Collapse