scispace - formally typeset
Search or ask a question
Topic

Transactional memory

About: Transactional memory is a research topic. Over the lifetime, 2365 publications have been published within this topic receiving 60818 citations.


Papers
More filters
Proceedings ArticleDOI
04 Dec 2010
TL;DR: This paper proposes a new method to determine thread interleaving constraints from the tested interleavings in the form of lifeguard transactions (LifeTxes), and shows that 11 out of 14 real concurrency bugs in programs like Apache, MySQL and Mozilla could be avoided using the proposed approach for a negligible performance overhead.
Abstract: Parallel programming is hard, because it is impractical to test all possible thread interleavings. One promising approach to improve a multi-threaded program’s reliability is to constrain a production run’s thread interleavings in such a way that untested interleavings are avoided as much as possible. Such an approach would avoid hard-to-test rare thread interleavings in production runs, and thereby improve correctness. However, a key challenge in realizing this goal is in determining thread interleaving constraints from the tested correct interleavings, and enforcing them efficiently in production runs. In this paper, we propose a new method to determine thread interleaving constraints from the tested interleavings in the form of lifeguard transactions (LifeTxes). An untested code region initially is contained in a single LifeTx. As the code region is tested over more thread interleavings, its original LifeTx is automatically split into multiple smaller LifeTxes so that the newly tested interleavings are permitted in production runs. To efficiently enforce LifeTx constraints in production runs, we propose a hardware design similar to the eager conflict detection capability that exist in a conventional hardware transactional memory (TM) systems, but without the need for versioning, rollback and unbounded TM support.We show that 11 out of 14 real concurrency bugs in programs like Apache, MySQL and Mozilla could be avoided using the proposed approach for a negligible performance overhead.

28 citations

Proceedings ArticleDOI
31 May 2011
TL;DR: This work develops an HTM system that allows selection of versioning and conflict resolution policies at the granularity of cache lines and discovers that this neat match in granularity with that of the cache coherence protocol results in a design that is very simple and yet able to track closely or exceed the performance of the best performing policy for a given workload.
Abstract: Hardware Transactional Memory (HTM) systems, in prior research, have either fixed policies of conflict resolution and data versioning for the entire system or allowed a degree of flexibility at the level of transactions. Unfortunately, this results in susceptibility to pathologies, lower average performance over diverse workload characteristics or high design complexity. In this work we explore a new dimension along which flexibility in policy can be introduced. Recognizing the fact that contention is more a property of data rather than that of an atomic code block, we develop an HTM system that allows selection of versioning and conflict resolution policies at the granularity of cache lines. We discover that this neat match in granularity with that of the cache coherence protocol results in a design that is very simple and yet able to track closely or exceed the performance of the best performing policy for a given workload. It also brings together the benefits of parallel commits (inherent in traditional eager HTMs) and good optimistic concurrency without deadlock avoidance mechanisms (inherent in lazy HTMs), with little increase in complexity.

28 citations

Proceedings ArticleDOI
23 Feb 2013
TL;DR: Evaluation results indicate that the approach provides promising performance at low thread counts: FastLane almost systematically wins over a classical STM in the 1-6 threads range, and often performs better than sequential execution of the non-instrumented version of the same application starting with 2 threads.
Abstract: Software transactional memory (STM) can lead to scalable implementations of concurrent programs, as the relative performance of an application increases with the number of threads that support it. However, the absolute performance is typically impaired by the overheads of transaction management and instrumented accesses to shared memory. This often leads STM-based programs with low thread counts to perform worse than a sequential, non-instrumented version of the same application.In this paper, we propose FastLane, a new STM algorithm that bridges the performance gap between sequential execution and classical STM algorithms when running on few cores. FastLane seeks to reduce instrumentation costs and thus performance degradation in its target operation range. We introduce a novel algorithm that differentiates between two types of threads: One thread (the master) executes transactions pessimistically without ever aborting, thus with minimal instrumentation and management costs, while other threads (the helpers) can commit speculative transactions only when they do not conflict with the master. Helpers thus contribute to the application progress without impairing on the performance of the master.We implement FastLane as an extension of a state-of-the-art STM runtime system and compiler. Multiple code paths are produced for execution on a single, few, and many cores. The runtime system selects the code path providing the best throughput, depending on the number of cores available on the target machine. Evaluation results indicate that our approach provides promising performance at low thread counts: FastLane almost systematically wins over a classical STM in the 1-6 threads range, and often performs better than sequential execution of the non-instrumented version of the same application starting with 2 threads.

28 citations

Proceedings ArticleDOI
18 Oct 2014
TL;DR: This paper illustrates three lightweight STM designs that are capable of scaling to a large number of GPU threads and proposes to support software transactional memory on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts.
Abstract: Graphics Processing Units (GPUs) provide an attractive option for extracting data-level parallelism from diverse applications However, some applications, although possess abundant data-level parallelism, exhibit irregular memory access patterns to the shared data structures Porting such applications to GPUs requires synchronization mechanisms such as locks, which significantly increase the programming complexity Coarse-grained locking, where a single lock controls all the shared resources, although reduces programming efforts, can substantially serialize GPU threads On the other hand, fine-grained locking, where each data element is protected by an independent lock, although facilitates maximum parallelism, requires significant programming efforts To overcome these challenges, we propose to support software transactional memory (STM) on GPU that is able to achieve performance comparable to fine-grained locking, while requiring minimal programming efforts Software-based transactional execution can incur significant runtime overheads due to activities such as detecting conflicts across thousands of GPU threads and managing a consistent memory state Thus, in this paper we illustrate three lightweight STM designs that are capable of scaling to a large number of GPU threads In our system, programmers simply mark the critical sections in the applications, and the underlying STM support is able to achieve performance comparable to fine-grained locking

28 citations

Patent
Yuan C. Chou1
15 Jan 2010
TL;DR: In this paper, a method of read-set and write-set management distinguishes between shared and non-shared memory regions is presented, and a subset of a readset and a writeset that access the shared memory region is checked for conflicts with the one or more concurrent transactions at a first granularity.
Abstract: A method of read-set and write-set management distinguishes between shared and non-shared memory regions. A shared memory region, used by a transactional memory application, which may be shared by one or more concurrent transactions is identified. A non-shared memory region, used by the transactional memory application, which is not shared by the one or more concurrent transactions is identified. A subset of a read-set and a write-set that access the shared memory region is checked for conflicts with the one or more concurrent transactions at a first granularity. A subset of the read-set and the write-set that access the non-shared memory region is checked for conflicts with the one or more concurrent transactions at a second granularity. The first granularity is finer than the second granularity.

28 citations


Network Information
Related Topics (5)
Compiler
26.3K papers, 578.5K citations
87% related
Cache
59.1K papers, 976.6K citations
86% related
Parallel algorithm
23.6K papers, 452.6K citations
84% related
Model checking
16.9K papers, 451.6K citations
84% related
Programming paradigm
18.7K papers, 467.9K citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202316
202240
202129
202063
201970
201888