scispace - formally typeset
Search or ask a question
Topic

Transactional memory

About: Transactional memory is a research topic. Over the lifetime, 2365 publications have been published within this topic receiving 60818 citations.


Papers
More filters
Book ChapterDOI
07 Oct 2015
TL;DR: The key idea in ALE is to use a sequence of fine-grained locks in the fall back-path to detect conflicts with the fast-path, and at the same time reduce the costs of these locks by executing the fallback-path as a series segments, where each segment is a dynamic length short hardware transaction.
Abstract: Hardware lock-elision HLE introduces concurrency into legacy lock-based code by optimistically executing critical sections in a fast-path as hardware transactions. Its main limitation is that in case of repeated aborts, it reverts to a fallback-path that acquires a serial lock. This fallback-path lacks hardware-software concurrency, because all fast-path hardware transactions abort and wait for the completion of the fallback. Software lock elision has no such limitation, but the overheads incurred are simply too high. We propose amalgamated lock-elision ALE, a novel lock-elision algorithm that provides hardware-software concurrency and efficiency: the fallback-path executes concurrently with fast-path hardware transactions, while the common-path fast-path reads incur no overheads and proceed without any instrumentation. The key idea in ALE is to use a sequence of fine-grained locks in the fallback-path to detect conflicts with the fast-path, and at the same time reduce the costs of these locks by executing the fallback-path as a series segments, where each segment is a dynamic length short hardware transaction. We implemented ALE into GCC and tested the new system on Intel Haswell 16-way chip that provides hardware transactions. We benchmarked linked-lists, hash-tables and red-black trees, as well as converting KyotoCacheDB to use ALE in GCC, and all show that ALE significantly outperforms HLE.

17 citations

Proceedings ArticleDOI
18 Oct 2007
TL;DR: A constructive critique of locking and transactional memory: their strengths, weaknesses, and challenges is presented.
Abstract: The advent of multi-core and multi-threaded processor architectures highlights the need to address the well-known shortcomings of the ubiquitous lock-based synchronization mechanisms. The emerging transactional-memory synchronization mechanism is viewed as a promising alternative to locking for high-concurrency environments, including operating systems. This paper presents a constructive critique of locking and transactional memory: their strengths, weaknesses, and challenges

17 citations

Proceedings ArticleDOI
10 Oct 2011
TL;DR: STM2 is presented, a novel parallel STM designed for high performance, aggressive multithreading systems, and shows speedups between 1.8x and 5.2x over the tested STM systems, on average, with peaks up to 12.4x.
Abstract: Extracting high performance from modern chip multithreading (CMT) processors is a complex task, especially for large CMT systems. Programmers must efficiently parallelize performance-critical software while avoiding deadlocks and race conditions. Transactional memory (TM) is a promising programming model that allows programmers to focus on parallelism rather than maintaining correctness and avoiding deadlock. Software-only implementations (STMs) are especially compelling because they run on commodity hardware, therefore providing high portability. Unfortunately, STM systems usually suffer from high overheads, which may limit their usage especially at scale. In this paper we present STM2, a novel parallel STM designed for high performance, aggressive multithreading systems. STM2 significantly lowers runtime overhead by offloading read-set validation, bookkeeping and conflict detection to auxiliary threads running on sibling hardware threads. Auxiliary threads perform STM operations in parallel with their paired application threads and absorb STM overhead, significantly improving performance. We exploit the fact that, on modern multi-core processors, sets of cores can share L1 or L2 caches. This lets us achieve closer coupling between the application thread and the auxiliary thread (when compared with a traditional multi-processor systems). Our results, performed on an IBM POWER7 machine, a state-of-the-art, aggressive multi-threaded system, show that our approach outperforms several well-known STM implementations. In particular, STM2 shows speedups between 1.8x and 5.2x over the tested STM systems, on average, with peaks up to 12.8x.

17 citations

Book ChapterDOI
04 Jan 2014
TL;DR: This work presents HiperTM, a high performance active replication protocol for fault-tolerant distributed transactional memory that guarantees 0% of out-of-order optimistic deliveries and performance up to 1.2× better than atomic broadcast-based competitor PaxosSTM.
Abstract: We present HiperTM, a high performance active replication protocol for fault-tolerant distributed transactional memory. The active replication paradigm allows transactions to execute locally, costing them only a single network communication step during transaction execution. Shared objects are replicated across all sites, avoiding remote object accesses. Replica consistency is ensured by a OS-Paxos, an optimistic atomic broadcast layer that total-orders transactional requests, and b SCC, a local multi-version concurrency control protocol that enforces a commit order equivalent to transactions' delivery order. SCC executes write transactions serially without incurring any synchronization overhead, and runs read-only transactions in parallel to write transactions with non-blocking execution and abort-freedom. Our implementation reveals that HiperTM guarantees 0% of out-of-order optimistic deliveries and performance up to 1.2× better than atomic broadcast-based competitor PaxosSTM.

16 citations

Patent
12 Oct 2012
TL;DR: In this article, an instruction sequence including, in order, a load-and-reserve instruction specifying a read access to a target memory block, an instruction delimiting transactional memory access instructions belonging to a memory transaction, and a store-conditional instruction specified a conditional write access to the target memory blocks is detected.
Abstract: In a processor, an instruction sequence including, in order, a load-and-reserve instruction specifying a read access to a target memory block, an instruction delimiting transactional memory access instructions belonging to a memory transaction, and a store-conditional instruction specifying a conditional write access to the target memory block is detected. In response to detecting the instruction sequence, the processor causes the conditional write access to the target memory block to fail.

16 citations


Network Information
Related Topics (5)
Compiler
26.3K papers, 578.5K citations
87% related
Cache
59.1K papers, 976.6K citations
86% related
Parallel algorithm
23.6K papers, 452.6K citations
84% related
Model checking
16.9K papers, 451.6K citations
84% related
Programming paradigm
18.7K papers, 467.9K citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202316
202240
202129
202063
201970
201888