scispace - formally typeset
Search or ask a question
Topic

Transactional memory

About: Transactional memory is a research topic. Over the lifetime, 2365 publications have been published within this topic receiving 60818 citations.


Papers
More filters
Proceedings ArticleDOI
01 Dec 2011
TL;DR: A novel speculative transactional memory architecture “SPT” is proposed that supports both TLS&TM semantics, including its special hardware, compiler and execution support and further trades off several important factors in the multicore architecture design.
Abstract: Combining the benefits of Thread level speculation (TLS) and Transactional memory (TM) can enhance the performance of chip multiprocessor (CMP) effectively This paper proposes a novel speculative transactional memory architecture “SPT” that supports both TLS&TM semantics, including its special hardware, compiler and execution support It further trades off several important factors in the multicore architecture design The experimental results show that 16 cache lines is the proper speculative buffer capacity and the write back strategy is the better cache design choice in speculation
Dissertation
22 Mar 2013
TL;DR: A new STM design is proposed, STM2, based on an assisted execution model in which time-consuming TM operations are offloaded to auxiliary threads while application threads optimistically perform computation, and subtle transactional data races in widely-used STAMP applications are discovered.
Abstract: Chip Multithreading (CMT) processors promise to deliver higher performance by running more than one stream of instructions in parallel. To exploit CMT's capabilities, programmers have to parallelize their applications, which is not a trivial task. Transactional Memory (TM) is one of parallel programming models that aims at simplifying synchronization by raising the level of abstraction between semantic atomicity and the means by which that atomicity is achieved. TM is a promising programming model but there are still important challenges that must be addressed to make it more practical and efficient in mainstream parallel programming. The first challenge addressed in this dissertation is that of making the evaluation of TM proposals more solid with realistic TM benchmarks and being able to run the same benchmarks on different STM systems. We first introduce a benchmark suite, RMS-TM, a comprehensive benchmark suite to evaluate HTMs and STMs. RMS-TM consists of seven applications from the Recognition, Mining and Synthesis (RMS) domain that are representative of future workloads. RMS-TM features current TM research issues such as nesting and I/O inside transactions, while also providing various TM characteristics. Most STM systems are implemented as user-level libraries: the programmer is expected to manually instrument not only transaction boundaries, but also individual loads and stores within transactions. This library-based approach is increasingly tedious and error prone and also makes it difficult to make reliable performance comparisons. To enable an "apples-to-apples" performance comparison, we then develop a software layer that allows researchers to test the same applications with interchangeable STM back ends. The second challenge addressed is that of enhancing performance and scalability of TM applications running on aggressive multi-core/multi-threaded processors. Performance and scalability of current TM designs, in particular STM desings, do not always meet the programmer's expectation, especially at scale. To overcome this limitation, we propose a new STM design, STM2, based on an assisted execution model in which time-consuming TM operations are offloaded to auxiliary threads while application threads optimistically perform computation. Surprisingly, our results show that STM2 provides, on average, speedups between 1.8x and 5.2x over state-of-the-art STM systems. On the other hand, we notice that assisted-execution systems may show low processor utilization. To alleviate this problem and to increase the efficiency of STM2, we enriched STM2 with a runtime mechanism that automatically and adaptively detects application and auxiliary threads' computing demands and dynamically partition hardware resources between the pair through the hardware thread prioritization mechanism implemented in POWER machines. The third challenge is to define a notion of what it means for a TM program to be correctly synchronized. The current definition of transactional data race requires all transactions to be totally ordered "as if'' serialized by a global lock, which limits the scalability of TM designs. To remove this constraint, we first propose to relax the current definition of transactional data race to allow a higher level of concurrency. Based on this definition we propose the first practical race detection algorithm for C/C++ applications (TRADE) and implement the corresponding race detection tool. Then, we introduce a new definition of transactional data race that is more intuitive, transparent to the underlying TM implementation, can be used for a broad set of C/C++ TM programs. Based on this new definition, we proposed T-Rex, an efficient and scalable race detection tool for C/C++ TM applications. Using TRADE and T-Rex, we have discovered subtle transactional data races in widely-used STAMP applications which have not been reported in the past.
Posted Content
TL;DR: This paper proposes an algorithm for maintaining a concurrent directed graph that is concurrently being updated by threads adding/deleting vertices and edges and poses the constraint that the graph should always be acyclic, the first work to propose a concurrent data structure for an adjacency list representation of the graphs.
Abstract: In this paper, we propose an algorithm for maintaining a concurrent directed graph (for shared memory architecture) that is concurrently being updated by threads adding/deleting vertices and edges. The update methods of the algorithm are deadlock-free while the contains methods are wait-free. To the the best of our knowledge, this is the first work to propose a concurrent data structure for an adjacency list representation of the graphs. We extend the lazy list implementation of concurrent set for achieving this. We believe that there are many applications that can benefit from this concurrent graph structure. An important application that inspired us is SGT in databases and Transactional Memory. Motivated by this application, on this concurrent graph data-structure, we pose the constraint that the graph should always be acyclic. We ensure this by checking for graph acyclicity whenever we add an edge. To detect the cycle efficiently we have proposed a Wait-free reachability algorithm. We have compared the performance of the proposed concurrent data structure with coarse-grained locking implementation which has been traditionally used in implementing SGT. We show that our algorithm achieves on an average 8x improvement in throughput as compared to coarse-grained and sequential implementations.
Proceedings ArticleDOI
03 Jan 2023
TL;DR: NBTC as mentioned in this paper is a new methodology for atomic composition of nonblocking operations on concurrent data structures, which makes it easy to transform most nonblocking data structures into transactional counterparts while preserving their nonblocking liveness and high concurrency.
Abstract: We introduce nonblocking transaction composition (NBTC), a new methodology for atomic composition of nonblocking operations on concurrent data structures. Unlike previous software transactional memory (STM) approaches, NBTC leverages the linearizability of existing nonblocking structures, reducing the number of memory accesses that must be executed together, atomically, to only one per operation in most cases (these are typically the linearizing instructions of the constituent operations). Our obstruction-free implementation of NBTC, which we call Medley, makes it easy to transform most nonblocking data structures into transactional counterparts while preserving their nonblocking liveness and high concurrency. In our experiments, Medley outperforms Lock-Free Transactional Transform (LFTT), the fastest prior competing methodology, by 40--170%. The marginal overhead of Medley's transactional composition, relative to separate operations performed in succession, is roughly 2.2×. For persistent memory, we observe that failure atomicity for transactions can be achieved "almost for free" with epoch-based periodic persistence. Toward that end, we integrate Medley with nbMontage, a general system for periodically persistent data structures. The resulting txMontage provides ACID transactions and achieves throughput up to two orders of magnitude higher than that of the OneFile persistent STM system.
Posted ContentDOI
30 Jul 2022
TL;DR: TMS2-RA as mentioned in this paper is a relaxed operational transactional memory (TM) specification that provides a formal semantics for TM libraries and their clients that can be implemented by a C11 library, TML-RA, that uses relaxed and release-acquire atomics.
Abstract: Transactional memory (TM) is an intensively studied synchronisation paradigm with many proposed implementations in software and hardware, and combinations thereof. However, TM under relaxed memory, e.g., C11 (the 2011 C/C++ standard) is still poorly understood, lacking rigorous foundations that support verifiable implementations. This paper addresses this gap by developing TMS2-RA, a relaxed operational TM specification. We integrate TMS2-RA with RC11 (the repaired C11 memory model that disallows load-buffering) to provide a formal semantics for TM libraries and their clients. We develop a logic, TARO, for verifying client programs that use TMS2-RA for synchronisation. We also show how TMS2-RA can be implemented by a C11 library, TML-RA, that uses relaxed and release-acquire atomics, yet guarantees the synchronisation properties required by TMS2-RA. We benchmark TML-RA and show that it outperforms its sequentially consistent counterpart in the STAMP benchmarks. Finally, we use a simulation-based verification technique to prove correctness of TML-RA. Our entire development is supported by the Isabelle/HOL proof assistant.

Network Information
Related Topics (5)
Compiler
26.3K papers, 578.5K citations
87% related
Cache
59.1K papers, 976.6K citations
86% related
Parallel algorithm
23.6K papers, 452.6K citations
84% related
Model checking
16.9K papers, 451.6K citations
84% related
Programming paradigm
18.7K papers, 467.9K citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202316
202240
202129
202063
201970
201888