Topic
Transactional memory
About: Transactional memory is a research topic. Over the lifetime, 2365 publications have been published within this topic receiving 60818 citations.
Papers published on a yearly basis
Papers
More filters
••
01 May 2022
TL;DR: CSMV (Client Server Multiversioned), a multi-versioned Software TM (STM) for GPUs that adopts an innovative client-server design that achieves up to 3 orders of magnitude speed-ups with respect to state-of-the-art STMs for GPUs and that it can accelerate by up to 20× irregular applications running on state of the art STM for CPUs.
Abstract: GPUs have traditionally focused on streaming applications with regular parallelism. Over the last years, though, GPUs have also been successfully used to accelerate irregular applications in a number of application domains by using fine grained synchronization schemes. Unfortunately, fine-grained synchronization strategies are notoriously complex and error-prone. This has motivated the search for alternative paradigms aimed to simplify concurrent programming and, among these, Transactional Memory (TM) is probably one of the most prominent proposals. This paper introduces CSMV (Client Server Multiversioned), a multi-versioned Software TM (STM) for GPUs that adopts an innovative client-server design. By decoupling the execution of transactions from their commit process, CSMV provides two main benefits: (i) it enables the use of fast on chip memory to access the global metadata used to synchronize transaction (ii) it allows for implementing highly efficient collaborative commit procedures, tailored to take full advantage of the architectural characteristics of GPUs. Via an extensive experimental study, we show that CSMV achieves up to 3 orders of magnitude speed-ups with respect to state of the art STMs for GPUs and that it can accelerate by up to 20× irregular applications running on state of the art STMs for CPUs.
••
TL;DR: This work designs and implements a partial abort scheme for STM based on automated software instrumentation, which injects into the application capabilities to undo the required portions of transaction executions and can correctly undo also non‐transactional operations executed on the stack and the heap during a transaction.
Abstract: Software transactional memory (STM) provides synchronization support to ensure atomicity and isolation when threads access shared data in concurrent applications. With STM, shared data accesses are encapsulated within transactions automatically handled by the STM layer. Hence, programmers are not requested to use code‐synchronization mechanisms explicitly, like locking. In this article, we present our experience in designing and implementing a partial abort scheme for STM. The objective of our work is threefold: (1) enabling STM to undo only part of the transaction execution in the case of conflict, (2) designing a scheme that is fully transparent to programmers, thus also allowing to run existing STM applications without modifications, and (3) providing a scheme that can be easily integrated within existing STM runtime environments without altering their internal structure. The scheme we designed is based on automated software instrumentation, which injects into the application capabilities to undo the required portions of transaction executions. Further, it can correctly undo also non‐transactional operations executed on the stack and the heap during a transaction. This capability allows programmers to write transactional code without concerns about the side effects of aborted transactions on both shared and thread‐private data. We integrated and evaluated our partial abort scheme within the TinySTM open‐source library. We analyze the experimental results we achieved with common STM benchmark applications, focusing on the advantages and disadvantages of the proposed solutions for implementing our scheme's different components. Hence, we highlight the appropriate choices and possible solutions to improve partial abort schemes further.
••
15 Apr 2023
TL;DR: In this article , the authors investigate the challenges that arise when leveraging existing transactional memory systems in conjunction with another recent disrup- tive hardware technology, namely Non-Volatile Memory (NVM), such as Intel Optane DC, while attaining competitive performance and pre- serving DRAM's byte addressability.
Abstract: Transactional memory (TM) has emerged as a powerful paradigm to simplify concurrent programming. Nowadays, hardware-based TM (HTM) implementations are available in several mainstream CPUs (e.g., by ARM, Intel and IBM). Due to their hardware nature, HTM implementations spare the cost of software instrumentation and can efficiently detect conflicts by extending existing cache- coherency protocols. However, their cache-centric approach also imposes a number of limitations that impact how effectively such systems can be used in practice. This talk investigates the challenges that arise when leveraging existing HTM systems in conjunction with another recent disrup- tive hardware technology, namely Non-Volatile Memory (NVM). NVM, such as Intel Optane DC, provide much higher density than existing DRAM, while attaining competitive performance and pre- serving DRAM's byte addressability. However, the cache-centric approach adopted by existing HTM implementations raises a crucial problem when these are used in conjunction with NVM: since CPU caches are volatile, existing HTM fail to guarantee that data updated by committed transactions are atomically persisted to NVM. I will overview how this problem has been so far tackled in the literature, with a focus on solutions that do not assume ad-hoc hard- ware mechanisms not provided by current HTM implementations, but that rather rely on hardware-software co-design techniques to ensure consistency on unmodified existing HTM systems. I will conclude by presenting ongoing research directions that depart from state of the art approaches in a twofold way: i) they assume the availability of durable caches, i.e., systems equipped with addi- tional power sources that ensure that cache contents can be safely persisted to NVM upon crashes; ii) they assume a weaker isolation levels at the TM level, namely Snapshot Isolation, which despite being more relaxed than the reference consistency model for TM systems (e.g., opacity), can still ensure correct execution of a wide range of applications while enabling new optimizations to boost the efficiency HTM applications operating on NVM.
••
TL;DR: In this paper , a scenario-aware conflict management strategy called LosaTM is proposed to resolve most false conflicts at a half-cache-line granularity by leveraging the proposed feature of multiple-grained coherency maintenance in the coherence protocol.
Abstract: The vigorous development of high compute-intensive applications has led to the demand for maximizing the concurrency of multicore processors. The best-effort hardware transactional memory(HTM) is an important technology adopted by vendors to improve the potential concurrency of multicore processors, but the HTM implementations on commercial products have some drawbacks for its simplicity and need some further optimizations to enable more exploitation of concurrency. In this article, we propose and evaluate a novel design of HTM, called LosaTM, which can provide a scenario-awareness conflict management strategy. By leveraging the proposed feature of multiple-grained coherency maintenance in the coherence protocol, LosaTM resolves most false conflicts at a half-cache-line granularity. Furthermore, we design a winner/aborter vector conflict management algorithm to improve the efficiency of LosaTM in handling friendly-fire and unfairness competition that we have newly defined. In order to coordinate these integrated conflict management strategies, a scheduling strategy is also proposed to adaptively select the appropriate management according to the specific conflict scenario. We use gem5 to simulate LosaTM in detail on an 8-core tiled CMP system, and the simulation result shows that it only causes 0.7% of the L1 cache size hardware overhead while achieving a 38% average execution time reduction on the native STAMP. The speedup also demonstrates that LosaTM outperforms the state-of-the-art designs in previous works.
••
10 Jun 2014TL;DR: The ability of TLS hardware to allow programmers to parallelize code almost arbitrarily and then performance tune afterwards, based on feedback supplied by the TLS system, provided significant improvements to programmer productivity and made parallel programming much less error-prone.
Abstract: Our 1999 paper described how to use hardware with thread-level speculation (TLS) support to effectively parallelize a number of serial application benchmarks with minimal programmer intervention required. The ability of TLS hardware to allow programmers to parallelize code almost arbitrarily and then performance tune afterwards, based on feedback supplied by the TLS system, provided significant improvements to programmer productivity and made parallel programming much less error-prone. Since this paper appeared, we investigated other hardware variations that could provide similar benefits in terms of programmer productivity, such as ones based on an extension of transactional memory. Unfortunately, these concepts have not been implemented on any real systems. As a result, there is still an opportunity to implement schemes like the ones that we described in this paper in order to ease parallel programming in future systems dramatically.