Topic
Transactional memory
About: Transactional memory is a research topic. Over the lifetime, 2365 publications have been published within this topic receiving 60818 citations.
Papers published on a yearly basis
Papers
More filters
••
14 Feb 2009TL;DR: The programming model, design and implementation of NePalTM; a transactional memory system where atomic blocks can be used for concurrency control at an arbitrary level of nested parallelism are presented.
Abstract: We present the programming model, design and implementation of NePalTM; a transactional memory system where atomic blocks can be used for concurrency control at an arbitrary level of nested parallelism.
20 citations
••
17 Sep 2005TL;DR: The motivation for multi-core architectures, their unique characteristics, and potential solutions to the fundamental software challenges, including architectural enhancements for transactional memory, fine-grain message passing, and speculative multi-threading are addressed.
Abstract: Summary form only given. It is likely that 2005 will be viewed as the year that parallelism came to the masses, with multiple vendors shipping dual/multi-core platforms into the mainstream consumer and enterprise markets. Assuming that this trend will follow Moore's Law scaling, mainstream systems will contain over 10 processing cores by the end of the decade, yielding unprecedented theoretical peak performance. However, it is unclear whether the software community is sufficiently ready for this transition and will be able to unleash these capabilities due to the significant challenges associated with parallel programming. This keynote addresses the motivation for multi-core architectures, their unique characteristics, and potential solutions to the fundamental software challenges, including architectural enhancements for transactional memory, fine-grain message passing, and speculative multi-threading. Finally, we stress the need for a concerted, accelerated effort, starting at the academic-level and encompassing the entire platform software ecosystem, to successfully make the multi-core architectural transition.
20 citations
•
23 Mar 2006TL;DR: In this article, a software transactional memory system is described, which utilizes decomposed software transaction memory instructions as well as runtime optimizations to achieve efficient performance, such as code movement around procedure calls, addition of operations to provide strong atomicity, removal of unnecessary read-to-update upgrades, and removal of operations for newly-allocated objects.
Abstract: A software transactional memory system is described which utilizes decomposed software transactional memory instructions as well as runtime optimizations to achieve efficient performance. The decomposed instructions allow a compiler with knowledge of the instruction semantics to perform optimizations which would be unavailable on traditional software transactional memory systems. Additionally, high-level software transactional memory optimizations are performed such as code movement around procedure calls, addition of operations to provide strong atomicity, removal of unnecessary read-to-update upgrades, and removal of operations for newly-allocated objects. During execution, multi-use header words for objects are extended to provide for per-object housekeeping, as well as fast snapshots which illustrate changes to objects. Additionally, entries to software transactional memory logs are filtered using an associative table during execution, preventing needless writes to the logs. Finally a garbage collector with knowledge of the software transactional memory system compacts software transactional memory logs during garbage collection.
20 citations
••
30 Jun 2009TL;DR: All delay-based CMs, which pause a transaction for some finite duration upon conflict, are found to be unsuitable for the evaluated benchmarks with even moderate amounts of contention.
Abstract: In Transactional Memory (TM), contention management is the process of selecting which transaction should be aborted when a data access conflict arises. In this paper, the performance of published contention managers (CMs) is re-investigated using complex benchmarks recently published in the literature. Our results redefine the CM performance hierarchy. Greedy and Priority are found to give the best performance overall. Polka is still competitive, but by no means best performing as previously published, and in some cases degrading performance by orders of magnitude. In the worst example, execution of a benchmark completes in 6.5 seconds with Priority, yet fails to complete even after 20 minutes with Polka. Analysis of the benchmark found it aborted only 22% of all transactions, spread consistently over the duration of its execution. More generally, all delay-based CMs, which pause a transaction for some finite duration upon conflict, are found to be unsuitable for the evaluated benchmarks with even moderate amounts of contention. This has significant implications, given that TM is primarily aimedat easing concurrent programming for mainstream software development, where applications are unlikely to be highly optimised to reduce aborts.
20 citations
••
TL;DR: The proposed clfB-tree—a B-tree structure whose tree node fits in a single cache line— achieves atomicity and consistency via in-place update, which requires maximum four cache line flushes.
Abstract: Emerging byte-addressable non-volatile memory (NVRAM) is expected to replace block device storages as an alternative low-latency persistent storage device. If NVRAM is used as a persistent storage device, a cache line instead of a disk page will be the unit of data transfer, consistency, and durability.In this work, we design and develop clfB-tree—a B-tree structure whose tree node fits in a single cache line. We employ existing write combining store buffer and restricted transactional memory to provide a failure-atomic cache line write operation. Using the failure-atomic cache line write operations, we atomically update a clfB-tree node via a single cache line flush instruction without major changes in hardware. However, there exist many processors that do not provide SW interface for transactional memory. For those processors, our proposed clfB-tree achieves atomicity and consistency via in-place update, which requires maximum four cache line flushes. We evaluate the performance of clfB-tree on an NVRAM emulation board with ARM Cortex A-9 processor and a workstation that has Intel Xeon E7-4809 v3 processor. Our experimental results show clfB-tree outperforms wB-tree and CDDS B-tree by a large margin in terms of both insertion and search performance.
20 citations