Showing papers on "Software transactional memory published in 2020"

PDF

Open Access

Journal Article•DOI•

[...]

Thomas D. Dickerson¹, Paul Gazzillo², Maurice Herlihy¹, Eric Koskinen³•Institutions (3)

Brown University¹, Stevens Institute of Technology², University of Central Florida³

01 Jun 2020-Distributed Computing

TL;DR: This paper presents a novel way to permit miners and validators to execute smart contracts in parallel, based on techniques adapted from software transactional memory and proves that the validator's execution is equivalent to miner’s execution.

...read moreread less

Abstract: Modern cryptocurrency systems, such as the Ethereum project, permit complex financial transactions through scripts called smart contracts. These smart contracts are executed many, many times, always without real concurrency. First, all smart contracts are serially executed by miners before appending them to the blockchain. Later, those contracts are serially re-executed by validators to verify that the smart contracts were executed correctly by miners. Serial execution limits system throughput and fails to exploit today’s concurrent multicore and cluster architectures. Nevertheless, serial execution appears to be required: contracts share state, and contract programming languages have a serial semantics. This paper presents a novel way to permit miners and validators to execute smart contracts in parallel, based on techniques adapted from software transactional memory. Miners execute smart contracts speculatively in parallel, allowing non-conflicting contracts to proceed concurrently, and “discovering” a serializable concurrent schedule for a block’s transactions, This schedule is captured and encoded as a deterministic fork-join program used by validators to re-execute the miner’s parallel schedule deterministically but concurrently. We have proved that the validator’s execution is equivalent to miner’s execution. Smart contract benchmarks run on a JVM with ScalaSTM show that a speedup of 1.39$$\times $$ can be obtained for miners and 1.59$$\times $$ for validators with just three concurrent threads.

...read moreread less

67 citations

Proceedings Article•DOI•

Persistent memory and the rise of universal constructions

[...]

Andreia Correia¹, Pascal Felber¹, Pedro Ramalhete²•Institutions (2)

University of Neuchâtel¹, Cisco Systems, Inc.²

15 Apr 2020

TL;DR: CX-PUC is presented, the first bounded wait-free persistent universal construction requiring no annotation of the underlying sequential data structure, and Redo-PTM is proposed, a new generic construction based on a finite number of replicas and Herlihy's wait- free consensus, which uses physical instead of logical logging.

...read moreread less

Abstract: Non-Volatile Main Memory (NVMM) has brought forth the need for data structures that are not only concurrent but also resilient to non-corrupting failures. Until now, persistent transactional memory libraries (PTMs) have focused on providing correct recovery from non-corrupting failures without memory leaks. Most PTMs that provide concurrent access do so with blocking progress. The main focus of this paper is to design practical PTMs with wait-free progress based on universal constructions. We first present CX-PUC, the first bounded wait-free persistent universal construction requiring no annotation of the underlying sequential data structure. CX-PUC is an adaptation to persistence of CX, a recently proposed universal construction. We next introduce CX-PTM, a PTM that achieves better throughput and supports transactions over multiple data structure instances, at the price of requiring annotation of the loads and stores in the data structure---as is commonplace in software transactional memory. Finally, we propose a new generic construction, Redo-PTM, based on a finite number of replicas and Herlihy's wait-free consensus, which uses physical instead of logical logging. By exploiting its capability of providing wait-free ACID transactions, we have used Redo-PTM to implement the world's first persistent key-value store with bounded wait-free progress.

...read moreread less

22 citations

Proceedings Article•DOI•

Nonblocking persistent software transactional memory

[...]

H. Alan Beadle¹, Wentao Cai¹, Haosen Wen¹, Michael L. Scott¹•Institutions (1)

University of Rochester¹

19 Feb 2020

TL;DR: QSTM, a persistent word-based software transactional memory (STM) system, is presented that is nonblocking and does not require either the modification of target data structures or the use of a wide CAS instruction.

...read moreread less

Abstract: While developed largely for higher density and lower power, byte-addressable nonvolatile memory can also allow data to persist across program runs and system crashes without the need to flush to disk or flash. If data is to be recovered after a crash, however, care must be taken to ensure that the contents of memory are consistent at all times. This can be challenging in multithreaded applications with write-back caches. We present QSTM, a persistent word-based software transactional memory (STM) system to address this problem. Unlike past such systems, QSTM is nonblocking and does not require either the modification of target data structures or the use of a wide CAS instruction.

...read moreread less

16 citations

Book Chapter•DOI•

Efficient Concurrent Execution of Smart Contracts in Blockchains Using Object-Based Transactional Memory

[...]

Parwat Singh Anjana¹, Hagit Attiya², Sweta Kumari², Sathya Peri¹, Archit Somani² - Show less +1 more•Institutions (2)

Indian Institute of Technology, Hyderabad¹, Technion – Israel Institute of Technology²

03 Jun 2020

TL;DR: In this article, the authors leverage multiple threads to execute SCTs and achieve better efficiency and higher throughput by leveraging multiple versions for each shared data item as opposed to Single-Version OSTMs (SVOSTMs).

...read moreread less

Abstract: Several popular blockchains such as Ethereum execute complex transactions through user-defined scripts. A block of the chain typically consists of multiple smart contract transactions (SCTs). To append a block into the blockchain, a miner executes these SCTs. On receiving this block, other nodes act as validators, who re-execute these SCTs as part of the consensus protocol to validate the block. In Ethereum and other blockchains that support cryptocurrencies, a miner gets an incentive every time such a valid block is successfully added to the blockchain. When executing SCTs sequentially, miners and validators fail to harness the power of multiprocessing offered by the prevalence of multi-core processors, thus degrading throughput. By leveraging multiple threads to execute SCTs, we can achieve better efficiency and higher throughput. Recently, Read-Write Software Transactional Memory Systems (RWSTMs) were used for concurrent execution of SCTs. It is known that Object-based STMs (OSTMs), using higher-level objects (such as hash-tables or lists), achieve better throughput as compared to RWSTMs. Even greater concurrency can be obtained using Multi-Version OSTMs (MVOSTMs), which maintain multiple versions for each shared data item as opposed to Single-Version OSTMs (SVOSTMs).

...read moreread less

10 citations

Journal Article•DOI•

Formal analysis and verification of the PSTM architecture using CSP

[...]

Ailun Liu¹, Huibiao Zhu¹, Miroslav Popovic², Shuangqing Xiang¹, Lei Zhang¹ - Show less +1 more•Institutions (2)

East China Normal University¹, University of Novi Sad²

01 Jul 2020-Journal of Systems and Software

TL;DR: This paper applies process algebra CSP to formally verify the PSTM architecture at a fine-grained level and concludes that the architecture can have a proper communication and can guarantee atomicity, isolation, consistency and optimism.

...read moreread less

7 citations

Book Chapter•DOI•

Defining and Verifying Durable Opacity: Correctness for Persistent Software Transactional Memory

[...]

Eleni Bila¹, Simon Doherty², Brijesh Dongol¹, John Derrick², Gerhard Schellhorn³, Heike Wehrheim⁴ - Show less +2 more•Institutions (4)

University of Surrey¹, University of Sheffield², University of Augsburg³, University of Paderborn⁴

15 Jun 2020

TL;DR: In this article, a number of persistent concurrent data structures are designed to satisfy the notion of durable linearizability, which is a new paradigm for memory that preserves its contents even after power loss.

...read moreread less

Abstract: Non-volatile memory (NVM), aka persistent memory, is a new paradigm for memory that preserves its contents even after power loss The expected ubiquity of NVM has stimulated interest in the design of novel concepts ensuring correctness of concurrent programming abstractions in the face of persistency So far, this has lead to the design of a number of persistent concurrent data structures, built to satisfy an associated notion of correctness: durable linearizability

...read moreread less

6 citations

Proceedings Article•DOI•

Thread Affinity in Software Transactional Memory

[...]

Douglas Pereira Pasqualin¹, Matthias Diener², André Rauber Du Bois¹, Maurício L. Pilla¹•Institutions (2)

Universidade Federal de Pelotas¹, University of Illinois at Urbana–Champaign²

01 Jul 2020

TL;DR: A method to detect sharing behavior directly inside the STM library by tracking and analyzing how threads perform STM operations is introduced, which is then used to perform an optimized mapping of the application's threads to cores in order to improve the efficiency ofSTM operations.

...read moreread less

Abstract: Software Transactional Memory (STM) is an abstraction to synchronize accesses to shared resources. It simplifies parallel programming by replacing the use of explicit locks and synchronization mechanisms with atomic blocks. A wellknown approach to improve performance of STM applications is to serialize transactions to avoid conflicts using schedulers and mapping algorithms. However, in current architectures with complex memory hierarchies it is also important to consider where the memory of the program is allocated and how it is accessed. An important technique for improving memory locality is to map threads and data of an application based on their memory access behavior. This technique is called sharing-aware mapping. In this paper, we introduce a method to detect sharing behavior directly inside the STM library by tracking and analyzing how threads perform STM operations. This information is then used to perform an optimized mapping of the application's threads to cores in order to improve the efficiency of STM operations. Experimental results with the STAMP benchmarks show performance gains of up to 9.7x (1.4x on average), and a reduction of the number of aborts of up to 8.5x, compared to the Linux scheduler.

...read moreread less

5 citations

Journal Article•DOI•

Convoider: A Concurrency Bug Avoider Based on Transparent Software Transactional Memory

[...]

Zhen Yu, Yu Zuo, Yong Zhao

01 Feb 2020-International Journal of Parallel Programming

TL;DR: Experimental results show that Convoider succeeds in transparently transactionalizing twelve real-world applications and perfectly avoid 94% of concurrency bugs used in the authors' experiments.

...read moreread less

Abstract: Software transactional memory is an effective mechanism to avoid concurrency bugs in multi-threaded programs. However, two problems hinder the adoption of traditional such systems in wild world: high human cost for equipping programs with transaction functionality and low compatibility with I/O calls and conditional variables. This paper presents Convoider to try to solve these problems. By intercepting inter-thread operations and designating code among them as transactions in each thread, Convoider automatically transactionalizes target programs without any source code modification and recompiling. By saving/restoring stack frames and CPU registers on beginning/aborting a transaction, Convoider makes execution flow revocable. By turning threads into processes, leveraging virtual memory protection and customizing memory allocation/deallocation, Convoider makes memory manipulations revocable. By maintaining virtual file systems and redirecting I/O operations onto them, Convoider makes I/O effects revocable. By converting lock/unlock operations to no-ops, customizing signal/wait operations on condition variables and committing memory changes transactionally, Convoider makes deadlocks, data races and atomicity violations impossible. Experimental results show that Convoider succeeds in transparently transactionalizing twelve real-world applications and perfectly avoid 94% of thirty-one concurrency bugs used in our experiments. This study can help efficiently transactionalize legacy multi-threaded applications and effectively improve the runtime reliability of them.

...read moreread less

5 citations

Proceedings Article•DOI•

Online Sharing-Aware Thread Mapping in Software Transactional Memory

[...]

Douglas Pereira Pasqualin¹, Matthias Diener², André Rauber Du Bois¹, Maurício L. Pilla¹•Institutions (2)

Universidade Federal de Pelotas¹, University of Illinois at Urbana–Champaign²

01 Sep 2020

TL;DR: STMap is introduced, an online, low overhead mechanism to detect the sharing behavior and perform the mapping directly inside the STM library, by tracking and analyzing how threads perform STM operations.

...read moreread less

Abstract: Software Transactional Memory (STM) is an alternative abstraction to synchronize processes in parallel programming. One advantage is simplicity since it is possible to replace the use of explicit locks with atomic blocks. Regarding STM performance, many studies already have been made focusing on reducing the number of aborts. However, in current multicore architectures with complex memory hierarchies, it is also important to consider where the memory of a program is allocated and how it is accessed. This paper proposes the use of a technique called sharing-aware mapping, which maps threads to cores of an application based on their memory access behavior, to achieve better performance in STM systems. We introduce STMap, an online, low overhead mechanism to detect the sharing behavior and perform the mapping directly inside the STM library, by tracking and analyzing how threads perform STM operations. In experiments with the STAMP benchmark suite and synthetic benchmarks, STMap shows performance gains of up to 77% on a Xeon system (17.5% on average) and 85% on an Opteron system (9.1% on average), compared to the Linux scheduler.

...read moreread less

5 citations

Proceedings Article•DOI•

Lock-free transactional vector

[...]

Kenneth Lamar¹, Christina Peterson¹, Damian Dechev¹•Institutions (1)

University of Central Florida¹

22 Feb 2020

TL;DR: This work presents the first lock-free transactional vector, which pre-processes transactions to reduce shared memory access and simplify access logic, and generally offers better scalability than STM and STO, and competitive performance with Transactional Boosting, but with additionalLock-free guarantees.

...read moreread less

Abstract: The vector is a fundamental data structure, offering constant-time traversal to elements and a dynamically resizable range of indices. While several concurrent vectors exist, a composition of concurrent vector operations dependent on each other can lead to undefined behavior. Techniques for providing transactional capabilities for data structure operations include Software Transactional Memory (STM) and transactional transformation methodologies. Transactional transformations convert concurrent data structures into their transactional equivalents at an operation level, rather than STM's object or memory level. To the best of our knowledge, existing STMs do not support dynamic read/write sets in a lock-free manner, and transactional transformation methodologies are unsuitable for the vector's contiguous memory layout. In this work, we present the first lock-free transactional vector. It integrates the fast lock-free resizing and instant logical status changes from related works. Our approach pre-processes transactions to reduce shared memory access and simplify access logic. This can be done without locking elements or verifying conflicts between transactions. We compare our design against state-of-the-art transactional designs, GCC STM, Transactional Boosting, and STO. All data structures are tested on four different platforms, including x86_64 and ARM architectures. We find that our lock-free transactional vector generally offers better scalability than STM and STO, and competitive performance with Transactional Boosting, but with additional lock-free guarantees. In scenarios with only reads and writes, our vector is as much as 47% faster than Transactional Boosting.

...read moreread less

4 citations

Journal Article•DOI•

Adaptive Model-Based Scheduling in Software Transactional Memory

[...]

Pierangelo Di Sanzo¹, Alessandro Pellegrini¹, Marco Sannicandro¹, Bruno Ciciani¹, Francesco Quaglia² - Show less +1 more•Institutions (2)

Sapienza University of Rome¹, University of Rome Tor Vergata²

01 May 2020-IEEE Transactions on Computers

TL;DR: This article presents an adaptive model-based transaction scheduling technique relying on a Markov Chain-based performance model of STM systems, and presents a scheduler that implements the adaptive technique, integrated within the open source TinySTM package.

...read moreread less

Abstract: Software Transactional Memory (STM) stands as powerful concurrent programming paradigm, enabling atomicity, and isolation while accessing shared data. On the downside, STM may suffer from performance degradation due to excessive conflicts among concurrent transactions, which cause waste of CPU-cycles and energy because of transaction aborts. An approach to cope with this issue consists of putting in place smart scheduling strategies which temporarily suspend the execution of some transaction in order to reduce the transaction conflict rate. In this article, we present an adaptive model-based transaction scheduling technique relying on a Markov Chain-based performance model of STM systems. Our scheduling technique is adaptive in a twofold sense: (i) It controls the execution of transactions depending on throughput predictions by the model as a function of the current system state. (ii) It re-tunes on-line the Markov Chain-based model to adapt it—and the outcoming transaction scheduling decisions—to dynamic variations of the workload. We have been able to achieve the latter target thanks to the fact that our performance model is extremely lightweight. In fact, to be recomputed, it requires a reduced set of input parameters, whose values can be estimated via a few on-line samples related to the current workload dynamics. We also present a scheduler that implements our adaptive technique, which we integrated within the open source TinySTM package. Further, we report the results of an experimental study based on the STAMP benchmark suite, which has been aimed at assessing both the accuracy of our performance model in predicting the actual system throughput and the advantages of the adaptive scheduling policy over literature techniques.

...read moreread less

Proceedings Article•DOI•

TardisTM: incremental repair for transactional memory

[...]

Daming D. Chen¹, Phillip B. Gibbons¹, Todd C. Mowry¹•Institutions (1)

Carnegie Mellon University¹

22 Feb 2020

TL;DR: This paper designs a mechanism for localizing conflicts back to transactional program points, defines the semantics for optional repair handler annotations, and extends the conflict detection algorithm to ensure all repairs are completed.

...read moreread less

Abstract: Transactional memory (TM) provides developers with a transaction primitive for concurrent code execution that transparently checks for concurrency conflicts. When such a conflict is detected, the system recovers by aborting and restarting the transaction. Although correct, this behavior wastes work and inhibits forward progress. In this paper, we present TardisTM, a software TM system that supports repairing concurrency conflicts while preserving unaffected computation. Our key insight is that existing conflict detection mechanisms can be extended to perform incremental transaction repair, when augmented with additional runtime information. To do so, we design a mechanism for localizing conflicts back to transactional program points, define the semantics for optional repair handler annotations, and extend the conflict detection algorithm to ensure all repairs are completed. To evaluate our system, we characterize the benefit of repair on a set of benchmark programs; we measure up to 2.95x speedup over mutual exclusion, and 93% abort reduction over a baseline software TM system that does not support repair.

...read moreread less

Posted Content•

Using Nesting to Push the Limits of Transactional Data Structure Libraries.

[...]

Gal Assa¹, Hagar Meir², Guy Golan-Gueta, Idit Keidar¹, Alexander Spiegelman - Show less +1 more•Institutions (2)

Technion – Israel Institute of Technology¹, IBM²

02 Jan 2020-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper builds a Java TDSL with built-in support for nesting in a number of data structures, and shows that the library outperforms TL2 twofold without nesting, and by up to 16x when nesting is used.

...read moreread less

Abstract: Transactional data structure libraries (TDSL) combine the ease-of-programming of transactions with the high performance and scalability of custom-tailored concurrent data structures. They can be very efficient thanks to their ability to exploit data structure semantics in order to reduce overhead, aborts, and wasted work compared to general-purpose software transactional memory. However, TDSLs were not previously used for complex use-cases involving long transactions and a variety of data structures. In this paper, we boost the performance and usability of a TDSL, towards allowing it to support complex applications. A key idea is nesting. Nested transactions create checkpoints within a longer transaction, so as to limit the scope of abort, without changing the semantics of the original transaction. We build a Java TDSL with built-in support for nested transactions over a number of data structures. We conduct a case study of a complex network intrusion detection system that invests a significant amount of work to process each packet. Our study shows that our library outperforms publicly available STMs twofold without nesting, and by up to 16x when nesting is used.

...read moreread less

Proceedings Article•DOI•

Memory Tagging: Minimalist Synchronization for Scalable Concurrent Data Structures

[...]

Dan Alistarh¹, Trevor Brown², Nandini Singhal³•Institutions (3)

Institute of Science and Technology Austria¹, University of Waterloo², Microsoft³

06 Jul 2020

TL;DR: This paper introduces memory tagging, a simple hardware mechanism which enables the programmer to "tag" a dynamic set of memory locations, at cache-line granularity, and later validate whether the memory has been concurrently modified, with the possibility of updating one of the underlying locations atomically if validation succeeds.

...read moreread less

Abstract: There has been a significant amount of research on hardware and software support for efficient concurrent data structures; yet, the question of how to build correct, simple, and scalable data structures has not yet been definitively settled. In this paper, we revisit this question from a minimalist perspective, and ask: what is the smallest amount of synchronization required for correct and efficient concurrent search data structures, and how could this minimal synchronization support be provided in hardware? To address these questions, we introduce memory tagging, a simple hardware mechanism which enables the programmer to "tag" a dynamic set of memory locations, at cache-line granularity, and later validate whether the memory has been concurrently modified, with the possibility of updating one of the underlying locations atomically if validation succeeds. We provide several examples showing that this mechanism can enable fast and arguably simple concurrent data structure designs, such as lists, binary search trees, balanced search trees, range queries, and Software Transactional Memory (STM) implementations. We provide an implementation of memory tags in the Graphite multi-core simulator, showing that the mechanism can be implemented entirely at the level of L1 cache, and that it can enable non-trivial speedups versus existing implementations of the above data structures.

...read moreread less

Proceedings Article•DOI•

Nesting and composition in transactional data structure libraries

[...]

Gal Assa¹, Hagar Meir², Guy Golan-Gueta³, Idit Keidar¹, Alexander Spiegelman³ - Show less +1 more•Institutions (3)

Technion – Israel Institute of Technology¹, IBM², VMware³

19 Feb 2020

TL;DR: This work builds a Java TDSL with built-in support for nesting in a number of data structures, and shows that the library outperforms TL2 twofold without nesting, and by up to 16x when nesting is used.

...read moreread less

Abstract: Transactional data structure libraries (TDSL) combine the ease-of-programming of transactions with the high performance and scalability of custom-tailored concurrent data structures. They can be very efficient thanks to their ability to exploit data structure semantics in order to reduce overhead, aborts, and wasted work compared to general-purpose software transactional memory. However, TDSLs were not previously used for complex use-cases involving long transactions and a variety of data structures. In this work, we boost the performance and usability of a TDSL, allowing it to support complex applications. A key idea is nesting. Nested transactions create checkpoints within a longer transaction, so as to limit the scope of abort, without changing the semantics of the original transaction. We build a Java TDSL with built-in support for nesting in a number of data structures. We conduct a case study of a complex network intrusion detection system that invests a significant amount of work to process each packet. Our study shows that our library outperforms TL2 twofold without nesting, and by up to 16x when nesting is used. Finally, we discuss cross-library nesting, namely dynamic composition of transactions from multiple libraries.

...read moreread less

Book Chapter•DOI•

Towards a formal account for software transactional memory

[...]

Doriana Medić¹, Claudio Antares Mezzina², Iain Phillips³, Nobuko Yoshida³•Institutions (3)

University of Bologna¹, University of Urbino², Imperial College London³

09 Jul 2020

TL;DR: A formal framework for describing STM is defined and it is shown how with a minor variation of the rules it is possible to model two common policies for STM: reader preference and writer preference.

...read moreread less

Abstract: Software transactional memory (STM) is a concurrency control mechanism for shared memory systems. It is opposite to the lock based mechanism, as it allows multiple processes to access the same set of variables in a concurrent way. Then according to the used policy, the effect of accessing to shared variables can be committed (hence, made permanent) or undone. In this paper, we define a formal framework for describing STMs and show how with a minor variation of the rules it is possible to model two common policies for STM: reader preference and writer preference.

...read moreread less

Posted Content•

Modularising Verification Of Durable Opacity.

[...]

Eleni Bila¹, John Derrick², Simon Doherty², Brijesh Dongol¹, Gerhard Schellhorn³, Heike Wehrheim - Show less +2 more•Institutions (3)

University of Surrey¹, University of Sheffield², University of Augsburg³

30 Nov 2020-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A durably opaque version of NOrec (no ownership records), an existing STM algorithm proven to be opaque, is developed and a verification technique based on refinement is designed, separating the proof of durability of memory accesses from theProof of opacity.

...read moreread less

Abstract: Non-volatile memory (NVM), also known as persistent memory, is an emerging paradigm for memory that preserves its contents even after power loss. NVM is widely expected to become ubiquitous, and hardware architectures are already providing support for NVM programming. This has stimulated interest in the design of novel concepts ensuring correctness of concurrent programming abstractions in the face of persistency and in the development of associated verification approaches. Software transactional memory (STM) is a key programming abstraction that supports concurrent access to shared state. In a fashion similar to linearizability as the correctness condition for concurrent data structures, there is an established notion of correctness for STMs known as opacity. We have recently proposed durable opacity as the natural extension of opacity to a setting with non-volatile memory. Together with this novel correctness condition, we designed a verification technique based on refinement. In this paper, we extend this work in two directions. First, we develop a durably opaque version of NOrec (no ownership records), an existing STM algorithm proven to be opaque. Second, we modularize our existing verification approach by separating the proof of durability of memory accesses from the proof of opacity. For NOrec, this allows us to re-use an existing opacity proof and complement it with a proof of the durability of accesses to shared state.

...read moreread less

Journal Article•DOI•

Thread-Level Locking for SIMT Architectures

[...]

Lan Gao¹, Yunlong Xu², Rui Wang³, Zhongzhi Luan³, Zhibin Yu⁴, Depei Qian³ - Show less +2 more•Institutions (4)

Capital Normal University¹, Xi'an Jiaotong University², Beihang University³, Chinese Academy of Sciences⁴

01 May 2020-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This article proposes a software-based thread-level synchronization mechanism called lock stealing for GPUs to avoid live-locks, and describes how to implement the lock stealing algorithm in mutual exclusive locks and readers-writer locks with high performance.

...read moreread less

Abstract: As more emerging applications are moving to GPUs, thread-level synchronization has become a requirement. However, GPUs only provide warp-level and thread-block-level rather than thread-level synchronization. Moreover, it is highly possible to cause live-locks by using CPU synchronization mechanisms to implement thread-level synchronization for GPUs. In this article, we first propose a software-based thread-level synchronization mechanism called lock stealing for GPUs to avoid live-locks. We then describe how to implement our lock stealing algorithm in mutual exclusive locks and readers-writer locks with high performance. Finally, by putting it all together, we develop a thread-level locking library (TLLL) for commercial GPUs. To evaluate TLLL and show its general applicability, we use it to implement six widely used programs. We compare TLLL against the state-of-the-art ad-hoc GPU synchronization, GPU software transactional memory (STM), and CPU hardware transactional memory (HTM), respectively. The results show that, compared with the ad-hoc GPU synchronization for Delaunay mesh refinement (DMR), TLLL improves the performance by 22 percent on average on a GTX970 GPU, and shows up to 11 percent of performance improvement on a Volta V100 GPU. Moreover, it significantly reduces the required memory size. Such low memory consumption enables DMR to successfully run on the GTX970 GPU with the 10-million mesh size, and the V100 GPU with the 40-million mesh size, with which the ad-hoc synchronization can not run successfully. In addition, TLLL outperforms the GPU STM by 65 percent, and the CPU HTM (running on a Xeon E5-2620 v4 CPU with 16 hardware threads) by 43 percent on average.

...read moreread less

Proceedings Article•DOI•

Brief Announcement: On Implementing Software Transactional Memory in the C++ Memory Model

[...]

Matthew Rodriguez¹, Michael Spear¹•Institutions (1)

Lehigh University¹

31 Jul 2020

TL;DR: This work discusses some consequences of the C++ memory model on STM, identifies an easy-to-fix implementation error, and describes an unavoidable formal race condition that occurs in an important class of STM algorithms.

...read moreread less

Abstract: High-performance software transactional memory (STM) implementations rely on nuanced use of synchronization variables to coordinate speculative accesses to program data. We discuss some consequences of the C++ memory model on STM, identify an easy-to-fix implementation error, and describe an unavoidable formal race condition that occurs in an important class of STM algorithms.

...read moreread less

Posted Content•

Defining and Verifying Durable Opacity: Correctness for Persistent Software Transactional Memory

[...]

Eleni Bila¹, Simon Doherty², Brijesh Dongol¹, John Derrick², Gerhard Schellhorn³, Heike Wehrheim⁴ - Show less +2 more•Institutions (4)

University of Surrey¹, University of Sheffield², University of Augsburg³, University of Paderborn⁴

17 Apr 2020-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper develops a durably opaque version of an existing STM algorithm, namely the Transactional Mutex Lock (TML), and develops a proof technique for durable opacity based on refinement between TML and an operational characterisation of durable opacity by adapting the TMS2 specification.

...read moreread less

Abstract: Non-volatile memory (NVM), aka persistent memory, is a new paradigm for memory that preserves its contents even after power loss. The expected ubiquity of NVM has stimulated interest in the design of novel concepts ensuring correctness of concurrent programming abstractions in the face of persistency. So far, this has lead to the design of a number of persistent concurrent data structures, built to satisfy an associated notion of correctness: durable linearizability. In this paper, we transfer the principle of durable concurrent correctness to the area of software transactional memory (STM). Software transactional memory algorithms allow for concurrent access to shared state. Like linearizability for concurrent data structures, opacity is the established notion of correctness for STMs. First, we provide a novel definition of durable opacity extending opacity to handle crashes and recovery in the context of NVM. Second, we develop a durably opaque version of an existing STM algorithm, namely the Transactional Mutex Lock (TML). Third, we design a proof technique for durable opacity based on refinement between TML and an operational characterisation of durable opacity by adapting the TMS2 specification. Finally, we apply this proof technique to show that the durable version of TML is indeed durably opaque. The correctness proof is mechanized within Isabelle.

...read moreread less

Book Chapter•DOI•

Characterizing the Sharing Behavior of Applications Using Software Transactional Memory

[...]

Douglas Pereira Pasqualin¹, Matthias Diener², André Rauber Du Bois¹, Maurício L. Pilla¹•Institutions (2)

Universidade Federal de Pelotas¹, University of Illinois at Urbana–Champaign²

15 Nov 2020

TL;DR: In this article, the sharing behavior of the STAMP benchmark suite is analyzed using information extracted from the STM runtime, providing information to guide thread mapping based on their sharing behavior.

...read moreread less

Abstract: Software Transactional Memory (STM) is an alternative abstraction for process synchronization in parallel programming. It is often easier to use than locks, avoiding issues such as deadlocks. In order to improve STM performance, many studies have been made on transactional schedulers. However, in current architectures with complex memories hierarchies, it is also important to map threads in such a way that threads that share data are executed close to each other in the memory hierarchy, such that they can access data protected by STM faster. For a successful thread mapping of an STM application, it is important to perform an in-depth analysis of its sharing behavior to determine its suitability for different mapping policies and the expected performance gains. This paper characterizes the sharing behavior of the STAMP benchmark suite by using information extracted from the STM runtime, providing information to guide thread mapping based on their sharing behavior. Our main findings are that most of the STAMP applications are suitable for a static thread mapping approach to improve the performance since (1) the applications do not present dynamic behavior and (2) the sharing pattern does not change between executions. Furthermore, we show that sharing information gathered from the STM runtime can be used to analyze and reduce false sharing in TM applications.

...read moreread less

Proceedings Article•DOI•

Transactional Memory: A Review

[...]

Tabassum¹, Meenu¹•Institutions (1)

Madan Mohan Malaviya University of Technology¹

01 Mar 2020

TL;DR: This paper discusses STM having basic approaches called HTM, STM, and HyTM with version management and synchronization, and discusses previous version work and challenges of STM.

...read moreread less

Abstract: For simplifying parallel programming, Software Transactional Memory is an important method which improves scalability in multiple cores. Python uses a system clock instead of a global counter. STM in python deals with shared data among parallel threads of locked(as in python) data structure. A Haskell has thread-local journaling for operations on mutable and globally shared objects. In this review paper, we have discussed STM having basic approaches called HTM, STM, and HyTM with version management and synchronization. This paper also discusses previous version work and challenges of STM.

...read moreread less

Scalable Consistency in the Multi-core Era

[...]

Deepthi Devaki Akkoorath

01 Jan 2020

TL;DR: The Global-Local View Model is proposed, a programming model that exploits the heterogeneous access latencies in many-core systems and provides a consistency semantics that allows for more scalability even under contention.

...read moreread less

Abstract: The advent of heterogeneous many-core systems has increased the spectrum of achievable performance from multi-threaded programming. As the processor components become more distributed, the cost of synchronization and communication needed to access the shared resources increases. Concurrent linearizable access to shared objects can be prohibitively expensive in a high contention workload. Though there are various mechanisms (e.g., lock-free data structures) to circumvent the synchronization overhead in linearizable objects, it still incurs performance overhead for many concurrent data types. Moreover, many applications do not require linearizable objects and apply ad-hoc techniques to eliminate synchronous atomic updates. In this thesis, we propose the Global-Local View Model. This programming model exploits the heterogeneous access latencies in many-core systems. In this model, each thread maintains different views on the shared object: a thread-local view and a global view. As the thread-local view is not shared, it can be updated without incurring synchronization costs. The local updates become visible to other threads only after the thread-local view is merged with the global view. This scheme improves the performance at the expense of linearizability. Besides the weak operations on the local view, the model also allows strong operations on the global view. Combining operations on the global and the local views, we can build data types with customizable consistency semantics on the spectrum between sequential and purely mergeable data types. Thus the model provides a framework that captures the semantics of Multi-View Data Types. We discuss a formal operational semantics of the model. We also introduce a verification method to verify the correctness of the implementation of several multi-view data types. Frequently, applications require updating shared objects in an “all-or-nothing” manner. Therefore, the mechanisms to synchronize access to individual objects are not sufficient. Software Transactional Memory (STM) is a mechanism that helps the programmer to correctly synchronize access to multiple mutable shared data by serializing the transactional reads and writes. But under high contention, serializable transactions incur frequent aborts and limit parallelism, which can lead to severe performance degradation. Mergeable Transactional Memory (MTM), proposed in this thesis, allows accessing multi-view data types within a transaction. Instead of aborting and re-executing the transaction, MTM merges its changes using the data-type specific merge semantics. Thus it provides a consistency semantics that allows for more scalability even under contention. The evaluation of our prototype implementation in Haskell shows that mergeable transactions outperform serializable transactions even under low contention while providing a structured and type-safe interface.

...read moreread less

Proceedings Article•DOI•

Nonblocking Persistent Software Transactional Memory

[...]

H. Alan Beadle¹, Wentao Cai¹, Haosen Wen¹, Michael L. Scott¹•Institutions (1)

University of Rochester¹

01 Dec 2020

TL;DR: QSTM as discussed by the authors is a nonblocking persistent STM that requires neither the modification of target data structures nor the availability of a wide CAS instruction, and works only on a machine with double-width compare-and-swap (CAS) or load-linked/store-conditional (LL/SC) instructions.

...read moreread less

Abstract: Newly emerging nonvolatile alternatives to DRAM raise the possibility that applications might compute directly on long-lived data, rather than serializing them to and from a file system or database. To ensure crash consistency, such data must, like a file system or database, provide failure-atomic transactional semantics. Several persistent software transactional memory (STM) systems have been devised to provide these semantics, but only one—the OneFile system of Ramalhete et al.—is nonblocking. Nonblocking progress is desirable to avoid both performance anomalies due to process preemption or failures and deadlock due to priority inversion. Unfortunately, OneFile achieves nonblocking progress at the cost of 2 × space overhead, sacrificing much of the cost and density benefit of nonvolatile memory relative to DRAM. OneFile also requires extensive and intrusive changes to data declarations, and works only on a machine with double-width compare-and-swap (CAS) or load-linked/store-conditional (LL/SC) instructions. To address these limitations, we introduce QSTM, a nonblocking persistent STM that requires neither the modification of target data structures nor the availability of a wide CAS instruction. We describe our system, give arguments for safety and liveness, and compare performance to that of the Mnemosyne and OneFile persistent STM systems. We argue that modest performance costs (within a factor of 2 of OneFile in almost all cases) are easily justified by dramatically lower space overhead and higher programmer convenience.

...read moreread less

Dissertation•

Формална верификација софтверске трансакционе меморије засноване на временским аутоматима

[...]

Бранислав Кордић

03 Feb 2020

Proceedings Article•DOI•

Analysis of Polka Contention Manager for use in Multicore Hard Real-Time Systems

[...]

Adrien Quillet¹, Audrey Queudet², Didier Lime¹•Institutions (2)

École centrale de Nantes¹, University of Nantes²

09 Jun 2020

TL;DR: New realistic assumptions relative to real-time systems are introduced, which allow to ensure wait-freedom guarantees progress when Polka contention manager is considered and prove upper bounds both on the number of abortions and on the execution time of transactions.

...read moreread less

Abstract: Transactional memory (TM) draws the attention of both academic and development groups; indeed this concept offers an alternative to lock-based approaches, easing programmers' work. Despite the large amount of investigations around this topic, the question of the correctness of most TM implementations remains open. More specifically, the lack of upper bounds on the execution time of transactions prevents the use of TM in real-time systems. To address this issue, we introduce new realistic assumptions relative to real-time systems, which allow to ensure wait-freedom guarantees progress (i.e. all transactions progress) when Polka contention manager is considered. In that context, through a thorough formalization of the system, we prove upper bounds both on the number of abortions and on the execution time of transactions.

...read moreread less

Posted Content•

Software Transactional Memory with Interactions

[...]

Marino Miculan¹, Marco Peressotti²•Institutions (2)

University of Udine¹, University of Southern Denmark²

17 Jul 2020-arXiv: Programming Languages

TL;DR: Open Transactional Memory (OTM) is presented, a programming abstraction supporting safe, data-driven interactions between composable memory transactions, by relaxing isolation between transactions, still ensuring atomicity.

...read moreread less

Abstract: Software Transactional memory (STM) is an emerging abstraction for concurrent programming alternative to lock-based synchronizations. Most STM models admit only isolated transactions, which are not adequate in multithreaded programming where transactions need to interact via shared data before committing. To overcome this limitation, in this paper we present Open Transactional Memory (OTM), a programming abstraction supporting safe, data-driven interactions between composable memory transactions. This is achieved by relaxing isolation between transactions, still ensuring atomicity. This model allows for loosely-coupled interactions since transaction merging is driven only by accesses to shared data, with no need to specify participants beforehand.

...read moreread less

Posted Content•

Persistence and Synchronization: Friends or Foes?

[...]

Pradeep Fernando, Irina Calciu, Jayneel Gandhi, Aasheesh Kolli, Ada Gavrilovska - Show less +1 more

26 Dec 2020

TL;DR: In this paper, the impact of combining existing crash-consistency and synchronization methods for achieving performant and correct NVM transactional systems is evaluated, in terms of support for hardware transactional memory and the boundaries of the persistence domain (transient or persistent caches).

...read moreread less

Abstract: Emerging non-volatile memory (NVM) technologies promise memory speed byte-addressable persistent storage with a load/store interface. However, programming applications to directly manipulate NVM data is complex and error-prone. Applications generally employ libraries that hide the low-level details of the hardware and provide a transactional programming model to achieve crash-consistency. Furthermore, applications continue to expect correctness during concurrent executions, achieved through the use of synchronization. To achieve this, applications seek well-known ACID guarantees. However, realizing this presents designers of transactional systems with a range of choices in how to combine several low-level techniques, given target hardware features and workload characteristics. In this paper, we provide a comprehensive evaluation of the impact of combining existing crash-consistency and synchronization methods for achieving performant and correct NVM transactional systems. We consider different hardware characteristics, in terms of support for hardware transactional memory (HTM) and the boundaries of the persistence domain (transient or persistent caches). By characterizing persistent transactional systems in terms of their properties, we make it possible to better understand the tradeoffs of different implementations and to arrive at better design choices for providing ACID guarantees. We use both real hardware with Intel Optane DC persistent memory and simulation to evaluate a persistent version of hardware transactional memory, a persistent version of software transactional memory, and undo/redo logging. Through our empirical study, we show two major factors that impact the cost of supporting persistence in transactional systems: the persistence domain (transient or persistent caches) and application characteristics, such as transaction size and parallelism.

...read moreread less

Dissertation•

Exploring Progress Guarantees in Multi-Version Software Transactional Memory Systems

[...]

Sweta Kumari, Sathya Peri

13 Feb 2020

TL;DR: In this article, the authors explored the weaker progress condition starvation-freedom for transactional memory systems (SV-MV-RWSTM) and proposed a starvation-free multi-Version RWSTM (SF-Mv-RWStM) algorithm.

...read moreread less

Abstract: In the current era of multi-core processors, Software Transactional Memory systems (STMs) have garnered significant interest as an elegant alternative for addressing synchronization and concurrency issues with multi-threaded programming to utilize the cores properly Client programs use STMs by issuing transactions A transaction of STMs is a piece of code that performs reads and writes to the shared memory Typical STMs work on read/write methods which maintain single-version corresponding to each transactional-object or t-object called as Single-Version Read-Write STMs (SV-RWSTMs or RWSTMs) It has been shown in the literature that maintaining multiple versions corresponding to each t-object reduces the number of aborts and enhances performance Several Multi-Version RWSTMs (or MV-RWSTMs) have been proposed in the literature that maintain multiple versions and provide increased concurrency along with better performance than SV-RWSTMs Some STMs work at higher-level operations and ensure greater concurrency than MVRWSTMs and SV-RWSTMs They include more semantically rich operations such as push/pop on stack objects, enqueue/dequeue on queue objects and insert/lookup/delete on sets, trees or hash table objects depending upon the underlying data structure used to implement higher-level systems Such STMs are known as Single-Version Object-based STMs (SV-OSTMs or OSTMs) To achieve the greater concurrency further, researchers have proposed Multi-Version OSTM (or MV-OSTM) which maintains multiple versions corresponding to each t-object in OSTMs MV-OSTM system reduces the number of aborts and improves performance than SV-OSTMs, MV-RWSTMs, and SV-RWSTMs All the STMs defined above ensure that transaction either commits or aborts A transaction aborted due to conflicts (two transactions are said to be in conflict if both of them are accessing same t-object x and at least one of the transaction performs write/update on x) is typically re-issued with the expectation that it will complete successfully in a subsequent incarnation However, many existing STMs fail to provide starvation freedom, ie, in these systems, it is possible that concurrency conflicts may prevent an incarnated transaction from committing To overcome this limitation, we developed an efficient STM system which ensuresstarvationfreedom as a progress condition An STM system is said to be starvation-free if a thread invoking a transaction Ti gets the opportunity to retry Ti on every abort (due to the presence of a fair underlying scheduler with bounded termination) and Ti is not parasitic, ie, Ti will try to commit given a chance then Ti will eventually commit Wait-freedom is another interesting progress condition for STMs in which every transaction commits regardless of the nature of concurrent transactions and the underlying scheduler But it was shown in the literature that it is not possible to achieve wait-freedom in dynamic STMs in which t-objects of transactions are not known in advance So in this thesis, we explore the weaker progress condition starvation-freedom for transactional memory systems (SV-RWSTMs, MV-RWSTMs, SV-OSTMs, and MV-OSTMs) vi while assuming that the t-objects of the transactions are not known in advance Some researchers have explored starvation-freedom in SV-RWSTMs while maintaining single-version corresponding to each t-object We denote such an algorithm as Starvation-Free Single-Version RWSTM (or SF-SV-RWSTM) Although SF-SV-RWSTM guarantees starvationfreedom, but it can still abort many transactions spuriously which brings down the efficiency and progress of the entire system To overcome this issue, we systematically developed a novel and efficient starvation free algorithm as Starvation-free Multi-Version RWSTM (SF-MV-RWSTM) It maintains multiple versions corresponding to each t-object which reduces the number of aborts and enhances the performance than SF-SV-RWSTMs Proposed SF-MV-RWSTM can be used either with the case where the number of versions is unbounded and Garbage Collection (GC) is used to delete unwanted versions as SF-MV-RWSTM-GC or where only the latest K-versions are maintained, as Starvation-Free K-Version RWSTM (or SF-K-RWSTM) Our experimental analysis demonstrates that the proposed SF-K-RWSTM algorithm performs best among its variants (SFMV-RWSTM and SF-MV-RWSTM-GC) along with state-of-the-art STMs under long-running transactions with high contention SF-K-RWSTM satisfies the popular correctness-criteria local opacity and ensures the progress condition as starvation-freedom To achieve starvation-freedom along with higher concurrency, we proposed StarvationFreedom in SV-OSTM as SF-SV-OSTM which assigns the priority to the transaction on abort SF-SV-OSTM satisfies the correctness criteria conflict-opacity while ensuring the progress condition as starvation-freedom To achieve the greater concurrency further while ensuring the starvation-freedom, we maintained multiple versions corresponding to each t-object in starvation-free OSTMs and proposed a novel and efficient Starvation-Freedom Multi-Version OSTM (or SF-MV-OSTM) The number of versions maintained by SF-MV-OSTM either be unbounded with Garbage Collection (GC) as SF-MV-OSTM-GC or bounded with the latest K-versions as SF-K-OSTM SF-K-OSTM ensures starvation-freedom, satisfies the correctness criteria as local opacity and shows the performance benefits as compared with state-of-the-art STMs This thesis explores the progress guarantee starvation-freedom in single and multi-version RWSTMs, single and multi-version OSTMs while satisfying the correctness-criteria as conflictopacity and local opacity It shows that maintaining multiple versions improves the concurrency than single-version while reducing the number of aborts and increasing the throughput This motivated us to use efficient multi-version STMs to improve the performance of smart contract executions in blockchain systems Blockchain platforms such as Ethereum and several others execute complex transactions in blocks through user-defined scripts known as smart contracts Normally, a block of the chain consists of multiple transactions of smart contracts that are added by a miner To append a correct block into the blockchain, miners execute these transactions of smart contracts sequentially Later the validators serially re-execute the smart contract transactions of the block If the validators agree with the final state of the block as recorded by the miner and reach the consensus, then the block is said to be validated and the respective miner gets an incentive on such a valid block successfully added to the blockchain Nowadays, multi-core processors are ubiquitous By employing serial execution of the transactions, the miners and validators fail to utilize the cores properly and as a result, have poor throughput Adding concurrency to smart contracts execution can result in better utilization of the cores and as a result higher throughput In this thesis, we develop a framework to execute the smart contract transactions concurrently by miner using efficient Multi-Version Software Transactional Memory systems (MVSTMs) The miner proposes a block which consists of a set of transactions, block graph, the hash of the previous block and final state of each shared t-object The block graph captures the conflicting relations among the transactions Later, the validators re-execute the same smart contract transactions concurrently and deterministically with the help of the block graph given by miner to verify the final state If the validation is successful then the proposed block is appended into the blockchain as a part of the consensus protocol In the case of the blockchain as a cryptocurrency like Ethereum, Bitcoin, the respective miner also gets a reward for producing the block If validation is not successful, then validator discards the proposed block Concurrent execution of smart contract transactions by miner and validator achieve significant performance gain as compared to serial miner and validator But concurrent execution of smart contracts poses some interesting challenges So, in this thesis, we show how to overcome these challenges to improve the performance of smart contract execution by executing them concurrently using efficient MVSTM protocols

...read moreread less

Proceedings Article•

Software Transactional Memory with Interactions.

[...]

Marino Miculan¹, Marco Peressotti²•Institutions (2)

University of Udine¹, University of Southern Denmark²

01 Jan 2020

TL;DR: Open Transactional Memory (OTM) as mentioned in this paper is a programming abstraction supporting safe, data-driven interactions between composable memory transactions, which allows loosely-coupled interactions since transaction merging is driven only by accesses to shared data.

...read moreread less