Author
Michael Spear
Other affiliations: Microsoft, University of Rochester, Intel
Bio: Michael Spear is an academic researcher from Lehigh University. The author has contributed to research in topics: Transactional memory & Software transactional memory. The author has an hindex of 29, co-authored 93 publications receiving 2986 citations. Previous affiliations of Michael Spear include Microsoft & University of Rochester.
Papers published on a yearly basis
Papers
More filters
••
09 Jan 2010TL;DR: An ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusually clean semantics is presented, and the experience suggests that NOrec may be an ideal candidate for such a software system.
Abstract: Drawing inspiration from several previous projects, we present an ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusually clean semantics. While unlikely to scale to hundreds of active threads, this "NOrec" system offers many appealing features: very low fast-path latency--as low as any system we know of that admits concurrent updates; publication and privatization safety; livelock freedom; a small, constant amount of global metadata, and full compatibility with existing data structure layouts; no false conflicts due to hash collisions; compatibility with both managed and unmanaged languages, and both static and dynamic compilation; and easy acccommodation of closed nesting, inevitable (irrevocable) transactions, and starvation avoidance mechanisms. To the best of our knowledge, no extant STM system combines this set of features.While transactional memory for processors with hundreds of cores is likely to require hardware support, software implementations will be required for backward compatibility with current and near-future processors with 2--64 cores, as well as for fall-back in future machines when hardware resources are exhausted. Our experience suggests that NOrec may be an ideal candidate for such a software system. We also observe that it has considerable appeal for use within the operating system, and in systems that require both closed nesting and publication safety.
327 citations
01 Jan 2006
TL;DR: This work considers the design of low-overhead, obstruction-free software transactional memory for non-garbage-collected languages and eliminates dynamic allocation of transactional metadata and co-locates data that are separate in other systems, thereby reducing the expected number of cache misses on the common-case code path.
Abstract: Recent years have seen the development of several dierent systems for software transactional memory (STM). Most either employ locks in the underlying implementation or depend on thread-safe general-purpose garbage collection to collect stale data and metadata. We consider the design of low-overhead, obstruction-free software transactional memory for non-garbage-collected languages. Our design eliminates dynamic allocation of transactional metadata and co-locates data that are separate in other systems, thereby reducing the expected number of cache misses on the common-case code path, while preserving nonblocking progress and requiring no atomic instructions other than single-word load, store, and compare-and-swap (or load-linked/store-conditional). We also employ a simple, epoch-based storage management system and introduce a novel conservative mechanism to make reader transactions visible to writers without inducing additional metadata copying or dynamic allocation. Experimental results show throughput significantly higher than that of existing nonblocking STM systems, and highlight significant, application-specific dierences among conflict detection and validation strategies.
183 citations
••
12 Aug 2007TL;DR: It is argued that privatization comprises a pair of symmetric subproblems: private operations may fail to see updates made by transactions that have committed but not yet completed; conversely, transactions that are doomed but have not yet aborted may see Updates made by private code, causing them to perform erroneous, externally visible operations.
Abstract: Early implementations of software transactional memory (STM) assumed that sharable data would be accessed only within transactions. Memory may appear inconsistent in programs that violate this assumption, even when program logic would seem to make extra-transactional accesses safe. Designing STM systems that avoid such inconsistency has been dubbed the privatization problem. We argue that privatization comprises a pair of symmetric subproblems: private operations may fail to see updates made by transactions that have committed but not yet completed; conversely, transactions that are doomed but have not yet aborted may see updates made by private code, causing them to perform erroneous, externally visible operations. We explain how these problems arise in different styles of STM, present strategies to address them, and discuss their implementation tradeoffs. We also propose a taxonomy of contracts between the system and the user, analogous to programmer-centric memory consistency models, which allow us to classify programs based on their privatization requirements. Finally, we present empirical comparisons of several privatization strategies. Our results suggest that the best strategy may depend on application characteristics.
163 citations
••
14 Jun 2008TL;DR: The RingSTM system is the first STM that is inherently livelock-free and privatization-safe while at the same time permitting parallel writeback by concurrent disjoint transactions.
Abstract: Existing Software Transactional Memory (STM) designs attach metadata to ranges of shared memory; subsequent runtime instructions read and update this metadata in order to ensure that an in-flight transaction's reads and writes remain correct. The overhead of metadata manipulation and inspection is linear in the number of reads and writes performed by a transaction, and involves expensive read-modify-write instructions, resulting in substantial overheads.We consider a novel approach to STM, in which transactions represent their read and write sets as Bloom filters, and transactions commit by enqueuing a Bloom filter onto a global list. Using this approach, our RingSTM system requires at most one read-modify-write operation for any transaction, and incurs validation overhead linear not in transaction size, but in the number of concurrent writers who commit. Furthermore, RingSTM is the first STM that is inherently livelock-free and privatization-safe while at the same time permitting parallel writeback by concurrent disjoint transactions.We evaluate three variants of the RingSTM algorithm, and find that it offers superior performance and/or stronger semantics than the state-of-the-art TL2 algorithm under a number of workloads.
155 citations
••
14 Feb 2009TL;DR: Experimental evaluation demonstrates that the overhead of the mechanisms is low, particularly when conflicts are rare, and that the strategy as a whole provides good throughput and fairness, including livelock and starvation freedom, even for challenging workloads.
Abstract: In Software Transactional Memory (STM), contention management refers to the mechanisms used to ensure forward progress--to avoid livelock and starvation, and to promote throughput and fairness. Unfortunately, most past approaches to contention management were designed for obstruction-free STM frameworks, and impose significant constant-time overheads. Priority-based approaches in particular typically require that reads be visible to all transactions, an expensive property that is not easy to support in most STM systems.In this paper we present a comprehensive strategy for contention management via fair resolution of conflicts in an STM with invisible reads. Our strategy depends on (1) lazy acquisition of ownership, (2) extendable timestamps, and (3) an efficient way to capture both priority and conflicts. We introduce two mechanisms--one using Bloom filters, the other using visible read bits--that implement point (3). These mechanisms unify the notions of conflict resolution, inevitability, and transaction retry. They are orthogonal to the rest of the contention management strategy, and could be used in a wide variety of hardware and software TM systems. Experimental evaluation demonstrates that the overhead of the mechanisms is low, particularly when conflicts are rare, and that our strategy as a whole provides good throughput and fairness, including livelock and starvation freedom, even for challenging workloads.
149 citations
Cited by
More filters
••
30 Sep 2008TL;DR: This paper introduces the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems and uses the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.
Abstract: Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.
934 citations
••
05 Mar 2011TL;DR: In tests emulating the performance characteristics of forthcoming SCMs, Mnemosyne can persist data as fast as 3 microseconds and can be up to 1400% faster than alternative persistence strategies, such as Berkeley DB or Boost serialization, that are designed for disks.
Abstract: New storage-class memory (SCM) technologies, such as phase-change memory, STT-RAM, and memristors, promise user-level access to non-volatile storage through regular memory instructions. These memory devices enable fast user-mode access to persistence, allowing regular in-memory data structures to survive system crashes.In this paper, we present Mnemosyne, a simple interface for programming with persistent memory. Mnemosyne addresses two challenges: how to create and manage such memory, and how to ensure consistency in the presence of failures. Without additional mechanisms, a system failure may leave data structures in SCM in an invalid state, crashing the program the next time it starts.In Mnemosyne, programmers declare global persistent data with the keyword "pstatic" or allocate it dynamically. Mnemosyne provides primitives for directly modifying persistent variables and supports consistent updates through a lightweight transaction mechanism. Compared to past work on disk-based persistent memory, Mnemosyne reduces latency to storage by writing data directly to memory at the granularity of an update rather than writing memory pages back to disk through the file system. In tests emulating the performance characteristics of forthcoming SCMs, we show that Mnemosyne can persist data as fast as 3 microseconds. Furthermore, it provides a 35 percent performance increase when applied in the OpenLDAP directory server. In microbenchmark studies we find that Mnemosyne can be up to 1400% faster than alternative persistence strategies, such as Berkeley DB or Boost serialization, that are designed for disks.
873 citations
••
05 Mar 2011TL;DR: A lightweight, high-performance persistent object system called NV-heaps is implemented that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about.
Abstract: Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent.We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.
850 citations
••
20 Feb 2008TL;DR: Opacity is defined as a property of concurrent transaction histories and its graph theoretical interpretation is given and it is proved that every single-version TM system that uses invisible reads and does not abort non-conflicting transactions requires, in the worst case, k steps for an operation to terminate.
Abstract: Transactional memory (TM) is perceived as an appealing alternative to critical sections for general purpose concurrent programming. Despite the large amount of recent work on TM implementations, however, very little effort has been devoted to precisely defining what guarantees these implementations should provide. A formal description of such guarantees is necessary in order to check the correctness of TM systems, as well as to establish TM optimality results and inherent trade-offs.This paper presents opacity, a candidate correctness criterion for TM implementations. We define opacity as a property of concurrent transaction histories and give its graph theoretical interpretation. Opacity captures precisely the correctness requirements that have been intuitively described by many TM designers. Most TM systems we know of do ensure opacity.At a very first approximation, opacity can be viewed as an extension of the classical database serializability property with the additional requirement that even non-committed transactions are prevented from accessing inconsistent states. Capturing this requirement precisely, in the context of general objects, and without precluding pragmatic strategies that are often used by modern TM implementations, such as versioning, invisible reads, lazy updates, and open nesting, is not trivial.As a use case of opacity, we prove the first lower bound on the complexity of TM implementations. Basically, we show that every single-version TM system that uses invisible reads and does not abort non-conflicting transactions requires, in the worst case, ?(k) steps for an operation to terminate, where k is the total number of objects shared by transactions. This (tight) bound precisely captures an inherent trade-off in the design of TM systems. The bound also highlights a fundamental gap between systems in which transactions can be fully isolated from the outside environment, e.g., databases or certain specialized transactional languages, and systems that lack such isolation capabilities, e.g., general TM frameworks.
500 citations