scispace - formally typeset
Search or ask a question

Showing papers by "Tatiana Shpeisman published in 2007"


Proceedings ArticleDOI
14 Mar 2007
TL;DR: New language constructs to support open nesting in Java are described, and it is demonstrated how these constructs can be mapped efficiently to existing STM data structures, demonstrating how open nesting can enhance application scalability.
Abstract: Transactional memory (TM) promises to simplify concurrent programming while providing scalability competitive to fine-grained locking. Language-based constructs allow programmers to denote atomic regions declaratively and to rely on the underlying system to provide transactional guarantees along with concurrency. In contrast with fine-grained locking, TM allows programmers to write simpler programs that are composable and deadlock-free.TM implementations operate by tracking loads and stores to memory and by detecting concurrent conflicting accesses by different transactions. By automating this process, they greatly reduce the programmer's burden, but they also are forced to be conservative. Incertain cases, conflicting memory accesses may not actually violate the higher-level semantics of a program, and a programmer may wish to allow seemingly conflicting transactions to execute concurrently.Open nested transactions enable expert programmers to differentiate between physical conflicts, at the level of memory, and logical conflicts that actually violate application semantics. A TMsystem with open nesting can permit physical conflicts that are not logical conflicts, and thus increase concurrency among application threads.Here we present an implementation of open nested transactions in a Java-based software transactional memory (STM)system. We describe new language constructs to support open nesting in Java, and we discuss new abstract locking mechanisms that a programmer can use to prevent logical conflicts. We demonstrate how these constructs can be mapped efficiently to existing STM data structures. Finally, we evaluate our system on a set of Java applications and data structures, demonstrating how open nesting can enhance application scalability.

212 citations


Proceedings ArticleDOI
10 Jun 2007
TL;DR: The results on a set of Java programs show that strong atomicity can be implemented efficiently in a high-performance STM system and introduces a dynamic escape analysis that differentiates private and public data at runtime to make barriers cheaper and a static not-accessed-in-transaction analysis that removes many barriers completely.
Abstract: Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can result in unexpected, implementation-dependent behavior. To guarantee isolation and consistent ordering in such a system, programmers are expected to enclose all shared-memory accesses inside transactions.A system that provides strong atomicity guarantees isolation even in the presence of threads that access shared data outside transactions. A strongly-atomic system also orders transactions with conflicting non-transactional memory operations in a consistent manner.In this paper, we discuss some surprising pitfalls of weak atomicity, and we present an STM system that avoids these problems via strong atomicity. We demonstrate how to implement non-transactional data accesses via efficient read and write barriers, and we present compiler optimizations that further reduce the overheads of these barriers. We introduce a dynamic escape analysis that differentiates private and public data at runtime to make barriers cheaper and a static not-accessed-in-transaction analysis that removes many barriers completely. Our results on a set of Java programs show that strong atomicity can be implemented efficiently in a high-performance STM system.

209 citations


Proceedings ArticleDOI
21 Mar 2007
TL;DR: This paper presents the architecture of McRT and discusses the experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale.
Abstract: Hardware trends suggest that large-scale CMP architectures, with tens to hundreds of processing cores on a single piece of silicon, are iminent within the next decade. While existing CMP machines have traditionally been handled in the same way as SMPs, this magnitude of parallelism introduces several fundamental challenges at the architectural level and this, in turn, translates to novel challenges in the design of the software stack for these platforms. This paper presents the "Many Core Run Time" (McRT), a software prototype of an integrated language runtime that was designed to explore configurations of the software stack for enabling performance and scalability on large scale CMP platforms. This paper presents the architecture of McRT and discusses our experiences with the system, including experimental evaluation that lead to several interesting, non-intuitive findings, providing key insights about the structure of the system stack at this scale. A key contribution of this paper is to demonstrate how McRT enables near linear improvements in performance and scalability for desktop workloads such as the popular XviD encoder and a set of RMS (recognition, mining, and synthesis) applications. Another key contribution of this work is its use of McRT to explore non-traditional system configurations such as a light-weight executive in which McRT runs on "bare metal" and replaces the traditional OS. Such configurations are becoming an increasingly attractive alternative to leverage heterogeneous computing uints as seen in today's CPU-GPU configurations.

83 citations


Patent
30 Dec 2007
TL;DR: In this paper, a method and apparatus for providing efficient strong atomicity is described, where optimized strong operations are inserted at non-transactional read accesses to provide efficient strong atomicicity.
Abstract: A method and apparatus for providing efficient strong atomicity is herein described. Optimized strong operations may be inserted at non-transactional read accesses to provide efficient strong atomicity. A global transaction value is copied at a beginning of a non-transational function to a local transaction value; essentially creating a local timestamp of the global transaction value. At a non-transactional memory access within the function, a counter value or version value is compared to the LTV to see if a transaction has started updating memory locations, or specifically the memory location accessed. If memory locations have not been updated by a transaction, execution is accelerated by avoiding a full set of slowpath strong atomic operations to ensure validity of data accessed. In contrast, the slowpath operations may be executed to resolve contention between a transactional and non-transaction access contending for the same memory location.

48 citations


Patent
28 Sep 2007
TL;DR: In this article, a cache line is resident in a cache and this is not the first time that the cache line has been read since the last write, then the data may be read directly from the cache lines, improving performance.
Abstract: In accordance with some embodiments, software transactional memory may be used for both managed and unmanaged environments. If a cache line is resident in a cache and this is not the first time that the cache line has been read since the last write, then the data may be read directly from the cache line, improving performance. Otherwise, a normal read may be utilized to read the information. Similarly, write performance can be accelerated in some instances to improve performance.

41 citations