scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Language support for lightweight transactions

26 Oct 2003-Vol. 49, Iss: 4, pp 388-402
TL;DR: It is argued that these problems can be addressed by moving to a declarative style of concurrency control in which programmers directly indicate the safety properties that they require, which is easier for mainstream programmers to use, prevents lock-based priority-inversion and deadlock problems and can offer performance advantages.
Abstract: Concurrent programming is notoriously difficult. Current abstractions are intricate and make it hard to design computer systems that are reliable and scalable. We argue that these problems can be addressed by moving to a declarative style of concurrency control in which programmers directly indicate the safety properties that they require. In our scheme the programmer demarks sections of code which execute within lightweight software-based transactions that commit atomically and exactly once. These transactions can update shared data, instantiate objects, invoke library features and so on. They can also block, waiting for arbitrary boolean conditions to become true. Transactions which do not access the same shared memory locations can commit concurrently. Furthermore, in general, no performance penalty is incurred for memory accesses outside transactions.We present a detailed design of this proposal along with an implementation and evaluation. We argue that the resulting system (i) is easier for mainstream programmers to use, (ii) prevents lock-based priority-inversion and deadlock problems and (iii) can offer performance advantages.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
12 Oct 2005
TL;DR: A modern object-oriented programming language, X10, is designed for high performance, high productivity programming of NUCC systems and an overview of the X10 programming model and language, experience with the reference implementation, and results from some initial productivity comparisons between the X 10 and Java™ languages are presented.
Abstract: It is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor to multiprocessor configurations, so as to use parallelism instead of frequency scaling as the foundation for increased compute capacity. The dominant emerging multiprocessor structure for the future is a Non-Uniform Cluster Computing (NUCC) system with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in horizontally scalable cluster configurations such as blade servers. Unlike previous generations of hardware evolution, this shift will have a major impact on existing software. Current OO language facilities for concurrent and distributed programming are inadequate for addressing the needs of NUCC systems because they do not support the notions of non-uniform data access within a node, or of tight coupling of distributed nodes.We have designed a modern object-oriented programming language, X10, for high performance, high productivity programming of NUCC systems. A member of the partitioned global address space family of languages, X10 highlights the explicit reification of locality in the form of places}; lightweight activities embodied in async, future, foreach, and ateach constructs; a construct for termination detection (finish); the use of lock-free synchronization (atomic blocks); and the manipulation of cluster-wide global data structures. We present an overview of the X10 programming model and language, experience with our reference implementation, and results from some initial productivity comparisons between the X10 and Java™ languages.

1,469 citations

Proceedings ArticleDOI
C. Ranger1, R. Raghuraman1, A. Penmetsa1, Gary Bradski1, Christos Kozyrakis1 
10 Feb 2007
TL;DR: It is established that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code.
Abstract: This paper evaluates the suitability of the MapReduce model for multi-core and multi-processor systems. MapReduce was created by Google for application development on data-centers with thousands of servers. It allows programmers to write functional-style code that is automatically parallelized and scheduled in a distributed system. We describe Phoenix, an implementation of MapReduce for shared-memory systems that includes a programming API and an efficient runtime system. The Phoenix runtime automatically manages thread creation, dynamic task scheduling, data partitioning, and fault tolerance across processor nodes. We study Phoenix with multi-core and symmetric multiprocessor systems and evaluate its performance potential and error recovery features. We also compare MapReduce code to code written in lower-level APIs such as P-threads. Overall, we establish that, given a careful implementation, MapReduce is a promising model for scalable performance on shared-memory systems with simple parallel code

1,058 citations

Proceedings ArticleDOI
30 Sep 2008
TL;DR: This paper introduces the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems and uses the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.
Abstract: Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

934 citations

Book ChapterDOI
18 Sep 2006
TL;DR: TL2 as mentioned in this paper is a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique, which is ten times faster than a single lock.
Abstract: The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any system's memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available.

891 citations

Proceedings ArticleDOI
05 Mar 2011
TL;DR: A lightweight, high-performance persistent object system called NV-heaps is implemented that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about.
Abstract: Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent.We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.

850 citations

References
More filters
Book
01 Dec 1989
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Abstract: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high-performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and Web technologies, and high-performance computing.

11,671 citations


"Language support for lightweight tr..." refers background in this paper

  • ...This is important to ensure effective caching [ 10 ]....

    [...]

  • ...Modern cache coherence protocols allow multiple CPUs to concurrently hold the same cache block,so long as they do not attempt to write to it [ 10 ]....

    [...]

Journal ArticleDOI
TL;DR: It is suggested that input and output are basic primitives of programming and that parallel composition of communicating sequential processes is a fundamental program structuring method.
Abstract: This paper suggests that input and output are basic primitives of programming and that parallel composition of communicating sequential processes is a fundamental program structuring method. When combined with a development of Dijkstra's guarded command, these concepts are surprisingly versatile. Their use is illustrated by sample solutions of a variety of a familiar programming exercises.

11,419 citations


"Language support for lightweight tr..." refers background in this paper

  • ...There is a substantial body of work on theoretical models of concurrency, of which Hoare’s Communicating Sequential Processes [ 15 ] is a major starting point....

    [...]

Book
01 Jan 1985

9,210 citations

Book
19 Sep 1996
TL;DR: In this article, the authors present a detailed overview of the Java Virtual Machine, including the internal structure of the class file format, the internal form of Fully Qualified Class and Interface names, and the implementation of new class instances.
Abstract: Preface. 1. Introduction. A Bit of History. The Java Virtual Machine. Summary of Chapters. Notation. 2. Java Programming Language Concepts. Unicode. Identifiers. Literals. Types and Values. Primitive Types and Values. Operators on Integral Values. Floating-Point Types, Value Sets, and Values. Operators on Floating-Point Values. Operators on boolean Values. Reference Types, Objects, and Reference Values. The Class Object. The Class String. Operators on Objects. Variables. Initial Values of Variables. Variables Have Types, Objects Have Classes. Conversions and Promotions. Identity Conversions. Widening Primitive Conversions. Narrowing Primitive Conversions. Widening Reference Conversions. Narrowing Reference Conversions. Value Set Conversion. Assignment Conversion. Method Invocation Conversion. Casting Conversion. Numeric Promotion. Names and Packages. Names. Packages. Members. Package Members. The Members of a Class Type. The Members of an Interface Type. The Members of an Array Type. Qualified Names and Access Control. Fully Qualified Names. Classes. Class Names. Class Modifiers. Superclasses and Subclasses. The Class Members. Fields. Field Modifiers. Initialization of Fields. Methods. Formal Parameters. Method Signature. Method Modifiers. Static Initializers. Constructors. Constructor Modifiers. Interfaces. Interface Modifiers. Superinterfaces. Interface Members. Interface (Constant) Fields. Interface (Abstract) Methods. Overriding, Inheritance, and Overloading in Interfaces. Nested Classes and Interfaces. Arrays. Array Types. Array Variables. Array Creation. Array Access. Exceptions. The Causes of Exceptions. Handling an Exception. The Exception Hierarchy. The Classes Exception and RuntimeException. Execution. Virtual Machine Start-up. Loading. Linking: Verification, Preparation, and Resolution. Initialization. Detailed Initialization Procedure. Creation of New Class Instances. Finalization of Class Instances. Unloading of Classes and Interfaces. Virtual Machine Exit. FP-strict Expressions. Threads. 3. The Structure of the Java Virtual Machine. The class File Format. Data Types. Primitive Types and Values. Integral Types and Values. Floating-Point Types, Value Sets, and Values. The returnAddress Type and Values. The boolean Type. Reference Types and Values. Runtime Data Areas. The pc Register. Java Virtual Machine Stacks. Heap. Method Area. Runtime Constant Pool. Native Method Stacks. Frames. Local Variables. Operand Stacks. Dynamic Linking. Normal Method Invocation Completion. Abrupt Method Invocation Completion. Additional Information. Representation of Objects. Floating-Point Arithmetic. Java Virtual Machine Floating-Point Arithmetic and IEEE 754. Floating-Point Modes. Value Set Conversion. Specially Named Initialization Methods. Exceptions. Instruction Set Summary. Types and the Java Virtual Machine. Load and Store Instructions. Arithmetic Instructions. Type Conversion Instructions. Object Creation and Manipulation. Operand Stack Management Instructions. Control Transfer Instructions. Method Invocation and Return Instructions. Throwing Exceptions. Implementing finally. Synchronization. Class Libraries. Public Design, Private Implementation. 4. The class File Format. The ClassFile Structure. The Internal Form of Fully Qualified Class and Interface Names. Descriptors. Grammar Notation. Field Descriptors. Method Descriptors. The Constant Pool. The CONSTANT_Class_info Structure. The CONSTANT_Fieldref_info, CONSTANT_Methodref_info, and CONSTANT_InterfaceMethodref_info Structures. The CONSTANT_String_info Structure. The CONSTANT_Integer_info and CONSTANT_Float_info Structures. The CONSTANT_Long_info and CONSTANT_Double_info Structures. The CONSTANT_NameAndType_info Structure. The CONSTANT_Utf8_info Structure. Fields. Methods. Attributes. Defining and Naming New Attributes. The ConstantValue Attribute. The Code Attribute. The Exceptions Attribute. The InnerClasses Attribute. The Synthetic Attribute. The SourceFile Attribute. The LineNumberTable Attribute. The LocalVariableTable Attribute. The Deprecated Attribute. Constraints on Java Virtual Machine Code. Static Constraints. Structural Constraints. Verification of class Files. The Verification Process. The Bytecode Verifier. Values of Types long and double. Instance Initialization Methods and Newly Created Objects. Exception Handlers. Exceptions and finally. Limitations of the Java Virtual Machine. 5. Loading, Linking, and Initializing. The Runtime Constant Pool. Virtual Machine Start-up. Creation and Loading. Loading Using the Bootstrap Class Loader. Loading Using a User-defined Class Loader. Creating Array Classes. Loading Constraints. Deriving a Class from a class File Representation. Linking. Verification. Preparation. Resolution. Access Control. Initialization. Binding Native Method Implementations. 6. The Java Virtual Machine Instruction Set. Assumptions: The Meaning of "Must." Reserved Opcodes. Virtual Machine Errors. Format of Instruction Descriptions. 7. Compiling for the Java Virtual Machine. Format of Examples. Use of Constants, Local Variables, and Control Constructs. Arithmetic. Accessing the Runtime Constant Pool. More Control Examples. Receiving Arguments. Invoking Methods. Working with Class Instances. Arrays. Compiling Switches. Operations on the Operand Stack. Throwing and Handling Exceptions. Compiling finally. Synchronization. Compiling Nested Classes and Interfaces. 8. Threads and Locks. Terminology and Framework. Execution Order and Consistency. Rules About Variables. Nonatomic Treatment of double and long Variables. Rules About Locks. Rules About the Interaction of Locks and Variables. Rules for volatile Variables. Prescient Store Operations. Discussion. Example: Possible Swap. Example: Out-of-Order Writes. Threads. Locks and Synchronization. Wait Sets and Notification. 9. Opcode Mnemonics by Opcode. Appendix: Summary of Clarifications and Amendments. Index. 0201432943T04062001

3,111 citations

Proceedings ArticleDOI
01 May 1993
TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Abstract: A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock-free synchronization as efficient (and easy to use) as conventional techniques based on mutual exclusion. Transactional memory allows programmers to define customized read-modify-write operations that apply to multiple, independently-chosen words of memory. It is implemented by straightforward extensions to any multiprocessor cache-coherence protocol. Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.

2,406 citations


"Language support for lightweight tr..." refers background or methods in this paper

  • ...One promising example is transactional memory [14], which allows memory accesses to be grouped into transactions which either commit, becoming globally visible at the same instant in time, or abort without being observed....

    [...]

  • ...already been made for hardware transactional memories, with suggestions for implementation techniques based on extended cache coherence protocols [14]....

    [...]