scispace - formally typeset
Search or ask a question

Showing papers on "Concurrency control published in 2008"


Proceedings ArticleDOI
09 Jun 2008
TL;DR: Overall, overheads and optimizations that explain a total difference of about a factor of 20x in raw performance are identified and it is shown that there is no single "high pole in the tent" in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations.
Abstract: Online Transaction Processing (OLTP) databases include a suite of features - disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading - that were optimized for computer technology of the late 1970's Advances in modern processors, memories, and networks mean that today's computers are vastly different from those of 30 years ago, such that many OLTP databases will now fit in main memory, and most OLTP transactions can be processed in milliseconds or less Yet database architecture has changed littleBased on this observation, we look at some interesting variants of conventional database systems that one might build that exploit recent hardware trends, and speculate on their performance through a detailed instruction-level breakdown of the major components involved in a transaction processing database system (Shore) running a subset of TPC-C Rather than simply profiling Shore, we progressively modified it so that after every feature removal or optimization, we had a (faster) working system that fully ran our workload Overall, we identify overheads and optimizations that explain a total difference of about a factor of 20x in raw performance We also show that there is no single "high pole in the tent" in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations

331 citations


Journal ArticleDOI
TL;DR: The promise of STM may likely be undermined by its overheads and workload applicabilities.
Abstract: TM (transactional memory) is a concurrency control paradigm that provides atomic and isolated execution for regions of code. TM is considered by many researchers to be one of the most promising sol...

252 citations


Journal ArticleDOI
01 Nov 2008
TL;DR: In this paper, a novel method is proposed that provides small size controllers, based on a set covering approach that conveniently relates siphons and markings, and is compared with other methods proposed in the literature.
Abstract: Deadlock prevention is a crucial step in the modeling of flexible manufacturing systems. In the Petri net framework, deadlock prevention policies based on siphon control are often employed, since it is easy to specify generalized mutual exclusion constraints that avoid the emptying of siphons. However, such policies may require an excessive computational load and result in impractical oversized control subnets. This is often a consequence of the redundancy in the control conditions derived from siphons. In this paper, a novel method is proposed that provides small size controllers, based on a set covering approach that conveniently relates siphons and markings. Some examples are provided to demonstrate the feasibility of the approach and to compare it with other methods proposed in the literature.

222 citations


Journal ArticleDOI
01 Jan 2008
TL;DR: A deadlock control policy is proposed and proved to be computationally efficient and less conservative than the existing policies in the literature and an industrial case study is used to show the results.
Abstract: In many flexible assembly systems, base components are transported with pallets; parts to be mounted onto the base ones are transported by trays with no pallets. When an assembly operation is performed by using some parts in a tray but not all, the tray with the remaining parts still occupies a buffer space. In this way, an assembly/disassembly material flow is formed. In such a material flow, deadlock can occur both in the base component and part flow. Furthermore, the assembly operations can also result in a deadlock. Thus, it is a great challenge to tackle deadlocks in such processes. This paper models them using resource-oriented Petri nets. Based on the models, a deadlock control policy is proposed and proved to be computationally efficient and less conservative than the existing policies in the literature. An industrial case study is used to show the results.

215 citations


Journal ArticleDOI
TL;DR: It is observed that the overall performance of TM is significantly worse at low levels of parallelism, which is likely to limit the adoption of this programming paradigm.
Abstract: TM (transactional memory) is a concurrency control paradigm that provides atomic and isolated execution for regions of code. TM is considered by many researchers to be one of the most promising solutions to address the problem of programming multicore processors. Its most appealing feature is that most programmers only need to reason locally about shared data accesses, mark the code region to be executed transactionally, and let the underlying system ensure the correct concurrent execution. This model promises to provide the scalability of fine-grain locking, while avoiding common pitfalls of lock composition such as deadlock. In this article we explore the performance of a highly optimized STM and observe that the overall performance of TM is significantly worse at low levels of parallelism, which is likely to limit the adoption of this programming paradigm.

177 citations


Journal ArticleDOI
01 Jun 2008
TL;DR: FlexTM (FLEXible Transactional Memory) is described, an STM-inspired protocol that uses CSTs to manage conflicts in a distributed manner (no global arbitration) and allows parallel commits and its distributed commit protocol is also more efficient than a central hardware manager.
Abstract: A high-concurrency transactional memory (TM) implementation needs to track concurrent accesses, buffer speculative updates, and manage conflicts. We present a system, FlexTM (FLEXible Transactional Memory), that coordinates four decoupled hardware mechanisms: read and write signatures, which summarize per-thread access sets; per-thread conflict summary tables (CSTs), which identify the threads with which conflicts have occurred; Programmable Data Isolation, which maintains speculative updates in the local cache and employs a thread-private buffer (in virtual memory) in the rare event of overflow; and Alert-On-Update, which selectively notifies threads about coherence events. All mechanisms are software-accessible, to enable virtualization and to support transactions of arbitrary length. FlexTM allows software to determine when to manage conflicts (either eagerly or lazily), and to employ a variety of conflict management and commit protocols. We describe an STM-inspired protocol thatuses CSTs to manage conflicts in a distributed manner (no global arbitration) and allows parallel commits. In experiments with a prototype on Simics/GEMS, FlexTM exhibits 5x speedup over high-quality software TM, with no loss in policy flexibility. Its distributed commit protocol is also more efficient than a central hardware manager. Our results highlight the importance of flexibility in determining when to manage conflicts: lazy maximizes concurrency and helps to ensure forward progress while eager provides better overall utilization in a multi-programmed system.

141 citations


Proceedings ArticleDOI
19 Oct 2008
TL;DR: A software transactional memory system that introduces first-class C++ language constructs for transactional programming, a production-quality optimizing C++ compiler that translates and optimizes these extensions, and a high-performance STM runtime library are presented.
Abstract: This paper presents a software transactional memory system that introduces first-class C++ language constructs for transactional programming. We describe new C++ language extensions, a production-quality optimizing C++ compiler that translates and optimizes these extensions, and a high-performance STM runtime library. The transactional language constructs support C++ language features including classes, inheritance, virtual functions, exception handling, and templates. The compiler automatically instruments the program for transactional execution and optimizes TM overheads. The runtime library implements multiple execution modes and implements a novel STM algorithm that supports both optimistic and pessimistic concurrency control. The runtime switches a transaction's execution mode dynamically to improve performance and to handle calls to precompiled functions and I/O libraries. We present experimental results on 8 cores (two quad-core CPUs) running a set of 20 non-trivial parallel programs. Our measurements show that our system scales well as the numbers of cores increases and that our compiler and runtime optimizations improve scalability.

135 citations


Proceedings ArticleDOI
10 May 2008
TL;DR: This paper presents a dynamic analysis for detecting violations of atomic-set serializability, and shows that a set of problematic data access patterns characterize executions that are not atomic- set serializable.
Abstract: Previously we presented atomic sets, memory locations that share some consistency property, and units of work, code fragments that preserve consistency of atomic sets on which they are declared. We also proposed atomic-set serializability as a correctness criterion for concurrent programs, stating that units of work must be serializable for each atomic set. We showed that a set of problematic data access patterns characterize executions that are not atomic-set serializable. Our criterion subsumes data races (single-location atomic sets) and serializability (all locations in one set). In this paper, we present a dynamic analysis for detecting violations of atomic-set serializability. The analysis can be implemented efficiently, and does not depend on any specific synchronization mechanism. We implemented the analysis and evaluated it on a suite of real programs and benchmarks. We found a number of known errors as well as several problems not previously reported.

123 citations


Journal ArticleDOI
01 May 2008
TL;DR: A polynomial-time algorithm for finding the set of elementary siphons is proposed, which avoids complete siphon enumeration and it is shown that a dependent siphon can always be controlled by properly supervising its Elementary siphons.
Abstract: As a structural object, siphons are well recognized in the analysis and control of deadlocks in resource allocation systems modeled with Petri nets. Many deadlock prevention policies characterize the deadlock behavior of the systems in terms of siphons and utilize this characterization to avoid deadlocks. This paper develops a novel methodology to find interesting siphons for deadlock control purposes in a class of Petri nets, i.e., a system of simple sequential processes with resources . Resource circuits in an are first detected, from which, in general, a small portion of emptiable minimal siphons can be derived. The remaining emptiable ones can be found by their composition. A polynomial-time algorithm for finding the set of elementary siphons is proposed, which avoids complete siphon enumeration. It is shown that a dependent siphon can always be controlled by properly supervising its elementary siphons. A computationally efficient deadlock control policy is accordingly developed. Experimental study shows the efficiency of the proposed siphon computation approach.

119 citations


Journal ArticleDOI
01 Aug 2008
TL;DR: This paper describes how a temporal indexing technique, the TSB-tree, was integrated into Immortal DB to serve as the core access method, creating a high-performance transaction time database system built into an RDBMS engine which has not been achieved before.
Abstract: Immortal DB is a transaction time database system designed to enable high performance for temporal applications. It is built into a commercial database engine, Microsoft SQL Server. This paper describes how we integrated a temporal indexing technique, the TSB-tree, into Immortal DB to serve as the core access method. The TSB-tree provides high performance access and update for both current and historical data. A main challenge was integrating TSB-tree functionality while preserving original B+tree functionality, including concurrency control and recovery. We discuss the overall architecture, including our unique treatment of index terms, and practical issues such as uncommitted data and log management. Performance is a primary concern. To increase performance, versions are locally delta compressed, exploiting the commonality between adjacent versions of the same record. This technique is also applied to index terms in index pages. There is a tradeoff between query performance and storage space. We discuss optimizing performance regarding this tradeoff throughout the paper. The result of our efforts is a high-performance transaction time database system built into an RDBMS engine, which has not been achieved before. We include a thorough experimental study and analysis that confirms the very good performance that it achieves.

115 citations


Proceedings ArticleDOI
08 Nov 2008
TL;DR: A model and implementation of dependence-aware transactional memory (DATM), a novel solution to the problem of scaling under contention, and shows that DATM increases concurrency, for example by reducing the runtime of STAMP benchmarks by up to 39% and reducing transaction restarts byUp to 94%.
Abstract: Transactional memory (TM) is a promising paradigm for helping programmers take advantage of emerging multi-core platforms. Though they perform well under low contention, hardware TM systems have a reputation of not performing well under high contention, as compared to locks. This paper presents a model and implementation of dependence-aware transactional memory (DATM), a novel solution to the problem of scaling under contention. Unlike many proposals to deal with write-shared data (which arise in common data structures like counters and linked lists), DATM operates transparently to the programmer. The main idea in DATM is to accept any transaction execution interleaving that is conflict serializable, including interleavings that contain simple conflicts. Current TM systems reduce useful concurrency by restarting conflicting transactions, even if the execution interleaving is conflict serializable. DATM manages dependences between uncommitted transactions, sometimes forwarding data between them to safely commit conflicting transactions. The evaluation of our prototype shows that DATM increases concurrency, for example by reducing the runtime of STAMP benchmarks by up to 39% and reducing transaction restarts by up to 94%.

Proceedings ArticleDOI
09 Jun 2008
TL;DR: A modification to the concurrency control algorithm of a database management system is described that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation.
Abstract: Many popular database management systems offer snapshot isolation rather than full serializability. There are well-known anomalies permitted by snapshot isolation that can lead to violations of data consistency by interleaving transactions that individually maintain consistency. Until now, the only way to prevent these anomalies was to modify the applications by introducing artificial locking or update conflicts, following careful analysis of conflicts between all pairs of transactions.This paper describes a modification to the concurrency control algorithm of a database management system that automatically detects and prevents snapshot isolation anomalies at runtime for arbitrary applications, thus providing serializable isolation. The new algorithm preserves the properties that make snapshot isolation attractive, including that readers do not block writers and vice versa. An implementation and performance study of the algorithm are described, showing that the throughput approaches that of snapshot isolation in most cases.

Proceedings ArticleDOI
07 Apr 2008
TL;DR: This work evaluates the performance impact of proposals for modifying the application programs, without changing their semantics, so that they are certain to execute serializably even on an engine that uses SI, and gives guidelines on which conflicts to introduce so as to ensure correctness with little impact on performance.
Abstract: Several common DBMS engines use the multi- version concurrency control mechanism called Snapshot Isolation, even though application programs can experience non- serializable executions when run concurrently on such a platform. Several proposals exist for modifying the application programs, without changing their semantics, so that they are certain to execute serializably even on an engine that uses SI. We evaluate the performance impact of these proposals, and find that some have limited impact (only a few percent drop in throughput at a given multi-programming level) while others lead to much greater reduction in throughput of up-to 60% in high contention scenarios. We present experimental results for both an open- source and a commercial engine. We relate these to the theory, giving guidelines on which conflicts to introduce so as to ensure correctness with little impact on performance.

Proceedings ArticleDOI
01 Jan 2008

Proceedings ArticleDOI
20 Feb 2008
TL;DR: A run-time tool that computes the dependence densities from a deterministic single-threaded program execution provides insights into the potential for optimistic parallelization, opportunities for algorithmic scheduling, and performance defects due to synchronization bottlenecks.
Abstract: This work presents a quantitative approach to analyze parallelization opportunities in programs with irregular memory access where potential data dependencies mask available parallelism. The model captures data and causal dependencies among critical sections as algorithmic properties and quantifies them as a density computed over the number of executed instructions. The model abstracts from runtime aspects such as scheduling, the number of threads, and concurrency control used in a particular parallelization.We illustrate the model on several applications requiring ordered and unordered execution of critical sections. We describe a run-time tool that computes the dependence densities from a deterministic single-threaded program execution. This density metric provides insights into the potential for optimistic parallelization, opportunities for algorithmic scheduling, and performance defects due to synchronization bottlenecks.Based on the results of our analysis, we classify applications into three categories with low, medium, and high dependence densities. Applications with low dependence density are naturally good candidates for optimistic concurrency, applications with medium density may require a scheduler that is aware of the algorithmic dependencies for optimistic concurrency to be effective, and applications with high dependence density may not be suitable for parallelization.

Proceedings ArticleDOI
07 Jun 2008
TL;DR: It is shown that, under certain conditions, the verification problem can be reduced to a finite-state problem, and the use of the method is illustrated by proving the correctness of several STMs, including two-phase locking, DSTM, TL2, and optimistic concurrency control.
Abstract: Model checking software transactional memories (STMs) is difficult because of the unbounded number, length, and delay of concurrent transactions and the unbounded size of the memory We show that, under certain conditions, the verification problem can be reduced to a finite-state problem, and we illustrate the use of the method by proving the correctness of several STMs, including two-phase locking, DSTM, TL2, and optimistic concurrency control The safety properties we consider include strict serializability and opacity; the liveness properties include obstruction freedom, livelock freedom, and wait freedomOur main contribution lies in the structure of the proofs, which are largely automated and not restricted to the STMs mentioned above In a first step we show that every STM that enjoys certain structural properties either violates a safety or liveness requirement on some program with two threads and two shared variables, or satisfies the requirement on all programs In the second step we use a model checker to prove the requirement for the STM applied to a most general program with two threads and two variables In the safety case, the model checker constructs a simulation relation between two carefully constructed finite-state transition systems, one representing the given STM applied to a most general program, and the other representing a most liberal safe STM applied to the same program In the liveness case, the model checker analyzes fairness conditions on the given STM transition system

Proceedings ArticleDOI
19 Oct 2008
TL;DR: Measurements on a set of transactional and non-transactional Java workloads demonstrate that the techniques presented substantially reduce the overhead of strong atomicity from a factor of 5x down to 10% or less over an efficient weak atomicity baseline.
Abstract: Transactional memory (TM) is a promising concurrency control alternative to locks. Recent work has highlighted important memory model issues regarding TM semantics and exposed problems in existing TM implementations. For safe, managed languages such as Java, there is a growing consensus towards strong atomicity semantics as a sound, scalable solution. Strong atomicity has presented a challenge to implement efficiently because it requires instrumentation of non-transactional memory accesses, incurring significant overhead even when a program makes minimal or no use of transactions. To minimize overhead, existing solutions require either a sophisticated type system, specialized hardware, or static whole-program analysis. These techniques do not translate easily into a production setting on existing hardware. In this paper, we present novel dynamic optimizations that significantly reduce strong atomicity overheads and make strong atomicity practical for dynamic language environments. We introduce analyses that optimistically track which non-transactional memory accesses can avoid strong atomicity instrumentation, and we describe a lightweight speculation and recovery mechanism that applies these analyses to generate speculatively-optimized but safe code for strong atomicity in a dynamically-loaded environment. We show how to implement these mechanisms efficiently by leveraging existing dynamic optimization infrastructure in a Java system. Measurements on a set of transactional and non-transactional Java workloads demonstrate that our techniques substantially reduce the overhead of strong atomicity from a factor of 5x down to 10% or less over an efficient weak atomicity baseline.


Book ChapterDOI
26 Aug 2008
TL;DR: Through an extensive evaluation, a new Concurrency Control Algorithm (CCA), called P-only Concurrency control (PoCC), is shown to perform better than the other four proposed CCAs for a synthetic benchmark, and the STAMP and Lee-TM benchmarks.
Abstract: Concurrency control for Transactional Memory (TM) is investigated as a means for improving resource usage by adjusting dynamically the number of concurrently executing transactions. The proposed control system takes as feedback the measured Transaction Commit Rateto adjust the concurrency. Through an extensive evaluation, a new Concurrency Control Algorithm (CCA), called P-only Concurrency Control (PoCC), is shown to perform better than our other four proposed CCAs for a synthetic benchmark, and the STAMP and Lee-TM benchmarks.

Book ChapterDOI
07 Jul 2008
TL;DR: This work implements a uniform transactional execution environment for Java programs in which transactions can be integrated with more traditional concurrency control constructs and freely combine abstractions that use both coding styles.
Abstract: Transactional memory (TM) has recently emerged as an effective tool for extracting fine-grain parallelism from declarative critical sections. In order to make STM systems practical, significant effort has been made to integrate transactions into existing programming languages. Unfortunately, existing approaches fail to provide a simple implementation that permits lock-based and transaction-based abstractions to coexist seamlessly. Because of the fundamental semantic differences between locks and transactions, legacy applications or libraries written using locks can not be transparently used within atomic regions. To address these shortcomings, we implement a uniform transactional execution environment for Java programs in which transactions can be integrated with more traditional concurrency control constructs. Programmers can run arbitrary programs that utilize traditional mutual-exclusion-based programming techniques, execute new programs written with explicit transactional constructs, and freely combine abstractions that use both coding styles.

Proceedings ArticleDOI
10 Mar 2008
TL;DR: This work presents a parallel JPEG decoder and FFT exhibiting 3.05 and 3.3times speedups on a four-core processor and demonstrates language restrictions in the SHIM concurrent programming language are practical by presenting a SHIM to C-plus-Pthreads compiler.
Abstract: Multicore shared-memory architectures are becoming prevalent and bring many programming challenges. Among the biggest are data races: accesses to shared resources that make a program's behavior depend on scheduling decisions beyond its control. To eliminate such races, the SHIM concurrent programming language adopts deterministic message passing as it sole communication mechanism. We demonstrate such language restrictions are practical by presenting a SHIM to C-plus-Pthreads compiler that can produce efficient code for shared-memory multiprocessors. We present a parallel JPEG decoder and FFT exhibiting 3.05 and 3.3times speedups on a four-core processor.

Proceedings ArticleDOI
Jin-Heum Paek1, Tae-Eog Lee1
19 Sep 2008
TL;DR: A Petri net modeling method without the swap restriction is proposed and it is shown that deadlock prevention constraints, not required for the model, together with the initial branching rule for a branch and bound procedure reduce the solution space significantly.
Abstract: In a dual-armed cluster tool, the swap operation method that exchanges a wafer on a robot arm with another wafer at a chamber has been mostly used. It is known to minimize the tool cycle time although it restricts the robot task sequence. Recent cluster tools have new scheduling requirements such as reentrant wafer flows for atomic layer deposition processes, constraints on the wafer delay times within chambers after processing, and concurrent processing of different wafer types. The restricted swap operation method may neither minimize the tool cycle time nor satisfy the wafer delay constraints, and even cause a deadlock. We examine new robot task sequences for dual-armed cluster tools that use the two robot arms more flexibly without the swap restriction. We first propose a Petri net modeling method without the swap restriction. From the model, we identify necessary conditions for which deadlocks are prevented. We then systematically develop a mixed integer programming model that determines an optimal robot task sequence. From experiments, we show that deadlock prevention constraints, not required for the model, together with the initial branching rule for a branch and bound procedure reduce the solution space significantly.

Journal ArticleDOI
TL;DR: A notion of spatial-behavioral typing suitable to discipline concurrent interactions and resource usage in distributed object systems is developed, building on a interpretation of types as properties expressible in a spatial logic.

Patent
Asser N. Tantawi1, Giovanni Pacifici1, Wolfgang Segmuller1, Mike Spreitzer1, Alaa S Youssef1 
14 Aug 2008
TL;DR: In this article, the authors use external performance monitors to build a simple black box model of the computer system, comprising two resources: a virtual bottleneck resource and a delay resource representing all non-bottleneck resources combined.
Abstract: Provides control of the workload, flow control, and concurrency control of a computer system through the use of only external performance monitors. Data collected by external performance monitors are used to build a simple, black box model of the computer system, comprising two resources: a virtual bottleneck resource and a delay resource representing all non-bottleneck resources combined. The service times of the two resource types are two parameters of the black box model. The two parameters are evaluated based on historical data collected by the external performance monitors. The workload capacity that avoids saturation of the bottleneck resource is then determined and used as a control variable by a flow controller to limit the workload on the computer system. The workload may include a mix of traffic classes. In such a case, data is collected, parameters are evaluated and control variables are determined for each of the traffic classes.

Proceedings ArticleDOI
20 Feb 2008
TL;DR: The approach makes use of futures, a simple annotation that introduces asynchronous concurrency into Java programs, but provides no concurrency control, and inserts lightweight barriers that block and resume threads executing futures if a dependency violation may ensue.
Abstract: Migrating sequential programs to effectively utilize next generation multicore architectures is a key challenge facing application developers and implementors. Languages like Java that support complex control- and dataflow abstractions confound classical automatic parallelization techniques. On the other hand, introducing multithreading and concurrency control explicitly into programs can impose a high conceptual burden on the programmer, and may entail a significant rewrite of the original program.In this paper, we consider a new technique to address this issue. Our approach makes use of futures, a simple annotation that introduces asynchronous concurrency into Java programs, but provides no concurrency control. To ensure concurrent execution does not yield behavior inconsistent with sequential execution (i.e., execution yielded by erasing all futures), we present a new interprocedural summary-based dataflow analysis. The analysis inserts lightweight barriers that block and resume threads executing futures if a dependency violation may ensue. There are no constraints on how threads execute other than those imposed by these barriers.Our experimental results indicate futures can be leveraged to transparently ensure safety and profitably exploit parallelism; in contrast to earlier efforts, our technique is completely portable, and requires no modifications to the underlying JVM.

Journal ArticleDOI
TL;DR: This position paper reflects about the distinguishing features of these memory transactions with respect to their database cousins.
Abstract: Transactions are back in the spotlight! They are emerging in concurrent programming languages under the name of transactional memory (TM). Their new role? Concurrency control on new multi-core processors. From afar they look the same as good ol' database transactions. But are they really?In this position paper, we reflect about the distinguishing features of these memory transactions with respect to their database cousins.Disclaimer: By its very nature, this position paper does not try to avoid subjectivity.

Book ChapterDOI
15 Dec 2008
TL;DR: This paper presents a lock-based STM system designed from simple basic principles that satisfies the opacity safety property, never aborts a write only transaction, employs only bounded control variables, has no centralized contention point, and is formally proved correct.
Abstract: The aim of a software transactional memory (STM) system is to facilitate the delicate problem of low-level concurrency management, i.e. the design of programs made up of processes/threads that concurrently access shared objects. To that end, a STM system allows a programmer to write transactions accessing shared objects, without having to take care of the fact that these objects are concurrently accessed: the programmer is discharged from the delicate problem of concurrency management. Given a transaction, the STM system commits or aborts it. Ideally, it has to be efficient (this is measured by the number of transactions committed per time unit), while ensuring that as few transactions as possible are aborted. From a safety point of view (the one addressed in this paper), a STM system has to ensure that, whatever its fate (commit or abort), each transaction always operates on a consistent state. STM systems have recently received a lot of attention. Among the proposed solutions, lock-based systems and clock-based systems have been particularly investigated. Their design is mainly efficiency-oriented, the properties they satisfy are not always clearly stated, and few of them are formally proved. This paper presents a lock-based STM system designed from simple basic principles. Its main features are the following: it (1) uses visible reads, (2) does not require the shared memory to manage several versions of each object, (3) uses neither timestamps, nor version numbers, (4) satisfies the opacity safety property, (5) aborts a transaction only when it conflicts with some other live transaction (progressiveness property), (6) never aborts a write only transaction, (7) employs only bounded control variables, (8) has no centralized contention point, and (9) is formally proved correct.

Proceedings ArticleDOI
08 Nov 2008
TL;DR: By choosing an appropriate runtime block mapping strategy, average performance can be increased by 18%, while simultaneously reducing average operand communication by 70%, saving energy as well as improving performance.
Abstract: Distributed processors must balance communication and concurrency. When dividing instructions among the processors, key factors are the available concurrency, criticality of dependence chains, and communication penalties. The amount of concurrency determines the importance of the other factors: if concurrency is high, wider distribution of instructions is likely to tolerate the increased operand routing latencies. If concurrency is low, mapping dependent instructions close to one another is likely to reduce communication costs that contribute to the critical path. This paper explores these tradeoffs for distributed Explicit Dataflow Graph Execution (EDGE) architectures that execute blocks of dataflow instructions atomically. A runtime block mapper assigns instructions from a single thread to distributed hardware resources (cores) based on compiler-assigned instruction identifiers. We explore two approaches: fixed strategies that map all blocks to the same number of cores, and adaptive strategies that vary the number of cores for each block. The results show that best fixed strategy varies, based on the corespsila issue width. A simple adaptive strategy improves performance over the best fixed strategies for single and dual-issue cores, but its benefits decrease as the corespsila issue width increases. These results show that by choosing an appropriate runtime block mapping strategy, average performance can be increased by 18%, while simultaneously reducing average operand communication by 70%, saving energy as well as improving performance. These results indicate that runtime block mapping is a promising mechanism for balancing communication and concurrency in distributed processors.

Proceedings ArticleDOI
09 Nov 2008
TL;DR: In this paper, the authors show how existing discrete-event system (DES) theory can be successfully applied to the problem of concurrency control for concurrent code, and apply rigorously proven DES theory.
Abstract: The development of controls for the execution of concurrent code is non-trivial. We show how existing discrete-event system (DES) theory can be successfully applied to this problem. From code without concurrency controls and a specification of desired behaviours, concurrency control code is generated. By applying rigorously proven DES theory, we guarantee that the control scheme is nonblocking (and thus free of both deadlock and livelock) and minimally restrictive. Some conflicts between specifications and source can be automatically resolved without introducing new specifications. Moreover, the approach is independent of specific programming or specification languages. Two examples using Java are presented to illustrate the approach. Additional applicable DES results are discussed as future work.

Patent
23 Jun 2008
TL;DR: The fine-grained concurrency control for transactions in the presence of database updates has been proposed in this article, where each transaction is assigned a snapshot version number or SVN.
Abstract: Embodiments of the present invention provide fine grain concurrency control for transactions in the presence of database updates During operations, each transaction is assigned a snapshot version number or SVN A SVN refers to a historical snapshot of the database that can be created periodically or on demand Transactions are thus tied to a particular SVN, such as, when the transaction was created Queries belonging to the transactions can access data that is consistent as of a point in time, for example, corresponding to the latest SVN when the transaction was created At various times, data from the database stored in a memory can be updated using the snapshot data corresponding to a SVN When a transaction is committed, a snapshot of the database with a new SVN is created based on the data modified by the transaction and the snapshot is synchronized to the memory When a transaction query requires data from a version of the database corresponding to a SVN, the data in the memory may be synchronized with the snapshot data corresponding to that SVN