scispace - formally typeset
Search or ask a question

Showing papers on "Concurrency control published in 2010"


Journal ArticleDOI
01 Sep 2010
TL;DR: This paper proposes a distributed database system which combines a simple deadlock avoidance technique with concurrency control schemes that guarantee equivalence to a predetermined serial ordering of transactions, effectively removing all nondeterminism from typical OLTP workloads.
Abstract: Replication is a widely used method for achieving high availability in database systems. Due to the nondeterminism inherent in traditional concurrency control schemes, however, special care must be taken to ensure that replicas don't diverge. Log shipping, eager commit protocols, and lazy synchronization protocols are well-understood methods for safely replicating databases, but each comes with its own cost in availability, performance, or consistency.In this paper, we propose a distributed database system which combines a simple deadlock avoidance technique with concurrency control schemes that guarantee equivalence to a predetermined serial ordering of transactions. This effectively removes all nondeterminism from typical OLTP workloads, allowing active replication with no synchronization overhead whatsoever. Further, our system eliminates the requirement for two-phase commit for any kind of distributed transaction, even across multiple nodes within the same replica. By eschewing deadlock detection and two-phase commit, our system under many workloads outperforms traditional systems that allow nondeterministic transaction reordering.

185 citations


Book
07 Sep 2010
TL;DR: In this article, the authors provide a categorization of replica control mechanisms, presents several replica and concurrency control mechanisms in detail, and discusses many of the issues that arise when such solutions need to be implemented within or on top of relational database systems.
Abstract: Database replication is widely used for fault-tolerance, scalability and performance. The failure of one database replica does not stop the system from working as available replicas can take over the tasks of the failed replica. Scalability can be achieved by distributing the load across all replicas, and adding new replicas should the load increase. Finally, database replication can provide fast local access, even if clients are geographically distributed clients, if data copies are located close to clients. Despite its advantages, replication is not a straightforward technique to apply, and there are many hurdles to overcome. At the forefront is replica control: assuring that data copies remain consistent when updates occur. There exist many alternatives in regard to where updates can occur and when changes are propagated to data copies, how changes are applied, where the replication tool is located, etc. A particular challenge is to combine replica control with transaction management as it requires several operations to be treated as a single logical unit, and it provides atomicity, consistency, isolation and durability across the replicated system. The book provides a categorization of replica control mechanisms, presents several replica and concurrency control mechanisms in detail, and discusses many of the issues that arise when such solutions need to be implemented within or on top of relational database systems. Table of Contents: Overview / 1-Copy-Equivalence and Consistency / Basic Protocols / Replication Architecture / The Scalability of Replication / Eager Replication and 1-Copy-Serializability / 1-Copy-Snapshot Isolation / Lazy Replication / Self-Configuration and Elasticity / Other Aspects of Replication

181 citations


Proceedings ArticleDOI
06 Jun 2010
TL;DR: This paper compares two low overhead concurrency control schemes that allow partitions to work on other transactions during network stalls, yet have little cost in the common case when concurrency is not needed, and quantifies the range of workloads over which each technique is beneficial.
Abstract: Database partitioning is a technique for improving the performance of distributed OLTP databases, since "single partition" transactions that access data on one partition do not need coordination with other partitions. For workloads that are amenable to partitioning, some argue that transactions should be executed serially on each partition without any concurrency at all. This strategy makes sense for a main memory database where there are no disk or user stalls, since the CPU can be fully utilized and the overhead of traditional concurrency control, such as two-phase locking, can be avoided. Unfortunately, many OLTP applications have some transactions which access multiple partitions. This introduces network stalls in order to coordinate distributed transactions, which will limit the performance of a database that does not allow concurrency. In this paper, we compare two low overhead concurrency control schemes that allow partitions to work on other transactions during network stalls, yet have little cost in the common case when concurrency is not needed. The first is a light-weight locking scheme, and the second is an even lighter-weight type of speculative concurrency control that avoids the overhead of tracking reads and writes, but sometimes performs work that eventually must be undone. We quantify the range of workloads over which each technique is beneficial, showing that speculative concurrency control generally outperforms locking as long as there are few aborts or few distributed transactions that involve multiple rounds of communication. On a modified TPC-C benchmark, speculative concurrency control can improve throughput relative to the other schemes by up to a factor of two.

160 citations


Journal ArticleDOI
TL;DR: The first time-based STM algorithm, the Lazy Snapshot Algorithm (LSA), is formally introduced and its semantics and the impact of its design parameters, notably multiversioning and dynamic snapshot extension are studied.
Abstract: Software transactional memory (STM) is a concurrency control mechanism that is widely considered to be easier to use by programmers than other mechanisms such as locking. The first generations of STMs have either relied on visible read designs, which simplify conflict detection while pessimistically ensuring a consistent view of shared data to the application, or optimistic invisible read designs that are significantly more efficient but require incremental validation to preserve consistency, at a cost that increases quadratically with the number of objects read in a transaction. Most of the recent designs now use a “time-based” (or “time stamp-based”) approach to still benefit from the performance advantage of invisible reads without incurring the quadratic overhead of incremental validation. In this paper, we give an overview of the time-based STM approach and discuss its benefits and limitations. We formally introduce the first time-based STM algorithm, the Lazy Snapshot Algorithm (LSA). We study its semantics and the impact of its design parameters, notably multiversioning and dynamic snapshot extension. We compare it against other classical designs and we demonstrate that its performance is highly competitive, both for obstruction-free and lock-based STM designs.

150 citations


Journal ArticleDOI
01 Sep 2010
TL;DR: This paper presents a comprehensive solution that is based on an extended deferred-indexing method with integrated versioning that enables time-travel queries that are efficiently processed without adversely affecting queries on the current data.
Abstract: The RDF data model is gaining importance for applications in computational biology, knowledge sharing, and social communities. Recent work on RDF engines has focused on scalable performance for querying, and has largely disregarded updates. In addition to incremental bulk loading, applications also require online updates with flexible control over multi-user isolation levels and data consistency. The challenge lies in meeting these requirements while retaining the capability for fast querying.This paper presents a comprehensive solution that is based on an extended deferred-indexing method with integrated versioning. The version store enables time-travel queries that are efficiently processed without adversely affecting queries on the current data. For flexible consistency, transactional concurrency control is provided with options for either snapshot isolation or full serializability. All methods are integrated in an extension of the RDF-3X system, and their very good performance for both queries and updates is demonstrated by measurements of multi-user workloads with real-life data as well as stress-test synthetic loads.

139 citations


Patent
07 Jul 2010
TL;DR: In this article, an ordered and shared log of indexed transaction records represented as multi-version data structures of nodes and node pointers is used for enforcing concurrency control, where each node of a record is assigned a log address.
Abstract: Architecture that includes an ordered and shared log of indexed transaction records represented as multi-version data structures of nodes and node pointers. The log is a sole monolithic source of datastore state and is used for enforcing concurrency control. The architecture also includes a transaction processing component that appends transaction records to the log from concurrent transactions executing on different processors. Each node of a record is assigned a log address.

111 citations


Journal ArticleDOI
Goetz Graefe1
TL;DR: The topic of concurrency control in B-trees is clarified, simplified, and structure by dividing it into two subtopics and exploring each of them in depth.
Abstract: B-trees have been ubiquitous in database management systems for several decades, and they are used in other storage systems as well. Their basic structure and basic operations are well and widely understood including search, insertion, and deletion. Concurrency control of operations in B-trees, however, is perceived as a difficult subject with many subtleties and special cases. The purpose of this survey is to clarify, simplify, and structure the topic of concurrency control in B-trees by dividing it into two subtopics and exploring each of them in depth.

96 citations


Proceedings ArticleDOI
01 May 2010
TL;DR: A two-phase testing technique that can effectively detect atomic-set serializability violations and can identify more concurrency bugs than two recent testing tools RaceFuzzer and AtomFuzzers is proposed.
Abstract: Concurrency bugs are notoriously difficult to detect because there can be vast combinations of interleavings among concurrent threads, yet only a small fraction can reveal them. Atomic-set serializability characterizes a wide range of concurrency bugs, including data races and atomicity violations. In this paper, we propose a two-phase testing technique that can effectively detect atomic-set serializability violations. In Phase I, our technique infers potential violations that do not appear in a concrete execution and prunes those interleavings that are violation-free. In Phase II, our technique actively controls a thread scheduler to enumerate these potential scenarios identified in Phase I to look for real violations. We have implemented our technique as a prototype system AssetFuzzer and applied it to a number of subject programs for evaluating concurrency defect analysis techniques. The experimental results show that AssetFuzzer can identify more concurrency bugs than two recent testing tools RaceFuzzer and AtomFuzzer.

91 citations


Proceedings ArticleDOI
19 Apr 2010
TL;DR: This work substitutes the original HDFS layer of Hadoop with a new, concurrency-optimized data storage layer based on the BlobSeer data management service, which improves the efficiency ofHadoop for data-intensive Map-Reduce applications, which naturally exhibit a high degree of data access concurrency.
Abstract: Hadoop is a software framework supporting the Map-Reduce programming model. It relies on the Hadoop Distributed File System (HDFS) as its primary storage system. The efficiency of HDFS is crucial for the performance of Map-Reduce applications. We substitute the original HDFS layer of Hadoop with a new, concurrency-optimized data storage layer based on the BlobSeer data management service. Thereby, the efficiency of Hadoop is significantly improved for data-intensive Map-Reduce applications, which naturally exhibit a high degree of data access concurrency. Moreover, BlobSeer's features (built-in versioning, its support for concurrent append operations) open the possibility for Hadoop to further extend its functionalities. We report on extensive experiments conducted on the Grid'5000 testbed. The results illustrate the benefits of our approach over the original HDFS-based implementation of Hadoop.

69 citations


Proceedings ArticleDOI
28 Jan 2010
TL;DR: This article presents an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results and captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases.
Abstract: Nowadays the 2-Phase-Locking (2PL) concurrency control algorithm still plays a core rule in the construction of transactional systems (e.g. database systems and transactional memories). Hence, any technique allowing accurate analysis and prediction of the performance of 2PL based systems can be of wide interest and applicability. In this article we present an accurate analytical model of 2PL concurrency control, which overcomes several limitations of preexisting analytical results. In particular our model captures relevant features of realistic data access patterns, by taking into account access distributions that depend on transactions' execution phases. Also, our model provides significantly more accurate performance predictions in heavy contention scenarios, where the number of transactions enqueued due to conflicting lock requests is expected to be non-minimal. The accuracy of our model has been verified against simulation results based on both synthetic data access patterns and patterns derived from the TPC-C benchmark.

59 citations


Proceedings ArticleDOI
20 Oct 2010
TL;DR: New efficient techniques for computing interaction invariants are proposed and have been capable of verifying properties and deadlock-freedom of DALA, an autonomous robot whose behaviors in the functional level are described with 500000 lines of C Code.
Abstract: We propose invariant-based techniques for the efficient verification of safety and deadlock properties of concurrent systems. We assume that components and component interactions are described within the BIP framework, a tool for component-based design. We build on a compositional methodology in which the invariant is obtained by combining the invariants of the individual components with an interaction invariant that takes concurrency and interaction between components into account. In this paper, we propose new efficient techniques for computing interaction invariants. This is achieved in several steps. First, we propose a formalization of incremental component-based design. Then we suggest sufficient conditions that ensure the preservation of invariants through the introduction of new interactions. For cases in which these conditions are not satisfied, we propose methods for generation of new invariants in an incremental manner. The reuse of existing invariants reduces considerably the verification effort. Our techniques have been implemented in the D-Finder toolset. Among the experiments conducted, we have been capable of verifying properties and deadlock-freedom of DALA, an autonomous robot whose behaviors in the functional level are described with 500000 lines of C Code. This experiment, which is conducted with industrial partners, is far beyond the scope of existing academic tools such as NuSMV or SPIN.

Patent
Ying Chen1, Bin He1, Rui Wang1
01 Mar 2010
TL;DR: In this paper, a method for concurrency management for ETL processes in a database having database tables and communicatively coupled to a computer is described. But the method is limited to a single ETL process accessing a database.
Abstract: System and methods manage concurrent ETL processes accessing a database. Exemplary embodiments include a method for concurrency management for ETL processes in a database having database tables and communicatively coupled to a computer, the method including establishing a session lock for the database, determining that a current ETL process is accessing the database at a current time, associating a current expiration time with the session lock, the expiration time being stored in a lock table in the database, sending the session lock to the current ETL process and performing ETL-level locking for the current ETL process.

Proceedings ArticleDOI
15 Jul 2010
TL;DR: AGGRO as discussed by the authors is an innovative Optimistic Atomic Broadcast-based (OAB) active replication protocol that aims at maximizing the overlap between communication and processing through a novel AGGRessively Optimistic concurrency control scheme.
Abstract: Software Transactional Memories (STMs) are emerging as a potentially disruptive programming model. In this paper we are address the issue of how to enhance dependability of STM systems via replication. In particular we present AGGRO, an innovative Optimistic Atomic Broadcast-based (OAB) active replication protocol that aims at maximizing the overlap between communication and processing through a novel AGGRessively Optimistic concurrency control scheme. The key idea underlying AGGRO is to propagate dependencies across uncommitted transactions in a controlled manner, namely according to a serialization order compliant with the optimistic message delivery order provided by the OAB service. Another relevant distinguishing feature of AGGRO is of not requiring a-priori knowledge about read/write sets of transactions, but rather to detect and handle conflicts dynamically, i.e. as soon (and only if) they materialize. Based on a detailed simulation study we show the striking performance gains achievable by AGGRO (up to 6x increase of the maximum sustainable throughput, and 75% response time reduction) compared to literature approaches for active replication of transactional systems.

Journal ArticleDOI
TL;DR: It is concluded that an S4R is live if all its siphons are max′-controlled, and a new concept called max″-controlled ones are given to illustrate it.
Abstract: Over the last two decades, a number of deadlock control policies based on Petri nets were proposed for flexible manufacturing systems (FMSs). As a structural object of a Petri net, siphons are widely used in deadlock control. For system of sequential systems with shared resources (S4R), the current deadlock control policies based on max or max′-controlled siphons tend to overly restrict the behaviour of a controlled system. The controllability conditions of a siphon are relaxed by a new concept called max″-controlled ones. We conclude that an S4R is live if all its siphons are max″-controlled. Compared with the existing ones, the proposed one is more general. Examples are given to illustrate it.

Patent
28 Jun 2010
TL;DR: In this article, the authors propose an architecture that addresses the efficient detection of conflicts and the merging of data structures such as trees, when possible, by using a meld operation to detect conflicts and merging the trees.
Abstract: Architecture that addresses the efficient detection of conflicts and the merging of data structures such as trees, when possible. The process of detecting conflicts and merging the trees is a meld operation. Confluent trees offer transactional consistency with some degree of isolation, and scaling out a concurrent system based on confluent trees can be accomplished where the meld operation is more efficient than the transaction computations. Transactions execute optimistically using lazily versioned “intention trees” that efficiently describe dependencies and effects using structure and content version information for each intention subtree. The data structure is modified by melding the intention trees in sequence, which causes each transaction to either commit (producing an incremental new version of the data structure) or abort (identifying a conflict which prevents the intention tree from being melded). The architecture is computationally efficient and completes without needing to access much of each tree.

Proceedings ArticleDOI
10 Mar 2010
TL;DR: This course, which has been taught for ten years, introduces concurrency in the context of event-driven programming and makes use of graphics and animations with the support of a library that reduces the syntactic overhead of using these constructs.
Abstract: Because of the growing importance of concurrent programming, many people are trying to figure out where in the curriculum to introduce students to concurrency. In this paper we discuss the use of concurrency in an introductory computer science course. This course, which has been taught for ten years, introduces concurrency in the context of event-driven programming. It also makes use of graphics and animations with the support of a library that reduces the syntactic overhead of using these constructs. Students learn to use separate threads in a way that enables them to write programs that match their intuitions of the world. While the separate threads do interact, programs are selected so that race conditions are generally not an issue.

Proceedings ArticleDOI
17 May 2010
TL;DR: The Scalaris key/value store supports strong data consistency and atomic transactions and uses an enhanced Paxos Commit protocol with only four communication steps rather than six.
Abstract: Key/value stores which are built on structured overlay networks often lack support for atomic transactions and strong data consistency among replicas. This is unfortunate, because consistency guarantees and transactions would allow a wide range of additional application domains to benefit from the inherent scalability and fault-tolerance of DHTs. The Scalaris key/value store supports strong data consistency and atomic transactions. It uses an enhanced Paxos Commit protocol with only four communication steps rather than six. This improvement was possible by exploiting information from the replica distribution in the DHT. Scalaris enables implementation of more reliable and scalable infrastructure for collaborative Web services that require strong consistency and atomic changes across multiple items.

Journal ArticleDOI
TL;DR: This work validate the design of a non-trivial CRDT, a replicated sequence, with performance measurements in the context of Wikipedia, and proposes a flexible two-tier architecture and a protocol for migrating between tiers.
Abstract: Replicas of a commutative replicated data type (CRDT) eventually converge without any complex concurrency control. We validate the design of a non-trivial CRDT, a replicated sequence, with performance measurements in the context of Wikipedia. Furthermore, we discuss how to eliminate a remaining scalability bottleneck: Whereas garbage collection previously required a system-wide consensus, here we propose a flexible two-tier architecture and a protocol for migrating between tiers. We also discuss how the CRDT concept can be generalised, and its limitations.

Proceedings ArticleDOI
04 Dec 2010
TL;DR: This paper uses a technique which approximates conflict-serializability and implements it in hardware on top a base hardware transactional memory system that provides support for isolation and conflict detection and shows that it captures the benefits of conflict- serializability.
Abstract: Today's transactional memory systems implement the two-phase-locking (2PL) algorithm which aborts transactions every time a conflict happens. 2PL is a simple algorithm that provides fast transactional operations. However, it limits concurrency in applications with high contention by increasing the rate of aborts. More relaxed algorithms that can commit conflicting transactions have recently been shown to provide better concurrency both in software and hardware. However, existing approaches for implementing such algorithms increase latencies of transactional operations, require complex hardware support and alter standard cache coherence protocols. In this paper, we discuss how a relaxed concurrency control algorithm can be efficiently implemented in hardware. More specifically, we use a technique which approximates conflict-serializability and implement it in hardware on top a base hardware transactional memory system that provides support for isolation and conflict detection. Our novel hardware scheme is based on recording conflicts as they occur, instead of aborting transactions. Transactions serialize at commit time according to these conflicts by sending broadcast messages. Our evaluation of this hardware scheme using a simulator and standard benchmarks shows that it captures the benefits of conflict-serializability. Applications with long transactions and high contention benefit the most, abort rates are reduced up to 7.2 times and the performance is improved up to 66%. We argue that this improvement comes with little additional hardware complexity and requires no changes to the transactional programming model.

Proceedings ArticleDOI
04 Dec 2010
TL;DR: The lock control unit (LCU) as discussed by the authors is an acceleration mechanism collocated with each core to explicitly handle fast reader-writer locking, which decouples the hardware lock from the requestor core by associating a unique thread-id to each lock request.
Abstract: Many shared-memory parallel systems use lock-based synchronization mechanisms to provide mutual exclusion or reader-writer access to memory locations. Software locks are inefficient either in memory usage, lock transfer time, or both. Proposed hardware locking mechanisms are either too specific (for example, requiring static assignment of threads to cores and vice-versa), support a limited number of concurrent locks, require tag values to be associated with every memory location, rely on the low latencies of single-chip multicore designs or are slow in adversarial cases such as suspended threads in a lock queue. Additionally, few proposals cover reader-writer locks and their associated fairness issues. In this paper we introduce the Lock Control Unit (LCU) which is an acceleration mechanism collocated with each core to explicitly handle fast reader-writer locking. By associating a unique thread-id to each lock request we decouple the hardware lock from the requestor core. This provides correct and efficient execution in the presence of thread migration. By making the LCU logic autonomous from the core, it seamlessly handles thread preemption. Our design offers richer semantics than previous proposals, such as try lock support while providing direct core-to-core transfers. We evaluate our proposal with micro benchmarks, a fine-grain Software Transactional Memory system and programs from the Parsec and Splash parallel benchmark suites. The lock transfer time decreases in up to 30\% when compared to previous hardware proposals. Transactional Memory systems limited by reader-locking congestion boost up to 3x while still preserving graceful fairness and starvation freedom properties. Finally, commonly used applications achieve speedups up to a 7% when compared to software models.

Proceedings ArticleDOI
08 Mar 2010
TL;DR: This paper presents a method to remove deadlocks in application-specific NoCs and shows that this method results in a large reduction in the number of resources needed and NoC power consumption, area reduction when compared to the state-of-the-art deadlock removal methods.
Abstract: Networks-on-Chip (NoCs) are a promising interconnect paradigm to address the communication bottleneck of Systems-on-Chip (SoCs). Wormhole flow control is widely used as the transmission protocol in NoCs, as it offers high throughput and low latency. To match the application characteristics, customized irregular topologies and routing functions are used. With wormhole flow control and custom irregular NoC topologies, deadlocks can occur during system operation. Ensuring a deadlock free operation of custom NoCs is a major challenge. In this paper, we address this important issue and present a method to remove deadlocks in application-specific NoCs. Our method can be applied to any NoC topology and routing function, and the potential deadlocks are removed by adding minimal number of virtual or physical channels. Experiments on a variety of realistic benchmarks show that our method results in a large reduction in the number of resources needed (88% on average) and NoC power consumption, area reduction (66% area savings on average) when compared to the state-of-the-art deadlock removal methods.

Journal ArticleDOI
TL;DR: The design and implementation of NBmalloc is shown, a lock-free memory allocator designed to enhance the parallelism in the system, inspired by Hoard, with modular design that preserves scalability and helps avoiding false-sharing and heap-blowup.
Abstract: Efficient, scalable memory allocation for multithreaded applications on multiprocessors is a significant goal of recent research. In the distributed computing literature it has been emphasized that lock-based synchronization and concurrency-control may limit the parallelism in multiprocessor systems. Thus, system services that employ such methods can hinder reaching the full potential of these systems. A natural research question is the pertinence and the impact of lock-free concurrency control in key services for multiprocessors, such as in the memory allocation service, which is the theme of this work. We show the design and implementation of NBmalloc, a lock-free memory allocator designed to enhance the parallelism in the system. The architecture of NBmalloc is inspired by Hoard, a well-known concurrent memory allocator, with modular design that preserves scalability and helps avoiding false-sharing and heap-blowup. Within our effort to design appropriate lock-free algorithms for NBmalloc, we propose and show a lock-free implementation of a new data structure, flat-set, supporting conventional "internal" set operations as well as "inter-object" operations, for moving items between flat-sets. The design of NBmalloc also involved a series of other algorithmic problems, which are discussed in the paper. Further, we present the implementation of NBmalloc and a study of its behaviour in a set of multiprocessor systems. The results show that the good properties of Hoard w.r.t. false-sharing and heap-blowup are preserved.

Proceedings ArticleDOI
19 Jun 2010
TL;DR: By incorporating a connected component labeling algorithm into this platform, this paper has been able to measure the benefits of this asymmetric multiprocessor for real-time and dynamic image processing.
Abstract: Future systems will have to support multiple and concurrent dynamic compute-intensive applications, while respecting real-time and energy consumption constraints. Within this framework, this paper presents an architecture, named SCMP. This asymmetric multiprocessor can support dynamic migration and preemption of tasks, thanks to a concurrent control of tasks, while offering a specific data sharing solution. Its tasks are controlled by a dedicated HW-RTOS that allows online scheduling of independent real-time and non-real-time tasks. By incorporating a connected component labeling algorithm into this platform, we have been able to measure its benefits for real-time and dynamic image processing.

Journal ArticleDOI
TL;DR: An overview of the VELOX TM stack and its associated challenges and contributions is presented, spanning from programming language to the hardware support, and including runtime and libraries, compilers, and application environments.
Abstract: The adoption of multi- and many-core architectures for mainstream computing undoubtedly brings profound changes in the way software is developed In particular, the use of fine grained locking as the multi-core programmer's coordination methodology is considered by more and more experts as a dead-end The transactional memory (TM) programming paradigm is a strong contender to become the approach of choice for replacing locks and implementing atomic operations in concurrent programming Combining sequences of concurrent operations into atomic transactions allows a great reduction in the complexity of both programming and verification, by making parts of the code appear to execute sequentially without the need to program using fine-grained locking Transactions remove from the programmer the burden of figuring out the interaction among concurrent operations that happen to conflict when accessing the same locations in memory The EU-funded FP7 VELOX project designs, implements and evaluates an integrated TM stack, spanning from programming language to the hardware support, and including runtime and libraries, compilers, and application environments This paper presents an overview of the VELOX TM stack and its associated challenges and contributions

Proceedings ArticleDOI
13 Sep 2010
TL;DR: It is argued that the S-NET approach delivers a flexible component technology which liberates application developers from the logistics of task and data management while at the same time making it unnecessary for a distributed computing professional to acquire detailed knowledge of the application area.
Abstract: Development and implementation of the coordination language S-NET has been reported previously. In this paper we apply the S-NET design methodology to a computer graphics problem. We demonstrate (i) how a complete separation of concerns can be achieved between algorithm engineering and concurrency engineering and (ii) that the S-NET implementation is quite capable of achieving performance that matches what can be achieved using low-level tools such as MPI. We find this remarkable as under S-NET communication, concurrency and synchronization are completely separated from algorithmic code. We argue that our approach delivers a flexible component technology which liberates application developers from the logistics of task and data management while at the same time making it unnecessary for a distributed computing professional to acquire detailed knowledge of the application area.

Proceedings ArticleDOI
19 Apr 2010
TL;DR: An extensive simulation study is presented aimed at assessing the efficiency of some recently proposed database-oriented replication schemes, when employed in the context of STM systems, and pointing out the limited efficiency and scalability of these schemes.
Abstract: Software Transactional Memories (STMs) are emerging as a highly attractive programming model, thanks to their ability to mask concurrency management issues to the overlying applications. In this paper we are interested in dependability of STM systems via replication. In particular we present an extensive simulation study aimed at assessing the efficiency of some recently proposed database-oriented replication schemes, when employed in the context of STM systems. Our results point out the limited efficiency and scalability of these schemes, highlighting the need for redesigning ad-hoc solutions well fitting the requirements of STM environments. Possible directions for the re-design process are also discussed and supported by some early quantitative data.

Book ChapterDOI
20 Mar 2010
TL;DR: This paper considers a sequential library annotated with assertions along with a proof that these assertions hold in a sequential execution and shows how it can be used to derive concurrency control that ensures that any execution of the library methods, when invoked by concurrent clients, satisfies the same assertions.
Abstract: We are interested in identifying and enforcing the isolation requirements of a concurrent program, i.e., concurrency control that ensures that the program meets its specification. The thesis of this paper is that this can be done systematically starting from a sequential proof, i.e., a proof of correctness of the program in the absence of concurrent interleavings. We illustrate our thesis by presenting a solution to the problem of making a sequential library thread-safe for concurrent clients. We consider a sequential library annotated with assertions along with a proof that these assertions hold in a sequential execution. We show how we can use the proof to derive concurrency control that ensures that any execution of the library methods, when invoked by concurrent clients, satisfies the same assertions. We also present an extension to guarantee that the library is linearizable with respect to its sequential specification.

Proceedings ArticleDOI
01 Dec 2010
TL;DR: The consistence of multiple data replicas is discussed in this paper and lazy update approaches are used to separate the process of data replication and data access in cloud computing, which improves throughput of the data access and reduces response time.
Abstract: Cloud Computing promises to deliver on an objective that building on compute and storage virtualization technologies, consumers are able to rent infrastructure "in the Cloud" as needed, deploy applications and store data, and access them via Web protocols on a pay-per-use basis. For application and data deployment in cloud, high availability and service level agreement must be satisfied. To leverage the performance and the number of concurrent access of application and data, dynamic application deployment and data replication are usually employed. The consistence of multiple data replicas is discussed in this paper and lazy update approaches are used to separate the process of data replication and data access in cloud computing, which improves throughput of the data access and reduces response time. Initial experiments show the methods are efficient and effective.

Proceedings ArticleDOI
08 Dec 2010
TL;DR: This paper investigates the use of multiple threads to concurrently process XPath queries on a shared incoming XML document using an approach that builds on YFilter, and divides the NFA into several smaller ones for concurrent processing.
Abstract: The importance of XPath in XML filtering systems has led to a significant body of research on improving the processing performance of XPath queries. Most of the work, however, has been in the context of a single processing core. Given the prevalence of multicore processors, we believe that a parallel approach can provide significant benefits for a number of application scenarios. In this paper we thus investigate the use of multiple threads to concurrently process XPath queries on a shared incoming XML document. Using an approach that builds on YFilter, we divide the NFA into several smaller ones for concurrent processing. We implement and test two strategies for load balancing: a static approach and a dynamic approach. We test our approach on an eight-core machine, and show that it provides reasonable speedup up to eight cores.

Journal ArticleDOI
TL;DR: This paper presents a novel hardware-oriented deadlock detection algorithm suitable for current and future MPSoCs that leverages specialized hardware to guarantee O(1) overall runtime complexity.
Abstract: Due to rapid technology advance, multiprocessor system-on-chips (MPSoCs) are likely to become commodity computing platforms for embedded applications. In the future, it is possible that an MPSoC is equipped with a large number of processing elements as well as on-chip resources. The management of these faces many challenges, among which deadlock is one of the most crucial issues. This paper presents a novel hardware-oriented deadlock detection algorithm suitable for current and future MPSoCs. Unlike previously published methods whose runtime complexities are often affected by the number of processing elements and resources in the system, the proposed algorithm leverages specialized hardware to guarantee O(1) overall runtime complexity. Such complexity is achieved by: 1) classifying resource allocation events; 2) for each type of events, using hardware to perform a set of specific detection and/or preparation operations that only takes constant runtime; and 3) updating necessary information for multiple resources in parallel in hardware. We implement the algorithm in Verilog HDL and demonstrate through simulation that each algorithm invocation takes at most four clock cycles.