scispace - formally typeset
Search or ask a question

Showing papers on "Concurrency control published in 2017"


Journal ArticleDOI
01 Mar 2017
TL;DR: An extensive study of the scheme's four key design decisions: concurrency control protocol, version storage, garbage collection, and index management is conducted and identifies the fundamental bottlenecks of each design choice.
Abstract: Multi-version concurrency control (MVCC) is currently the most popular transaction management scheme in modern database management systems (DBMSs). Although MVCC was discovered in the late 1970s, it is used in almost every major relational DBMS released in the last decade. Maintaining multiple versions of data potentially increases parallelism without sacrificing serializability when processing transactions. But scaling MVCC in a multi-core and in-memory setting is non-trivial: when there are a large number of threads running in parallel, the synchronization overhead can outweigh the benefits of multi-versioning.To understand how MVCC perform when processing transactions in modern hardware settings, we conduct an extensive study of the scheme's four key design decisions: concurrency control protocol, version storage, garbage collection, and index management. We implemented state-of-the-art variants of all of these in an in-memory DBMS and evaluated them using OLTP workloads. Our analysis identifies the fundamental bottlenecks of each design choice.

140 citations


Proceedings ArticleDOI
09 May 2017
TL;DR: Cicada is a single-node multi-core in-memory transactional database with serializability that reduces overhead and contention at several levels of the system by leveraging optimistic and multi-version concurrency control schemes and multiple loosely synchronized clocks while mitigating their drawbacks.
Abstract: Multi-core in-memory databases promise high-speed online transaction processing. However, the performance of individual designs suffers when the workload characteristics miss their small sweet spot of a desired contention level, read-write ratio, record size, processing rate, and so forth. Cicada is a single-node multi-core in-memory transactional database with serializability. To provide high performance under diverse workloads, Cicada reduces overhead and contention at several levels of the system by leveraging optimistic and multi-version concurrency control schemes and multiple loosely synchronized clocks while mitigating their drawbacks. On the TPC-C and YCSB benchmarks, Cicada outperforms Silo, TicToc, FOEDUS, MOCC, two-phase locking, Hekaton, and ERMIA in most scenarios, achieving up to 3X higher throughput than the next fastest design. It handles up to 2.07 M TPC-C transactions per second and 56.5 M YCSB transactions per second, and scans up to 356 M records per second on a single 28-core machine.

126 citations


Journal ArticleDOI
01 Jan 2017
TL;DR: To achieve truly scalable operation, distributed concurrency control solutions must seek a tighter coupling with either novel network hardware or applications (via data modeling and semantically-aware execution), or both.
Abstract: Increasing transaction volumes have led to a resurgence of interest in distributed transaction processing. In particular, partitioning data across several servers can improve throughput by allowing servers to process transactions in parallel. But executing transactions across servers limits the scalability and performance of these systems.In this paper, we quantify the effects of distribution on concurrency control protocols in a distributed environment. We evaluate six classic and modern protocols in an in-memory distributed database evaluation framework called Deneva, providing an apples-to-apples comparison between each. Our results expose severe limitations of distributed transaction processing engines. Moreover, in our analysis, we identify several protocol-specific scalability bottlenecks. We conclude that to achieve truly scalable operation, distributed concurrency control solutions must seek a tighter coupling with either novel network hardware (in the local area) or applications (via data modeling and semantically-aware execution), or both.

105 citations


Proceedings ArticleDOI
14 Oct 2017
TL;DR: Eris can process a large class of distributed transactions in a single round-trip from the client to the storage system without any explicit coordination between shards or replicas in the normal case, providing atomicity, consistency, and fault tolerance with less than 10% overhead.
Abstract: Distributed storage systems aim to provide strong consistency and isolation guarantees on an architecture that is partitioned across multiple shards for scalability and replicated for fault tolerance. Traditionally, achieving all of these goals has required an expensive combination of atomic commitment and replication protocols -- introducing extensive coordination overhead. Our system, Eris, takes a different approach. It moves a core piece of concurrency control functionality, which we term multi-sequencing, into the datacenter network itself. This network primitive takes on the responsibility for consistently ordering transactions, and a new lightweight transaction protocol ensures atomicity. The end result is that Eris avoids both replication and transaction coordination overhead: we show that it can process a large class of distributed transactions in a single round-trip from the client to the storage system without any explicit coordination between shards or replicas in the normal case. It provides atomicity, consistency, and fault tolerance with less than 10% overhead -- achieving throughput 3.6-35x higher and latency 72-80% lower than a conventional design on standard benchmarks.

99 citations


Journal ArticleDOI
01 Jan 2017
TL;DR: This paper designs a new serializable concurrency control protocol, piece-wise visibility (PWV), with the explicit goal of enabling early write visibility, and finds that PWV can outperform serializable protocols by an order of magnitude and read committed by 3X on high contention workloads.
Abstract: In order to guarantee recoverable transaction execution, database systems permit a transaction's writes to be observable only at the end of its execution. As a consequence, there is generally a delay between the time a transaction performs a write and the time later transactions are permitted to read it. This delayed write visibility can significantly impact the performance of serializable database systems by reducing concurrency among conflicting transactions.This paper makes the observation that delayed write visibility stems from the fact that database systems can arbitrarily abort transactions at any point during their execution. Accordingly, we make the case for database systems which only abort transactions under a restricted set of conditions, thereby enabling a new recoverability mechanism, early write visibility, which safely makes transactions' writes visible prior to the end of their execution. We design a new serializable concurrency control protocol, piece-wise visibility (PWV), with the explicit goal of enabling early write visibility. We evaluate PWV against state-of-the-art serializable protocols and a highly optimized implementation of read committed, and find that PWV can outperform serializable protocols by an order of magnitude and read committed by 3X on high contention workloads.

93 citations


Journal ArticleDOI
TL;DR: A hybrid system architecture that enables a team of mobile robots to complete a task in a complex environment by self-organizing into a multihop ad hoc network and solving the concurrent communication and mobility problem is developed.
Abstract: We develop a hybrid system architecture that enables a team of mobile robots to complete a task in a complex environment by self-organizing into a multihop ad hoc network and solving the concurrent communication and mobility problem. The proposed system consists of a two-layer feedback loop. An outer loop performs infrequent global coordination and a local inner loop determines motion and communication variables. This system provides the lightweight coordination and responsiveness of decentralized systems while avoiding local minima. This allows a team to complete a task in complex environments while maintaining desired end-to-end data rates. The behavior of the system is evaluated in experiments that demonstrate: 1) successful task completion in complex environments; 2) achievement of equal or greater end-to-end data rates as compared to a centralized system; and 3) robustness to unexpected events such as motion restriction.

61 citations


Proceedings ArticleDOI
04 Apr 2017
TL;DR: To build DCatch, a set of happens-before rules are designed that model a wide variety of communication and concurrency mechanisms in real-world distributed cloud systems, and tools to help prune false positives and trigger DCbugs are designed.
Abstract: In big data and cloud computing era, reliability of distributed systems is extremely important. Unfortunately, distributed concurrency bugs, referred to as DCbugs, widely exist. They hide in the large state space of distributed cloud systems and manifest non-deterministically depending on the timing of distributed computation and communication. Effective techniques to detect DCbugs are desired. This paper presents a pilot solution, DCatch, in the world of DCbug detection. DCatch predicts DCbugs by analyzing correct execution of distributed systems. To build DCatch, we design a set of happens-before rules that model a wide variety of communication and concurrency mechanisms in real-world distributed cloud systems. We then build runtime tracing and trace analysis tools to effectively identify concurrent conflicting memory accesses in these systems. Finally, we design tools to help prune false positives and trigger DCbugs. We have evaluated DCatch on four representative open-source distributed cloud systems, Cassandra, Hadoop MapReduce, HBase, and ZooKeeper. By monitoring correct execution of seven workloads on these systems, DCatch reports 32 DCbugs, with 20 of them being truly harmful.

46 citations


Journal ArticleDOI
01 Aug 2017
TL;DR: Asynchronous Parallel Table Replication (ATR) employs a novel optimistic lock-free parallel log replay scheme which exploits characteristics of multi-version concurrency control (MVCC) in order to enable real-time reporting by minimizing the propagation delay between the primary and replicas.
Abstract: Modern in-memory database systems are facing the need of efficiently supporting mixed workloads of OLTP and OLAP. A conventional approach to this requirement is to rely on ETL-style, application-driven data replication between two very different OLTP and OLAP systems, sacrificing real-time reporting on operational data. An alternative approach is to run OLTP and OLAP workloads in a single machine, which eventually limits the maximum scalability of OLAP query performance. In order to tackle this challenging problem, we propose a novel database replication architecture called Asynchronous Parallel Table Replication (ATR). ATR supports OLTP workloads in one primary machine, while it supports heavy OLAP workloads in replicas. Here, row-store formats can be used for OLTP transactions at the primary, while column-store formats are used for OLAP analytical queries at the replicas. ATR is designed to support elastic scalability of OLAP query performance while it minimizes the overhead for transaction processing at the primary and minimizes CPU consumption for replayed transactions at the replicas. ATR employs a novel optimistic lock-free parallel log replay scheme which exploits characteristics of multi-version concurrency control (MVCC) in order to enable real-time reporting by minimizing the propagation delay between the primary and replicas. Through extensive experiments with a concrete implementation available in a commercial database system, we demonstrate that ATR achieves sub-second visibility delay even for update-intensive workloads, providing scalable OLAP performance without notable overhead to the primary.

31 citations


Journal ArticleDOI
01 Oct 2017
TL;DR: An extensive experimental study is presented to understand the impact of each system architecture on overall scalability, the interaction between system architecture and concurrency control protocols, and the pros and cons of new architectures that have been proposed recently to explicitly deal with high-contention workloads.
Abstract: Main-memory OLTP engines are being increasingly deployed on multicore servers that provide abundant thread-level parallelism. However, recent research has shown that even the state-of-the-art OLTP engines are unable to exploit available parallelism for high contention workloads. While previous studies have shown the lack of scalability of all popular concurrency control protocols, they consider only one system architecture---a non-partitioned, shared everything one where transactions can be scheduled to run on any core and can access any data or metadata stored in shared memory.In this paper, we perform a thorough analysis of the impact of other architectural alternatives (Data-oriented transaction execution, Partitioned Serial Execution, and Delegation) on scalability under high contention scenarios. In doing so, we present Trireme, a main-memory OLTP engine testbed that implements four system architectures and several popular concurrency control protocols in a single code base. Using Trireme, we present an extensive experimental study to understand i) the impact of each system architecture on overall scalability, ii) the interaction between system architecture and concurrency control protocols, and iii) the pros and cons of new architectures that have been proposed recently to explicitly deal with high-contention workloads.

30 citations


Proceedings ArticleDOI
03 Apr 2017
TL;DR: This paper uses a recently proposed isolation level, called Non-Monotonic Snapshot Isolation, to achieve ACID transactions with low latency, and presents Blotter, a geo-replicated system that leverages these semantics in the design of a new concurrency control protocol that leaves a small amount of local state during reads to make commits more efficient.
Abstract: Most geo-replicated storage systems use weak consistency to avoid the performance penalty of coordinating replicas in different data centers. This departure from strong semantics poses problems to application programmers, who need to address the anomalies enabled by weak consistency. In this paper we use a recently proposed isolation level, called Non-Monotonic Snapshot Isolation, to achieve ACID transactions with low latency. To this end, we present Blotter, a geo-replicated system that leverages these semantics in the design of a new concurrency control protocol that leaves a small amount of local state during reads to make commits more efficient, which is combined with a configuration of Paxos that is tailored for good performance in wide area settings. Read operations always run on the local data center, and update transactions complete in a small number of message steps to a subset of the replicas. We implemented Blotter as an extension to Cassandra. Our experimental evaluation shows that Blotter has a small overhead at the data center scale, and performs better across data centers when compared with our implementations of the core Spanner protocol and of Snapshot Isolation on the same codebase.

29 citations


Proceedings ArticleDOI
09 May 2017
TL;DR: It is argued that the real issues that will have the most impact are not easily solved by more "clever" algorithms and instead, they can only be solved by hardware improvements and artificial intelligence.
Abstract: Most of the academic papers on concurrency control published in the last five years have assumed the following two design decisions: (1) applications execute transactions with serializable isolation and (2) applications execute most (if not all) of their transactions using stored procedures. But results from a recent survey of database administrators indicates that these assumptions are not realistic. This survey includes both legacy deployments where the cost of changing the application to use either serializable isolation or stored procedures is not feasible, as well as new "greenfield" projects that not encumbered by prior constraints. As such, the research produced by our community is not helping people with their real-world systems and thus is essentially irrelevant. I know this because I am guilty of writing these papers too. In this talk/denouncement, I will descend from my ivory tower and argue that we need to rethink our agenda for concurrency control research. Recent trends focus on asking the wrong questions and solving the wrong problems. I contend that the real issues that will have the most impact are not easily solved by more "clever" algorithms. Instead, in many cases, they can only be solved by hardware improvements and artificial intelligence.

Proceedings ArticleDOI
09 May 2017
TL;DR: This paper proposes a novel approach for conflict resolution in MVCC for in-memory databases that maximizes the reuse of the computations done in the initial execution round, and increases the transaction processing throughput.
Abstract: The optimistic variants of Multi-Version Concurrency Control (MVCC) avoid blocking concurrent transactions at the cost of having a validation phase. Upon failure in the validation phase, the transaction is usually aborted and restarted from scratch. The "abort and restart" approach becomes a performance bottleneck for use cases with high contention objects or long running transactions. In addition, restarting from scratch creates a negative feedback loop in the system, because the system incurs additional overhead that may create even more conflicts. In this paper, we propose a novel approach for conflict resolution in MVCC for in-memory databases. This low overhead approach summarizes the transaction programs in the form of a dependency graph. The dependency graph also contains the constructs used in the validation phase of the MVCC algorithm. Then, when encountering conflicts among transactions, our mechanism quickly detects the conflict locations in the program and partially re-executes the conflicting transactions. This approach maximizes the reuse of the computations done in the initial execution round, and increases the transaction processing throughput.

Journal ArticleDOI
TL;DR: A novel prototyping technique for concurrent control systems implemented in field programmable gate array (FPGA) devices is proposed in the paper, which allows for dynamic modification of the implemented system.
Abstract: A novel prototyping technique for concurrent control systems implemented in field programmable gate array (FPGA) devices is proposed in the paper. The method allows for dynamic modification of the implemented system. It means that the functionality of a part of the controller can be changed, while the rest of the system is still running. The approach applies to unified modeling language state machine diagrams as a specification of the system. Contrary to other methods, the presented concept requires neither major changes to the design, nor the application of external, specialized tools. The proposed idea has been experimentally verified with the use of Xilinx FPGAs.

Journal ArticleDOI
TL;DR: This paper presents an empirical study focusing on understanding the differences and similarities between concurrency bugs and other bugs, as well as the differences among various concurrency bug types in terms of their severity and their fixing time, and reproducibility.
Abstract: Concurrent programming puts demands on software debugging and testing, as concurrent software may exhibit problems not present in sequential software, e.g., deadlocks and race conditions. In aiming to increase efficiency and effectiveness of debugging and bug-fixing for concurrent software, a deep understanding of concurrency bugs, their frequency and fixing-times would be helpful. Similarly, to design effective tools and techniques for testing and debugging concurrent software, understanding the differences between non-concurrency and concurrency bugs in real-word software would be useful. This paper presents an empirical study focusing on understanding the differences and similarities between concurrency bugs and other bugs, as well as the differences among various concurrency bug types in terms of their severity and their fixing time, and reproducibility. Our basis is a comprehensive analysis of bug reports covering several generations of five open source software projects. The analysis involves a total of 11860 bug reports from the last decade, including 351 reports related to concurrency bugs. We found that concurrency bugs are different from other bugs in terms of their fixing time and severity while they are similar in terms of reproducibility. Our findings shed light on concurrency bugs and could thereby influence future design and development of concurrent software, their debugging and testing, as well as related tools.

Proceedings Article
01 Jan 2017
TL;DR: Adapt concurrency control (ACC) is presented, that dynamically clusters data and chooses the optimal Concurrency control protocol for each cluster, to address three key challenges: how to cluster data to minimize cross-cluster access and maintain load-balancing, how to model workloads and perform protocol selection accordingly.
Abstract: Use of transactional multicore main-memory databases is growing due to dramatic increases in memory size and CPU cores available for a single machine. To leverage these resources, recent concurrency control protocols have been proposed for main-memory databases, but are largely optimized for specific workloads. Due to shifting and unknown access patterns, workloads may change and one specific algorithm cannot dynamically fit all varied workloads. Thus, it is desirable to choose the right concurrency control protocol for a given workload. To address this issue we present adaptive concurrency control (ACC), that dynamically clusters data and chooses the optimal concurrency control protocol for each cluster. ACC addresses three key challenges: i) how to cluster data to minimize cross-cluster access and maintain load-balancing, ii) how to model workloads and perform protocol selection accordingly, and iii) how to support mixed concurrency control protocols running simultaneously. In this paper, we outline these challenges and present preliminary results.

Journal ArticleDOI
01 Aug 2017
TL;DR: The serial safety net (SSN) as discussed by the authors is a serializability-enforcing certifier which can be applied on top of various CC schemes that offer higher performance but admit anomalies, such as snapshot isolation and read committed.
Abstract: Concurrency control (CC) algorithms must trade off strictness for performance. In particular, serializable CC schemes generally pay higher cost to prevent anomalies, both in runtime overhead such as the maintenance of lock tables and in efforts wasted by aborting transactions. We propose the serial safety net (SSN), a serializability-enforcing certifier which can be applied on top of various CC schemes that offer higher performance but admit anomalies, such as snapshot isolation and read committed. The underlying CC mechanism retains control of scheduling and transactional accesses, while SSN tracks the resulting dependencies. At commit time, SSN performs a validation test by examining only direct dependencies of the committing transaction to determine whether it can commit safely or must abort to avoid a potential dependency cycle. SSN performs robustly for a variety of workloads. It maintains the characteristics of the underlying CC without biasing toward a certain type of transactions, though the underlying CC scheme might. Besides traditional OLTP workloads, SSN also efficiently handles heterogeneous workloads which include a significant portion of long, read-mostly transactions. SSN can avoid tracking the vast majority of reads (thus reducing the overhead of serializability certification) and still produce serializable executions with little overhead. The dependency tracking and validation tests can be done efficiently, fully parallel and latch-free, for multi-version systems on modern hardware with substantial core count and large main memory. We demonstrate the efficiency, accuracy and robustness of SSN using extensive simulations and an implementation that overlays snapshot isolation in ERMIA, a memory-optimized OLTP engine that supports multiple CC schemes. Evaluation results confirm that SSN is a promising approach to serializability with robust performance and low overhead for various workloads.

Proceedings ArticleDOI
09 May 2017
TL;DR: Tebaldi partitions conflicts at a fine granularity and matches them to specialized CCs within a hierarchical framework that is modular, extensible, and able to support a wide variety of concurrency control techniques, from single-version to multiversion and from lock-based to timestamp-based.
Abstract: This paper presents Tebaldi, a distributed key-value store that explores new ways to harness the performance opportunity of combining different specialized concurrency control mechanisms (CCs) within the same database. Tebaldi partitions conflicts at a fine granularity and matches them to specialized CCs within a hierarchical framework that is modular, extensible, and able to support a wide variety of concurrency control techniques, from single-version to multiversion and from lock-based to timestamp-based. When running the TPC-C benchmark, Tebaldi yields more than 20× the throughput of the basic two-phase locking protocol, and over 3.7× the throughput of Callas, a recent system that, like Tebaldi, aims to combine different CCs.

Proceedings ArticleDOI
26 Jan 2017
TL;DR: Evaluation using key-value store benchmarks on a 20-core HTM-capable multi-core machine shows that Eunomia leads to 5X-11X speedup under high contention, while incurring small overhead under low contention.
Abstract: While hardware transactional memory (HTM) has recently been adopted to construct efficient concurrent search tree structures, such designs fail to deliver scalable performance under contention. In this paper, we first conduct a detailed analysis on an HTM-based concurrent B+Tree, which uncovers several reasons for excessive HTM aborts induced by both false and true conflicts under contention. Based on the analysis, we advocate Eunomia, a design pattern for search trees which contains several principles to reduce HTM aborts, including splitting HTM regions with version-based concurrency control to reduce HTM working sets, partitioned data layout to reduce false conflicts, proactively detecting and avoiding true conflicts, and adaptive concurrency control. To validate their effectiveness, we apply such designs to construct a scalable concurrent B+Tree using HTM. Evaluation using key-value store benchmarks on a 20-core HTM-capable multi-core machine shows that Eunomia leads to 5X-11X speedup under high contention, while incurring small overhead under low contention.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: This paper presents the first formal verification of a pessimistic software TM algorithm, namely, an algorithm proposed by Matveev and Shavit, and proves that this pessimistic TM is a refinement of an intermediate opaque I/O-automaton, known as TMS2.
Abstract: Transactional Memory (TM) is a high-level programming abstraction for concurrency control that provides programmers with the illusion of atomically executing blocks of code, called transactions. TMs come in two categories, optimistic and pessimistic, where in the latter transactions never abort. While this simplifies the programming model, high-performing pessimistic TMs can be complex. In this paper, we present the first formal verification of a pessimistic software TM algorithm, namely, an algorithm proposed by Matveev and Shavit. The correctness criterion used is opacity, formalising the transactional atomicity guarantees. We prove that this pessimistic TM is a refinement of an intermediate opaque I/O-automaton, known as TMS2. To this end, we develop a rely-guarantee approach for reducing the complexity of the proof. Proofs are mechanised in the interactive prover Isabelle.

Journal ArticleDOI
23 Apr 2017-Symmetry
TL;DR: In order to generate a valid, unique, and symmetric queue among collaborative sites, a set of correlated mechanisms is presented in this paper, in which all Co-CAD sites maintain symmetric and consistent operating procedures.
Abstract: One basic issue with collaborative computer aided design (Co-CAD) is how to maintain valid and consistent modeling results across all design sites. Moreover, modeling history is important in parametric CAD modeling. Therefore, different from a typical co-editing approach, this paper proposes a novel method for Co-CAD synchronization, in which all Co-CAD sites maintain symmetric and consistent operating procedures. Consequently, the consistency of both modeling results and history can be achieved. In order to generate a valid, unique, and symmetric queue among collaborative sites, a set of correlated mechanisms is presented in this paper. Firstly, the causal relationship of operations is maintained. Secondly, the operation queue is reconstructed for partial concurrency operation, and the concurrent operation can be retrieved. Thirdly, a symmetric, concurrent operation control strategy is proposed to determine the order of operations and resolve possible conflicts. Compared with existing Co-CAD consistency methods, the proposed method is convenient and flexible in supporting collaborative design. The experiment performed based on the collaborative modeling procedure demonstrates the correctness and applicability of this work.

Patent
10 May 2017
TL;DR: In this paper, a high concurrent data transmission method based on RDMA (Remote Direct Memory Access) is proposed, which comprises the steps of establishing a graded buffer area before a client accesses a data transmission system; actively carrying out data transfer among a user buffer area, the graded buffer areas and a remote memory area by the client when remote data read/write is carried out; and setting a lock field at the head part of each independent data block of the remote memory areas by a server, wherein the lock field is used for concurrent control, when a plurality of clients concurrently
Abstract: The invention discloses a high concurrent data transmission method based on RDMA (Remote Direct Memory Access). The method comprises the steps of establishing a graded buffer area before a client accesses a data transmission system; actively carrying out data transfer among a user buffer area, the graded buffer area and a remote memory area by the client when remote data read/write is carried out; and setting a lock field at the head part of each independent data block of the remote memory area by a server, wherein the lock field is used for concurrent control, when a plurality of clients concurrently read/write data, the concurrent control is carried out through a distributed lock protocol that the server locks locally and the clients unlock remotely. The method has the advantages that the data replication is reduced when a file is read/written, the processing pressure of the server is reduced, and the efficient concurrent control is provided.

Journal ArticleDOI
TL;DR: The implementation reveals that HiperTM guarantees 0% of out-of-order optimistic deliveries and performance up to 3.5× better than atomic broadcast-based competitor (PaxosSTM) using the standard configuration of TPC-C benchmark.

Proceedings ArticleDOI
24 Sep 2017
TL;DR: This paper proposes a simpler and leaner protocol for serializable read-only write-only transactions, which uses only one round trip to commit a transaction in the absence of failures irrespective of contention, and integrates this protocol into ALOHA-KV, a scalable distributed key-value store for read- only write- only transactions.
Abstract: There is a trend in recent database research to pursue coordination avoidance and weaker transaction isolation under a long-standing assumption: concurrent serializable transactions under read-write or write-write conflicts require costly synchronization, and thus may incur a steep price in terms of performance. In particular, distributed transactions, which access multiple data items atomically, are considered inherently costly. They require concurrency control for transaction isolation since both read-write and write-write conflicts are possible, and they rely on distributed commitment protocols to ensure atomicity in the presence of failures. This paper presents serializable read-only and write-only distributed transactions as a counterexample to show that concurrent transactions can be processed in parallel with low-overhead despite conflicts. Inspired by the slotted ALOHA network protocol, we propose a simpler and leaner protocol for serializable read-only write-only transactions, which uses only one round trip to commit a transaction in the absence of failures irrespective of contention. Our design is centered around an epoch-based concurrency control (ECC) mechanism that minimizes synchronization conflicts and uses a small number of additional messages whose cost is amortized across many transactions. We integrate this protocol into ALOHA-KV, a scalable distributed key-value store for read-only write-only transactions, and demonstrate that the system can process close to 15 million read/write operations per second per server when each transaction batches together thousands of such operations.

Posted Content
TL;DR: AnKerDB as discussed by the authors is a heterogeneous transaction processing system that outsources OLAP transactions to run on separate (virtual) snapshots while OLTP transactions run on the most recent representation of the database.
Abstract: Efficient transactional management is a delicate task. As systems face transactions of inherently different types, ranging from point updates to long running analytical computations, it is hard to satisfy their individual requirements with a single processing component. Unfortunately, most systems nowadays rely on such a single component that implements its parallelism using multi-version concurrency control (MVCC). While MVCC parallelizes short-running OLTP transactions very well, it struggles in the presence of mixed workloads containing long-running scan-centric OLAP queries, as scans have to work their way through large amounts of versioned data. To overcome this problem, we propose a system, which reintroduces the concept of heterogeneous transaction processing: OLAP transactions are outsourced to run on separate (virtual) snapshots while OLTP transactions run on the most recent representation of the database. Inside both components, MVCC ensures a high degree of concurrency. The biggest challenge of such a heterogeneous approach is to generate the snapshots at a high frequency. Previous approaches heavily suffered from the tremendous cost of snapshot creation. In our system, we overcome the restrictions of the OS by introducing a custom system call vm_snapshot, that is hand-tailored to our precise needs: it allows fine-granular snapshot creation at very high frequencies, rendering the snapshot creation phase orders of magnitudes faster than state-of-the-art approaches. Our experimental evaluation on a heterogeneous workload based on TPC-H transactions and handcrafted OLTP transactions shows that our system enables significantly higher analytical transaction throughputs on mixed workloads than homogeneous approaches. In this sense, we introduce a system that accelerates Analytical processing by introducing custom Kernel functionalities: AnKerDB.

Journal ArticleDOI
TL;DR: This article presents Transactional Correctness tool for Abstract Data Types (TxC-ADT), the first tool that can check the correctness of transactional data structures and presents a technique for defining correctness as a happens-before relation, an essential aspect for checking correctness of transactions that synchronize only for high-level semantic conflicts.
Abstract: Transactional memory simplifies multiprocessor programming by providing the guarantee that a sequential block of code in the form of a transaction will exhibit atomicity and isolation. Transactional data structures offer the same guarantee to concurrent data structures by enabling the atomic execution of a composition of operations. The concurrency control of transactional memory systems preserves atomicity and isolation by detecting read/write conflicts among multiple concurrent transactions. State-of-the-art transactional data structures improve on this concurrency control protocol by providing explicit transaction-level synchronization for only non-commutative operations. Since read/write conflicts are handled by thread-level concurrency control, the correctness of transactional data structures cannot be evaluated according to the read/write histories. This presents a challenge for existing correctness verification techniques for transactional memory, because correctness is determined according to the transitions taken by the transactions in the presence of read/write conflicts.In this article, we present Transactional Correctness tool for Abstract Data Types (TxC-ADT), the first tool that can check the correctness of transactional data structures. TxC-ADT elevates the standard definitions of transactional correctness to be in terms of an abstract data type, an essential aspect for checking correctness of transactions that synchronize only for high-level semantic conflicts. To accommodate a diverse assortment of transactional correctness conditions, we present a technique for defining correctness as a happens-before relation. Defining a correctness condition in this manner enables an automated approach in which correctness is evaluated by generating and analyzing a transactional happens-before graph during model checking. A transactional happens-before graph is maintained on a per-thread basis, making our approach applicable to transactional correctness conditions that do not enforce a total order on a transactional execution. We demonstrate the practical applications of TxC-ADT by checking Lock Free Transactional Transformation and Transactional Data Structure Libraries for serializability, strict serializability, opacity, and causal consistency.

Patent
10 May 2017
TL;DR: Disclosed as discussed by the authors is a programming model for the definition of services to be operated on large sets of data with numerous responsibilities, the programming model comprising program units in a tree topology for high performance and implicit concurrency control, where each program unit definition comprises responsibilities defined in behaviors and configurations.
Abstract: Disclosed is a programming model utilized for the definition of services to be operated on large sets of data with numerous responsibilities, the programming model comprising program units in a tree topology for high performance and implicit concurrency control, where each program unit definition comprises responsibilities defined in behaviors and configurations. A runtime environment may be utilized to provide implicit concurrency, parallelization, and concurrency control for operations executed on program unit instances.

Journal ArticleDOI
27 Dec 2017
TL;DR: In this paper, the authors present a program logic that enables compositional reasoning about the behavior of concurrently executing weakly-isolated transactions, and they also describe an inference procedure based on this foundation that ascertains the weakest isolation level that still guarantees the safety of high-level consistency assertions associated with such transactions.
Abstract: Serializability is a well-understood correctness criterion that simplifies reasoning about the behavior of concurrent transactions by ensuring they are isolated from each other while they execute. However, enforcing serializable isolation comes at a steep cost in performance because it necessarily restricts opportunities to exploit concurrency even when such opportunities would not violate application-specific invariants. As a result, database systems in practice support, and often encourage, developers to implement transactions using weaker alternatives. These alternatives break the strong isolation guarantees offered by serializable transactions to permit greater concurrency. Unfortunately, the semantics of weak isolation is poorly understood, and usually explained only informally in terms of low-level implementation artifacts. Consequently, verifying high-level correctness properties in such environments remains a challenging problem. To address this issue, we present a novel program logic that enables compositional reasoning about the behavior of concurrently executing weakly-isolated transactions. Recognizing that the proof burden necessary to use this logic may dissuade application developers, we also describe an inference procedure based on this foundation that ascertains the weakest isolation level that still guarantees the safety of high-level consistency assertions associated with such transactions. The key to effective inference is the observation that weakly-isolated transactions can be viewed as functional (monadic) computations over an abstract database state, allowing us to treat their operations as state transformers over the database. This interpretation enables automated verification using off-the-shelf SMT solvers. Our development is parametric over a transaction’s specific isolation semantics, allowing it to be applicable over a range of concurrency control mechanisms. Case studies and experiments on real-world applications (written in an embedded DSL in OCaml) demonstrate the utility of our approach, and provide strong evidence that automated verification of weakly-isolated transactions can be placed on the same formal footing as their strongly-isolated serializable counterparts.

Proceedings ArticleDOI
01 Sep 2017
TL;DR: A new priority heuristic method based on deadline computed by considering write only operations has been proposed for wireless environments which shows better results than earlier priority heuristics.
Abstract: Priority scheduling among running transactions is one of the most important issues in the design of mobile distributed real-time database systems (MDRTDBS). In MDRTDBS, to perform correct transaction scheduling, several priority heuristics with different concurrency control methods are used, so that, it could minimize transaction abort rate. Priority heuristic approaches deal with problem of assigning the priorities among transactions so that it could helpful with concurrency control (CC) mechanism to achieve typical time constraint. In recent few years, the performance of CC protocols of distributed real-time database Systems (DRTDBS) have been examined using different priority heuristics methods. However, very few numbers of approaches have been proposed on priority heuristics for wireless environments. Hence, a new priority heuristic method based on deadline computed by considering write only operations has been proposed for wireless environments. It is improved priority heuristic approach which shows better results than earlier priority heuristics. In recent years, researchers have classified the transactions in two types called as Read only transactions (ROT) and Update transaction. This new priority heuristic for mobile environment considers ROT and update transactions separately. Further, a study has also been done to examine the impact of these priority heuristics as compared with number of locks and mixed method approaches.

Journal ArticleDOI
TL;DR: A situationally adaptive scheduler (SAS) that learns the architectural choices offline using synthetically generated graphs that is comparable to the optimal setup that optimizes both algorithmic and architectural choices is proposed.
Abstract: Situational dynamic changes in graph analytic algorithm implementations give rise to efficiency challenges in concurrent hardware, such as GPUs and large-scale multicores. These performance variations stem from input dependence, such as the density and degree of the graph being processed. Consequently, concurrency control becomes challenging, because the complex data-dependent behavior in these workloads exhibits a range of plausible algorithmic and architectural choices. This article addresses the question of how to efficiently harness the multidimensional search space of such choices for graph analytic workloads in a real-time execution environment. A key insight is that architectural choices are sufficient to yield a concurrency control setting that is comparable to the optimal setup that optimizes both algorithmic and architectural choices. The authors propose a situationally adaptive scheduler (SAS) that learns the architectural choices offline using synthetically generated graphs. SAS-assisted execution in a real-time setup provides geometric performance gains of 40 percent for a large-scale GPU (Nvidia GTX-970), 35 percent for a smaller GPU (Nvidia GTX- 750Ti), and 30 percent for a large-scale multicore (Intel Xeon Phi).

Proceedings ArticleDOI
20 Sep 2017
TL;DR: This paper designs a set of experiments that allow them to shed lights on the internal mechanisms used in TSX to manage conflicts among transactions and to track their readsets and writesets, and builds an analytical model of TSX focused on capturing the impact on performance of two key mechanisms.
Abstract: This paper investigates the problem of deriving a white box performance model of Hardware Transactional Memory (HTM) systems. The proposed model targets TSX, a popular implementation of HTM integrated in Intel processors starting with the Haswell family in 2013.An inherent difficulty with building white-box models of commercially available HTM systems is that their internals are either vaguely documented or undisclosed by their manufacturers. We tackle this challenge by designing a set of experiments that allow us to shed lights on the internal mechanisms used in TSX to manage conflicts among transactions and to track their readsets and writesets.We exploit the information inferred from this experimental study to build an analytical model of TSX focused on capturing the impact on performance of two key mechanisms: the concurrency control scheme and the management of transactional meta-data in the processor's caches. We validate the proposed model by means of an extensive experimental study encompassing a broad range of workloads executed on a real system.