scispace - formally typeset
Search or ask a question

Showing papers on "Concurrency control published in 2021"


Proceedings ArticleDOI
26 Oct 2021
TL;DR: PACTree as mentioned in this paper is a hybrid index that employs a trie index for its internal nodes and B+-tree-like leaf nodes, and decouple the trie indices from the critical path of the index to prevent blocking concurrent accesses.
Abstract: Non-Volatile Memory (NVM), which provides relatively fast and byte-addressable persistence, is now commercially available. However, we cannot equate a real NVM with a slow DRAM, as it is much more complicated than we expect. In this work, we revisit and analyze both NVM and NVM-specific persistent memory indexes. We find that there is still a lot of room for improvement if we consider NVM hardware, its software stack, persistent index design, and concurrency control. Based on our analysis, we propose Packed Asynchronous Concurrency (PAC) guidelines for designing high-performance persistent index structures. The key idea behind the guidelines is to 1) access NVM hardware in a packed manner to minimize its bandwidth utilization and 2) exploit asynchronous concurrency control to decouple the long NVM latency from the critical path of the index. We develop PACTree, a high-performance persistent range index following the PAC guidelines. PACTree is a hybrid index that employs a trie index for its internal nodes and B+-tree-like leaf nodes. The trie index structure packs partial keys in internal nodes. Moreover, we decouple the trie index and B+-tree-like leaf nodes. The decoupling allows us to prevent blocking concurrent accesses by updating internal nodes asynchronously. Our evaluation shows that PACTree outperforms state-of-the-art persistent range indexes by 7x in performance and 20x in 99.99 percentile tail latency.

24 citations


Proceedings ArticleDOI
01 Apr 2021
TL;DR: In concurrent consensus as mentioned in this paper, replicas independently propose transactions, thereby reducing the influence of any single replica on performance, which can achieve up to 2.75× higher throughput than other consensus protocols and can be scaled to 91 replicas.
Abstract: Recently, we saw the emergence of consensus-based database systems that promise resilience against failures, strong data provenance, and federated data management. Typically, these fully-replicated systems are operated on top of a primary-backup consensus protocol, which limits the throughput of these systems to the capabilities of a single replica (the primary).To push throughput beyond this single-replica limit, we propose concurrent consensus. In concurrent consensus, replicas independently propose transactions, thereby reducing the influence of any single replica on performance. To put this idea in practice, we propose our RCC paradigm that can turn any primary-backup consensus protocol into a concurrent consensus protocol by running many consensus instances concurrently. RCC is designed with performance in mind and requires minimal coordination between instances. Furthermore, RCC also promises increased resilience against failures. We put the design of RCC to the test by implementing it in ResilientDB, our high-performance resilient blockchain fabric, and comparing it with state-of-the-art primary-backup consensus protocols. Our experiments show that RCC achieves up to 2.75× higher throughput than other consensus protocols and can be scaled to 91 replicas.

17 citations


Journal ArticleDOI
TL;DR: A novel two-phase concurrency control protocol to optimize both phases for the first time that outperforms state-of-art solutions significantly and is further optimized by integrating with PBFT.
Abstract: Although the emergence of the programmable smart contract makes blockchain systems easily embrace a wide range of industrial services, how to execute smart contracts efficiently becomes a big challenge nowadays. Due to the existence of Byzantine nodes, existing mature concurrency control protocols in database cannot be employed directly, since the mechanism of executing smart contracts varies a lot. Furthermore, even though smart contract execution follows a two-phase style, i.e., the primary node executes a batch of smart contracts in the first phase and the validators replay them in the second phase, existing parallel solutions merely focus on the optimization for the first phase, rather than the second phase. In this paper, we propose a novel two-phase concurrency control protocol to optimize both phases for the first time. First, the primary executes transactions in parallel and generates a transaction dependency graph with high parallelism for validators. Then, a graph partition algorithm is devised to divide the original graph into several sub-graphs to preserve parallelism and reduce communication cost remarkably. Finally, we propose a deterministic replay protocol to re-execute the primary's parallel schedule concurrently. Moreover, this two-phase protocol is further optimized by integrating with PBFT. Theoretical analysis and extensive experimental results illustrate that the proposed scheme outperforms state-of-art solutions significantly.

15 citations


Proceedings ArticleDOI
09 Jun 2021
TL;DR: Bamboo as discussed by the authors is a concurrency control protocol that can enable parallelism by modifying the conventional two-phase locking, while maintaining the same guarantees in correctness, and it achieves a speedup up to 19x over baselines.
Abstract: Hotspots, a small set of tuples frequently read/written by a large number of transactions, cause contention in a concurrency control protocol. While a hotspot may comprise only a small fraction of a transaction's execution time, conventional strict two-phase locking allows a transaction to release lock only after the transaction completes, which leaves significant parallelism unexploited. Ideally, a concurrency control protocol serializes transactions only for the duration of the hotspots, rather than the duration of transactions. We observe that exploiting such parallelism requires violating two-phase locking. In this paper, we propose Bamboo, a new concurrency control protocol that can enable such parallelism by modifying the conventional two-phase locking, while maintaining the same guarantees in correctness. We thoroughly analyzed the effect of cascading aborts involved in reading uncommitted data and discussed optimizations that can be applied to further improve the performance. Our evaluation on TPC-C shows a performance improvement up to 4x compared to the best of pessimistic and optimistic baseline protocols. On synthetic workloads that contain a single hotspot, Bamboo achieves a speedup up to 19x over baselines.

14 citations


Proceedings ArticleDOI
26 Oct 2021
TL;DR: Caracal as mentioned in this paper is a shared-memory, deterministic database that performs well under both skew and contention by batching concurrency control operations in an epoch in a predetermined order.
Abstract: Deterministic databases offer several benefits: they ensure serializable execution while avoiding concurrency-control related aborts, and they scale well in distributed environments. Today, most deterministic database designs use partitioning to scale up and avoid contention. However, partitioning requires significant programmer effort, leads to poor performance under skewed workloads, and incurs unnecessary overheads in certain uncontended workloads. We present the design of Caracal, a novel shared-memory, deterministic database that performs well under both skew and contention. Our deterministic scheme batches transactions in epochs and executes the transactions in an epoch in a predetermined order. Our scheme enables reducing contention by batching concurrency control operations. It also allows analyzing the transactions in the epoch to determine contended keys accurately. Certain transactions can then be split into independent contended and uncontended pieces and run deterministically and in parallel, further reducing contention. Based on these ideas, we present two novel optimizations, batch append and split-on-demand, for managing contention. With these optimizations, Caracal scales well and outperforms existing deterministic schemes in most workloads by 1.9x to 9.7x.

13 citations


Proceedings ArticleDOI
17 Feb 2021
TL;DR: In this paper, the authors explore different trade-offs in terms of memory usage vs. number of fences and flushes for durable transactions on persistent memory (PM) and present two new algorithms, named Trinity and Quadra, which implement each of them in the form of a user-level library persistent transactional memory (PTM).
Abstract: Durable techniques coupled with transactional semantics provide to application developers the guarantee that data is saved consistently in persistent memory (PM), even in the event of a non-corrupting failure. Persistence fences and flush instructions are known to have a significant impact on the throughput of persistent transactions. In this paper we explore different trade-offs in terms of memory usage vs. number of fences and flushes. We present two new algorithms, named Trinity and Quadra, for durable transactions on PM and implement each of them in the form of a user-level library persistent transactional memory (PTM). Quadra achieves the lower bound with respect to the number of persistence fences and executes one flush instruction per modified cache line. Trinity can be easily combined with concurrency control techniques based on fine grain locking, and we have integrated it with our TL2 adaptation, with eager locking and write-through update strategy. Moreover, the combination of Trinity and TL2 into a PTM provides good scalability for data structures and workloads with a disjoint access pattern. We used this disjoint PTM to implement a key-value (KV) store with durable linearizable transactions. When compared with previous work, our TL2 KV store provides better throughput in nearly all experiments.

11 citations


Journal ArticleDOI
TL;DR: RCC is developed, the first unified and comprehensive RDMA-enabled distributed transaction processing framework containing six serializable concurrency control protocols—not only the classical protocols NOWAIT, WAITDIE, OCC, but also more advanced MVCC, SUNDIAL, and CALVIN — the deterministic protocol.
Abstract: Online transaction processing (OLTP) is widely used on modern cloud infrastructures to complete important businesses such as payments and stock exchanges. Remote Direct Memory Access (RDMA) is a technology that enables ultra-low inter-server memory access latency - which is critical for implementing high-performance concurrency control protocols in distributed OLTP. In this paper, we develop RCC, the first unified and comprehensive RDMA-enabled distributed transaction processing framework containing six serializable concurrency control protocolsnot only the classical protocols NOWAIT, WAITDIE, OCC, but also more advanced MVCC, SUNDIAL, and CALVIN the deterministic protocol. Our goal is to unbiasedly compare protocols on OLTP workloads in a common execution environment with the concurrency control protocol being the only changeable component. From RCC, we obtained new insights on building RDMA-based protocols. We analyzed stage-wise latency breakdown to develop more efficient hybrid implementations. Moreover, RCC can enumerate all stage-wise hybrid designs under a given workload characteristic. Our results show that throughput-wise hybrid designs are better than RPC or one-sided counterparts by 32.2% and up to 67%; three hybrid designs are better than their pure counterparts by up to 17.8%. RCC can provide performance insights and be used as the common open-source infrastructure for fast prototyping new implementations.

8 citations


Proceedings ArticleDOI
09 Jun 2021
TL;DR: RisGraph as mentioned in this paper proposes a data structure named Indexed Adjacency Lists and uses sparse arrays and hybrid parallel mode to enable localized data access and inter-update parallelism.
Abstract: Evolving graphs in the real world are large-scale and constantly changing, as hundreds of thousands of updates may come every second. Monotonic algorithms such as Reachability and Shortest Path are widely used in real-time analytics to gain both static and temporal insights and can be accelerated by incremental computing. Existing streaming systems adopt the incremental computing model and achieve either low latency or high throughput, but not both. However, both high throughput and low latency are required in real scenarios such as financial fraud detection. This paper presents RisGraph, a real-time streaming system that provides low-latency analysis for each update with high throughput. RisGraph addresses the challenge with localized data access and inter-update parallelism. We propose a data structure named Indexed Adjacency Lists and use sparse arrays and Hybrid Parallel Mode to enable localized data access. To achieve inter-update parallelism, we propose a domain-specific concurrency control mechanism based on the classification of safe and unsafe updates. Experiments show that RisGraph can ingest millions of updates per second for graphs with several hundred million vertices and billions of edges, and the P999 processing time latency is within 20 milliseconds. RisGraph achieves orders-of-magnitude improvement on throughput when analyses are executed for each update without batching and performs better than existing systems with batches of up to 20 million updates.

8 citations


Proceedings ArticleDOI
Gyuyoung Kwauk1, Seungkwan Kang1, Hans Kasan1, Hyojun Son1, John Kim1 
01 Feb 2021
TL;DR: BoomGATE is proposed for deadlock avoidance in large-scale networks and complement RINR algorithm with opportunistic flow control (OFC) where “illegal routes” are allowed if and only if sufficient buffer can be guaranteed to ensure cyclical dependency does not occur.
Abstract: Avoiding routing deadlock is an important component of an interconnection network. For large-scale systems with high-radix topologies that leverage non-minimal adaptive routing, virtual channels (VCs) are commonly used to prevent routing deadlock. However, VCs in large-scale networks can be costly because of deep buffers and restrict VC usage. In this work, we propose BoomGATE for deadlock avoidance in large-scale networks. In particular, BoomGATE consists of two components – Restricted Intermediate-node Non-minimal Routing (RINR) algorithm and opportunistic flow control (OFC) which both exploit the low-diameter characteristics of high-radix networks while maximizing path diversity within the topology. We identify how routing deadlock in fully-connected topologies are caused by non-minimal routes and propose to restrict the non-minimal routing to ensure deadlock freedom without additional virtual channels. We also propose an algorithm that ensures path diversity is load-balanced across all nodes in the system. However, since path diversity is restricted with the RINR algorithm, complement RINR algorithm with opportunistic flow control (OFC) where “illegal routes” are allowed if and only if sufficient buffer can be guaranteed to ensure cyclical dependency does not occur. We propose both a static and dynamic OFC implementation. We evaluate the performance of BoomGATE and demonstrate there is minimal performance loss compared to global adaptive routing, while reducing the amount of buffers required by 50%.

7 citations


Journal ArticleDOI
TL;DR: In this article, a concurrent execution strategy based on concurrency degree optimization is proposed for performance optimization within a single shard, which increases the concurrency of smart contract execution by 39% on average and the transaction throughput of the whole system by 21%.
Abstract: Throughput performance is a critical issue in blockchain technology, especially in blockchain sharding systems. Although sharding proposals can improve transaction throughput by parallel processing, the essence of each shard is still a small blockchain. Using serial execution of smart contract transactions, performance has not significantly improved, and there is still room for improvement. A smart contract concurrent execution strategy based on concurrency degree optimization is proposed for performance optimization within a single shard. This strategy is applied to each shard. First, it characterizes the conflicting contract feature information by executing a smart contract, analyzing the factors that affect the concurrent execution of the smart contracts, and clustering the contract transaction. Second, in shards with high transaction frequency, considering the execution time, conflict rate, and available resources of contract transactions, finding a serializable schedule of contract transactions by redundant computation and a Variable Shadow Speculative Concurrency Control (SCC-VS) algorithm for smart contract scheduling is proposed. Finally, experimental results show that the strategy increases the concurrency of smart contract execution by 39% on average and the transaction throughput of the whole system by 21% on average.

6 citations


Book ChapterDOI
Chen Zhihao1, Xiaodong Qi1, Xiaofan Du1, Zhao Zhang1, Cheqing Jin1 
11 Apr 2021
TL;DR: PEEP as discussed by the authors employs a deterministic concurrency mechanism to obtain a predetermined serial order for parallel execution, and offers parallel update operations on state tree, which can be implemented on any radix tree with Merkle property.
Abstract: Unlike blockchain systems in public settings, the stricter trust model in permissioned blockchain opens an opportunity for pursuing higher throughput Recently, as the consensus protocols are developed significantly, the existing serial execution manner of transactions becomes a key factor in limiting overall performance However, it is not easy to extend the concurrency control protocols, widely used in database systems, to blockchain systems In particular, there are two challenges to achieve parallel execution of transactions in blockchain as follows: (i) the final results of different replicas may diverge since most protocols just promise the effect of transactions equivalent to some serial order but this order may vary for every concurrent execution; and (ii) almost all state trees that are used to manage states of blockchain do not support fast concurrent updates In the view of above challenges, we propose a parallel execution engine called PEEP, towards permissioned blockchain systems Specifically, PEEP employs a deterministic concurrency mechanism to obtain a predetermined serial order for parallel execution, and offers parallel update operations on state tree, which can be implemented on any radix tree with Merkle property Finally, the extensive experiments show that PEEP outperforms existing serial execution greatly

Proceedings ArticleDOI
20 Nov 2021
TL;DR: In this paper, the authors apply formal methods to specify and verify an industrial distributed database, Taurus, which uses a combination of several fundamental protocols, including Multi-Version Concurrency Control and Raft-based Cluster Management.
Abstract: Distributed database services are an increasingly important part of cloud computing. They are required to satisfy several key properties, including consensus and fault tolerance. Given the highly concurrent nature of these systems, subtle errors can arise that are difficult to discover through traditional testing methods. Formal verification can help in discovering bugs and ensuring correctness of these systems. In this paper, we apply formal methods to specify and verify an industrial distributed database, Taurus, which uses a combination of several fundamental protocols, including Multi-Version Concurrency Control and Raft-based Cluster Management. TLA\(^{+}\) is used to model an abstraction of the system and specify its properties. The properties are verified using the TLC model checker, as well as by theorem proving using the TLA proof system (TLAPS). We show that model checking is able to reproduce a bug in Taurus that was found during testing. But our most significant result is twofold: we successfully verified an abstract model of Taurus, and convinced our industrial partners of the usefulness of formal methods to industrial systems.

Proceedings ArticleDOI
06 Jul 2021
TL;DR: In this paper, the integration of ArrayQL inside a relational database system, either addressable through a separate query interface or integrated into SQL as user-defined functions, is described.
Abstract: Array database systems offer a declarative language for array-based access on multidimensional data. This study explains the integration of ArrayQL inside a relational database system, either addressable through a separate query interface or integrated into SQL as user-defined functions. With a relational database system as the target, we inherit the benefits such as query optimisation and multi-version concurrency control by design. Apart from SQL, having another query language allows processing the data without extraction or transformation out of its relational form. This is possible as we work on a relational array representation, for which we translate each ArrayQL operator into relational algebra. In our evaluation, ArrayQL within Umbra computes matrix operations faster than state of the art database extensions.

Proceedings Article
01 Jan 2021
TL;DR: The Deferred Action Framework is proposed, a new system architecture for scheduling maintenance tasks in an MVCC DBMS integrated with the system’s transactional semantics that can support garbage collection and index cleaning without compromising performance while facilitating higher-level implementation goals, such as non-blocking schema changes and self-driving optimizations.
Abstract: Almost every database management system (DBMS) supporting transactions created in the last decade implements multi-version concurrency control (MVCC). But these systems rely on physical data structures (e.g., B+trees, hash tables) that do not natively support multi-versioning. As a result, there is a disconnect between the logical semantics of transactions and the DBMS’s underlying implementation. System developers must invest in engineering efforts to coordinate transactional access to these data structures and nontransactional maintenance tasks. This burden leads to challenges when reasoning about the system’s correctness and performance and inhibits its modularity. In this paper, we propose the Deferred Action Framework (DAF), a new system architecture for scheduling maintenance tasks in an MVCC DBMS integrated with the system’s transactional semantics. DAF allows the system to register arbitrary actions and then defer their processing until they are deemed safe by transactional processing. We show that DAF can support garbage collection and index cleaning without compromising performance while facilitating higher-level implementation goals, such as non-blocking schema changes and self-driving optimizations.

Journal ArticleDOI
TL;DR: A taxonomy of concurrency control solutions is defined and the maturity of these solutions is assessed in the light of characteristics of real‐time and mobile environment.
Abstract: Recently, mobile computing has changed the way that spatial data and GIS are processed. Unlike wired and stand‐alone GIS, now the trend has been switched from offline to real‐time data processing using location aware services, such as GPS technology. The increased usage of location aware services in multiuser real‐time environment has made transaction management incredibly significant. If the simultaneous query operations on the same data item are not handled intelligently then this results in data inconsistency issue. Concurrency control protocol is one of the primary aspects that helps in overcoming this issue a in multiuser environment. To the best of our knowledge, the impact of technological advancements on concurrency control has not been thoroughly studied in the literature. In this article, we explored the literature on concurrency control algorithms in depth with respect to real‐time applications and the applications with moving objects. We defined a taxonomy of concurrency control solutions and assessed the maturity of these solutions in the light of characteristics of real‐time and mobile environment. We compared the most recent developments made in the literature and presented meaningful insights. Challenges are also identified and discussed, which can assist in doing research in this domain in future.

Journal ArticleDOI
TL;DR: This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability, to highlight the adoption of relational model approaches by bigdata techniques.
Abstract: A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system.

Proceedings ArticleDOI
01 Apr 2021
TL;DR: In this paper, a discriminative admission control mechanism for shared-everything database, referred to as DAC, is proposed to identify and classify high-conflict transactions according to the set of records they try to access, which is defined as a conflict zone.
Abstract: Due to the variability of IT applications, the back-end databases usually run the mixed OLTP workload, which comprises a variety of transactions. Some of these transactions are high-conflict and others are low-conflict. Furthermore, high-conflict transactions may contend on different groups of data stored in the database. Without precise admission control, too many transactions with conflict on the same group of records are simultaneously executed by the OLTP engine, and this will lead to the well-known problem of data-contention thrashing. Under mixed OLTP workloads, conflicting transactions would be blocked for a long time or rolled back finally, and other transactions have not enough opportunity to be processed.To achieve the optimal performance for each kind of transaction, we design a discriminative admission control mechanism for shared-everything database, referred to as DAC. DAC can quickly identify and classify high-conflict transactions according to the set of records they try to access, which is defined as a conflict zone. DAC makes admission control over OLTP transactions with the conflict zone as the granularity. By adaptively adjusting the transaction concurrency level for each zone, transaction blocking and waiting among the same kind of high-conflict transactions can be alleviated. Furthermore, thread resources are released to make the execution of low-conflict transactions less affected. We evaluate DAC using a main-memory database prototype and a classical disk-based database system. Experimental results demonstrate that DAC can help OLTP engine significantly improve the performance under mixed OLTP workloads.

Journal ArticleDOI
01 Jan 2021
TL;DR: This work introduces Ocean Vista—a novel distributed protocol that guarantees strict serializability and outperforms a leading distributed transaction processing engine (TAPIR) more than tenfold in terms of peak throughput, albeit at the cost of additional latency for gossip and a more restricted transaction model.
Abstract: Providing ACID transactions under conflicts across globally distributed data is the Everest of transaction processing protocols. Transaction processing in this scenario is particularly costly due to the high latency of cross-continent network links, which inflates concurrency control and data replication overheads. To mitigate the problem, we introduce Ocean Vista—a novel distributed protocol that guarantees strict serializability. We observe that concurrency control and replication address different aspects of resolving the visibility of transactions, and we address both concerns using a multi-version protocol that tracks visibility using version watermarks and arrives at correct visibility decisions using efficient gossip. Gossiping the watermarks enables asynchronous transaction processing and acknowledging transaction visibility in batches in the concurrency control and replication protocols, which improves efficiency under high cross-data center network delays. In particular, Ocean Vista can access conflicting transactions in parallel and supports efficient write-quorum/read-one access using one round trip in the common case. We demonstrate experimentally in a multi-data center cloud environment that our design outperforms a leading distributed transaction processing engine (TAPIR) more than tenfold in terms of peak throughput, albeit at the cost of additional latency for gossip and a more restricted transaction model. The latency penalty is generally bounded by one wide area network (WAN) round trip time (RTT), and in the best case (i.e., under light load) our system nearly breaks even with TAPIR by committing transactions in around one WAN RTT.

Proceedings ArticleDOI
19 Jun 2021
TL;DR: In this article, the authors present a sound fully-automated schema refactoring procedure that refactors a program's data layout to eliminate statically identified concurrency bugs, allowing more transactions to be safely executed under weaker and more performant database guarantees.
Abstract: Serializability is a well-understood concurrency control mechanism that eases reasoning about highly-concurrent database programs. Unfortunately, enforcing serializability has a high performance cost, especially on geographically distributed database clusters. Consequently, many databases allow programmers to choose when a transaction must be executed under serializability, with the expectation that transactions would only be so marked when necessary to avoid serious concurrency bugs. However, this is a significant burden to impose on developers, requiring them to (a) reason about subtle concurrent interactions among potentially interfering transactions, (b) determine when such interactions would violate desired invariants, and (c) then identify the minimum number of transactions whose executions should be serialized to prevent these violations. To mitigate this burden, this paper presents a sound fully-automated schema refactoring procedure that refactors a program’s data layout – rather than its concurrency control logic – to eliminate statically identified concurrency bugs, allowing more transactions to be safely executed under weaker and more performant database guarantees. Experimental results over a range of realistic database benchmarks indicate that our approach is highly effective in eliminating concurrency bugs, with safe refactored programs showing an average of 120% higher throughput and 45% lower latency compared to a serialized baseline.

Journal ArticleDOI
Chenchen Huang1, Huiqi Hu1, Xuecheng Qi1, Xuan Zhou1, Aoying Zhou1 
TL;DR: RS-Store as discussed by the authors is a key-value store with RDMA, which can overcome the CPU handle of the storage layer by enabling two access modes: local access and remote access.
Abstract: Many key-value stores use RDMA to optimize the messaging and data transmission between application layer and the storage layer, most of which only provide point-wise operations. Skiplist-based store can support both point operations and range queries, but its CPU-intensive access operations combined with the high-speed network will easily lead to the storage layer reaches CPU bottlenecks. The common solution to this problem is offloading some operations into the application layer and using RDMA bypassing CPU to directly perform remote access, but this method is only used in the hash table-based store. In this paper, we present RS-store, a skiplist-based key-value store with RDMA, which can overcome the CPU handle of the storage layer by enabling two access modes: local access and remote access. In RS-store, we redesign a novel data structure R-skiplist to save the communication cost in remote access, and implement a latch-free concurrency control mechanism to ensure all the concurrency during two access modes. RS-store also supports client-active range query which can reduce the storage layer’s CPU consumption. At last, we evaluate RS-store on an RDMA-capable cluster. Experimental results show that RS-store achieves up to 2x improvements over RDMA-enabled RocksDB on the throughput and application’s scalability.

Journal ArticleDOI
TL;DR: In this paper, the main concepts of database recovery and architectural choices to implement an in-memory database system are reviewed and the techniques to recover inmemory databases and discuss the recovery strategies of a representative sample of modern In-memory databases.
Abstract: Many of today’s applications need massive real-time data processing. In-memory database systems have become a good alternative for these requirements. These systems maintain the primary copy of the database in the main memory to achieve high throughput rates and low latency. However, a database in RAM is more vulnerable to failures than in traditional disk-oriented databases because of the memory volatility. DBMSs implement recovery activities (logging, checkpoint, and restart) for recovery proposes. Although the recovery component looks similar in disk- and memory-oriented systems, these systems differ dramatically in the way they implement their architectural components, such as data storage, indexing, concurrency control, query processing, durability, and recovery. This survey aims to provide a thorough review of in-memory database recovery techniques. To achieve this goal, we reviewed the main concepts of database recovery and architectural choices to implement an in-memory database system. Only then, we present the techniques to recover in-memory databases and discuss the recovery strategies of a representative sample of modern in-memory databases.

Book ChapterDOI
11 Apr 2021
TL;DR: In this article, the authors cover most of the issues and challenges with transaction scheduling algorithms in one place to put out the current research status and discuss the immediate future directions requiring actions/efforts by the modern data-driven research community.
Abstract: The multi-site real-time transactional data-analysis based applications and the underlying research efforts to improve the performance of such applications have got renewed attention by researchers in the last four years. It reveals that the current scenario possesses numerous unanswered and truly relevant issues and challenges requiring a multi-disciplinary research approach to work on and solve the core database transaction processing related issues. Our focus is to cover most of the issues and challenges with transaction scheduling algorithms in one place to put out the current research status. At a high level, the domains covered are—real-time priority assignment heuristics, real-time concurrency control protocols, and real-time commit processing. The article indeed guides towards the immediate-future directions requiring actions/ efforts by the modern data-driven research community.

Proceedings ArticleDOI
09 Aug 2021
TL;DR: SPMFS as mentioned in this paper partitions global metadata structures of the file system into per-core structures to distribute load and relieve contention, and designs a dedicated I/O thread pool to offer optimal parallelism inherent in underlying NVM regardless of varying user-threads.
Abstract: The first commercial Non-Volatile Memory (NVM) (i.e., Intel Optane DC Persistent Memory) exhibits limited parallelism, especially for write operations, which is generally neglected by existing NVM-aware file systems. Besides, the concurrent control of file systems also limits their scalability on high-performance NVMs under mainstream multi-core architectures. To effectively exploit full parallelism inherent in both NVMs and multi-core processors to enhance the overall performance, this paper proposes a novel scalable persistent memory file system, called SPMFS. SPMFS first partitions global metadata structures of the file system into per-core structures to distribute load and relieve contention. Second, SPMFS presents a fine-grained range lock to support concurrent accesses upon a file. Finally, SPMFS designs a dedicated I/O thread pool to offer optimal parallelism inherent in underlying NVM regardless of varying user-threads. We implement an SPMFS prototype and evaluate it under a variety of workloads generated by IOtest, Filebench, FIO, and production traces from Alibaba Pangu. The experiments show that SPMFS provides better scalability, and achieves up to 2.37 × write throughput improvement over state-of-the-art kernel NVM-aware file systems (Ext4-DAX, NOVA, and PMFS) and user-space file systems (Strata and Libnvmmio), without sacrificing the read performance.

Proceedings ArticleDOI
01 Jan 2021
TL;DR: The canonical amoebot model as mentioned in this paper formalizes all communication as message passing, leveraging adversarial activation models of concurrent executions, and embeds concurrency control directly in algorithm design.
Abstract: The amoebot model abstracts active programmable matter as a collection of simple computational elements called amoebots that interact locally to collectively achieve tasks of coordination and movement. Since its introduction (SPAA 2014), a growing body of literature has adapted its assumptions for a variety of problems; however, without a standardized hierarchy of assumptions, precise systematic comparison of results under the amoebot model is difficult. We propose the canonical amoebot model, an updated formalization that distinguishes between core model features and families of assumption variants. A key improvement addressed by the canonical amoebot model is concurrency. Much of the existing literature implicitly assumes amoebot actions are isolated and reliable, reducing analysis to the sequential setting where at most one amoebot is active at a time. However, real programmable matter systems are concurrent. The canonical amoebot model formalizes all amoebot communication as message passing, leveraging adversarial activation models of concurrent executions. Under this granular treatment of time, we take two complementary approaches to concurrent algorithm design. Using hexagon formation as a case study, we first establish a set of sufficient conditions for algorithm correctness under any concurrent execution, embedding concurrency control directly in algorithm design. We then present a concurrency control framework that uses locks to convert amoebot algorithms that terminate in the sequential setting and satisfy certain conventions into algorithms that exhibit equivalent behavior in the concurrent setting. Together, the canonical amoebot model and these complementary approaches to concurrent algorithm design open new directions for distributed computing research on programmable matter.

Proceedings ArticleDOI
01 Jul 2021
TL;DR: FastBlock as discussed by the authors utilizes symbolic execution to identify minimal atomic sections in each transaction and guarantees the atomicity of these sections in execution step via an efficient concurrency control mechanism-hardware transactional memory (HTM).
Abstract: The efficiency of block lifecycle determines the performance of blockchain, which is critically affected by the execution, mining and validation steps in blockchain lifecycle. To accelerate blockchains, many works focus on optimizing the mining step while ignoring other steps. In this paper, we propose a novel blockchain framework-FastBlock to speed up the execution and validation steps by introducing efficient concurrency. To efficiently prevent the potential concurrency violations, FastBlock utilizes symbolic execution to identify minimal atomic sections in each transaction and guarantees the atomicity of these sections in execution step via an efficient concurrency control mechanism-hardware transactional memory (HTM). To enable a deterministic validation step, FastBlock concurrently re-executes transactions based on a happen-before graph without increasing block size. Finally, we implement FastBlock and evaluate it in terms of conflicting transactions rate, number of transactions per block, and varying thread number. Our results indicate that FastBlock is efficient: the execution step and validation step speed up to 3.0x and 2.3x on average over the original serial model respectively with eight concurrent threads.

Proceedings ArticleDOI
TL;DR: In this article, the authors present a sound and fully automated schema refactoring procedure that transforms a program's data layout, rather than its concurrency control logic, to eliminate statically identified concurrency bugs, allowing more transactions to be safely executed under weaker and more performant database guarantees.
Abstract: Serializability is a well-understood concurrency control mechanism that eases reasoning about highly-concurrent database programs. Unfortunately, enforcing serializability has a high-performance cost, especially on geographically distributed database clusters. Consequently, many databases allow programmers to choose when a transaction must be executed under serializability, with the expectation that transactions would only be so marked when necessary to avoid serious concurrency bugs. However, this is a significant burden to impose on developers, requiring them to (a) reason about subtle concurrent interactions among potentially interfering transactions, (b) determine when such interactions would violate desired invariants, and (c) then identify the minimum number of transactions whose executions should be serialized to prevent these violations. To mitigate this burden, in this paper we present a sound and fully automated schema refactoring procedure that transforms a program's data layout -- rather than its concurrency control logic -- to eliminate statically identified concurrency bugs, allowing more transactions to be safely executed under weaker and more performant database guarantees. Experimental results over a range of database benchmarks indicate that our approach is highly effective in eliminating concurrency bugs, with safe refactored programs showing an average of 120% higher throughput and 45% lower latency compared to the baselines.

Journal ArticleDOI
Zhuo Ren1, Yu Gu1, Chuanwen Li1, Fangfang Li1, Ge Yu1 
TL;DR: GHSH as mentioned in this paper is a fully concurrent dynamic hyperspace hash table for GPU that uses atomic operations instead of locking to make the approach highly parallel and lock-free, which can accelerate processing queries of some secondary attributes in addition to just primary keys.
Abstract: Hyperspace hashing which is often applied to NoSQL data-bases builds indexes by mapping objects with multiple attributes to a multidimensional space. It can accelerate processing queries of some secondary attributes in addition to just primary keys. In recent years, the rich computing resources of GPU provide opportunities for implementing high-performance HyperSpace Hash. In this study, we construct a fully concurrent dynamic hyperspace hash table for GPU. By using atomic operations instead of locking, we make our approach highly parallel and lock-free. We propose a special concurrency control strategy that ensures wait-free read operations. Our data structure is designed considering GPU specific hardware characteristics. We also propose a warp-level pre-combinations data sharing strategy to obtain high parallel acceleration. Experiments on an Nvidia RTX2080Ti GPU suggest that GHSH performs about 20-100X faster than its counterpart on CPU. Specifically, GHSH performs updates with up to 396 M updates/s and processes search queries with up to 995 M queries/s. Compared to other GPU hashes that cannot conduct queries on non-key attributes, GHSH demonstrates comparable building and retrieval performance.

Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the authors have proposed a concurrency control mechanism for Hadoop file system using IPFS, which allows multiple users to access the same file without getting any distortion in the content of that file.
Abstract: A distributed file system is used to store and share files in a peer-to-peer network using the InterPlanetary File System (IPFS) protocol. In a distributed file system, multiple central servers can save the files which will be accessed by various remote clients with proper authorization rights within the network. Nowadays, the amount of data getting generated each minute is huge and can be accessed by multiple users. This creates a problem in managing, accessing and executing data and finally leading to concurrency control issues. Concurrency control is a process of simultaneously managing the execution of data in a shared database and ensures the serializability of data for multiple users. Thus, in the context of Hadoop, if multiple clients want to write an updated data in the HDFS file system then the protocol that needs to be followed to make sure that the write done by one client does not influence the computation performed by the other client. This paper elaborates how multiple users can access the same file without getting any distortion in the content of that file. It also provides a theoretical solution to handle the concurrency control problem in Hadoop. The solution discussed in this paper is to implement Hadoop’s Java-based Filesystem interface for the decentralised, peer-to-peer file system using IPFS. The proposed interface will allow the Hadoop MapReduce functions to be directly performed on data files hosted on IPFS.

Posted Content
TL;DR: In this article, the authors propose Reinshard, a new blockchain that inherits the properties of hybrid consensus for optimal sharding, where the hybrid consensus is attained through Verifiable Delay Function (VDF).
Abstract: Decentralized control, low-complexity, flexible and efficient communications are the requirements of an architecture that aims to scale blockchains beyond the current state. Such properties are attainable by reducing ledger size and providing parallel operations in the blockchain. Sharding is one of the approaches that lower the burden of the nodes and enhance performance. However, the current solutions lack the features for resolving concurrency during cross-shard communications. With multiple participants belonging to different shards, handling concurrent operations is essential for optimal sharding. This issue becomes prominent due to the lack of architectural support and requires additional consensus for cross-shard communications. Inspired by hybrid Proof-of-Work/Proof-of-Stake (PoW/PoS), like Ethereum, hybrid consensus and 2-hop blockchain, we propose Reinshard, a new blockchain that inherits the properties of hybrid consensus for optimal sharding. Reinshard uses PoW and PoS chain-pairs with PoS sub-chains for all the valid chain-pairs where the hybrid consensus is attained through Verifiable Delay Function (VDF). Our architecture provides a secure method of arranging nodes in shards and resolves concurrency conflicts using the delay factor of VDF. The applicability of Reinshard is demonstrated through security and experimental evaluations. A practical concurrency problem is considered to show the efficacy of Reinshard in providing optimal sharding.

Proceedings Article
01 Jan 2021
TL;DR: PolyJuice as discussed by the authors proposes a learning-based framework that explicitly optimizes concurrency control via offline training to maximize performance by searching in a policy space of fine-grained actions.
Abstract: Concurrency control algorithms are key determinants of the performance of in-memory databases. Existing algorithms are designed to work well for certain workloads. For example, optimistic concurrency control (OCC) is better than two-phase-locking (2PL) under low contention, while the converse is true under high contention. To adapt to different workloads, prior works mix or switch between a few known algorithms using manual insights or simple heuristics. We propose a learning-based framework that instead explicitly optimizes concurrency control via offline training to maximize performance. Instead of choosing among a small number of known algorithms, our approach searches in a "policy space" of fine-grained actions, resulting in novel algorithms that can outperform existing algorithms by specializing to a given workload. We build Polyjuice based on our learning framework and evaluate it against several existing algorithms. Under different configurations of TPC-C and TPC-E, Polyjuice can achieve throughput numbers higher than the best of existing algorithms by 15% to 56%.