scispace - formally typeset
Search or ask a question
Book

Modern B-Tree Techniques

15 Aug 2011-
TL;DR: This tutorial of B- tree techniques will stimulate research and development of modern B-tree indexing techniques for future data management systems.
Abstract: In summary, the core design of B-trees has remained unchanged in 40 years: balanced trees, pages or other units of I/O as nodes, efficient root-to-leaf search, splitting and merging nodes, etc. On the other hand, an enormous amount of research and development has improved every aspect of B-trees including data contents such as multi-dimensional data, access algorithms such as multi-dimensional queries, data organization within each node such as compression and cache optimization, concurrency control such as separation of latching and locking, recovery such as multi-level recovery, etc. Gray and Reuter believed in 1993 that “B-trees are by far the most important access path structure in database and file systems.” It seems that this statement remains true today. B-tree indexes are likely to gain new importance in relational databases due to the advent of flash storage. Fast access latencies permit many more random I/O operations than traditional disk storage, thus shifting the break-even point between a full-bandwidth scan and a B-tree index search, even if the scan has the benefit of columnar database storage. We hope that this tutorial of B-tree techniques will stimulate research and development of modern B-tree indexing techniques for future data management systems.
Citations
More filters
Proceedings ArticleDOI
14 Jun 2016
TL;DR: A novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts and a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory is proposed.
Abstract: The advent of Storage Class Memory (SCM) is driving a rethink of storage systems towards a single-level architecture where memory and storage are merged. In this context, several works have investigated how to design persistent trees in SCM as a fundamental building block for these novel systems. However, these trees are significantly slower than DRAM-based counterparts since trees are latency-sensitive and SCM exhibits higher latencies than DRAM. In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. In this novel design, leaf nodes are persisted in SCM while inner nodes are placed in DRAM and rebuilt upon recovery. The FPTree uses Fingerprinting, a technique that limits the expected number of in-leaf probed keys to one. In addition, we propose a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory. We conduct a thorough performance evaluation and show that the FPTree outperforms state-of-the-art persistent trees with different SCM latencies by up to a factor of 8.2. Moreover, we show that the FPTree scales very well on a machine with 88 logical cores. Finally, we integrate the evaluated trees in memcached and a prototype database. We show that the FPTree incurs an almost negligible performance overhead over using fully transient data structures, while significantly outperforming other persistent trees.

281 citations


Cites background from "Modern B-Tree Techniques"

  • ...We consider only the case of unique keys, which is often an acceptable assumption in practice [15]....

    [...]

Journal ArticleDOI
TL;DR: The design and implementation of modern column-oriented database systems can be found in this paper, with a specific focus on three influential research prototypes, MonetDB, C-Store, and X100, which form the basis for several well-known commercial column-store implementations.
Abstract: Database system performance is directly related to the efficiency of the system at storing data on primary storage (for example, disk) and moving it into CPU registers for processing. For this reason, there is a long history in the database community of research exploring physical storage alternatives, including sophisticated indexing, materialized views, and vertical and horizontal partitioning. In recent years, there has been renewed interest in so-called column-oriented systems, sometimes also called column-stores. Column-store systems completely vertically partition a database into a collection of individual columns that are stored separately. By storing each column separately on disk, these column-based systems enable queries to readjust the attributes they need, rather than having to read entire rows from disk and discard unneeded attributes once they are in memory. The Design and Implementation of Modern Column-Oriented Database Systems discusses modern column-stores, their architecture and evolution as well the benefits they can bring in data analytics. There is a specific focus on three influential research prototypes, MonetDB, MonetDB/X100, and C-Store. These systems have formed the basis for several well-known commercial column-store implementations. Their similarities and differences are described and they are discussed in terms of their specific architectural features for compression, late materialization, join processing, vectorization and adaptive indexing (database cracking). The Design and Implementation of Modern Column-Oriented Database Systems is an excellent reference on the topic for database researchers and practitioners.

190 citations

Proceedings ArticleDOI
TL;DR: A new learned index called ALEX is presented which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes and effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint.
Abstract: Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+Trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.

121 citations


Cites methods from "Modern B-Tree Techniques"

  • ...sitionsthatare•lledby elements. A node is full if the next insert results in exceeding d u. By default we setd l =0:6 andd u =0:8 to achieve average data storage utilization of 0:7, similar to B+Tree [13], which in our experience always produces good results and did not need to be tuned.In contrast, B+Tree nodes typically have d l =0:5 andd u =1.Section 5 presents a theoretical analysis of how the den...

    [...]

  • ...node. „e data size of B+Tree is the sum of the sizes of all leaf nodes. At initialization, the GappedArraysindatanodesaresettohave70%spaceutilization, comparable to B+Tree leaf node space utilization [13]. 6.1.1 Datasets. We run all experiments using 8-byte keys from some dataset and randomly generated •xed-size payloads. We evaluate ALEX on 4 datasets, whose characteristics and CDFs are shown in Tabl...

    [...]

Proceedings Article
01 Jan 2016
TL;DR: It is conjecture that when optimizing the read-update-memory overheads, optimizing in any two areas negatively impacts the third, and the RUM Conjecture is conjectured, which manifests in state of the art access methods, and is envisioned a trend toward RUMaware access methods for future data systems.
Abstract: The database research community has been building methods to store, access, and update data for more than four decades. Throughout the evolution of the structures and techniques used to access data, access methods adapt to the ever changing hardware and workload requirements. Today, even small changes in the workload or the hardware lead to a redesign of access methods. The need for new designs has been increasing as data generation and workload diversification grow exponentially, and hardware advances introduce increased complexity. New workload requirements are introduced by the emergence of new applications, and data is managed by large systems composed of more and more complex and heterogeneous hardware. As a result, it is increasingly important to develop application-aware and hardware-aware access methods. The fundamental challenges that every researcher, systems architect, or designer faces when designing a new access method are how to minimize, i) read times (R), ii) update cost (U), and iii) memory (or storage) overhead (M). In this paper, we conjecture that when optimizing the read-update-memory overheads, optimizing in any two areas negatively impacts the third. We present a simple model of the RUM overheads, and we articulate the RUM Conjecture. We show how the RUM Conjecture manifests in stateof-the-art access methods, and we envision a trend toward RUMaware access methods for future data systems.

90 citations


Cites background from "Modern B-Tree Techniques"

  • ...Examples include indexes with constant time access such as hash-based indexes or logarithmic time structures such as B-Trees [22], Tries [19], Prefix B-Trees [9], and Skiplists [45]....

    [...]

  • ...Notable proposals are Database Cracking [31, 32, 33, 48], Adaptive Merging [22, 25], and Adaptive Indexing [23, 24, 26, 34], which balance the read performance versus the overhead of creating an index....

    [...]

Proceedings ArticleDOI
27 May 2018
TL;DR: This work presents the Height Optimized Trie, a fast and space-efficient in-memory index structure that outperforms other state-of-the-art index structures for string keys both in terms of search performance and memory footprint, while being competitive for integer keys.
Abstract: We present the Height Optimized Trie (HOT), a fast and space-efficient in-memory index structure. The core algorithmic idea of HOT is to dynamically vary the number of bits considered at each node, which enables a consistently high fanout and thereby good cache efficiency. The layout of each node is carefully engineered for compactness and fast search using SIMD instructions. Our experimental results, which use a wide variety of workloads and data sets, show that HOT outperforms other state-of-the-art index structures for string keys both in terms of search performance and memory footprint, while being competitive for integer keys. We believe that these properties make HOT highly useful as a general-purpose index structure for main-memory databases.

75 citations


Cites background from "Modern B-Tree Techniques"

  • ...Traditionally, index structures used fine-grained locking and lock coupling to provide concurrent accesses to index structures [7, 8]....

    [...]

References
More filters
Proceedings ArticleDOI
01 Jun 1984
TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.
Abstract: In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, traditional indexing methods are not well suited to data objects of non-zero size located m multi-dimensional spaces In this paper we describe a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it. We present the results of a series of tests which indicate that the structure performs well, and conclude that it is useful for current database systems in spatial applications

7,336 citations


"Modern B-Tree Techniques" refers methods in this paper

  • ...For example, more and more efficient strategies for construction and bulk loading of R-tree indexes have been forthcoming over a long time [23, 62] compared to simply sorting for efficient construction of a B-tree index, which also applies to B-tree indexes adapted to multiple dimensions [6, 109]....

    [...]

Proceedings Article
01 Jan 2006
TL;DR: Bigtable as mentioned in this paper is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.
Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

4,843 citations

Proceedings ArticleDOI
14 Oct 2007
TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Abstract: Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems.This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

4,349 citations


"Modern B-Tree Techniques" refers background in this paper

  • ..., the topics of the present section and the following sections, differentiate traditional database management systems from key-value stores now employed in various web services and their implementations [21, 29] Implicit in this section is that B-tree structures can support not only read-only searches but also — concurrently — updates including insertions, deletions, and modifications of existing records, both of key and nonkey fields....

    [...]

  • ...Nonetheless, many of the techniques are readily applicable or at least transferable to other possible application domains of B-trees, in particular to information retrieval [83], file systems [71], and “No SQL” databases and key-value stores recently popularized for web services and cloud computing [21, 29]....

    [...]

Book
01 Jan 1992
TL;DR: Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk.
Abstract: From the Publisher: The key to client/server computing. Transaction processing techniques are deeply ingrained in the fields of databases and operating systems and are used to monitor, control and update information in modern computer systems. This book will show you how large, distributed, heterogeneous computer systems can be made to work reliably. Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk. The authors provide detailed explanations of why various problems occur as well as practical, usable techniques for their solution. Throughout the book, examples and techniques are drawn from the most successful commercial and research systems. Extensive use of compilable C code fragments demonstrates the many transaction processing algorithms presented in the book. The book will be valuable to anyone interested in implementing distributed systems or client/server architectures.

3,522 citations


"Modern B-Tree Techniques" refers background in this paper

  • ...The present survey goes beyond the “classic” B-tree references [7, 8, 27, 59] in multiple ways....

    [...]

  • ...In other words, changing 20 bytes in a page of 8 KB required writing 16 KB to the recovery log, plus appropriate record headers, which are fairly large for log records [59]....

    [...]

  • ...Gray and Reuter asserted that “B-trees are by far the most important access path structure in database and file systems” [59]....

    [...]

  • ...Thus, the latter design is more effective at preventing deadlocks [59] even if it introduces an asymmetry in the lock matrix....

    [...]

  • ...On the other hand, an enormous amount of research and development has improved every aspect of Btrees including data contents such as multi-dimensional data, access algorithms such as multi-dimensional queries, data organization within each node such as compression and cache optimization, concurrency control such as separation of latching and locking, recovery such as multi-level recovery, etc. Gray and Reuter believed in 1993 that “B-trees are by far the most important access path structure in database and file systems.”...

    [...]

Journal ArticleDOI
TL;DR: The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.
Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

3,259 citations


"Modern B-Tree Techniques" refers background in this paper

  • ..., the topics of the present section and the following sections, differentiate traditional database management systems from key-value stores now employed in various web services and their implementations [21, 29] Implicit in this section is that B-tree structures can support not only read-only searches but also — concurrently — updates including insertions, deletions, and modifications of existing records, both of key and nonkey fields....

    [...]

  • ...Nonetheless, many of the techniques are readily applicable or at least transferable to other possible application domains of B-trees, in particular to information retrieval [83], file systems [71], and “No SQL” databases and key-value stores recently popularized for web services and cloud computing [21, 29]....

    [...]