scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

11 Apr 2011-pp 195-206
TL;DR: This work presents an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data.
Abstract: The two areas of online transaction processing (OLTP) and online analytical processing (OLAP) present different challenges for database architectures. Currently, customers with high rates of mission-critical transactions have split their data into two separate systems, one database for OLTP and one so-called data warehouse for OLAP. While allowing for decent transaction rates, this separation has many disadvantages including data freshness issues due to the delay caused by only periodically initiating the Extract Transform Load-data staging and excessive resource consumption due to maintaining two separate information systems. We present an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data. HyPer is a main-memory database system that guarantees the ACID properties of OLTP transactions and executes OLAP query sessions (multiple queries) on the same, arbitrarily current and consistent snapshot. The utilization of the processor-inherent support for virtual memory management (address translation, caching, copy on update) yields both at the same time: unprecedentedly high transaction rates as high as 100000 per second and very fast OLAP query response times on a single system executing both workloads in parallel. The performance analysis is based on a combined TPC-C and TPC-H benchmark.
Citations
More filters
01 Aug 2001
TL;DR: The study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence, is concentrated on in this work.
Abstract: With digital equipment becoming increasingly networked, either on wired or wireless networks, for personal and professional use alike, distributed software systems have become a crucial element in information and communications technologies. The study of these systems forms the core of the ARLES' work, which is specifically concerned with defining new system software architectures, based on the use of emerging networking technologies. In this context, we concentrate on the study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence.

2,774 citations

Journal ArticleDOI
01 Jun 2011
TL;DR: This work presents a novel compilation strategy that translates a query into compact and efficient machine code using the LLVM compiler framework and integrates these techniques into the HyPer main memory database system and shows that this results in excellent query performance while requiring only modest compilation time.
Abstract: As main memory grows, query performance is more and more determined by the raw CPU costs of query processing itself. The classical iterator style query processing technique is very simple and exible, but shows poor performance on modern CPUs due to lack of locality and frequent instruction mispredictions. Several techniques like batch oriented processing or vectorized tuple processing have been proposed in the past to improve this situation, but even these techniques are frequently out-performed by hand-written execution plans.In this work we present a novel compilation strategy that translates a query into compact and efficient machine code using the LLVM compiler framework. By aiming at good code and data locality and predictable branch layout the resulting code frequently rivals the performance of hand-written C++ code. We integrated these techniques into the HyPer main memory database system and show that this results in excellent query performance while requiring only modest compilation time.

518 citations


Cites methods from "HyPer: A hybrid OLTP&OLAP main memo..."

  • ...As described in [5], they are derived from TPC-H queries but adapted to the combined TPC-C and TPC-H schema....

    [...]

  • ...We therefore used the TPC-CH benchmark from [5] for experiments....

    [...]

  • ...We have implemented the techniques proposed in this paper both in the HyPer main-memory database management systems [5], and in a disk-based DBMS....

    [...]

  • ...We demonstrate the impact of these techniques by integrating them into the HyPer main-memory database management system [5] and performing various comparisons with other systems....

    [...]

Proceedings ArticleDOI
22 Jun 2013
TL;DR: An overview of the design of the Hekaton engine is given and some experimental results are reported, designed for high con-currency and using only latch-free data structures and a new optimistic, multiversion concurrency control technique.
Abstract: Hekaton is a new database engine optimized for memory resident data and OLTP workloads. Hekaton is fully integrated into SQL Server; it is not a separate system. To take advantage of Hekaton, a user simply declares a table memory optimized. Hekaton tables are fully transactional and durable and accessed using T-SQL in the same way as regular SQL Server tables. A query can reference both Hekaton tables and regular tables and a transaction can update data in both types of tables. T-SQL stored procedures that reference only Hekaton tables can be compiled into machine code for further performance improvements. The engine is designed for high con-currency. To achieve this it uses only latch-free data structures and a new optimistic, multiversion concurrency control technique. This paper gives an overview of the design of the Hekaton engine and reports some experimental results.

504 citations

Journal ArticleDOI
TL;DR: This survey aims to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks.
Abstract: Growing main memory capacity has fueled the development of in-memory big data management and processing. By eliminating disk I/O bottleneck, it is now possible to support interactive data analytics. However, in-memory systems are much more sensitive to other sources of overhead that do not matter in traditional I/O-bounded disk-based systems. Some issues such as fault-tolerance and consistency are also more challenging to handle in in-memory environment. We are witnessing a revolution in the design of database systems that exploits main memory as its data storage layer. Many of these researches have focused along several dimensions: modern CPU and memory hierarchy utilization, time/space efficiency, parallelism, and concurrency control. In this survey, we aim to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks. We also give a comprehensive presentation of important technology in memory management, and some key factors that need to be considered in order to achieve efficient in-memory data management and processing.

391 citations


Additional excerpts

  • ...In this survey, we aim to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks....

    [...]

Proceedings ArticleDOI
08 Apr 2013
TL;DR: In this article, an adaptive radix tree (trie) is proposed for efficient indexing in main memory databases, which is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees.
Abstract: Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes. Even though ART's performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and prefix lookup.

372 citations


Cites background from "HyPer: A hybrid OLTP&OLAP main memo..."

  • ...Our system HyPer, for example, compiles transactions to machine code and gets rid of buffer management, locking, and latching overhead....

    [...]

  • ...This has led to very intense research and commercial activities in main-memory database systems like H-Store/VoltDB [1], SAP HANA [2], and HyPer [3]....

    [...]

  • ...One of its unique characteristics is that it very efficiently supports both transactional (OLTP) and analytical (OLAP) workloads at the same time [3]....

    [...]

References
More filters
Book
01 Feb 1987
TL;DR: In this article, the design and implementation of concurrency control and recovery mechanisms for transaction management in centralized and distributed database systems is described. But this can lead to interference between queries and updates.
Abstract: This book is an introduction to the design and implementation of concurrency control and recovery mechanisms for transaction management in centralized and distributed database systems. Concurrency control and recovery have become increasingly important as businesses rely more and more heavily on their on-line data processing activities. For high performance, the system must maximize concurrency by multiprogramming transactions. But this can lead to interference between queries and updates, which concurrency control mechanisms must avoid. In addition, a satisfactory recovery system is necessary to ensure that inevitable transaction and database system failures do not corrupt the database.

3,891 citations

Book
01 Jan 1992
TL;DR: Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk.
Abstract: From the Publisher: The key to client/server computing. Transaction processing techniques are deeply ingrained in the fields of databases and operating systems and are used to monitor, control and update information in modern computer systems. This book will show you how large, distributed, heterogeneous computer systems can be made to work reliably. Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk. The authors provide detailed explanations of why various problems occur as well as practical, usable techniques for their solution. Throughout the book, examples and techniques are drawn from the most successful commercial and research systems. Extensive use of compilable C code fragments demonstrates the many transaction processing algorithms presented in the book. The book will be valuable to anyone interested in implementing distributed systems or client/server architectures.

3,522 citations


"HyPer: A hybrid OLTP&OLAP main memo..." refers methods in this paper

  • ...We employ logical redo logging [35] by logging the parameters of the stored procedures that represent the transactions....

    [...]

01 Aug 2001
TL;DR: The study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence, is concentrated on in this work.
Abstract: With digital equipment becoming increasingly networked, either on wired or wireless networks, for personal and professional use alike, distributed software systems have become a crucial element in information and communications technologies. The study of these systems forms the core of the ARLES' work, which is specifically concerned with defining new system software architectures, based on the use of emerging networking technologies. In this context, we concentrate on the study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence.

2,774 citations


"HyPer: A hybrid OLTP&OLAP main memo..." refers background or methods or result in this paper

  • ...The OLTP performance of VoltDB we list for comparison was not measured on our hardware but extracted from the product overview brochure [18] and discussions on their web site [37]....

    [...]

  • ...As the VoltDB publications point out [18], these throughput numbers correspond to the very best published TPC-C results for high-scaled disk-based database configurations....

    [...]

  • ...throughput times (ms) throughput times (ms) throughput times (ms) times (ms) [18] Q1...

    [...]

  • ...VoltDB [18] is the commercialization of H-Store....

    [...]

Proceedings ArticleDOI
22 May 1995
TL;DR: It is shown that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered, and new phenomena that better characterize isolation types are introduced.
Abstract: ANSI SQL-92 [MS, ANSI] defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered. Ambiguity in the statement of the phenomena is investigated and a more formal statement is arrived at; in addition new phenomena that better characterize isolation types are introduced. Finally, an important multiversion isolation type, called Snapshot Isolation, is defined.

1,086 citations


"HyPer: A hybrid OLTP&OLAP main memo..." refers background in this paper

  • ...A variation of the multiversion synchronization is called snapshot isolation and was first described in [32]....

    [...]

Proceedings ArticleDOI
01 Jun 1984
TL;DR: This paper considers the changes necessary to permit a relational database system to take advantage of large amounts of main memory, and evaluates AVL vs B+-tree access methods, hash-based query processing strategies vs sort-merge, and study recovery issues when most or all of the database fits in main memory.
Abstract: With the availability of very large, relatively inexpensive main memories, it is becoming possible keep large databases resident in main memory In this paper we consider the changes necessary to permit a relational database system to take advantage of large amounts of main memory We evaluate AVL vs B+-tree access methods for main memory databases, hash-based query processing strategies vs sort-merge, and study recovery issues when most or all of the database fits in main memory As expected, B+-trees are the preferred storage mechanism unless more than 80--90% of the database fits in main memory A somewhat surprising result is that hash based query processing strategies are advantageous for large memory situations

922 citations


"HyPer: A hybrid OLTP&OLAP main memo..." refers background in this paper

  • ...[36] and extended in the recent paper about the so-called Aether system [25] are possible: Group commit or asynchronous commit....

    [...]