HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

doi:10.1109/ICDE.2011.5767867

Home
/
Papers
/
HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

Proceedings Article•DOI•

HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots

Alfons Kemper¹, Thomas Neumann¹•Institutions (1)

Technische Universität München¹

11 Apr 2011-pp 195-206

TL;DR: This work presents an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data.

read less

Abstract: The two areas of online transaction processing (OLTP) and online analytical processing (OLAP) present different challenges for database architectures. Currently, customers with high rates of mission-critical transactions have split their data into two separate systems, one database for OLTP and one so-called data warehouse for OLAP. While allowing for decent transaction rates, this separation has many disadvantages including data freshness issues due to the delay caused by only periodically initiating the Extract Transform Load-data staging and excessive resource consumption due to maintaining two separate information systems. We present an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data. HyPer is a main-memory database system that guarantees the ACID properties of OLTP transactions and executes OLAP query sessions (multiple queries) on the same, arbitrarily current and consistent snapshot. The utilization of the processor-inherent support for virtual memory management (address translation, caching, copy on update) yields both at the same time: unprecedentedly high transaction rates as high as 100000 per second and very fast OLAP query response times on a single system executing both workloads in parallel. The performance analysis is based on a combined TPC-C and TPC-H benchmark.

...read moreread less

Citations

PDF

Open Access

More filters

반도체 공정 overview

[...]

서정헌

01 Aug 2001

TL;DR: The study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence, is concentrated on in this work.

...read moreread less

Abstract: With digital equipment becoming increasingly networked, either on wired or wireless networks, for personal and professional use alike, distributed software systems have become a crucial element in information and communications technologies. The study of these systems forms the core of the ARLES' work, which is specifically concerned with defining new system software architectures, based on the use of emerging networking technologies. In this context, we concentrate on the study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence.

...read moreread less

2,774 citations

Journal Article•DOI•

Efficiently compiling efficient query plans for modern hardware

[...]

Thomas Neumann¹•Institutions (1)

Technische Universität München¹

01 Jun 2011

TL;DR: This work presents a novel compilation strategy that translates a query into compact and efficient machine code using the LLVM compiler framework and integrates these techniques into the HyPer main memory database system and shows that this results in excellent query performance while requiring only modest compilation time.

...read moreread less

Abstract: As main memory grows, query performance is more and more determined by the raw CPU costs of query processing itself. The classical iterator style query processing technique is very simple and exible, but shows poor performance on modern CPUs due to lack of locality and frequent instruction mispredictions. Several techniques like batch oriented processing or vectorized tuple processing have been proposed in the past to improve this situation, but even these techniques are frequently out-performed by hand-written execution plans.In this work we present a novel compilation strategy that translates a query into compact and efficient machine code using the LLVM compiler framework. By aiming at good code and data locality and predictable branch layout the resulting code frequently rivals the performance of hand-written C++ code. We integrated these techniques into the HyPer main memory database system and show that this results in excellent query performance while requiring only modest compilation time.

...read moreread less

518 citations

Cites methods from "HyPer: A hybrid OLTP&OLAP main memo..."

...As described in [5], they are derived from TPC-H queries but adapted to the combined TPC-C and TPC-H schema....
[...]
...We therefore used the TPC-CH benchmark from [5] for experiments....
[...]
...We have implemented the techniques proposed in this paper both in the HyPer main-memory database management systems [5], and in a disk-based DBMS....
[...]
...We demonstrate the impact of these techniques by integrating them into the HyPer main-memory database management system [5] and performing various comparisons with other systems....
[...]

Proceedings Article•DOI•

Hekaton: SQL server's memory-optimized OLTP engine

[...]

Cristian Diaconu¹, Craig Steven Freedman¹, Erik Ismert¹, Per-Ake Larson¹, Pravin Mittal¹, Ryan L. Stonecipher¹, Nitin Verma¹, Michael James Zwilling¹ - Show less +4 more•Institutions (1)

Microsoft¹

22 Jun 2013

TL;DR: An overview of the design of the Hekaton engine is given and some experimental results are reported, designed for high con-currency and using only latch-free data structures and a new optimistic, multiversion concurrency control technique.

...read moreread less

Abstract: Hekaton is a new database engine optimized for memory resident data and OLTP workloads. Hekaton is fully integrated into SQL Server; it is not a separate system. To take advantage of Hekaton, a user simply declares a table memory optimized. Hekaton tables are fully transactional and durable and accessed using T-SQL in the same way as regular SQL Server tables. A query can reference both Hekaton tables and regular tables and a transaction can update data in both types of tables. T-SQL stored procedures that reference only Hekaton tables can be compiled into machine code for further performance improvements. The engine is designed for high con-currency. To achieve this it uses only latch-free data structures and a new optimistic, multiversion concurrency control technique. This paper gives an overview of the design of the Hekaton engine and reports some experimental results.

...read moreread less

504 citations

Journal Article•DOI•

In-Memory Big Data Management and Processing: A Survey

[...]

Hao Zhang¹, Gang Chen², Beng Chin Ooi¹, Kian-Lee Tan¹, Meihui Zhang³ - Show less +1 more•Institutions (3)

National University of Singapore¹, Zhejiang University², Singapore University of Technology and Design³

01 Jul 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This survey aims to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks.

...read moreread less

Abstract: Growing main memory capacity has fueled the development of in-memory big data management and processing. By eliminating disk I/O bottleneck, it is now possible to support interactive data analytics. However, in-memory systems are much more sensitive to other sources of overhead that do not matter in traditional I/O-bounded disk-based systems. Some issues such as fault-tolerance and consistency are also more challenging to handle in in-memory environment. We are witnessing a revolution in the design of database systems that exploits main memory as its data storage layer. Many of these researches have focused along several dimensions: modern CPU and memory hierarchy utilization, time/space efficiency, parallelism, and concurrency control. In this survey, we aim to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks. We also give a comprehensive presentation of important technology in memory management, and some key factors that need to be considered in order to achieve efficient in-memory data management and processing.

...read moreread less

391 citations

Additional excerpts

...In this survey, we aim to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks....
[...]

Proceedings Article•DOI•

The adaptive radix tree: ARTful indexing for main-memory databases

[...]

Viktor Leis¹, Alfons Kemper¹, Thomas Neumann¹•Institutions (1)

Information Technology University¹

08 Apr 2013

TL;DR: In this article, an adaptive radix tree (trie) is proposed for efficient indexing in main memory databases, which is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees.

...read moreread less

Abstract: Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes. Even though ART's performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and prefix lookup.

...read moreread less

372 citations

Cites background from "HyPer: A hybrid OLTP&OLAP main memo..."

...Our system HyPer, for example, compiles transactions to machine code and gets rid of buffer management, locking, and latching overhead....
[...]
...This has led to very intense research and commercial activities in main-memory database systems like H-Store/VoltDB [1], SAP HANA [2], and HyPer [3]....
[...]
...One of its unique characteristics is that it very efficiently supports both transactional (OLTP) and analytical (OLAP) workloads at the same time [3]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135

Collapse

References

PDF

Open Access

More filters

Book•

Concurrency Control and Recovery in Database Systems

[...]

Philip A. Bernstein¹, Vassco Hadzilacos², Nathan Goodman•Institutions (2)

Wang Institute of Graduate Studies¹, University of Toronto²

01 Feb 1987

TL;DR: In this article, the design and implementation of concurrency control and recovery mechanisms for transaction management in centralized and distributed database systems is described. But this can lead to interference between queries and updates.

...read moreread less

Abstract: This book is an introduction to the design and implementation of concurrency control and recovery mechanisms for transaction management in centralized and distributed database systems. Concurrency control and recovery have become increasingly important as businesses rely more and more heavily on their on-line data processing activities. For high performance, the system must maximize concurrency by multiprogramming transactions. But this can lead to interference between queries and updates, which concurrency control mechanisms must avoid. In addition, a satisfactory recovery system is necessary to ensure that inevitable transaction and database system failures do not corrupt the database.

...read moreread less

3,891 citations

Book•

Transaction Processing: Concepts and Techniques

[...]

Jim Gray, Andreas Reuter

01 Jan 1992

TL;DR: Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk.

...read moreread less

Abstract: From the Publisher: The key to client/server computing. Transaction processing techniques are deeply ingrained in the fields of databases and operating systems and are used to monitor, control and update information in modern computer systems. This book will show you how large, distributed, heterogeneous computer systems can be made to work reliably. Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk. The authors provide detailed explanations of why various problems occur as well as practical, usable techniques for their solution. Throughout the book, examples and techniques are drawn from the most successful commercial and research systems. Extensive use of compilable C code fragments demonstrates the many transaction processing algorithms presented in the book. The book will be valuable to anyone interested in implementing distributed systems or client/server architectures.

...read moreread less

3,522 citations

"HyPer: A hybrid OLTP&OLAP main memo..." refers methods in this paper

...We employ logical redo logging [35] by logging the parameters of the stored procedures that represent the transactions....
[...]

반도체 공정 overview

[...]

서정헌

01 Aug 2001

TL;DR: The study of distributed systems which bring to life the vision of ubiquitous computing systems, also known as ambient intelligence, is concentrated on in this work.

...read moreread less

2,774 citations

"HyPer: A hybrid OLTP&OLAP main memo..." refers background or methods or result in this paper

...The OLTP performance of VoltDB we list for comparison was not measured on our hardware but extracted from the product overview brochure [18] and discussions on their web site [37]....
[...]
...As the VoltDB publications point out [18], these throughput numbers correspond to the very best published TPC-C results for high-scaled disk-based database configurations....
[...]
...throughput times (ms) throughput times (ms) throughput times (ms) times (ms) [18] Q1...
[...]
...VoltDB [18] is the commercialization of H-Store....
[...]

Proceedings Article•DOI•

A critique of ANSI SQL isolation levels

[...]

Hal Berenson¹, Phil Bernstein¹, Jim Gray², Jim Melton³, Elizabeth O'Neil, Patrick O'Neil - Show less +2 more•Institutions (3)

Microsoft¹, University of California, Berkeley², Sybase³

22 May 1995

TL;DR: It is shown that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered, and new phenomena that better characterize isolation types are introduced.

...read moreread less

Abstract: ANSI SQL-92 [MS, ANSI] defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered. Ambiguity in the statement of the phenomena is investigated and a more formal statement is arrived at; in addition new phenomena that better characterize isolation types are introduced. Finally, an important multiversion isolation type, called Snapshot Isolation, is defined.

...read moreread less

1,086 citations

"HyPer: A hybrid OLTP&OLAP main memo..." refers background in this paper

...A variation of the multiversion synchronization is called snapshot isolation and was first described in [32]....
[...]

Proceedings Article•DOI•

Implementation techniques for main memory database systems

[...]

David J. DeWitt¹, Randy H. Katz², Frank Olken³, Leonard D. Shapiro⁴, Michael Stonebraker², Darien Wood² - Show less +2 more•Institutions (4)

University of Wisconsin-Madison¹, University of California, Berkeley², Lawrence Berkeley National Laboratory³, North Dakota State University⁴

01 Jun 1984

TL;DR: This paper considers the changes necessary to permit a relational database system to take advantage of large amounts of main memory, and evaluates AVL vs B+-tree access methods, hash-based query processing strategies vs sort-merge, and study recovery issues when most or all of the database fits in main memory.

...read moreread less

Abstract: With the availability of very large, relatively inexpensive main memories, it is becoming possible keep large databases resident in main memory In this paper we consider the changes necessary to permit a relational database system to take advantage of large amounts of main memory We evaluate AVL vs B+-tree access methods for main memory databases, hash-based query processing strategies vs sort-merge, and study recovery issues when most or all of the database fits in main memory As expected, B+-trees are the preferred storage mechanism unless more than 80--90% of the database fits in main memory A somewhat surprising result is that hash based query processing strategies are advantageous for large memory situations

...read moreread less

922 citations