Home
/
Authors
/
Elizabeth O'Neil

Author

Elizabeth O'Neil

Other affiliations: University of Massachusetts Amherst

Bio: Elizabeth O'Neil is an academic researcher from University of Massachusetts Boston. The author has contributed to research in topics: Snapshot isolation & Isolation (database systems). The author has an hindex of 18, co-authored 26 publications receiving 6414 citations. Previous affiliations of Elizabeth O'Neil include University of Massachusetts Amherst.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The log-structured merge-tree (LSM-tree)

[...]

Patrick O'Neil¹, Edward C. Cheng, Dieter Gawlick², Elizabeth O'Neil¹•Institutions (2)

University of Massachusetts Boston¹, Oracle Corporation²

01 Jun 1996-Acta Informatica

TL;DR: The log-structured mergetree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts (and deletes) over an extended period.

...read moreread less

Abstract: High-performance transaction system applications typically insert rows in a History table to provide an activity trace; at the same time the transaction system generates log records for purposes of system recovery. Both types of generated information can benefit from efficient indexing. An example in a well-known setting is the TPC-A benchmark application, modified to support efficient queries on the history for account activity for specific accounts. This requires an index by account-id on the fast-growing History table. Unfortunately, standard disk-based index structures such as the B-tree will effectively double the I/O cost of the transaction to maintain an index such as this in real time, increasing the total system cost up to fifty percent. Clearly a method for maintaining a real-time index at low cost is desirable. The log-structured mergetree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts (and deletes) over an extended period. The LSM-tree uses an algorithm that defers and batches index changes, cascading the changes from a memory-based component through one or more disk components in an efficient manner reminiscent of merge sort. During this process all index values are continuously accessible to retrievals (aside from very short locking periods), either through the memory component or one of the disk components. The algorithm has greatly reduced disk arm movements compared to a traditional access methods such as B-trees, and will improve cost-performance in domains where disk arm costs for inserts with traditional access methods overwhelm storage media costs. The LSM-tree approach also generalizes to operations other than insert and delete. However, indexed finds requiring immediate response will lose I/O efficiency in some cases, so the LSM-tree is most useful in applications where index inserts are more common than finds that retrieve the entries. This seems to be a common property for history tables and log files, for example. The conclusions of Sect. 6 compare the hybrid use of memory and disk components in the LSM-tree access method with the commonly understood advantage of the hybrid method to buffer disk pages in memory.

...read moreread less

1,097 citations

Proceedings Article•DOI•

A critique of ANSI SQL isolation levels

[...]

Hal Berenson¹, Phil Bernstein¹, Jim Gray², Jim Melton³, Elizabeth O'Neil, Patrick O'Neil - Show less +2 more•Institutions (3)

Microsoft¹, University of California, Berkeley², Sybase³

22 May 1995

TL;DR: It is shown that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered, and new phenomena that better characterize isolation types are introduced.

...read moreread less

Abstract: ANSI SQL-92 [MS, ANSI] defines Isolation Levels in terms of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantoms. This paper shows that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered. Ambiguity in the statement of the phenomena is investigated and a more formal statement is arrived at; in addition new phenomena that better characterize isolation types are introduced. Finally, an important multiversion isolation type, called Snapshot Isolation, is defined.

...read moreread less

1,086 citations

Book Chapter•DOI•

C-store: a column-oriented DBMS

[...]

Michael Stonebraker¹, Daniel J. Abadi¹, Adam Batkin², Xuedong Chen³, Mitch Cherniack², Miguel Ferreira¹, Edmond Lau¹, Amerson Lin¹, Samuel Madden¹, Elizabeth O'Neil³, Patrick O'Neil³, Alexander Rasin⁴, Nga Tran², Stan Zdonik⁴ - Show less +10 more•Institutions (4)

Massachusetts Institute of Technology¹, Brandeis University², University of Massachusetts Boston³, Brown University⁴

01 Dec 2018

TL;DR: Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products.

...read moreread less

Abstract: This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.We present preliminary performance data on a subset of TPC-H and show that the system we are building, C-Store, is substantially faster than popular commercial products. Hence, the architecture looks very encouraging.

...read moreread less

1,063 citations

Proceedings Article•DOI•

The LRU-K page replacement algorithm for database disk buffering

[...]

Elizabeth O'Neil¹, Patrick O'Neil², Gerhard Weikum²•Institutions (2)

University of Massachusetts Amherst¹, ETH Zurich²

01 Jun 1993

TL;DR: The LRU-K algorithm surpasses conventional buffering algorithms in discriminating between frequently and infrequently referenced pages, and adapts in real time to changing patterns of access.

...read moreread less

Abstract: This paper introduces a new approach to database disk buffering, called the LRU-K method The basic idea of LRU-K is to keep track of the times of the last K references to popular database pages, using this information to statistically estimate the interarrival times of references on a page by page basis Although the LRU-K approach performs optimal statistical inference under relatively standard assumptions, it is fairly simple and incurs little bookkeeping overhead As we demonstrate with simulation experiments, the LRU-K algorithm surpasses conventional buffering algorithms in discriminating between frequently and infrequently referenced pages In fact, LRU-K can approach the behavior of buffering algorithms in which page sets with known access frequencies are manually assigned to different buffer pools of specifically tuned sizes Unlike such customized buffering algorithms however, the LRU-K method is self-tuning, and does not rely on external hints about workload characteristics Furthermore, the LRU-K algorithm adapts in real time to changing patterns of access

...read moreread less

1,033 citations

Proceedings Article•

C-store: a column-oriented DBMS

[...]

Massachusetts Institute of Technology¹, Brandeis University², University of Massachusetts Boston³, Brown University⁴

30 Aug 2005

TL;DR: C-Store as mentioned in this paper is a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimised, and it uses bitmap indexes to complement B-tree structures.

...read moreread less

970 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!).

[...]

Fay W. Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Deepak Chandra, Andrew Fikes, Robert Gruber - Show less +5 more

01 Jan 2006

TL;DR: Bigtable as mentioned in this paper is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.

...read moreread less

Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this article, we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

...read moreread less

4,843 citations

Proceedings Article•DOI•

Benchmarking cloud serving systems with YCSB

[...]

Brian F. Cooper¹, Adam Silberstein¹, Erwin Tam¹, Raghu Ramakrishnan¹, Russell Sears¹ - Show less +1 more•Institutions (1)

Yahoo!¹

10 Jun 2010

TL;DR: This work presents the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems, and defines a core set of benchmarks and reports results for four widely used systems.

...read moreread less

Abstract: While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely recognized and studied, we have recently seen an explosion in the number of systems developed for cloud data serving. These newer systems address "cloud OLTP" applications, though they typically do not support ACID transactions. Examples of systems proposed for cloud serving use include BigTable, PNUTS, Cassandra, HBase, Azure, CouchDB, SimpleDB, Voldemort, and many others. Further, they are being applied to a diverse range of applications that differ considerably from traditional (e.g., TPC-C like) serving workloads. The number of emerging cloud serving systems and the wide range of proposed applications, coupled with a lack of apples-to-apples performance comparisons, makes it difficult to understand the tradeoffs between systems and the workloads for which they are suited. We present the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems. We define a core set of benchmarks and report results for four widely used systems: Cassandra, HBase, Yahoo!'s PNUTS, and a simple sharded MySQL implementation. We also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source. In this regard, a key feature of the YCSB framework/tool is that it is extensible--it supports easy definition of new workloads, in addition to making it easy to benchmark new systems.

...read moreread less

3,276 citations

Journal Article•DOI•

Bigtable: A Distributed Storage System for Structured Data

[...]

Fay W. Chang¹, Jeffrey Dean¹, Sanjay Ghemawat¹, Wilson C. Hsieh¹, Deborah A. Wallach¹, Michael Burrows¹, Tushar Deepak Chandra¹, Andrew Fikes¹, Robert E. Gruber¹ - Show less +5 more•Institutions (1)

Google¹

01 Jun 2008-ACM Transactions on Computer Systems

TL;DR: The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.

...read moreread less

3,259 citations

Proceedings Article•

In search of an understandable consensus algorithm

[...]

Diego Ongaro¹, John Ousterhout¹•Institutions (1)

Stanford University¹

19 Jun 2014

TL;DR: Raft is a consensus algorithm for managing a replicated log that separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered.

...read moreread less

Abstract: Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to (multi-)Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election, log replication, and safety, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Results from a user study demonstrate that Raft is easier for students to learn than Paxos. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety.

...read moreread less

1,811 citations

Proceedings Article•DOI•

Bigtable: a distributed storage system for structured data

[...]

Google¹

06 Nov 2006

TL;DR: Bigtable as discussed by the authors is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.

...read moreread less

Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

...read moreread less

1,523 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse