Home
/
Authors
/
Aaron J. Elmore

Author

Aaron J. Elmore

Other affiliations: University of California, Santa Barbara, University of Illinois at Chicago

Bio: Aaron J. Elmore is an academic researcher from University of Chicago. The author has contributed to research in topics: Data management & Relational database. The author has an hindex of 22, co-authored 79 publications receiving 1974 citations. Previous affiliations of Aaron J. Elmore include University of California, Santa Barbara & University of Illinois at Chicago.

Topics: Data management, Relational database, Big data, Software versioning, Scalability ...read more

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Zephyr: live migration in shared nothing databases for elastic cloud platforms

[...]

Aaron J. Elmore¹, Sudipto Das¹, Divyakant Agrawal¹, Amr El Abbadi¹•Institutions (1)

University of California, Santa Barbara¹

12 Jun 2011

TL;DR: Zephyr is proposed, a technique to efficiently migrate a live database in a shared nothing transactional database architecture that uses phases of on-demand pull and asynchronous push of data, requires minimal synchronization, and provides ACID guarantees during migration and ensures correctness in the presence of failures.

...read moreread less

Abstract: Multitenant data infrastructures for large cloud platforms hosting hundreds of thousands of applications face the challenge of serving applications characterized by small data footprint and unpredictable load patterns. When such a platform is built on an elastic pay-per-use infrastructure, an added challenge is to minimize the system's operating cost while guaranteeing the tenants' service level agreements (SLA). Elastic load balancing is therefore an important feature to enable scale-up during high load while scaling down when the load is low. Live migration, a technique to migrate tenants with minimal service interruption and no downtime, is critical to allow lightweight elastic scaling. We focus on the problem of live migration in the database layer. We propose Zephyr, a technique to efficiently migrate a live database in a shared nothing transactional database architecture. Zephyr uses phases of on-demand pull and asynchronous push of data, requires minimal synchronization, results no service unavailability and few or no aborted transactions, minimizes the data transfer overhead, provides ACID guarantees during migration, and ensures correctness in the presence of failures. We outline a prototype implementation using an open source relational database engine and an present a thorough evaluation using various transactional workloads. Zephyr's efficiency is evident from the few tens of failed operations, 10-20% change in average transaction latency, minimal messaging, and no overhead during normal operation when migrating a live database.

...read moreread less

264 citations

Journal Article•DOI•

The BigDAWG Polystore System

[...]

Jennie Duggan¹, Aaron J. Elmore², Michael Stonebraker³, Magda Balazinska⁴, Bill Howe⁴, Jeremy Kepner³, Samuel Madden³, David Maier⁵, Timothy G. Mattson⁶, Stan Zdonik⁷ - Show less +6 more•Institutions (7)

Northwestern University¹, University of Chicago², Massachusetts Institute of Technology³, University of Washington⁴, Portland State University⁵, Intel⁶, Brown University⁷

12 Aug 2015

TL;DR: In this paper, a new view of federated databases is presented to address the growing need for managing information that spans multiple data models. And the authors propose a polystore architecture, which is designed to unify querying over multiple models.

...read moreread less

Abstract: This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models. This trend is fueled by the proliferation of storage engines and query languages based on the observation that 'no one size fits all'. To address this shift, we propose a polystore architecture; it is designed to unify querying over multiple data models. We consider the challenges and opportunities associated with polystores. Open questions in this space revolve around query optimization and the assignment of objects to storage engines. We introduce our approach to these topics and discuss our prototype in the context of the Intel Science and Technology Center for Big Data

...read moreread less

244 citations

Journal Article•DOI•

E-store: fine-grained elastic partitioning for distributed transaction processing systems

[...]

Rebecca Taft¹, Essam Mansour², Marco Serafini², Jennie Duggan³, Aaron J. Elmore⁴, Ashraf Aboulnaga², Andrew Pavlo⁵, Michael Stonebraker¹ - Show less +4 more•Institutions (5)

Massachusetts Institute of Technology¹, Qatar Computing Research Institute², Northwestern University³, University of Chicago⁴, Carnegie Mellon University⁵

01 Nov 2014

TL;DR: E-Store is presented, an elastic partitioning framework for distributed OLTP DBMSs that automatically scales resources in response to demand spikes, periodic events, and gradual changes in an application's workload.

...read moreread less

Abstract: On-line transaction processing (OLTP) database management systems (DBMSs) often serve time-varying workloads due to daily, weekly or seasonal fluctuations in demand, or because of rapid growth in demand due to a company's business success. In addition, many OLTP workloads are heavily skewed to "hot" tuples or ranges of tuples. For example, the majority of NYSE volume involves only 40 stocks. To deal with such fluctuations, an OLTP DBMS needs to be elastic; that is, it must be able to expand and contract resources in response to load fluctuations and dynamically balance load as hot tuples vary over time.This paper presents E-Store, an elastic partitioning framework for distributed OLTP DBMSs. It automatically scales resources in response to demand spikes, periodic events, and gradual changes in an application's workload. E-Store addresses localized bottlenecks through a two-tier data placement strategy: cold data is distributed in large chunks, while smaller ranges of hot tuples are assigned explicitly to individual nodes. This is in contrast to traditional single-tier hash and range partitioning strategies. Our experimental evaluation of E-Store shows the viability of our approach and its efficacy under variations in load across a cluster of machines. Compared to single-tier approaches, E-Store improves throughput by up to 130% while reducing latency by 80%.

...read moreread less

172 citations

Posted Content•

DataHub: Collaborative Data Science & Dataset Version Management at Scale

[...]

Anant Bhardwaj¹, Souvik Bhattacherjee², Amit Chavan², Amol Deshpande², Aaron J. Elmore³, Samuel Madden¹, Aditya Parameswaran⁴ - Show less +3 more•Institutions (4)

Massachusetts Institute of Technology¹, University of Maryland, College Park², University of Chicago³, University of Illinois at Urbana–Champaign⁴

02 Sep 2014-arXiv: Databases

TL;DR: In this article, a dataset version control system, DataHub, is proposed, which allows users to create, branch, merge, change, and search large, divergent collections of datasets.

...read moreread less

Abstract: Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.

...read moreread less

141 citations

Journal Article•DOI•

A demonstration of the BigDAWG polystore system

[...]

Aaron J. Elmore¹, Jennie Duggan², Michael Stonebraker³, Magdalena Balazinska, Uǧur Çetintemel, Vijay Gadepally³, Jeffrey Heer, Bill Howe, Jeremy Kepner³, Tim Kraska, Samuel Madden³, David Maier, Timothy G. Mattson⁴, Stavros Papadopoulos⁴, Jeff Parkhurst⁴, Nesime Tatbul⁴, Manasi Vartak³, Stanley B. Zdonik - Show less +14 more•Institutions (4)

University of Chicago¹, Northwestern University², Massachusetts Institute of Technology³, Intel⁴

01 Aug 2015

TL;DR: BigDAWG is presented, a reference implementation of a new architecture for "Big Data" applications that showcases novel approaches for querying across multiple storage engines, data visualization, and scalable real-time analytics.

...read moreread less

Abstract: This paper presents BigDAWG, a reference implementation of a new architecture for "Big Data" applications. Such applications not only call for large-scale analytics, but also for real-time streaming support, smaller analytics at interactive speeds, data visualization, and cross-storage-system queries. Guided by the principle that "one size does not fit all", we build on top of a variety of storage engines, each designed for a specialized use case. To illustrate the promise of this approach, we demonstrate its effectiveness on a hospital application using data from an intensive care unit (ICU). This complex application serves the needs of doctors and researchers and provides real-time support for streams of patient data. It showcases novel approaches for querying across multiple storage engines, data visualization, and scalable real-time analytics.

...read moreread less

119 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

Posted Content•

Datasheets for Datasets

[...]

Timnit Gebru, Jamie Morgenstern¹, Briana Vecchione², Jennifer Wortman Vaughan³, Hanna Wallach³, Hal Daumé⁴, Kate Crawford - Show less +3 more•Institutions (4)

University of Washington¹, Cornell University², Microsoft³, University of Maryland, College Park⁴

23 Mar 2018-arXiv: Databases

TL;DR: Documentation to facilitate communication between dataset creators and consumers and consumers is presented.

...read moreread less

Abstract: The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability.

...read moreread less

1,080 citations

Book Chapter•DOI•

C-store: a column-oriented DBMS

[...]

Michael Stonebraker¹, Daniel J. Abadi¹, Adam Batkin², Xuedong Chen³, Mitch Cherniack², Miguel Ferreira¹, Edmond Lau¹, Amerson Lin¹, Samuel Madden¹, Elizabeth O'Neil³, Patrick O'Neil³, Alexander Rasin⁴, Nga Tran², Stan Zdonik⁴ - Show less +10 more•Institutions (4)

Massachusetts Institute of Technology¹, Brandeis University², University of Massachusetts Boston³, Brown University⁴

01 Dec 2018

TL;DR: Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products.

...read moreread less

Abstract: This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.We present preliminary performance data on a subset of TPC-H and show that the system we are building, C-Store, is substantially faster than popular commercial products. Hence, the architecture looks very encouraging.

...read moreread less

1,063 citations

Book Chapter•DOI•

The end of an architectural era: it's time for a complete rewrite

[...]

Michael Stonebraker¹, Samuel Madden¹, Daniel J. Abadi¹, Stavros Harizopoulos¹, Nabil Hachem, Pat Helland² - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

01 Dec 2018

TL;DR: The current RDBMS code lines, while attempting to be a "one size fits all" solution, in fact, excel at nothing and should be retired in favor of a collection of "from scratch" specialized engines.

...read moreread less

Abstract: In previous papers [SC05, SBC+07], some of us predicted the end of "one size fits all" as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1--2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets.Assuming that specialized engines dominate these markets over time, the current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store, which we have built at M.I.T. to a popular RDBMS on the standard transactional benchmark, TPC-C.We conclude that the current RDBMS code lines, while attempting to be a "one size fits all" solution, infact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of "from scratch" specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for yesterday's needs.

...read moreread less

679 citations

Journal Article•DOI•

Knowledge Graphs

[...]

Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, Antoine Zimmermann - Show less +14 more

04 Mar 2020-arXiv: Artificial Intelligence

TL;DR: The historical events that lead to the interweaving of data and knowledge are tracked to help improve knowledge and understanding of the world around us.

...read moreread less

Abstract: In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.

...read moreread less

560 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse