Top 9 papers published by Nesime Tatbul from Intel in 2022

Journal Article•DOI•

[...]

R. Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska - Show less +2 more

31 May 2022-Sigmod Record

TL;DR: Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints, and combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm, to automatically learn from its mistakes and adapt to changes in query workloads, data, and schema.

...read moreread less

Abstract: Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties, we introduce Bao (the Bandit optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly learn strategies that improve end-to-end query execution performance, including tail latency, for several workloads containing longrunning queries. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a commercial system.

...read moreread less

7 citations

Proceedings Article•

Self-Organizing Data Containers

[...]

Samuel Madden, Jialin Ding, Tim Kraska, Sivaprasad Sudhir, David M. Cohen, Timothy G. Mattson, Nesime Tatbul - Show less +3 more

TL;DR: A new self-organizing, self-optimizing, meta-data rich storage format for the cloud that enables order-of-magnitude performance improvements in data-intensive applications through instance-optimization, i.e., the adaptation of data representation to exploit both the distribu-tion of the data and the workload operating on it.

...read moreread less

Abstract: We propose a new self-organizing, self-optimizing, meta-data rich storage format for the cloud, called a self-organizing data container (SDC), that enables order-of-magnitude performance improvements in data-intensive applications through instance-optimization, i.e., the adaptation of data representation to exploit both the distribu-tion of the data and the workload operating on it. Unlike existing low-level cloud storage formats like Apache Arrow and Parquet, SDCs capture both data and metadata, like access histories and distributional statistics, and are designed to be flexible enough to encompass a variety of modern high-performance representations for data analytics, including partitioning, replication, indexing, and materialization. We present a preliminary design for SDCs, some motivating experiments, and discuss new challenges they present.

...read moreread less

5 citations

Proceedings Article•

Mach: A Pluggable Metrics Storage Engine for the Age of Observability

[...]

Franco Solleza, Andrew Crotty, Suman Karumuri, Nesime Tatbul, Stan Zdonik - Show less +1 more

TL;DR: Mach’s lean, loosely coordinated architecture aggressively leverages the characteristics of metrics data and observability workloads, yielding an order-of-magnitude improvement over existing approaches—especially those marketed as “time series database systems” (TSDBs).

...read moreread less

Abstract: Observability is gaining traction as a key capability for understanding the internal behavior of large-scale system deployments. Instrumenting these systems to report quantitative telemetry data called metrics enables engineers to monitor and maintain services that operate at an enormous scale so they can respond rapidly to any issues that might arise. To be useful, metrics must be ingested, stored, and queryable in real time, but many existing solutions cannot keep up with the sheer volume of generated data. This paper describesMach, a pluggable storage engine we are building specifically to handle high-volume metrics data. Similar to many popular libraries (e.g., Berkeley DB, LevelDB, RocksDB, WiredTiger), Mach provides a simple API to store and retrieve data. Mach’s lean, loosely coordinated architecture aggressively leverages the characteristics of metrics data and observability workloads, yielding an order-of-magnitude improvement over existing approaches—especially those marketed as “time series database systems” (TSDBs). In fact, our preliminary results show that Mach can achieve nearly 10× higher write throughput and 3× higher read throughput compared to several widely used alternatives.

...read moreread less

1 citations

Journal Article•DOI•

Artifacts Availability & Reproducibility (VLDB 2021 Round Table)

[...]

M A Athanassoulis, Peter Triantafillou, Raja Appuswamy, Rajesh Bordawekar, Badrish Chandramouli, Xuntao Cheng, Ioana Manolescu, Yannis Papakonstantinou, Nesime Tatbul - Show less +5 more

29 Jul 2022-Sigmod Record

TL;DR: This short note summarizes the discussion of a panel held during VLDB 2021 titled "Artifacts, Availability & Reproducibility", aiming to assess the reproducibility of data management research and to propose changes moving forward.

...read moreread less

Abstract: In the last few years, SIGMOD and VLDB have intensified efforts to encourage, facilitate, and establish reproducibility as a key process for accepted research papers, awarding them with the Reproducibility badge. In addition, complementary efforts have focused on increasing the sharing of accompanying artifacts of published work (code, scripts, data), independently of reproducibility, awarding them the Artifacts Available badge. In this short note, we summarize the discussion of a panel held during VLDB 2021 titled "Artifacts, Availability & Reproducibility". We first present a more detailed summary of the recent efforts. Then, we present the discussion and the contributed key points that were made, aiming to assess the reproducibility of data management research and to propose changes moving forward.

...read moreread less

1 citations

Journal Article•DOI•

Machine programming

[...]

Abdul Wasay, Nesime Tatbul, Justin Gottschlich

01 Aug 2022-Proceedings of The Vldb Endowment

TL;DR: An introduction to machine programming is introduced introducing its three pillars: intention, invention, and adaptation, and an overview of the data ecosystem central to all machine programming systems is provided, highlighting challenges and novel opportunities relevant to the data systems community.

...read moreread less

Abstract: Machine programming is an emerging research area that improves the software development life cycle from design through deployment. We present a tutorial on machine programming research highlighting aspects relevant to the data systems community. We divide this tutorial into three parts: We begin with an introduction to machine programming introducing its three pillars: intention, invention, and adaptation. Then, we provide an overview of the data ecosystem central to all machine programming systems, highlighting challenges and novel opportunities relevant to the data systems community. Finally, we describe recent advances in machine programming research and how these directions use various data sets to improve the ease of creating and maintaining performant software systems.

...read moreread less

Journal Article•DOI•

VLDB Scalable Data Science Category

[...]

Arun Kumar, Alon Halevy, Nesime Tatbul

21 Nov 2022-Sigmod Record

TL;DR: The Scalable Data Science (SDS) research track as mentioned in this paper was introduced as part of the 2019 International Conference on Very Large Data Bases (VLDB) to enhance the impact and visibility of the VLDB community on data science practice.

...read moreread less

Abstract: As part of the International Conference on Very Large Data Bases (VLDB) 2021 / Proceedings of the VLDB Endowment Volume 14, a new Research Track category named Scalable Data Science (SDS) was launched [2, 6]. The goal of SDS is to attract cutting-edge and impactful real-world work in the scalable data science arena to enhance the impact and visibility of the VLDB community on data science practice, spur new technical connections, and inspire new follow-on research. The inaugural year proved to be successful, with numerous interesting papers from a wide cross section of both industry and academia, spanning several data science topics, and originating from several countries around the world. In this report, we reflect on the inaugural year of SDS with some statistics on both submissions and accepted papers, SDS invited talks, and our observations, lessons, and tips as inaugural Associate Editors for SDS. We hope this article is helpful to future authors, reviewers, and organizers of SDS, as well as other interested members of the wider database / data management community and beyond.

...read moreread less

Journal Article•

Machine Programming: Turning Data into Programmer Productivity

[...]

Abdul Wasay, Nesime Tatbul, Justin Gottschlich

Proceedings of The Vldb Endowment

TL;DR: An introduction to machine programming is introduced introducing its three pillars: intention, invention, and adaptation, and an overview of the data ecosystem central to all machine programming systems is provided, highlighting challenges and novel opportunities relevant to the data systems community.

...read moreread less

Abstract: Machine programming is an emerging research area that improves the software development life cycle from design through deployment. We present a tutorial on machine programming research highlighting aspects relevant to the data systems community. We divide this tutorial into three parts: We begin with an introduction to machine programming introducing its three pillars: intention, invention, and adaptation. Then, we provide an overview of the data ecosystem central to all machine programming systems, highlighting challenges and novel opportunities relevant to the data systems community. Finally, we describe recent advances in machine programming research and how these directions use various data sets to improve the ease of creating and maintaining performant software systems.

...read moreread less

Editorial

[...]

Alexander Artikis, Nesime Tatbul, Lukasz Golab, Mohammad Sadoghi

04 Jan 2022

Proceedings Article•DOI•

International Workshop on Data Management on New Hardware (DaMoN)

[...]

N. May, Nesime Tatbul

10 Jun 2022

TL;DR: The DaMoN Workshop as mentioned in this paper has established itself as the primary database venue to present ideas on how to exploit new hardware for data management, in particular how to improve performance or scalability of databases, how new hardware unlocks new database application scenarios, and how data management could benefit from future hardware.

...read moreread less

Abstract: New hardware, such as multi-core CPUs, GPUs, FPGAs, new memory and storage technologies, low-power devices, bring new challenges and opportunities in optimizing database systems performance. Consequently, exploiting the characteristics of modern hardware has become an important topic of database systems research. In the last two decades, the DaMoN Workshop has established itself as the primary database venue to present ideas on how to exploit new hardware for data management, in particular how to improve performance or scalability of databases, how new hardware unlocks new database application scenarios, and how data management could benefit from future hardware.

...read moreread less

Showing papers by "Nesime Tatbul published in 2022"