scispace - formally typeset
Open AccessProceedings ArticleDOI

Active Learning for ML Enhanced Database Systems

Reads0
Chats0
TLDR
This paper proposes an active data collection platform, ADCP, that employs active learning (AL) to gather relevant data cost-effectively and develops a novel AL technique, Holistic Active Learner (HAL), that robustly combines multiple noisy signals for data gathering in the context of database applications.
Abstract
Recent research has shown promising results by using machine learning (ML) techniques to improve the performance of database systems, e.g., in query optimization or index recommendation. However, in many production deployments, the ML models' performance degrades significantly when the test data diverges from the data used to train these models. In this paper, we address this performance degradation by using B-instances to collect additional data during deployment. We propose an active data collection platform, ADCP, that employs active learning (AL) to gather relevant data cost-effectively. We develop a novel AL technique, Holistic Active Learner (HAL), that robustly combines multiple noisy signals for data gathering in the context of database applications. HAL applies to various ML tasks, budget sizes, cost types, and budgeting interfaces for database applications. We evaluate ADCP on both industry-standard benchmarks and real customer workloads. Our evaluation shows that, compared with other baselines, our technique improves ML models' prediction performance by up to 2x with the same cost budget. In particular, on production workloads, our technique reduces the prediction error of ML models by 75% using about 100 additionally collected queries.

read more

Citations
More filters
Proceedings ArticleDOI

AI Meets Database: AI4DB and DB4AI

TL;DR: In this article, the authors review existing studies on AI4DB and DB4AI and provide research challenges and future directions in AI-oriented declarative language, data governance, training acceleration, and inference acceleration.
Journal ArticleDOI

Machine learning for databases

TL;DR: In this article, the authors categorize database tasks into three typical problems that can be optimized by different machine learning models, including (i) NP-hard problems (e.g., knob space exploration, index/view selection, partition-key recommendation for offline optimization; query rewrite, join order selection for online optimization), (ii) regression problems, and (iii) prediction problems, such as transaction scheduling, trend prediction).
Proceedings ArticleDOI

MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems

TL;DR: ModelBot2 as mentioned in this paper is an end-to-end framework for constructing and maintaining prediction models using machine learning (ML) in self-driving DBMSs, which decomposes a DBMS's architecture into fine-grained operating units that make it easier to estimate the system's behavior for configurations that it has never seen before.
Journal ArticleDOI

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

TL;DR: In this article, a cost-based optimizer is used to find the optimal execution plan for a complex query in a reasonable time, but due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find a (sub)plan, and then uses a cost model to obtain the cost of that plan and selects the plan with the lowest cost.
References
More filters
Book

Deep Learning

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Journal ArticleDOI

Pattern Recognition and Machine Learning

Radford M. Neal
- 01 Aug 2007 - 
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Book

Artificial Intelligence: A Modern Approach

TL;DR: In this article, the authors present a comprehensive introduction to the theory and practice of artificial intelligence for modern applications, including game playing, planning and acting, and reinforcement learning with neural networks.
Related Papers (5)