Active Learning for ML Enhanced Database Systems
Lin Ma,Bailu Ding,Sudipto Das,Adith Swaminathan +3 more
- pp 175-191
Reads0
Chats0
TLDR
This paper proposes an active data collection platform, ADCP, that employs active learning (AL) to gather relevant data cost-effectively and develops a novel AL technique, Holistic Active Learner (HAL), that robustly combines multiple noisy signals for data gathering in the context of database applications.Abstract:
Recent research has shown promising results by using machine learning (ML) techniques to improve the performance of database systems, e.g., in query optimization or index recommendation. However, in many production deployments, the ML models' performance degrades significantly when the test data diverges from the data used to train these models. In this paper, we address this performance degradation by using B-instances to collect additional data during deployment. We propose an active data collection platform, ADCP, that employs active learning (AL) to gather relevant data cost-effectively. We develop a novel AL technique, Holistic Active Learner (HAL), that robustly combines multiple noisy signals for data gathering in the context of database applications. HAL applies to various ML tasks, budget sizes, cost types, and budgeting interfaces for database applications. We evaluate ADCP on both industry-standard benchmarks and real customer workloads. Our evaluation shows that, compared with other baselines, our technique improves ML models' prediction performance by up to 2x with the same cost budget. In particular, on production workloads, our technique reduces the prediction error of ML models by 75% using about 100 additionally collected queries.read more
Citations
More filters
Proceedings ArticleDOI
AI Meets Database: AI4DB and DB4AI
Guoliang Li,Xuanhe Zhou,Lei Cao +2 more
TL;DR: In this article, the authors review existing studies on AI4DB and DB4AI and provide research challenges and future directions in AI-oriented declarative language, data governance, training acceleration, and inference acceleration.
Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Mario A. Nascimento,M. Tamer Özsu,Donald Kossmann,Renée J. Miller,José A. Blakeley,K. Bernhard Schiefer +5 more
Journal ArticleDOI
Machine learning for databases
Guoliang Li,Xuanhe Zhou,Lei Cao +2 more
TL;DR: In this article, the authors categorize database tasks into three typical problems that can be optimized by different machine learning models, including (i) NP-hard problems (e.g., knob space exploration, index/view selection, partition-key recommendation for offline optimization; query rewrite, join order selection for online optimization), (ii) regression problems, and (iii) prediction problems, such as transaction scheduling, trend prediction).
Proceedings ArticleDOI
MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems
Lin Ma,William Zhang,Jie Jiao,Wuwen Wang,Matthew Butrovich,Wan Shen Lim,Prashanth Menon,Andrew Pavlo +7 more
TL;DR: ModelBot2 as mentioned in this paper is an end-to-end framework for constructing and maintaining prediction models using machine learning (ML) in self-driving DBMSs, which decomposes a DBMS's architecture into fine-grained operating units that make it easier to estimate the system's behavior for configurations that it has never seen before.
Journal ArticleDOI
A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration
Hai Lan,Zhifeng Bao,Yuwei Peng +2 more
TL;DR: In this article, a cost-based optimizer is used to find the optimal execution plan for a complex query in a reasonable time, but due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find a (sub)plan, and then uses a cost model to obtain the cost of that plan and selects the plan with the lowest cost.
References
More filters
Book
Deep Learning
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Journal ArticleDOI
Pattern Recognition and Machine Learning
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Book
Artificial Intelligence: A Modern Approach
Stuart Russell,Peter Norvig +1 more
TL;DR: In this article, the authors present a comprehensive introduction to the theory and practice of artificial intelligence for modern applications, including game playing, planning and acting, and reinforcement learning with neural networks.