An end-to-end learning-based cost estimator

doi:10.14778/3368289.3368296

Open AccessJournal ArticleDOI

An end-to-end learning-based cost estimator

Ji Sun, +1 more

- Vol. 13, Iss: 3, pp 307-319

Chats0

TLDR

This work proposes an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously, and is likely to be the first end- to-end cost estimator based on deep learning.

Abstract:

Cost and cardinality estimation is vital to query optimizer, which can guide the query plan selection. However traditional empirical cost and cardinality estimation techniques cannot provide high-quality estimation, because they may not effectively capture the correlation between multiple tables. Recently the database community shows that the learning-based cardinality estimation is better than the empirical methods. However, existing learning-based methods have several limitations. Firstly, they focus on estimating the cardinality, but cannot estimate the cost. Secondly, they are either too heavy or hard to represent complicated structures, e.g., complex predicates.To address these challenges, we propose an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously. We propose effective feature extraction and encoding techniques, which consider both queries and physical operations in feature extraction. We embed these features into our tree-structured model. We propose an effective method to encode string values, which can improve the generalization ability for predicate matching. As it is prohibitively expensive to enumerate all string values, we design a patten-based method, which selects patterns to cover string values and utilizes the patterns to embed string values. We conducted experiments on real-world datasets and experimental results showed that our method outperformed baselines.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

QTune: a query-aware database tuning system with deep reinforcement learning

Guoliang Li, +3 more

TL;DR: A query-aware database tuning system QTune with a deep reinforcement learning (DRL) model, which can efficiently and effectively tune the database configurations based on both the query vector and database states, and which outperforms the state-of-the-art tuning methods.

...read moreread less

Proceedings ArticleDOI

Reinforcement Learning with Tree-LSTM for Join Order Selection

Xiang Yu, +3 more

TL;DR: RTOS is a novel learned optimizer that uses Reinforcement learning with Tree-structured long short-term memory (LSTM) for join Order Selection and improves existing DRL-based approaches in two main aspects: it adopts graph neural networks to capture the structures of join trees; and it well supports the modification of database schema and multi-alias table names.

...read moreread less

Posted Content

RadixSpline: A Single-Pass Learned Index

Andreas Kipf, +6 more

- 30 Apr 2020 -

arXiv: Databases

TL;DR: RadixSpline is introduced, a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance.

...read moreread less

Journal ArticleDOI

DeepDB: learn from data, not from queries!

Benjamin Hilprecht, +5 more

TL;DR: In this article, the authors propose a data-driven approach for learned DBMS components which directly supports changes of the workload and data without the need of retraining, and the results of their empirical evaluation demonstrate that their approach not only provides better accuracy than state-of-the-art learned components but also generalizes better to unseen queries.

...read moreread less

Proceedings ArticleDOI

Bao: Making Learned Query Optimization Practical

Ryan Marcus, +5 more

TL;DR: Bao as mentioned in this paper combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm, to automatically learn from its mistakes and adapts to changes in query workloads, data, and schema.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013 -

arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

Journal ArticleDOI

Multitask Learning

Rich Caruana

TL;DR: Multi-task Learning (MTL) as mentioned in this paper is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias.

...read moreread less

Journal ArticleDOI

Probabilistic counting algorithms for data base applications

Philippe Flajolet, +1 more

- 01 Sep 1985 -

Journal of Computer and System Sciences

TL;DR: A class of probabilistic counting algorithms with which one can estimate the number of distinct elements in a large collection of data in a single pass using only a small additional storage and only a few operations per element scanned is introduced.

...read moreread less

Journal ArticleDOI

Automating string processing in spreadsheets using input-output examples

Sumit Gulwani

TL;DR: The design of a string programming/expression language that supports restricted forms of regular expressions, conditionals and loops is described and an algorithm based on several novel concepts for synthesizing a desired program in this language is described from input-output examples.

...read moreread less

Journal ArticleDOI

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

Philippe Flajolet, +3 more

- 17 Jun 2007 -

Discrete Mathematics & Theoretical Compu...

TL;DR: This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data ensembles, and makes it possible to estimate cardinalities well beyond $10^9$ with a typical accuracy of 2% while using a memory of only 1.5 kilobytes.

...read moreread less

Collapse

An end-to-end learning-based cost estimator

Citations

QTune: a query-aware database tuning system with deep reinforcement learning

Reinforcement Learning with Tree-LSTM for Join Order Selection

RadixSpline: A Single-Pass Learned Index

DeepDB: learn from data, not from queries!

Bao: Making Learned Query Optimization Practical

References

Distributed Representations of Words and Phrases and their Compositionality

Multitask Learning

Probabilistic counting algorithms for data base applications

Automating string processing in spreadsheets using input-output examples

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

Related Papers (5)

Neo: a learned query optimizer

How good are query optimizers, really?

The Case for Learned Index Structures

Selectivity estimation for range predicates using lightweight models

An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning