scispace - formally typeset
Open AccessJournal ArticleDOI

An end-to-end learning-based cost estimator

Ji Sun, +1 more
- Vol. 13, Iss: 3, pp 307-319
Reads0
Chats0
TLDR
This work proposes an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously, and is likely to be the first end- to-end cost estimator based on deep learning.
Abstract
Cost and cardinality estimation is vital to query optimizer, which can guide the query plan selection. However traditional empirical cost and cardinality estimation techniques cannot provide high-quality estimation, because they may not effectively capture the correlation between multiple tables. Recently the database community shows that the learning-based cardinality estimation is better than the empirical methods. However, existing learning-based methods have several limitations. Firstly, they focus on estimating the cardinality, but cannot estimate the cost. Secondly, they are either too heavy or hard to represent complicated structures, e.g., complex predicates.To address these challenges, we propose an effective end-to-end learning-based cost estimation framework based on a tree-structured model, which can estimate both cost and cardinality simultaneously. We propose effective feature extraction and encoding techniques, which consider both queries and physical operations in feature extraction. We embed these features into our tree-structured model. We propose an effective method to encode string values, which can improve the generalization ability for predicate matching. As it is prohibitively expensive to enumerate all string values, we design a patten-based method, which selects patterns to cover string values and utilizes the patterns to embed string values. We conducted experiments on real-world datasets and experimental results showed that our method outperformed baselines.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

QTune: a query-aware database tuning system with deep reinforcement learning

TL;DR: A query-aware database tuning system QTune with a deep reinforcement learning (DRL) model, which can efficiently and effectively tune the database configurations based on both the query vector and database states, and which outperforms the state-of-the-art tuning methods.
Proceedings ArticleDOI

Reinforcement Learning with Tree-LSTM for Join Order Selection

TL;DR: RTOS is a novel learned optimizer that uses Reinforcement learning with Tree-structured long short-term memory (LSTM) for join Order Selection and improves existing DRL-based approaches in two main aspects: it adopts graph neural networks to capture the structures of join trees; and it well supports the modification of database schema and multi-alias table names.
Posted Content

RadixSpline: A Single-Pass Learned Index

TL;DR: RadixSpline is introduced, a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance.
Journal ArticleDOI

DeepDB: learn from data, not from queries!

TL;DR: In this article, the authors propose a data-driven approach for learned DBMS components which directly supports changes of the workload and data without the need of retraining, and the results of their empirical evaluation demonstrate that their approach not only provides better accuracy than state-of-the-art learned components but also generalizes better to unseen queries.
Proceedings ArticleDOI

Bao: Making Learned Query Optimization Practical

TL;DR: Bao as mentioned in this paper combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm, to automatically learn from its mistakes and adapts to changes in query workloads, data, and schema.
References
More filters
Posted Content

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Journal ArticleDOI

Multitask Learning

TL;DR: Multi-task Learning (MTL) as mentioned in this paper is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias.
Journal ArticleDOI

Probabilistic counting algorithms for data base applications

TL;DR: A class of probabilistic counting algorithms with which one can estimate the number of distinct elements in a large collection of data in a single pass using only a small additional storage and only a few operations per element scanned is introduced.
Journal ArticleDOI

Automating string processing in spreadsheets using input-output examples

TL;DR: The design of a string programming/expression language that supports restricted forms of regular expressions, conditionals and loops is described and an algorithm based on several novel concepts for synthesizing a desired program in this language is described from input-output examples.
Journal ArticleDOI

HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm

TL;DR: This extended abstract describes and analyses a near-optimal probabilistic algorithm, HYPERLOGLOG, dedicated to estimating the number of \emphdistinct elements (the cardinality) of very large data ensembles, and makes it possible to estimate cardinalities well beyond $10^9$ with a typical accuracy of 2% while using a memory of only 1.5 kilobytes.
Related Papers (5)