scispace - formally typeset
Proceedings ArticleDOI

Improved histograms for selectivity estimation of range predicates

Reads0
Chats0
TLDR
A taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities is provided, which introduces novel choices for several of the taxonomy dimensions, and derive new histograms types by combining choices in effective ways.
Abstract
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Data Streams: Models and Algorithms

TL;DR: This volume covers mining aspects of data streams comprehensively: each contributed chapter contains a survey on the topic, the key ideas in the field for that particular topic, and future research directions.
Proceedings ArticleDOI

An overview of query optimization in relational systems

TL;DR: The goal of this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area of query optimization.
Proceedings ArticleDOI

Space-efficient online computation of quantile summaries

TL;DR: The actual space bounds obtained on experimental data are significantly better than the worst case guarantees of the algorithm as well as the observed space requirements of earlier algorithms.
Journal ArticleDOI

Approximate Query Processing Using Wavelets

TL;DR: The use of multi-dimensional wavelets are proposed as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications and a novel wavelet decomposition algorithm is proposed that can build wavelet-coefficient synopses of the data in an I/O-efficient manner.

An Overview of Query Optimization in Relational Systems (paper)

TL;DR: There has been extensive work in query optimization since the early 1970s as discussed by the authors and it is hard to capture the breadth and depth of this large body of work in a short article.
References
More filters
Book

A practical guide to splines

Carl de Boor
TL;DR: This book presents those parts of the theory which are especially useful in calculations and stresses the representation of splines as linear combinations of B-splines as well as specific approximation methods, interpolation, smoothing and least-squares approximation, the solution of an ordinary differential equation by collocation, curve fitting, and surface fitting.
Book ChapterDOI

Probability Inequalities for sums of Bounded Random Variables

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Journal ArticleDOI

The Kolmogorov-Smirnov Test for Goodness of Fit

TL;DR: In this paper, the maximum difference between an empirical and a hypothetical cumulative distribution is calculated, and confidence limits for a cumulative distribution are described, showing that the test is superior to the chi-square test.
Proceedings ArticleDOI

Access path selection in a relational database management system

TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.
Related Papers (5)