Proceedings ArticleDOI
Improved histograms for selectivity estimation of range predicates
Viswanath Poosala,Peter J. Haas,Yannis Ioannidis,Eugene J. Shekita +3 more
- Vol. 25, Iss: 2, pp 294-305
Reads0
Chats0
TLDR
A taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities is provided, which introduces novel choices for several of the taxonomy dimensions, and derive new histograms types by combining choices in effective ways.Abstract:
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance.read more
Citations
More filters
Book
Data Streams: Models and Algorithms
TL;DR: This volume covers mining aspects of data streams comprehensively: each contributed chapter contains a survey on the topic, the key ideas in the field for that particular topic, and future research directions.
Proceedings ArticleDOI
An overview of query optimization in relational systems
TL;DR: The goal of this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area of query optimization.
Proceedings ArticleDOI
Space-efficient online computation of quantile summaries
TL;DR: The actual space bounds obtained on experimental data are significantly better than the worst case guarantees of the algorithm as well as the observed space requirements of earlier algorithms.
Journal ArticleDOI
Approximate Query Processing Using Wavelets
TL;DR: The use of multi-dimensional wavelets are proposed as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications and a novel wavelet decomposition algorithm is proposed that can build wavelet-coefficient synopses of the data in an I/O-efficient manner.
An Overview of Query Optimization in Relational Systems (paper)
TL;DR: There has been extensive work in query optimization since the early 1970s as discussed by the authors and it is hard to capture the breadth and depth of this large body of work in a short article.
References
More filters
Book
A practical guide to splines
TL;DR: This book presents those parts of the theory which are especially useful in calculations and stresses the representation of splines as linear combinations of B-splines as well as specific approximation methods, interpolation, smoothing and least-squares approximation, the solution of an ordinary differential equation by collocation, curve fitting, and surface fitting.
Book ChapterDOI
Probability Inequalities for sums of Bounded Random Variables
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Journal ArticleDOI
The Kolmogorov-Smirnov Test for Goodness of Fit
TL;DR: In this paper, the maximum difference between an empirical and a hypothetical cumulative distribution is calculated, and confidence limits for a cumulative distribution are described, showing that the test is superior to the chi-square test.
Proceedings ArticleDOI
Access path selection in a relational database management system
TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.