Statistical estimators for relational algebra expressions

doi:10.1145/308386.308455

Proceedings ArticleDOI

Statistical estimators for relational algebra expressions

Wen-Chi Hou, +2 more

- pp 276-287

Chats0

TLDR

This paper designs a sampling plan based on the cluster sampling method to improve the utilization of sampled data and to reduce the cost of sampling, and proposes consistent and unbiased estimators for arbitrary COUNT(E) type queries.

Abstract:

Present database systems process all the data related to a query before giving out responses. As a result, the size of the data to be processed becomes excessive for real-time/time-constrained environments. A new methodology is needed to cut down systematically the time to process the data involved in processing the query. To this end, we propose to use data samples and construct an approximate synthetic response to a given query.In this paper, we consider only COUNT(E) type queries, where E is an arbitrary relational algebra expression. We make no assumptions about the distribution of attribute values and ordering of tuples in the input relations, and propose consistent and unbiased estimators for arbitrary COUNT(E) type queries. We design a sampling plan based on the cluster sampling method to improve the utilization of sampled data and to reduce the cost of sampling. We also evaluate the performance of the proposed estimators.

Statistical estimators for relational algebra expressions

Citations

Query evaluation techniques for large databases

Online aggregation

Maintenance of materialized views: problems, techniques, and applications

Estimating the Number of Species: A Review

Ripple joins for online aggregation

References

Simple Random Sampling from Relational Databases

Antisampling for Estimation: An Overview

Computer based management information systems embodying answer accuracy as a user parameter

Physical database support for scientific and statistical database management

Related Papers (5)

Practical selectivity estimation through adaptive sampling

Online aggregation

Sampling-Based Estimation of the Number of Distinct Values of an Attribute

Access path selection in a relational database management system

Sequential sampling procedures for query size estimation