scispace - formally typeset
Proceedings ArticleDOI

Statistical estimators for relational algebra expressions

Reads0
Chats0
TLDR
This paper designs a sampling plan based on the cluster sampling method to improve the utilization of sampled data and to reduce the cost of sampling, and proposes consistent and unbiased estimators for arbitrary COUNT(E) type queries.
Abstract
Present database systems process all the data related to a query before giving out responses. As a result, the size of the data to be processed becomes excessive for real-time/time-constrained environments. A new methodology is needed to cut down systematically the time to process the data involved in processing the query. To this end, we propose to use data samples and construct an approximate synthetic response to a given query.In this paper, we consider only COUNT(E) type queries, where E is an arbitrary relational algebra expression. We make no assumptions about the distribution of attribute values and ordering of tuples in the input relations, and propose consistent and unbiased estimators for arbitrary COUNT(E) type queries. We design a sampling plan based on the cluster sampling method to improve the utilization of sampled data and to reduce the cost of sampling. We also evaluate the performance of the proposed estimators.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Query evaluation techniques for large databases

TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Proceedings ArticleDOI

Online aggregation

TL;DR: In this article, the authors propose an online aggregation interface that allows users to both observe the progress of their aggregation queries and control execution on the fly, and present a suite of techniques that extend a database system to meet these requirements.
Book

Maintenance of materialized views: problems, techniques, and applications

TL;DR: This chapter contains sections titled: Introduction, The Idea Behind View Maintenance, Using Full Information, Using Partial Information, Open Problems, Acknowledgments.
Journal ArticleDOI

Estimating the Number of Species: A Review

TL;DR: In this paper, the problem of estimating the number of kinds in a population of animals and plants is discussed. But the focus is not on estimating the relative sizes of the classes, but on the estimation of C itself.
Journal ArticleDOI

Ripple joins for online aggregation

TL;DR: It is shown how ripple joins can be implemented in an existing DBMS using iterators, and an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters are given.
References
More filters
Proceedings Article

Simple Random Sampling from Relational Databases

Frank Olken, +1 more
TL;DR: Sampling as a fundamental operation for the auditing and statistical analysis of large databases is discussed and how samples of relational queries can often be computed for a small fraction of the effort of computing the entire relational query is shown.
Journal ArticleDOI

Antisampling for Estimation: An Overview

TL;DR: This approach is a comprehensive attempt (apparently the first) to estimate statistics without any sampling, using "antisampling" techniques that have analogies to those of sampling, and exhibit similar estimation accuracy, but can be done much faster than sampling with large computer databases.