Proceedings ArticleDOI
Statistical estimators for relational algebra expressions
Wen-Chi Hou,Gultekin Ozsoyoglu,Baldeo K. Taneja +2 more
- pp 276-287
Reads0
Chats0
TLDR
This paper designs a sampling plan based on the cluster sampling method to improve the utilization of sampled data and to reduce the cost of sampling, and proposes consistent and unbiased estimators for arbitrary COUNT(E) type queries.Abstract:
Present database systems process all the data related to a query before giving out responses. As a result, the size of the data to be processed becomes excessive for real-time/time-constrained environments. A new methodology is needed to cut down systematically the time to process the data involved in processing the query. To this end, we propose to use data samples and construct an approximate synthetic response to a given query.In this paper, we consider only COUNT(E) type queries, where E is an arbitrary relational algebra expression. We make no assumptions about the distribution of attribute values and ordering of tuples in the input relations, and propose consistent and unbiased estimators for arbitrary COUNT(E) type queries. We design a sampling plan based on the cluster sampling method to improve the utilization of sampled data and to reduce the cost of sampling. We also evaluate the performance of the proposed estimators.read more
Citations
More filters
Journal ArticleDOI
Query evaluation techniques for large databases
TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Proceedings ArticleDOI
Online aggregation
TL;DR: In this article, the authors propose an online aggregation interface that allows users to both observe the progress of their aggregation queries and control execution on the fly, and present a suite of techniques that extend a database system to meet these requirements.
Book
Maintenance of materialized views: problems, techniques, and applications
TL;DR: This chapter contains sections titled: Introduction, The Idea Behind View Maintenance, Using Full Information, Using Partial Information, Open Problems, Acknowledgments.
Journal ArticleDOI
Estimating the Number of Species: A Review
John Bunge,M. Fitzpatrick +1 more
TL;DR: In this paper, the problem of estimating the number of kinds in a population of animals and plants is discussed. But the focus is not on estimating the relative sizes of the classes, but on the estimation of C itself.
Journal ArticleDOI
Ripple joins for online aggregation
TL;DR: It is shown how ripple joins can be implemented in an existing DBMS using iterators, and an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters are given.
References
More filters
Proceedings Article
Simple Random Sampling from Relational Databases
Frank Olken,Doron Rotem +1 more
TL;DR: Sampling as a fundamental operation for the auditing and statistical analysis of large databases is discussed and how samples of relational queries can often be computed for a small fraction of the effort of computing the entire relational query is shown.
Journal ArticleDOI
Antisampling for Estimation: An Overview
TL;DR: This approach is a comprehensive attempt (apparently the first) to estimate statistics without any sampling, using "antisampling" techniques that have analogies to those of sampling, and exhibit similar estimation accuracy, but can be done much faster than sampling with large computer databases.