MCDB: a monte carlo approach to managing uncertain data

doi:10.1145/1376616.1376686

Proceedings ArticleDOI

MCDB: a monte carlo approach to managing uncertain data

Ravi Jampani, +5 more

- pp 687-700

Chats0

TLDR

MCDB is introduced, a system for managing uncertain data that is based on a Monte Carlo approach, which can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles.

Abstract:

To deal with data uncertainty, existing probabilistic database systems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system's ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty model to be dynamically parameterized according to the current state of the database. We introduce MCDB, a system for managing uncertain data that is based on a Monte Carlo approach. MCDB represents uncertainty via "VG functions," which are used to pseudorandomly generate realized values for uncertain attributes. VG functions can be parameterized on the results of SQL queries over "parameter tables" that are stored in the database, facilitating what-if analyses. By storing parameters, and not probabilities, and by estimating, rather than exactly computing, the probability distribution over possible query answers, MCDB avoids many of the limitations of prior systems. For example, MCDB can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles. To achieve good performance, MCDB uses novel query processing techniques, executing a query plan exactly once, but over "tuple bundles" instead of ordinary tuples. Experiments indicate that our enhanced functionality can be obtained with acceptable overheads relative to traditional systems.

MCDB: a monte carlo approach to managing uncertain data

Citations

Probabilistic Databases

Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS

k-nearest neighbors in uncertain graphs

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS

Probabilistic databases: diamonds in the dirt

References

An Introduction to Copulas

Approximation Theorems of Mathematical Statistics

Non-Uniform Random Variate Generation.

Non-uniform random variate generation

Monte Carlo: Concepts, Algorithms, and Applications

Related Papers (5)

Efficient query evaluation on probabilistic databases

Efficient Top-k Query Evaluation on Probabilistic Data

ULDBs: databases with uncertainty and lineage

Evaluating probabilistic queries over imprecise data

Trio: a system for data, uncertainty, and lineage