scispace - formally typeset
Proceedings ArticleDOI

MCDB: a monte carlo approach to managing uncertain data

Reads0
Chats0
TLDR
MCDB is introduced, a system for managing uncertain data that is based on a Monte Carlo approach, which can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles.
Abstract
To deal with data uncertainty, existing probabilistic database systems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system's ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty model to be dynamically parameterized according to the current state of the database. We introduce MCDB, a system for managing uncertain data that is based on a Monte Carlo approach. MCDB represents uncertainty via "VG functions," which are used to pseudorandomly generate realized values for uncertain attributes. VG functions can be parameterized on the results of SQL queries over "parameter tables" that are stored in the database, facilitating what-if analyses. By storing parameters, and not probabilities, and by estimating, rather than exactly computing, the probability distribution over possible query answers, MCDB avoids many of the limitations of prior systems. For example, MCDB can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles. To achieve good performance, MCDB uses novel query processing techniques, executing a query plan exactly once, but over "tuple bundles" instead of ordinary tuples. Experiments indicate that our enhanced functionality can be obtained with acceptable overheads relative to traditional systems.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Probabilistic Databases

TL;DR: This paper describes the foundations of managing data where the uncertainties are quantified as probabilities and presents some fundamental theoretical results for query evaluation on probabilistic databases.
Journal ArticleDOI

Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS

TL;DR: This work presents Tuffy, a scalable Markov Logic Networks framework that achieves scalability via three novel contributions: a bottom-up approach to grounding, a novel hybrid architecture that allows to perform AI-style local search efficiently using an RDBMS, and a theoretical insight that shows when one can improve the efficiency of stochastic local search.
Journal ArticleDOI

k-nearest neighbors in uncertain graphs

TL;DR: Novel distance functions that extend well-known graph concepts, such as shortest paths are proposed that outperform previously used alternatives in identifying true neighbors in real-world biological data and scale for graphs with tens of millions of edges.
Posted Content

Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS

TL;DR: Tuffy as mentioned in this paper proposes a bottom-up approach to grounding that allows to leverage the full power of the relational optimizer and performs AI-style local search efficiently using an RDBMS.
Journal ArticleDOI

Probabilistic databases: diamonds in the dirt

TL;DR: Treasures abound from hidden facts found in imprecise data sets, according to research published in Science magazine in 2016.
References
More filters
Book

An Introduction to Copulas

TL;DR: This book discusses the fundamental properties of copulas and some of their primary applications, which include the study of dependence and measures of association, and the construction of families of bivariate distributions.
Book

Approximation Theorems of Mathematical Statistics

TL;DR: In this paper, the basic sample statistics are used for Parametric Inference, and the Asymptotic Theory in Parametric Induction (ATIP) is used to estimate the relative efficiency of given statistics.
Journal ArticleDOI

Non-Uniform Random Variate Generation.

B. J. T. Morgan, +1 more
- 01 Sep 1988 - 
TL;DR: This chapter reviews the main methods for generating random variables, vectors and processes in non-uniform random variate generation, and provides information on the expected time complexity of various algorithms before addressing modern topics such as indirectly specified distributions, random processes, and Markov chain methods.
Book

Non-uniform random variate generation

Luc Devroye
TL;DR: A survey of the main methods in non-uniform random variate generation can be found in this article, where the authors provide information on the expected time complexity of various algorithms, before addressing modern topics such as indirectly specified distributions, random processes and Markov chain methods.
Book

Monte Carlo: Concepts, Algorithms, and Applications

TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging samples and generating random numbers.