Proceedings ArticleDOI
MCDB: a monte carlo approach to managing uncertain data
Ravi Jampani,Fei Xu,Mingxi Wu,Luis Perez,Chris Jermaine,Peter J. Haas +5 more
- pp 687-700
Reads0
Chats0
TLDR
MCDB is introduced, a system for managing uncertain data that is based on a Monte Carlo approach, which can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles.Abstract:
To deal with data uncertainty, existing probabilistic database systems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system's ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty model to be dynamically parameterized according to the current state of the database. We introduce MCDB, a system for managing uncertain data that is based on a Monte Carlo approach. MCDB represents uncertainty via "VG functions," which are used to pseudorandomly generate realized values for uncertain attributes. VG functions can be parameterized on the results of SQL queries over "parameter tables" that are stored in the database, facilitating what-if analyses. By storing parameters, and not probabilities, and by estimating, rather than exactly computing, the probability distribution over possible query answers, MCDB avoids many of the limitations of prior systems. For example, MCDB can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles. To achieve good performance, MCDB uses novel query processing techniques, executing a query plan exactly once, but over "tuple bundles" instead of ordinary tuples. Experiments indicate that our enhanced functionality can be obtained with acceptable overheads relative to traditional systems.read more
Citations
More filters
Book
Probabilistic Databases
TL;DR: This paper describes the foundations of managing data where the uncertainties are quantified as probabilities and presents some fundamental theoretical results for query evaluation on probabilistic databases.
Journal ArticleDOI
Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS
TL;DR: This work presents Tuffy, a scalable Markov Logic Networks framework that achieves scalability via three novel contributions: a bottom-up approach to grounding, a novel hybrid architecture that allows to perform AI-style local search efficiently using an RDBMS, and a theoretical insight that shows when one can improve the efficiency of stochastic local search.
Journal ArticleDOI
k-nearest neighbors in uncertain graphs
TL;DR: Novel distance functions that extend well-known graph concepts, such as shortest paths are proposed that outperform previously used alternatives in identifying true neighbors in real-world biological data and scale for graphs with tens of millions of edges.
Posted Content
Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS
TL;DR: Tuffy as mentioned in this paper proposes a bottom-up approach to grounding that allows to leverage the full power of the relational optimizer and performs AI-style local search efficiently using an RDBMS.
Journal ArticleDOI
Probabilistic databases: diamonds in the dirt
TL;DR: Treasures abound from hidden facts found in imprecise data sets, according to research published in Science magazine in 2016.
References
More filters
Book
An Introduction to Copulas
TL;DR: This book discusses the fundamental properties of copulas and some of their primary applications, which include the study of dependence and measures of association, and the construction of families of bivariate distributions.
Book
Approximation Theorems of Mathematical Statistics
TL;DR: In this paper, the basic sample statistics are used for Parametric Inference, and the Asymptotic Theory in Parametric Induction (ATIP) is used to estimate the relative efficiency of given statistics.
Journal ArticleDOI
Non-Uniform Random Variate Generation.
B. J. T. Morgan,Luc Devroye +1 more
TL;DR: This chapter reviews the main methods for generating random variables, vectors and processes in non-uniform random variate generation, and provides information on the expected time complexity of various algorithms before addressing modern topics such as indirectly specified distributions, random processes, and Markov chain methods.
Book
Non-uniform random variate generation
TL;DR: A survey of the main methods in non-uniform random variate generation can be found in this article, where the authors provide information on the expected time complexity of various algorithms, before addressing modern topics such as indirectly specified distributions, random processes and Markov chain methods.
Book
Monte Carlo: Concepts, Algorithms, and Applications
TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging samples and generating random numbers.