Showing papers on "Uncertain data published in 2013"

PDF

Open Access

Journal Article•DOI•

Adaptive Robust Optimization for the Security Constrained Unit Commitment Problem

[...]

Dimitris Bertsimas¹, Eugene Litvinov², Xu Andy Sun³, Jinye Zhao², Tongxin Zheng² - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, ISO New England², Georgia Institute of Technology³

01 Feb 2013-IEEE Transactions on Power Systems

TL;DR: In this paper, a two-stage adaptive robust unit commitment model for the security constrained unit commitment problem in the presence of nodal net injection uncertainty is proposed, which only requires a deterministic uncertainty set, rather than a hard-to-obtain probability distribution on the uncertain data.

...read moreread less

Abstract: Unit commitment, one of the most critical tasks in electric power system operations, faces new challenges as the supply and demand uncertainty increases dramatically due to the integration of variable generation resources such as wind power and price responsive demand. To meet these challenges, we propose a two-stage adaptive robust unit commitment model for the security constrained unit commitment problem in the presence of nodal net injection uncertainty. Compared to the conventional stochastic programming approach, the proposed model is more practical in that it only requires a deterministic uncertainty set, rather than a hard-to-obtain probability distribution on the uncertain data. The unit commitment solutions of the proposed model are robust against all possible realizations of the modeled uncertainty. We develop a practical solution methodology based on a combination of Benders decomposition type algorithm and the outer approximation technique. We present an extensive numerical study on the real-world large scale power system operated by the ISO New England. Computational results demonstrate the economic and operational advantages of our model over the traditional reserve adjustment approach.

...read moreread less

1,454 citations

Journal Article•DOI•

Maximum Likelihood Estimation from Uncertain Data in the Belief Function Framework

[...]

T. Denoeux

01 Jan 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes a variant of the EM algorithm that iteratively maximizes the maximization of a generalized likelihood criterion, which can be interpreted as a degree of agreement between the statistical model and the uncertain observations.

...read moreread less

Abstract: We consider the problem of parameter estimation in statistical models in the case where data are uncertain and represented as belief functions. The proposed method is based on the maximization of a generalized likelihood criterion, which can be interpreted as a degree of agreement between the statistical model and the uncertain observations. We propose a variant of the EM algorithm that iteratively maximizes this criterion. As an illustration, the method is applied to uncertain data clustering using finite mixture models, in the cases of categorical and continuous attributes.

...read moreread less

249 citations

Journal Article•DOI•

Clustering Uncertain Data Based on Probability Distribution Similarity

[...]

Bin Jiang¹, Jian Pei¹, Yufei Tao², Xuemin Lin³•Institutions (3)

Simon Fraser University¹, The Chinese University of Hong Kong², University of New South Wales³

01 Apr 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work uses the well-known Kullback-Leibler divergence to measure similarity between uncertain objects in both the continuous and discrete cases, and integrates it into partitioning and density-based clustering methods to cluster uncertain objects.

...read moreread less

Abstract: Clustering on uncertain data, one of the essential tasks in mining uncertain data, posts significant challenges on both modeling similarity between uncertain objects and developing efficient computational methods The previous methods extend traditional partitioning clustering methods like $(k)$-means and density-based clustering methods like DBSCAN to uncertain data, thus rely on geometric distances between objects Such methods cannot handle uncertain objects that are geometrically indistinguishable, such as products with the same mean but very different variances in customer ratings Surprisingly, probability distributions, which are essential characteristics of uncertain objects, have not been considered in measuring similarity between uncertain objects In this paper, we systematically model uncertain objects in both continuous and discrete domains, where an uncertain object is modeled as a continuous and discrete random variable, respectively We use the well-known Kullback-Leibler divergence to measure similarity between uncertain objects in both the continuous and discrete cases, and integrate it into partitioning and density-based clustering methods to cluster uncertain objects Nevertheless, a naive implementation is very costly Particularly, computing exact KL divergence in the continuous case is very costly or even infeasible To tackle the problem, we estimate KL divergence in the continuous case by kernel density estimation and employ the fast Gauss transform technique to further speed up the computation Our extensive experiment results verify the effectiveness, efficiency, and scalability of our approaches

...read moreread less

149 citations

Journal Article•DOI•

Transdimensional inference in the geosciences

[...]

Malcolm Sambridge¹, Thomas Bodin², Kerry Gallagher³, Hrvoje Tkalčić¹•Institutions (3)

Australian National University¹, University of California², University of Rennes³

13 Feb 2013-Philosophical Transactions of the Royal Society A

TL;DR: Concepts of transdimensional inference are introduced to a general readership and illustrate with particular seismological examples.

...read moreread less

Abstract: Seismologists construct images of the Earth's interior structure using observations, derived from seismograms, collected at the surface. A common approach to such inverse problems is to build a single 'best' Earth model, in some sense. This is despite the fact that the observations by themselves often do not require, or even allow, a single best-fit Earth model to exist. Interpretation of optimal models can be fraught with difficulties, particularly when formal uncertainty estimates become heavily dependent on the regularization imposed. Similar issues occur across the physical sciences with model construction in ill-posed problems. An alternative approach is to embrace the non-uniqueness directly and employ an inference process based on parameter space sampling. Instead of seeking a best model within an optimization framework, one seeks an ensemble of solutions and derives properties of that ensemble for inspection. While this idea has itself been employed for more than 30 years, it is now receiving increasing attention in the geosciences. Recently, it has been shown that transdimensional and hierarchical sampling methods have some considerable benefits for problems involving multiple parameter types, uncertain data errors and/or uncertain model parametrizations, as are common in seismology. Rather than being forced to make decisions on parametrization, the level of data noise and the weights between data types in advance, as is often the case in an optimization framework, the choice can be informed by the data themselves. Despite the relatively high computational burden involved, the number of areas where sampling methods are now feasible is growing rapidly. The intention of this article is to introduce concepts of transdimensional inference to a general readership and illustrate with particular seismological examples. A growing body of references provide necessary detail.

...read moreread less

141 citations

Journal Article•DOI•

SVDD-based outlier detection on uncertain data

[...]

Bo Liu¹, Yanshan Xiao¹, Longbing Cao², Zhifeng Hao¹, Feiqi Deng³ - Show less +1 more•Institutions (3)

Guangdong University of Technology¹, University of Technology, Sydney², South China University of Technology³

01 Jan 2013-Knowledge and Information Systems

TL;DR: A new SVDD-based approach to detect outliers on uncertain data that outperforms state-of-art outlier detection techniques and reduces the contribution of the examples with the least confidence score on the construction of the decision boundary.

...read moreread less

Abstract: Outlier detection is an important problem that has been studied within diverse research areas and application domains. Most existing methods are based on the assumption that an example can be exactly categorized as either a normal class or an outlier. However, in many real-life applications, data are uncertain in nature due to various errors or partial completeness. These data uncertainty make the detection of outliers far more difficult than it is from clearly separable data. The key challenge of handling uncertain data in outlier detection is how to reduce the impact of uncertain data on the learned distinctive classi- fier. This paper proposes a new SVDD-based approach to detect outliers on uncertain data. The proposed approach operates in two steps. In the first step, a pseudo-training set is gen- erated by assigning a confidence score to each input example, which indicates the likelihood of an example tending normal class. In the second step, the generated confidence score is incorporated into the support vector data description training phase to construct a global

...read moreread less

127 citations

Proceedings Article•DOI•

Static analysis for probabilistic programs: inferring whole program properties from finitely many paths

[...]

Sriram Sankaranarayanan¹, Aleksandar Chakarov¹, Sumit Gulwani²•Institutions (2)

University of Colorado Boulder¹, Microsoft²

16 Jun 2013

TL;DR: A static analysis approach that provides guaranteed interval bounds on the values (assertion probabilities) of queries that seek the probabilities of assertions over program variables and demonstrates promising results on a suite of benchmarks including robotic manipulators and medical decision making programs.

...read moreread less

Abstract: We propose an approach for the static analysis of probabilistic programs that sense, manipulate, and control based on uncertain data. Examples include programs used in risk analysis, medical decision making and cyber-physical systems. Correctness properties of such programs take the form of queries that seek the probabilities of assertions over program variables. We present a static analysis approach that provides guaranteed interval bounds on the values (assertion probabilities) of such queries. First, we observe that for probabilistic programs, it is possible to conclude facts about the behavior of the entire program by choosing a finite, adequate set of its paths. We provide strategies for choosing such a set of paths and verifying its adequacy. The queries are evaluated over each path by a combination of symbolic execution and probabilistic volume-bound computations. Each path yields interval bounds that can be summed up with a "coverage" bound to yield an interval that encloses the probability of assertion for the program as a whole. We demonstrate promising results on a suite of benchmarks from many different sources including robotic manipulators and medical decision making programs.

...read moreread less

124 citations

Journal Article•DOI•

A survey of queries over uncertain data

[...]

Yijie Wang¹, Xiaoyong Li¹, Xiaoling Li¹, Yuan Wang¹•Institutions (1)

National University of Defense Technology¹

30 Apr 2013-Knowledge and Information Systems

TL;DR: This paper presents and analyzes several typical uncertain queries, such as skyline queries, top-$$k$$ queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data, and summarizes the main features of uncertain queries.

...read moreread less

Abstract: Uncertain data have already widely existed in many practical applications recently, such as sensor networks, RFID networks, location-based services, and mobile object management. Query processing over uncertain data as an important aspect of uncertain data management has received increasing attention in the field of database. Uncertain query processing poses inherent challenges and demands non-traditional techniques, due to the data uncertainty. This paper surveys this interesting and still evolving research area in current database community, so that readers can easily obtain an overview of the state-of-the-art techniques. We first provide an overview of data uncertainty, including uncertainty types, probability representation models, and sources of probabilities. We next outline the current major types of uncertain queries and summarize the main features of uncertain queries. Particularly, we present and analyze several typical uncertain queries in detail, such as skyline queries, top- $$k$$ queries, nearest-neighbor queries, aggregate queries, join queries, range queries, and threshold queries over uncertain data. Finally, we present many interesting research topics on uncertain queries that have not yet been explored.

...read moreread less

114 citations

Journal Article•DOI•

Clustering Large Probabilistic Graphs

[...]

George Kollios¹, Michalis Potamias, Evimaria Terzi¹•Institutions (1)

Boston University¹

01 Feb 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A connection is established between the objective function and correlation clustering to propose practical approximation algorithms for the problem of clustering probabilistic graphs and show the practicality of the techniques using a large social network of Yahoo! users consisting of one billion edges.

...read moreread less

Abstract: We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.

...read moreread less

110 citations

Journal Article•DOI•

Common weights data envelopment analysis with uncertain data: A robust optimization approach

[...]

Hashem Omrani¹•Institutions (1)

Urmia University of Technology¹

01 Dec 2013-Computers & Industrial Engineering

TL;DR: The proposed robust DEA model is solved and the ideal solution is found for each decision making units (DMUs) by utilizing the goal programming technique.

...read moreread less

77 citations

Journal Article•DOI•

Robust sustainable bi-directional logistics network design under uncertainty

[...]

Vincenzo De Rosa¹, Marina Gebhard¹, Evi Hartmann¹, Jens Wollenweber•Institutions (1)

University of Erlangen-Nuremberg¹

01 Sep 2013-International Journal of Production Economics

TL;DR: In this paper, the authors study a strategic capacitated facility location problem with integrated bi-directional product flows through a network of multiple supply stages, including production allocations, uncertain data development, facility locations and flexible capacity adjustments.

...read moreread less

75 citations

Journal Article•DOI•

Modeling fuzzy capacitated p-hub center problem and a genetic algorithm solution

[...]

Mahdi Bashiri¹, Masoud Mirzaei¹, Marcus Randall²•Institutions (2)

Shahed University¹, Bond University²

01 Mar 2013-Applied Mathematical Modelling

TL;DR: In this paper, a hybrid approach to the p-hub center allocation problem is presented, in which the location of hub facilities is determined by both quantitative and qualitative parameters simultaneously, and fuzzy systems are used to cope with these conditions and they are used as the basis of this work.

...read moreread less

Journal Article•DOI•

Robust weighted vertex p-center model considering uncertain data: An application to emergency management

[...]

Chung-Cheng Jason Lu¹•Institutions (1)

National Chiao Tung University¹

01 Oct 2013-European Journal of Operational Research

TL;DR: A theorem is proposed that facilitates identification of the worst-case scenario for a given set of facility locations and the impact of the degree of data uncertainty on the selected performance measures and the tradeoff between solution quality and robustness is examined.

...read moreread less

Book Chapter•DOI•

Discovering Frequent Patterns from Uncertain Data Streams with Time-Fading and Landmark Models

[...]

Carson K. Leung¹, Alfredo Cuzzocrea², Fan Jiang¹•Institutions (2)

University of Manitoba¹, University of Calabria²

01 Jan 2013

TL;DR: This paper proposes mining algorithms that use the time-fading model and the landmark model to discover frequent patterns from streams of uncertain data.

...read moreread less

Abstract: Streams of data can be continuously generated by sensors in various real-life applications such as environment surveillance. Partially due to the inherited limitation of the sensors, data in these streams can be uncertain. To discover useful knowledge in the form of frequent patterns from streams of uncertain data, a few algorithms have been developed. They mostly use the sliding window model for processing and mining data streams. However, for some applications, other stream processing models such as the time-fading model and the landmark model are more appropriate. In this paper, we propose mining algorithms that use (i) the time-fading model and (ii) the landmark model to discover frequent patterns from streams of uncertain data.

...read moreread less

Journal Article•DOI•

Finding Probabilistic Prevalent Colocations in Spatially Uncertain Data Sets

[...]

Lizhen Wang¹, Pinping Wu¹, Hongmei Chen¹•Institutions (1)

Yunnan University¹

01 Apr 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes pruning strategies for candidates to reduce the amount of computation of the probabilistic participation index values and designs an improved dynamic programming algorithm for identifying candidates that is suitable for parallel computation, and approximate computation.

...read moreread less

Abstract: A spatial colocation pattern is a group of spatial features whose instances are frequently located together in geographic space. Discovering colocations has many useful applications. For example, colocated plant species discovered from plant distribution data sets can contribute to the analysis of plant geography, phytosociology studies, and plant protection recommendations. In this paper, we study the colocation mining problem in the context of uncertain data, as the data generated from a wide range of data sources are inherently uncertain. One straightforward method to mine the prevalent colocations in a spatially uncertain data set is to simply compute the expected participation index of a candidate and decide if it exceeds a minimum prevalence threshold. Although this definition has been widely adopted, it misses important information about the confidence which can be associated with the participation index of a colocation. We propose another definition, probabilistic prevalent colocations, trying to find all the colocations that are likely to be prevalent in a randomly generated possible world. Finding probabilistic prevalent colocations (PPCs) turn out to be difficult. First, we propose pruning strategies for candidates to reduce the amount of computation of the probabilistic participation index values. Next, we design an improved dynamic programming algorithm for identifying candidates. This algorithm is suitable for parallel computation, and approximate computation. Finally, the effectiveness and efficiency of the methods proposed as well as the pruning strategies and the optimization techniques are verified by extensive experiments with “real + synthetic” spatially uncertain data sets.

...read moreread less

Journal Article•DOI•

Complex event processing over distributed probabilistic event streams

[...]

Y. H. Wang¹, K. Cao¹, X. M. Zhang¹•Institutions (1)

Hunan University¹

01 Dec 2013-Computers & Mathematics With Applications

TL;DR: A query plan based method using tree data structure is used to process hierarchical complex event from distributed event streams and query plan optimization is proposed based on query optimization technology of probabilistic databases.

...read moreread less

Abstract: With the rapid development of Internet of Things (IoT), enormous events are produced every day. Complex Event Processing (CEP), which can be used to extract high level patterns from raw data, becomes the key part of the IoT middleware. In large-scale IoT applications, the current CEP technology encounters the challenge of massive distributed data which cannot be handled by most of the current methods efficiently. Another challenge is the uncertainty of the data caused by noise, sensor error or wireless communication techniques. In order to solve these challenges, in this paper a high-performance complex event processing method over distributed probabilistic event streams is proposed. With the ability to report confidence for processed complex events over uncertain data, this method uses probabilistic nondeterministic finite automaton and active instance stacks to process a complex event in both single and distributed probabilistic event streams. A parallel algorithm is designed to improve the performance. A query plan-based method is used to process the hierarchical complex event from distributed event streams. Query plan optimization is proposed based on the query optimization technology of probabilistic databases. The experimental study shows that this method is efficient in processing complex events over distributed probabilistic event streams.

...read moreread less

Book Chapter•DOI•

PUF-Tree: A Compact Tree Structure for Frequent Pattern Mining of Uncertain Data

[...]

Carson K. Leung¹, Syed Khairuzzaman Tanbeer¹•Institutions (1)

University of Manitoba¹

14 Apr 2013

TL;DR: A more compact tree structure to capture uncertain data and an algorithm for mining all frequent patterns from the tree are proposed and Experimental results show that the tree is usually more compact than the UF-tree or UFP-tree.

...read moreread less

Abstract: Many existing algorithms mine frequent patterns from traditional databases of precise data. However, there are situations in which data are uncertain. In recent years, researchers have paid attention to frequent pattern mining from uncertain data. When handling uncertain data, UF-growth and UFP-growth are examples of well-known mining algorithms, which use the UF-tree and the UFP-tree respectively. However, these trees can be large, and thus degrade the mining performance. In this paper, we propose (i) a more compact tree structure to capture uncertain data and (ii) an algorithm for mining all frequent patterns from the tree. Experimental results show that (i) our tree is usually more compact than the UF-tree or UFP-tree, (ii) our tree can be as compact as the FP-tree, and (iii) our mining algorithm finds frequent patterns efficiently.

...read moreread less

Book Chapter•DOI•

Mining Frequent Patterns from Uncertain Data with MapReduce for Big Data Analytics

[...]

Carson K. Leung¹, Yaroslav Hayduk¹•Institutions (1)

University of Manitoba¹

22 Apr 2013

TL;DR: Experimental results show the effectiveness of the proposed tree-based algorithm and its enhancements in mining frequent patterns from uncertain data with MapReduce for Big Data analytics.

...read moreread less

Abstract: Frequent pattern mining is commonly used in many real-life applications. Since its introduction, the mining of frequent patterns from precise data has drawn attention of many researchers. In recent years, more attention has been drawn on mining from uncertain data. Items in each transaction of these uncertain data are usually associated with existential probabilities, which express the likelihood of these items to be present in the transaction. When compared with mining from precise data, the search/solution space for mining from uncertain data is much larger due to presence of the existential probabilities. Moreover, we are living in the era of Big Data. In this paper, we propose a tree-based algorithm that uses MapReduce to mine frequent patterns from Big uncertain data. In addition, we also propose some enhancements to further improve its performance. Experimental results show the effectiveness of our algorithm and its enhancements in mining frequent patterns from uncertain data with MapReduce for Big Data analytics.

...read moreread less

Journal Article•DOI•

Efficient Keyword Search on Uncertain Graph Data

[...]

Ye Yuan¹, Guoren Wang¹, Lei Chen², Haixun Wang³•Institutions (3)

Northeastern University (China)¹, Hong Kong University of Science and Technology², Microsoft³

01 Dec 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A filtering-and-verification strategy based on a probabilistic keyword index, PKIndex, which offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way to improve the search efficiency.

...read moreread less

Abstract: As a popular search mechanism, keyword search has been applied to retrieve useful data in documents, texts, graphs, and even relational databases. However, so far, there is no work on keyword search over uncertain graph data even though the uncertain graphs have been widely used in many real applications, such as modeling road networks, influential detection in social networks, and data analysis on PPI networks. Therefore, in this paper, we study the problem of top-k keyword search over uncertain graph data. Following the similar answer definition for keyword search over deterministic graphs, we consider a subtree in the uncertain graph as an answer to a keyword query if 1) it contains all the keywords; 2) it has a high score (defined by users or applications) based on keyword matching; and 3) it has low uncertainty. Keyword search over deterministic graphs is already a hard problem as stated in [1], [2], [3]. Due to the existence of uncertainty, keyword search over uncertain graphs is much harder. Therefore, to improve the search efficiency, we employ a filtering-and-verification strategy based on a probabilistic keyword index, PKIndex. For each keyword, we offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way. In the filtering phase, we perform existence, path-based and tree-based probabilistic pruning phases, which filter out most false subtrees. In the verification, we propose a sampling algorithm to verify the candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms.

...read moreread less

Journal Article•DOI•

Probabilistic top-k dominating queries in uncertain databases

[...]

Xiang Lian¹, Lei Chen²•Institutions (2)

University of Texas–Pan American¹, Hong Kong University of Science and Technology²

01 Mar 2013-Information Sciences

TL;DR: This paper formulate and tackle an important query, namely probabilistic top-k dominating (PTD) query, in the uncertain database, and proposes an effective pruning approach to reduce the PTD search space, and presents an efficient query procedure to answer PTD queries.

...read moreread less

Journal Article•DOI•

Selection of defuzzification method to obtain crisp values for representing uncertain data in a modified sweep algorithm

[...]

Gunadi Widi Nurcahyo

13 Dec 2013-European Journal of Immunology

TL;DR: In this article, a study of using fuzzy-based parameters for solving public bus routing problem with uncertain demand is presented, where uncertain data are represented as linguistic values which are fully dependent on the user's preference.

...read moreread less

Abstract: A study of using fuzzy-based parameters for solving public bus routing problem with uncertain demand is presented. The fuzzy-based parameters are designed to provide data required by the route selection procedure. The uncertain data are represented as linguistic values which are fully dependent on the user’s preference. Fuzzy inference rules are assigned to relate the fuzzy parameters to the crisp values which are concerned in the route selection process. This paper focuses on the selection of the Defuzzification method to discover the most appropriate method for obtaining crisp values which represent uncertain data. We also present a step by step evaluation showing that the fuzzy-based parameters are capable to represent uncertain data replacing the use of exact data which common route selection algorithms usually use.

...read moreread less

Journal Article•DOI•

Probabilistic skyline operator over sliding windows

[...]

Wenjie Zhang¹, Xuemin Lin¹, Ying Zhang¹, Wei Wang¹, Gaoping Zhu¹, Jeffrey Xu Yu² - Show less +2 more•Institutions (2)

University of New South Wales¹, The Chinese University of Hong Kong²

01 Nov 2013-Information Systems

TL;DR: This paper studies the problem of efficiently computing the skyline over sliding windows on uncertain data elements against probability thresholds, and characterize the properties of elements to be kept in the computation.

...read moreread less

Journal Article•DOI•

Uncertainty Quantification in Linear Interpolation for Isosurface Extraction

[...]

Tushar M. Athawale¹, Alireza Entezari¹•Institutions (1)

University of Florida¹

01 Dec 2013-IEEE Transactions on Visualization and Computer Graphics

TL;DR: An approach for deriving the probability density function of a random variable modeling the positional uncertainty in the isosurface extraction, when the uncertainty is quantified by a uniform distribution, provides a closed-form characterization of the mentioned random variable.

...read moreread less

Abstract: We present a study of linear interpolation when applied to uncertain data. Linear interpolation is a key step for isosurface extraction algorithms, and the uncertainties in the data lead to non-linear variations in the geometry of the extracted isosurface. We present an approach for deriving the probability density function of a random variable modeling the positional uncertainty in the isosurface extraction. When the uncertainty is quantified by a uniform distribution, our approach provides a closed-form characterization of the mentioned random variable. This allows us to derive, in closed form, the expected value as well as the variance of the level-crossing position. While the former quantity is used for constructing a stable isosurface for uncertain data, the latter is used for visualizing the positional uncertainties in the expected isosurface level crossings on the underlying grid.

...read moreread less

Journal Article•DOI•

Outlier detection on uncertain data based on local information

[...]

Jing Liu¹, Huifang Deng¹•Institutions (1)

South China University of Technology¹

01 Oct 2013-Knowledge Based Systems

TL;DR: Based on local information: local density and local uncertainty level, a new outlier detection algorithm is designed in this paper to calculate uncertain local outlier factor (ULOF) for each point in an uncertain dataset.

...read moreread less

Abstract: Based on local information: local density and local uncertainty level, a new outlier detection algorithm is designed in this paper to calculate uncertain local outlier factor (ULOF) for each point in an uncertain dataset. In this algorithm, all concepts, definitions and formulations for conventional local outlier detection approach (LOF) are generalized to include uncertainty information. The least squares algorithm on multi-times curve fitting is used to generate an approximate probability density function of distance between two points. An iteration algorithm is proposed to evaluate K–η–distance and a pruning strategy is adopted to reduce the size of candidate set of nearest-neighbors. The comparison between ULOF algorithm and the state-of-the-art approaches has been made. Results of several experiments on synthetic and real data sets demonstrate the effectiveness of the proposed approach.

...read moreread less

Proceedings Article•DOI•

Optimization algorithms for home energy resource scheduling in presence of data uncertainty

[...]

Stefano Squartini, Matteo Boaro, Francesco De Angelis, Danilo Fuselli, Francesco Piazza - Show less +1 more

09 Jun 2013

TL;DR: A comparison among different linear and nonlinear methods for home energy resource scheduling is proposed, considering the presence of data uncertainty into account, and results show how the offline approaches provide good performance also in presence of uncertain data.

...read moreread less

Abstract: Smart Home Energy Management is a very hot topic for the scientific community and some interesting solutions have also recently appeared on the market. One key issue is represented by the capability of planning the usage of energy resources in order to reduce the overall energy costs. This means that, considering the dynamic electricity price and the availability of adequately sized storage system, the expert system is supposed to automatically decide the more convenient policy for energy management from and towards the grid. In this work a comparison among different linear and nonlinear methods for home energy resource scheduling is proposed, considering the presence of data uncertainty into account. Indeed, whereas the employment of advanced optimization frameworks can take advantage by their inherent offline approach, the need to forecast the energy price and the amount of self-generated power. A residential scenario, in which a system storage and renewable resources are available and exploitable to match the user load demand, has been considered for performed computer simulations: obtained results show how the offline approaches provide good performance also in presence of uncertain data.

...read moreread less

Journal Article•DOI•

On the possibilistic approach to linear regression models involving uncertain, indeterminate or interval data

[...]

Michal Černý¹, Jaromír Antoch², Milan Hladík², Milan Hladík¹•Institutions (2)

University of Economics, Prague¹, Charles University in Prague²

20 Sep 2013-Information Sciences

TL;DR: It is shown that in the general case, very elementary questions about properties of the OLS-set are computationally intractable (assuming P ≠ NP).

...read moreread less

Proceedings Article•DOI•

Cleaning uncertain data for top-k queries

[...]

Luyi Mo¹, Reynold Cheng¹, Xiang Li¹, David W. Cheung¹, Xuan S. Yang¹ - Show less +1 more•Institutions (1)

University of Hong Kong¹

08 Apr 2013

TL;DR: This paper develops efficient algorithms to compute the quality of this query under the possible world semantics, and addresses the cleaning of a probabilistic database, in order to improve top-k query quality.

...read moreread less

Abstract: The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this “cleaning operation” may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal.

...read moreread less

Posted Content•

Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty

[...]

Walaa Eldin Moustafa¹, Angelika Kimmig², Amol Deshpande¹, Lise Getoor¹•Institutions (2)

University of Maryland, College Park¹, Katholieke Universiteit Leuven²

30 May 2013-arXiv: Databases

TL;DR: In this article, a probabilistic entity graph (PEG) model is proposed to capture uncertainties and answer queries over graph-structured data, which takes into account node attribute uncertainty, edge existence uncertainty, and identity uncertainty.

...read moreread less

Abstract: There is a growing need for methods which can capture uncertainties and answer queries over graph-structured data. Two common types of uncertainty are uncertainty over the attribute values of nodes and uncertainty over the existence of edges. In this paper, we combine those with identity uncertainty. Identity uncertainty represents uncertainty over the mapping from objects mentioned in the data, or references, to the underlying real-world entities. We propose the notion of a probabilistic entity graph (PEG), a probabilistic graph model that defines a distribution over possible graphs at the entity level. The model takes into account node attribute uncertainty, edge existence uncertainty, and identity uncertainty, and thus enables us to systematically reason about all three types of uncertainties in a uniform manner. We introduce a general framework for constructing a PEG given uncertain data at the reference level and develop highly efficient algorithms to answer subgraph pattern matching queries in this setting. Our algorithms are based on two novel ideas: context-aware path indexing and reduction by join-candidates, which drastically reduce the query search space. A comprehensive experimental evaluation shows that our approach outperforms baseline implementations by orders of magnitude.

...read moreread less

Proceedings Article•DOI•

Voronoi-based nearest neighbor search for multi-dimensional uncertain databases

[...]

Peiwu Zhang¹, Reynold Cheng¹, Nikos Mamoulis¹, Matthias Renz², Andreas Züfle², Yu Tang¹, Tobias Emrich² - Show less +3 more•Institutions (2)

University of Hong Kong¹, Ludwig Maximilian University of Munich²

08 Apr 2013

TL;DR: How to derive an axis-parallel hyper-rectangle (called the Uncertain Bounding Rectangle, or UBR) that tightly contains a PV-cell is studied, and the PV-index, a structure that stores UBRs, is developed to evaluate probabilistic nearest neighbor queries over uncertain data.

...read moreread less

Abstract: In Voronoi-based nearest neighbor search, the Voronoi cell of every point p in a database can be used to check whether p is the closest to some query point q. We extend the notion of Voronoi cells to support uncertain objects, whose attribute values are inexact. Particularly, we propose the Possible Voronoi cell (or PV-cell). A PV-cell of a multi-dimensional uncertain object o is a region R, such that for any point pϵR, o may be the nearest neighbor of p. If the PV-cells of all objects in a database S are known, they can be used to identify objects that have a chance to be the nearest neighbor of q. However, there is no efficient algorithm for computing an exact PV-cell. We hence study how to derive an axis-parallel hyper-rectangle (called the Uncertain Bounding Rectangle, or UBR) that tightly contains a PV-cell. We further develop the PV-index, a structure that stores UBRs, to evaluate probabilistic nearest neighbor queries over uncertain data. An advantage of the PV-index is that upon updates on S, it can be incrementally updated. Extensive experiments on both synthetic and real datasets are carried out to validate the performance of the PV-index.

...read moreread less

Journal Article•DOI•

A Fixed Point Approach to Origin-Destination Matrices Estimation Using Uncertain Data and Fuzzy Programming on Congested Networks

[...]

Leonardo Caggiani¹, Michele Ottomanelli¹, Domenico Sassanelli¹•Institutions (1)

University of Bari¹

01 Mar 2013-Transportation Research Part C-emerging Technologies

TL;DR: A Fuzzy-GLS estimation method that allows to improve the estimation performances of classic GLS estimator by including, in addition to traffic counts, uncertain information about starting O–D demand (i.e. outdated estimates, spot data, expert knowledge, etc.).

...read moreread less

Abstract: Origin–destination (O–D) matrix estimation methods based on traffic counts have been largely discussed and investigated. The most used methods are based on Generalised Least Square estimators (GLS) that use as input data a starting O–D matrix and a set of traffic counts. In addition to traffic counts, the analysts could know other general information about travel demand or link flows, based on their experience, or spot data, but few works deal with the matter of including effectively these sources of information. This paper proposes a Fuzzy-GLS estimation method that allows to improve the estimation performances of classic GLS estimator by including, in addition to traffic counts, uncertain information about starting O–D demand (i.e. outdated estimates, spot data, expert knowledge, etc.). The methods explicitly take into account the relevant level of uncertainty by taking as much advantage as possible from the few vague available data. The method is developed using fuzzy sets theory and fuzzy programming that seems to be a convenient theoretical framework to represent uncertainty in the available data. A solution algorithm for the proposed problem is also presented. The method has been tested by numerical applications and then compared to the classical GLS method under different sets of constraints to the problem.

...read moreread less

Journal Article•DOI•

Performance analysis of complex repairable industrial systems using PSO and fuzzy confidence interval based methodology.

[...]

Harish Garg¹•Institutions (1)

Indian Institute of Technology Roorkee¹

01 Mar 2013-Isa Transactions

TL;DR: The proposed confidence interval based fuzzy Lambda-Tau (CIBFLT) methodology for analyzing the behavior of the complex repairable industrial systems is illustrated through a case study of washing unit, the main part of the paper industry.

...read moreread less

Abstract: The main objective of the present paper is to propose a methodology for analyzing the behavior of the complex repairable industrial systems. In real-life situations, it is difficult to find the most optimal design policies for MTBF (mean time between failures), MTTR (mean time to repair) and related costs by utilizing available resources and uncertain data. For this, the availability–cost optimization model has been constructed for determining the optimal design parameters for improving the system design efficiency. The uncertainties in the data related to each component of the system are estimated with the help of fuzzy and statistical methodology in the form of the triangular fuzzy numbers. Using these data, the various reliability parameters, which affects the system performance, are obtained in the form of the fuzzy membership function by the proposed confidence interval based fuzzy Lambda-Tau (CIBFLT) methodology. The computed results by CIBFLT are compared with the existing fuzzy Lambda-Tau methodology. Sensitivity analysis on the system MTBF has also been addressed. The methodology has been illustrated through a case study of washing unit, the main part of the paper industry.

...read moreread less

Collapse