Showing papers by "Reynold Cheng published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data

[...]

Reynold Cheng, Jinchuan Chen, Mohamed F. Mokbel¹, Chi-Yin Chow¹•Institutions (1)

07 Apr 2008

TL;DR: The constrained nearest-neighbor query (C-PNN) is proposed, which returns the IDs of objects whose probabilities are higher than some threshold, with a given error bound in the answers.

...read moreread less

Abstract: In applications like location-based services, sensor monitoring and biological databases, the values of the database items are inherently uncertain in nature. An important query for uncertain objects is the probabilistic nearest-neighbor query (PNN), which computes the probability of each object for being the nearest neighbor of a query point. Evaluating this query is computationally expensive, since it needs to consider the relationship among uncertain objects, and requires the use of numerical integration or Monte-Carlo methods. Sometimes, a query user may not be concerned about the exact probability values. For example, he may only need answers that have sufficiently high confidence. We thus propose the constrained nearest-neighbor query (C-PNN), which returns the IDs of objects whose probabilities are higher than some threshold, with a given error bound in the answers. The C-PNN can be answered efficiently with probabilistic verifiers. These are methods that derive the lower and upper bounds of answer probabilities, so that an object can be quickly decided on whether it should be included in the answer. We have developed three probabilistic verifiers, which can be used on uncertain data with arbitrary probability density functions. Extensive experiments were performed to examine the effectiveness of these approaches.

...read moreread less

171 citations

Proceedings Article•DOI•

Database Support for Probabilistic Attributes and Tuples

[...]

Sarvjeet Singh¹, Chris Mayfield¹, Rahul Shah², Sunil Prabhakar¹, Susanne E. Hambrusch¹, Jennifer Neville¹, Reynold Cheng³ - Show less +3 more•Institutions (3)

Purdue University¹, Louisiana State University², Hong Kong Polytechnic University³

07 Apr 2008

TL;DR: This paper presents a model for handling arbitrary probabilistic uncertain data natively at the database level, and develops a model that is consistent with possible worlds semantics and closed under basic relational operators.

...read moreread less

Abstract: The inherent uncertainty of data present in numerous applications such as sensor databases, text annotations, and information retrieval motivate the need to handle imprecise data at the database level. Uncertainty can be at the attribute or tuple level and is present in both continuous and discrete data domains. This paper presents a model for handling arbitrary probabilistic uncertain data (both discrete and continuous) natively at the database level. Our approach leads to a natural and efficient representation for probabilistic data. We develop a model that is consistent with possible worlds semantics and closed under basic relational operators. This is the first model that accurately and efficiently handles both continuous and discrete uncertainty. The model is implemented in a real database system (PostgreSQL) and the effectiveness and efficiency of our approach is validated experimentally.

...read moreread less

122 citations

Journal Article•DOI•

Cleaning uncertain data with quality guarantees

[...]

Reynold Cheng¹, Jinchuan Chen¹, Xike Xie¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Aug 2008

TL;DR: This work presents the PWS-quality metric, a universal measure that quantifies the ambiguity of query answers under the possible world semantics, and investigates how such a metric can be used for data cleaning purposes.

...read moreread less

Abstract: Uncertain or imprecise data are pervasive in applications like location-based services, sensor monitoring, and data collection and integration. For these applications, probabilistic databases can be used to store uncertain data, and querying facilities are provided to yield answers with statistical confidence. Given that a limited amount of resources is available to "clean" the database (e.g., by probing some sensor data values to get their latest values), we address the problem of choosing the set of uncertain objects to be cleaned, in order to achieve the best improvement in the quality of query answers. For this purpose, we present the PWS-quality metric, which is a universal measure that quantifies the ambiguity of query answers under the possible world semantics. We study how PWS-quality can be efficiently evaluated for two major query classes: (1) queries that examine the satisfiability of tuples independent of other tuples (e.g., range queries); and (2) queries that require the knowledge of the relative ranking of the tuples (e.g., MAX queries). We then propose a polynomial-time solution to achieve an optimal improvement in PWS-quality. Other fast heuristics are presented as well. Experiments, performed on both real and synthetic datasets, show that the PWS-quality metric can be evaluated quickly, and that our cleaning algorithm provides an optimal solution with high efficiency. To our best knowledge, this is the first work that develops a quality metric for a probabilistic database, and investigates how such a metric can be used for data cleaning purposes.

...read moreread less

97 citations

Proceedings Article•DOI•

Position transformation: a location privacy protection method for moving objects

[...]

Dan Lin¹, Elisa Bertino², Reynold Cheng³, Sunil Prabhakar²•Institutions (3)

Missouri University of Science and Technology¹, Purdue University², University of Hong Kong³

04 Nov 2008

TL;DR: Wang et al. as mentioned in this paper proposed a framework for preserving location privacy based on the idea of sending to the service provider suitably modified location information, which not only prevents the service providers from knowing the exact locations of users, but also protects information about user movements and locations from being disclosed to other users who are not authorized to access this information.

...read moreread less

Abstract: The expanding use of location-based services has profound implications on the privacy of personal information. In this paper, we propose a framework for preserving location privacy based on the idea of sending to the service provider suitably modified location information. Agents execute data transformation and the service provider directly processes the transformed dataset. Our technique not only prevents the service provider from knowing the exact locations of users, but also protects information about user movements and locations from being disclosed to other users who are not authorized to access this information. We also define a privacy model to analyze our framework, and examine our approach experimentally.

...read moreread less

20 citations

Book Chapter•DOI•

Quality-Aware Probing of Uncertain Data with Resource Constraints

[...]

Jinchuan Chen¹, Reynold Cheng¹•Institutions (1)

Hong Kong Polytechnic University¹

09 Jul 2008

TL;DR: An entropy-based metric is presented to quantify the degree of ambiguity of probabilistic query answers due to data uncertainty and a new method to improve the query answer quality is developed.

...read moreread less

Abstract: In applications like sensor network monitoring and location-based services, due to limited network bandwidth and battery power, a system cannot always acquire accurate and fresh data from the external environment. To capture data errors in these environments, recent researches have proposed to model uncertainty as a probability distribution function (pdf), as well as the notion of probabilistic queries, which provide statistical guarantees on answer correctness. In this paper, we present an entropy-based metric to quantify the degree of ambiguity of probabilistic query answers due to data uncertainty. Based on this metric, we develop a new method to improve the query answer quality. The main idea of this method is to acquire (or probe) data from a selected set of sensing devices, in order to reduce data uncertainty and improve the quality of a query answer. Given that a query is assigned a limited number of probing resources, we investigate how the quality of a query answer can attain an optimal improvement. To improve the efficiency of our solution, we further present heuristics which achieve near-to-optimal quality improvement. We generalize our solution to handle multiple queries. An experimental simulation over a realistic dataset is performed to validate our approaches.

...read moreread less

11 citations

Journal Article•DOI•

Indexing continuously changing data with mean-variance tree

[...]

Yuni Xia¹, Reynold Cheng², Sunil Prabhakar³, Shan Lei³, Rahul Shah⁴ - Show less +1 more•Institutions (4)

Indiana University – Purdue University Indianapolis¹, University of Hong Kong², Purdue University³, Louisiana State University⁴

01 Dec 2008

TL;DR: This paper proposes a novel index structure, the Mean Variance Tree (MVTree), which is built based on the mean and variance of the data instead of the actual data values that can change continuously, which significantly reduces the index update cost.

...read moreread less

Abstract: Traditional spatial indexes like R-tree usually assume the database is not updated frequently. In applications like location-based services and sensor networks, this assumption is no longer true since data updates can be numerous and frequent. As a result these indexes can suffer from a high update overhead, leading to poor performance. In this paper we propose a novel index structure, the Mean Variance Tree (MVTree), which is built based on the mean and variance of the data instead of the actual data values that can change continuously. Since the mean and variance are relatively stable features compared to the actual values, the MVTree significantly reduces the index update cost. The mean and the variance of the data item can be dynamically adjusted to match the observed fluctuation of the data. Our experiments show that the MVTree substantially improves index update performance while maintaining satisfactory query performance.

...read moreread less

6 citations

Proceedings Article•

Spatial Data, Indexing Techniques.

[...]

Reynold Cheng

01 Jan 2008

1 citations