Showing papers by "Reynold Cheng published in 2005"

PDF

Open Access

Proceedings Article•

Indexing multi-dimensional uncertain data with arbitrary probability density functions

[...]

Yufei Tao¹, Reynold Cheng², Xiaokui Xiao¹, Wang Kay Ngai³, Ben Kao³, Sunil Prabhakar⁴ - Show less +2 more•Institutions (4)

City University of Hong Kong¹, Hong Kong Polytechnic University², University of Hong Kong³, Purdue University⁴

30 Aug 2005

TL;DR: The U-tree is proposed, an access method designed to optimize both the I/O and CPU time of range retrieval on multi-dimensional imprecise data and is fully dynamic, and does not place any constraints on the data pdfs.

...read moreread less

Abstract: In an "uncertain database", an object o is associated with a multi-dimensional probability density function(pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the "probabilistic range search" which, given a value pq and a rectangular area rq, retrieves the objects that appear in rq with probabilities at least pq. In this paper, we propose the U-tree, an access method designed to optimize both the I/O and CPU time of range retrieval on multi-dimensional imprecise data. The new structure is fully dynamic (i.e., objects can be incrementally inserted/deleted in any order), and does not place any constraints on the data pdfs. We verify the query and update efficiency of U-trees with extensive experiments.

...read moreread less

310 citations

Proceedings Article•

U-DBMS: a database system for managing constantly-evolving data

[...]

Reynold Cheng¹, Sarvjeet Singh², Sunil Prabhakar²•Institutions (2)

Hong Kong Polytechnic University¹, Purdue University²

30 Aug 2005

TL;DR: U-DBMS extends the database system with uncertainty management functionalities, and each data value is represented as an interval and a probability distribution function, and it can be processed with probabilistic query operators to produce imprecise answers.

...read moreread less

Abstract: In many systems, sensors are used to acquire information from external environments such as temperature, pressure and locations Due to continuous changes in these values, and limited resources (eg, network bandwidth and battery power), it is often infeasible for the database to store the exact values at all times Queries that uses these old values can produce invalid results In order to manage the uncertainty between the actual sensor value and the database value, we propose a system called U-DBMS U-DBMS extends the database system with uncertainty management functionalities In particular, each data value is represented as an interval and a probability distribution function, and it can be processed with probabilistic query operators to produce imprecise (but correct) answers This demonstration presents a PostgreSQL-based system that handles uncertainty and probabilistic queries for constantly-evolving data

...read moreread less

116 citations

Uncertain Data Mining: A New Research Direction

[...]

Michael Chau¹, Reynold Cheng², Ben Kao¹•Institutions (2)

University of Hong Kong¹, Hong Kong Polytechnic University²

01 Jan 2005

TL;DR: This paper proposes that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results, and presents the UK-means clustering algorithm as an example to illustrate how the traditional K-mean algorithm can be modified to handle data uncertainty in data mining.

...read moreread less

Abstract: Data uncertainty is often found in real-world applications due to reasons such as imprecise measurement, outdated sources, or sampling errors. Recently, much research has been published in the area of managing data uncertainty in databases. We propose that when data mining is performed on uncertain data, data uncertainty has to be considered in order to obtain high quality data mining results. We call this the "Uncertain Data Mining" problem. In this paper, we present a framework for possible research directions in this area. We also present the UK-means clustering algorithm as an example to illustrate how the traditional K-means algorithm can be modified to handle data uncertainty in data mining.

...read moreread less

52 citations

Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance Technical Report

[...]

Reynold Cheng, Ben Kao, Sunil Prabhakar, Alan Kwan, Yi-Cheng Tu - Show less +1 more

01 Jan 2005

TL;DR: This paper investigates different non-value-based error tolerance definitions and discusses how they are applied to two classes of entity- based queries: non-rankbased and rank-based queries.

...read moreread less

Abstract: We study the problem of applying adaptive filters for approximate query processing in a distributed stream environment. We propose filter bound assignment protocols with the objective of reducing communication cost. Most previous works focus on value-based queries (e.g., average) with numerical error tolerance. In this paper, we cover entity-based queries (e.g., a nearest neighbor query returns object names rather than a single value). In particular, we study non-value-based tolerance (e.g., the answer to the nearest-neighbor query should rank third or above). We investigate different non-value-based error tolerance definitions and discuss how they are applied to two classes of entity-based queries: non-rankbased and rank-based queries. Extensive experiments show that our protocols achieve significant savings in both communication overhead and server computation.

...read moreread less

45 citations

Proceedings Article•

Adaptive stream filters for entity-based queries with non-value tolerance

[...]

Reynold Cheng¹, Ben Kao², Sunil Prabhakar³, Alan Kwan², Yi-Cheng Tu³ - Show less +1 more•Institutions (3)

Hong Kong Polytechnic University¹, University of Hong Kong², Purdue University³

30 Aug 2005

TL;DR: In this article, the problem of applying adaptive filters for approximate query processing in a distributed stream environment is studied and filter bound assignment protocols with the objective of reducing communication cost are proposed.

...read moreread less

Abstract: We study the problem of applying adaptive filters for approximate query processing in a distributed stream environment. We propose filter bound assignment protocols with the objective of reducing communication cost. Most previous works focus on value-based queries (e.g., average) with numerical error tolerance. In this paper, we cover entity-based queries (e.g., nearest neighbor) with non-value-based error tolerance. We investigate different non-value-based error tolerance definitions and discuss how they are applied to two classes of entity-based queries: non-rank-based and rank-based queries. Extensive experiments show that our protocols achieve significant savings in both communication overhead and server computation.

...read moreread less

44 citations

Proceedings Article•DOI•

Change tolerant indexing for constantly evolving data

[...]

Reynold Cheng¹, Yuni Xia¹, Sunil Prabhakar¹, Rahul Shah²•Institutions (2)

Purdue University¹, IBM²

05 Apr 2005

TL;DR: This paper proposes an index structure explicitly designed to perform well for both querying and updating, and observes that objects often stay in a region for an extended amount of time, and exploits this phenomenon to optimize an index for both updates and queries.

...read moreread less

Abstract: Index structures are designed to optimize search performance, while at the same time supporting efficient data updates. Although not explicit, existing index structures are typically based upon the assumption that the rate of updates will be small compared to the rate of querying. This assumption is not valid in streaming data environments such as sensor and moving object databases, where updates are received incessantly. In fact, for many applications, the rate of updates may well exceed the rate of querying. In such environments, index structures suffer from poor performance due to the large overhead of keeping the index updated with the latest data. Recent efforts at indexing moving object data assume objects move in a restrictive manner (e.g. in straight lines with constant velocity). In this paper, we propose an index structure explicitly designed to perform well for both querying and updating. We assume a more relaxed model of object movement. In particular, we observe that objects often stay in a region (e.g., building) for an extended amount of time, and exploit this phenomenon to optimize an index for both updates and queries. The paper is developed with the example of R-trees, but the ideas can be extended to other index structures as well. We present the design of the change tolerant R-tree, and an experimental evaluation.

...read moreread less

32 citations

Proceedings Article•DOI•

A statistics-based sensor selection scheme for continuous probabilistic queries in sensor networks

[...]

Song Han¹, Edward Chan¹, Reynold Cheng², Kam-Yiu Lam¹•Institutions (2)

City University of Hong Kong¹, Purdue University²

17 Aug 2005

TL;DR: This paper proposes a statistical approach to decide which sensors to be used to answer a query with the aid of continuous probabilistic query (CPQ), which is originally used to manage uncertain data and is associated with a Probabilistic guarantee on the query result.

...read moreread less

Abstract: An approach to improve the reliability of query results based on error-prone sensors is to use redundant sensors. However, this approach is expensive; moreover, some sensors may malfunction and their readings need to be discarded. In this paper, we propose a statistical approach to decide which sensors to be used to answer a query. In particular, we propose to solve the problem with the aid of continuous probabilistic query (CPQ), which is originally used to manage uncertain data and is associated with a probabilistic guarantee on the query result. Based on the historical data values from the sensors, the query type, and the requirement on the query, we present methods to select an appropriate set of sensors and provide reliable answers for aggregate queries. Our algorithm is demonstrated in simulation experiments to provide accurate and robust query results.

...read moreread less

13 citations

Proceedings Article•DOI•

Indexing continuously changing data with mean-variance tree

[...]

Yuni Xia¹, Sunil Prabhakar¹, Shan Lei¹, Reynold Cheng¹, Rahul Shah¹ - Show less +1 more•Institutions (1)

Purdue University¹

13 Mar 2005

TL;DR: A novel index structure, the MVTree, which is built based on the mean and variance of the data instead of the actual data values that are in constant flux is proposed, which significantly reduces the index update cost.

...read moreread less

Abstract: Constantly evolving data arise in various mobile applications such as location-based services and sensor networks. The problem of indexing the data for efficient query processing is of increasing importance. Due to the constant changing nature of the data, traditional indexes suffer from a high update overhead which leads to poor performance. In this paper, we propose a novel index structure, the MVTree, which is built based on the mean and variance of the data instead of the actual data values that are in constant flux. Since the mean and variance are relatively stable features compared to the actual values, the MVTree significantly reduces the index update cost. The distribution interval and probability distribution function of the data are not required to be known a priori. The mean and variance for each data item can be dynamically adjusted to match the observed fluctuation of the data. Experiments show that compared to traditional index schemes, the MVTree substantially improves index update performance while maintaining satisfactory query performance.

...read moreread less

13 citations

Book Chapter•DOI•

Sensors, Uncertainty Models, and Probabilistic Queries.

[...]

Reynold Cheng¹, Sunil Prabhakar¹•Institutions (1)

Purdue University¹

01 Jan 2005

TL;DR: Sensors are often used to monitor the status of an environment continuously, and if the value of an entity being monitored is constantly evolving, the recorded data value may differ from the actual value.

...read moreread less

Abstract: Sensors are often used to monitor the status of an environment continuously. The sensor readings are reported to the application for making decisions and answering user queries. For example, a fire-alarm system in a building employs temperature sensors to detect any abrupt change in temperature. An aircraft is equipped with sensors to track the wind speed, and radars are used to report the aircraft’s location to a military application. These applications usually include a database or server to which the sensor readings are sent. Limited network bandwidth and battery power imply that it is often not practical for the server to record the exact status of an entity it monitors at every time instant. In particular, if the value of an entity (e.g., temperature, location) being monitored is constantly evolving, the recorded data value may differ from the actual value. Querying the database can then produce incorrect results. Consider a simple example where a user asks the database: “which room has a temperature between 10

...read moreread less

7 citations

Querying Private Data in Moving-Object Environments

[...]

Reynold Cheng¹, Yu Zhang², Elisa Bertino², Sunil Prabhakar²•Institutions (2)

Hong Kong Polytechnic University¹, Purdue University²

01 Jan 2005

TL;DR: This paper proposes using imprecise queries to hide the location of the query issuer and evaluate uncertain information, and suggests a framework where uncertainty can be controlled to provide high quality and privacy-preserving services.

...read moreread less

Abstract: Location-based services, such as finding the nearest gas station, require users to supply their location information. However, a user’s location can be tracked without her consent or knowledge. Lowering the spatial and temporal resolution of location data sent to the server has been proposed as a solution. Although this technique is effective in protecting privacy, it may be overkill and the quality of desired services can be severely affected. In this paper, we investigate the relationship between uncertainty, privacy, and quality of services. We propose using imprecise queries to hide the location of the query issuer and evaluate uncertain information. We also suggest a framework where uncertainty can be controlled to provide high quality and privacy-preserving services. We study how the idea can be applied to a moving range query over moving objects. We further investigate how the linkability of the proposed solution can be protected against trajectory-tracing.

...read moreread less

4 citations

Efficient Join Processing over Uncertain Data Technical Report

[...]

Reynold Cheng¹, Yuni Xia², Sunil Prabhakar¹, Rahul Shah¹, Jeffrey Scott Vitter¹ - Show less +1 more•Institutions (2)

Purdue University¹, University of Kansas²

01 Jan 2005

TL;DR: The notion of equality and inequality operators for uncertainty is presented, namely item-leveL page-level and index-level pruning and vVe also introduces the concept of "approximation" in these comparison operators.

...read moreread less

Abstract: In database systems that collect information about the external environment, such as temperature and location values, it is often infeasible to obtain accurate information due to measurement and sampling errors, and resource limitations. Queries evaluated over these inaccurate data can potentially yield incorrect results. To avoid these problems. the idea of using uncertainty models (such as an interval associated with a probability density function) instead of a single value for modeling a data item has been explored in recent years. These works have focussed on simple queries such as range and nearest-neighbor queries. Queries that join multiple relations have not been addressed in earlier work despite the significance of joins in databases. In this paper we address join queries over uncertain data. As with other queries over uncertain data, these joins return probabilistic answers. A probabilistic Join Query (PJQ) augments the results with probability guarantees to indicate the likelihood of each join tuple being part of the result. Traditional join operators, such as equality and inequality, need to be extended to support uncertain data. In this paper, we present the notion of equality and inequality operators for uncertainty. vVe also introduce the concept of "approximation" in these comparison operators. Although PJQs are more informative than traditional joins. they are expensive to evaluate. To overcome this problem, we observe that often it is only necessary to know whether the probability of the results exceeds a given threshold. instead of the precise probability value. By incorporating this constraint into PJQ, it is possible to achieve much better performance. In particular, we develop three sets of optimization techniques, namely item-leveL page-level and index-level pruning. for different join operators. These techniques facilitate pruning with little space and time overhead, and are easily adapted to most join algorithms. Extensive simulation results show that these techniques improve the performance of joins significantly.

...read moreread less