Showing papers by "Yufei Tao published in 2005"

PDF

Open Access

Journal Article•DOI•

Progressive skyline computation in database systems

[...]

Dimitris Papadias¹, Yufei Tao², Greg Fu³, Bernhard Seeger⁴•Institutions (4)

Hong Kong University of Science and Technology¹, City University of Hong Kong², JPMorgan Chase³, University of Marburg⁴

01 Mar 2005

TL;DR: In this paper, a branch-and-bound skyline (BBS) algorithm based on nearest-neighbor search is proposed, which is I/O optimal and performs a single access only to those nodes that may contain skyline points.

...read moreread less

Abstract: The skyline of a d-dimensional dataset contains the points that are not dominated by any other point on all dimensions. Skyline computation has recently received considerable attention in the database community, especially for progressive methods that can quickly return the initial results without reading the entire database. All the existing algorithms, however, have some serious shortcomings which limit their applicability in practice. In this article we develop branch-and-bound skyline (BBS), an algorithm based on nearest-neighbor search, which is I/O optimal, that is, it performs a single access only to those nodes that may contain skyline points. BBS is simple to implement and supports all types of progressive processing (e.g., user preferences, arbitrary dimensionality, etc). Furthermore, we propose several interesting variations of skyline computation, and show how BBS can be applied for their efficient processing.

...read moreread less

905 citations

Proceedings Article•

Indexing multi-dimensional uncertain data with arbitrary probability density functions

[...]

Yufei Tao¹, Reynold Cheng², Xiaokui Xiao¹, Wang Kay Ngai³, Ben Kao³, Sunil Prabhakar⁴ - Show less +2 more•Institutions (4)

City University of Hong Kong¹, Hong Kong Polytechnic University², University of Hong Kong³, Purdue University⁴

30 Aug 2005

TL;DR: The U-tree is proposed, an access method designed to optimize both the I/O and CPU time of range retrieval on multi-dimensional imprecise data and is fully dynamic, and does not place any constraints on the data pdfs.

...read moreread less

Abstract: In an "uncertain database", an object o is associated with a multi-dimensional probability density function(pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the "probabilistic range search" which, given a value pq and a rectangular area rq, retrieves the objects that appear in rq with probabilities at least pq. In this paper, we propose the U-tree, an access method designed to optimize both the I/O and CPU time of range retrieval on multi-dimensional imprecise data. The new structure is fully dynamic (i.e., objects can be incrementally inserted/deleted in any order), and does not place any constraints on the data pdfs. We verify the query and update efficiency of U-trees with extensive experiments.

...read moreread less

310 citations

Journal Article•DOI•

Aggregate nearest neighbor queries in spatial databases

[...]

Dimitris Papadias¹, Yufei Tao², Kyriakos Mouratidis¹, Chun Kit Hui¹•Institutions (2)

Hong Kong University of Science and Technology¹, City University of Hong Kong²

01 Jun 2005-ACM Transactions on Database Systems

TL;DR: If Q fits in memory and

...read moreread less

Abstract: Given two spatial datasets P (eg, facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q Assuming, for example, n users at locations q1,…qn, an ANN query outputs the facility p ∈ P that minimizes the sum of distances vpqiv for 1 ≤ i ≤ n that the users have to travel in order to meet there Similarly, another ANN query may report the point p ∈ P that minimizes the maximum distance that any user has to travel, or the minimum distance from some user to his/her closest facility If Q fits in memory and P is indexed by an R-tree, we develop algorithms for aggregate nearest neighbors that capture several versions of the problem, including weighted queries and incremental reporting of results Then, we analyze their performance and propose cost models for query optimization Finally, we extend our techniques for disk-resident queries and approximate ANN retrieval The efficiency of the algorithms and the accuracy of the cost models are evaluated through extensive experiments with real and synthetic datasets

...read moreread less

283 citations

Proceedings Article•

Catching the best views of skyline: a semantic approach based on decisive subspaces

[...]

Jian Pei¹, Wen Jin¹, Martin Ester¹, Yufei Tao²•Institutions (2)

Simon Fraser University¹, City University of Hong Kong²

30 Aug 2005

TL;DR: The semantics of skylines are investigated, the subspace skyline analysis is proposed, and a novel notion of skyline group is introduced which essentially is a group of objects that are coincidentally in the skyline of some subspaces.

...read moreread less

Abstract: The skyline operator is important for multi-criteria decision making applications. Although many recent studies developed efficient methods to compute skyline objects in a specific space, the fundamental problem on the semantics of skylines remains open: Why and in which subspaces is (or is not) an object in the skyline? Practically, users may also be interested in the skylines in any subspaces. Then, what is the relationship between the skylines in the subspaces and those in the super-spaces? How can we effectively analyze the subspace skylines? Can we efficiently compute skylines in various subspaces?In this paper, we investigate the semantics of skylines, propose the subspace skyline analysis, and extend the full-space skyline computation to subspace skyline computation. We introduce a novel notion of skyline group which essentially is a group of objects that are coincidentally in the skylines of some subspaces. We identify the decisive subspaces that qualify skyline groups in the subspace skylines. The new notions concisely capture the semantics and the structures of skylines in various subspaces. Multidimensional roll-up and drilldown analysis is introduced. We also develop an efficient algorithm, Skyey, to compute the set of skyline groups and, for each subspace, the set of objects that are in the subspace skyline. A performance study is reported to evaluate our approach.

...read moreread less

271 citations

Journal Article•DOI•

A threshold-based algorithm for continuous monitoring of k nearest neighbors

[...]

Kyriakos Mouratidis¹, Dimitris Papadias¹, Spiridon Bakiras¹, Yufei Tao²•Institutions (2)

Hong Kong University of Science and Technology¹, City University of Hong Kong²

01 Nov 2005-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work presents a threshold-based algorithm for the continuous monitoring of nearest neighbors that minimizes the communication overhead between the server and the data objects and can be used with multiple, static, or moving queries, for any distance definition.

...read moreread less

Abstract: Assume a set of moving objects and a central server that monitors their positions over time, while processing continuous nearest neighbor queries from geographically distributed clients. In order to always report up-to-date results, the server could constantly obtain the most recent position of all objects. However, this naive solution requires the transmission of a large number of rapid data streams corresponding to location updates. Intuitively, current information is necessary only for objects that may influence some query result (i.e., they may be included in the nearest neighbor set of some client). Motivated by this observation, we present a threshold-based algorithm for the continuous monitoring of nearest neighbors that minimizes the communication overhead between the server and the data objects. The proposed method can be used with multiple, static, or moving queries, for any distance definition, and does not require additional knowledge (e.g., velocity vectors) besides object locations.

...read moreread less

112 citations

Book Chapter•DOI•

Probabilistic spatial queries on existentially uncertain data

[...]

Xiangyuan Dai¹, Man Lung Yiu¹, Nikos Mamoulis¹, Yufei Tao², Michail Vaitis³ - Show less +1 more•Institutions (3)

University of Hong Kong¹, City University of Hong Kong², University of the Aegean³

22 Aug 2005

TL;DR: This work proposes adaptations of spatial access methods and search algorithms for probabilistic versions of range queries and nearest neighbors and conducts an extensive experimental study, which evaluates the effectiveness of proposed solutions.

...read moreread less

Abstract: We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries and nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions.

...read moreread less

102 citations

Proceedings Article•DOI•

RPJ: producing fast join results on streams through rate-based optimization

[...]

Yufei Tao¹, Man Lung Yiu², Dimitris Papadias³, Marios Hadjieleftheriou⁴, Nikos Mamoulis² - Show less +1 more•Institutions (4)

City University of Hong Kong¹, University of Hong Kong², Hong Kong University of Science and Technology³, University of California, Riverside⁴

14 Jun 2005

TL;DR: A new algorithm RPJ, which maximizes the output rate by optimizing its execution according to the characteristics of the join relations (e.g., data distribution, tuple arrival pattern, etc.).

...read moreread less

Abstract: We consider the problem of "progressively" joining relations whose records are continuously retrieved from remote sources through an unstable network that may incur temporary failures. The objectives are to (i) start reporting the first output tuples as soon as possible (before the participating relations are completely received), and (ii) produce the remaining results at a fast rate. We develop a new algorithm RPJ (Rate-based Progressive Join) based on solid theoretical analysis. RPJ maximizes the output rate by optimizing its execution according to the characteristics of the join relations (e.g., data distribution, tuple arrival pattern, etc.). Extensive experiments prove that our technique delivers results significantly faster than the previous methods.

...read moreread less

82 citations

Journal Article•DOI•

Historical spatio-temporal aggregation

[...]

Yufei Tao¹, Dimitris Papadias²•Institutions (2)

City University of Hong Kong¹, Hong Kong University of Science and Technology²

01 Jan 2005-ACM Transactions on Information Systems

TL;DR: Specialized methods, which integrate spatio-temporal indexing with pre-aggregation for the efficient processing of historical aggregate queries without a priori knowledge of grouping hierarchies are presented.

...read moreread less

Abstract: Spatio-temporal databases store information about the positions of individual objects over time. However, in many applications such as traffic supervision or mobile communication systems, only summarized data, like the number of cars in an area for a specific period, or phone-calls serviced by a cell each day, is required. Although this information can be obtained from operational databases, its computation is expensive, rendering online processing inapplicable. In this paper, we present specialized methods, which integrate spatio-temporal indexing with pre-aggregation. The methods support dynamic spatio-temporal dimensions for the efficient processing of historical aggregate queries without a priori knowledge of grouping hierarchies. The superiority of the proposed techniques over existing methods is demonstrated through a comprehensive probabilistic analysis and an extensive experimental evaluation.

...read moreread less

71 citations

Proceedings Article•DOI•

Reverse nearest neighbors in large graphs

[...]

Man Lung Yiu¹, Dimitris Papadias², Nikos Mamoulis¹, Yufei Tao³•Institutions (3)

University of Hong Kong¹, Hong Kong University of Science and Technology², City University of Hong Kong³

05 Apr 2005

TL;DR: Algorithms and optimization techniques for RNN queries are proposed by utilizing some characteristics of networks to solve reverse nearest neighbor queries in large graphs.

...read moreread less

Abstract: A reverse nearest neighbor query returns the data objects that have a query point as their nearest neighbor. Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large graphs. In this paper, we propose algorithms and optimization techniques for RNN queries by utilizing some characteristics of networks.

...read moreread less

60 citations

Proceedings Article•DOI•

Venn sampling: a novel prediction technique for moving objects

[...]

Yufei Tao¹, Dimitris Papadias², Jian Zhai¹, Qing Li¹•Institutions (2)

City University of Hong Kong¹, Hong Kong University of Science and Technology²

05 Apr 2005

TL;DR: Venn sampling (VS), a novel estimation method optimized for a set of "pivot queries" that reflect the distribution of actual ones, is developed, which permits the development of a novel "query-driven" update policy, which reduces the update cost of conventional policies significantly.

...read moreread less

Abstract: Given a region q/sub R/ and a future timestamp q/sub T/, a "range aggregate" query estimates the number of objects expected to appear in q/sub R/ at time q/sub T/. Currently the only methods for processing such queries are based on spatio-temporal histograms, which have several serious problems. First, they consume considerable space in order to provide accurate estimation. Second, they incur high evaluation cost. Third, their efficiency continuously deteriorates with time. Fourth, their maintenance requires significant update overhead. Motivated by this, we develop Venn sampling (VS), a novel estimation method optimized for a set of "pivot queries" that reflect the distribution of actual ones. In particular, given m pivot queries, VS achieves perfect estimation with only O(m) samples, as opposed to O(2/sup m/) required by the current state of the art in workload-aware sampling. Compared with histograms, our technique is much more accurate (given the same space), produces estimates with negligible cost, and does not deteriorate with time. Furthermore, it permits the development of a novel "query-driven" update policy, which reduces the update cost of conventional policies significantly.

...read moreread less

21 citations