scispace - formally typeset
Search or ask a question

Showing papers by "Yufei Tao published in 2003"


Proceedings ArticleDOI
09 Jun 2003
TL;DR: BBS is a progressive algorithm also based on nearest neighbor search, which is IO optimal, i.e., it performs a single access only to those R-tree nodes that may contain skyline points and its space overhead is significantly smaller than that of NN.
Abstract: The skyline of a set of d-dimensional points contains the points that are not dominated by any other point on all dimensions. Skyline computation has recently received considerable attention in the database community, especially for progressive (or online) algorithms that can quickly return the first skyline points without having to read the entire data file. Currently, the most efficient algorithm is NN (nearest neighbors), which applies the divide -and-conquer framework on datasets indexed by R-trees. Although NN has some desirable features (such as high speed for returning the initial skyline points, applicability to arbitrary data distributions and dimensions), it also presents several inherent disadvantages (need for duplicate elimination if d>2, multiple accesses of the same node, large space overhead). In this paper we develop BBS (branch-and-bound skyline), a progressive algorithm also based on nearest neighbor search, which is IO optimal, i.e., it performs a single access only to those R-tree nodes that may contain skyline points. Furthermore, it does not retrieve duplicates and its space overhead is significantly smaller than that of NN. Finally, BBS is simple to implement and can be efficiently applied to a variety of alternative skyline queries. An analytical and experimental comparison shows that BBS outperforms NN (usually by orders of magnitude) under all problem instances.

853 citations


Book ChapterDOI
09 Sep 2003
TL;DR: A Euclidean restriction and a network expansion framework that take advantage of location and connectivity to efficiently prune the search space are developed and applied to the most popular spatial queries.
Abstract: Despite the importance of spatial networks in real-life applications, most of the spatial database literature focuses on Euclidean spaces. In this paper we propose an architecture that integrates network and Euclidean information, capturing pragmatic constraints. Based on this architecture, we develop a Euclidean restriction and a network expansion framework that take advantage of location and connectivity to efficiently prune the search space. These frameworks are successfully applied to the most popular spatial queries, namely nearest neighbors, range search, closest pairs and e-distance joins, in the context of spatial network databases.

675 citations


Book ChapterDOI
09 Sep 2003
TL;DR: This paper proposes a new index structure called the TPR*- tree, which takes into account the unique features of dynamic objects through a set of improved construction algorithms and provides cost models that determine the optimal performance achievable by any data-partition spatio-temporal access method.
Abstract: A predictive spatio-temporal query retrieves the set of moving objects that will intersect a query window during a future time interval. Currently, the only access method for processing such queries in practice is the TPR-tree. In this paper we first perform an analysis to determine the factors that affect the performance of predictive queries and show that several of these factors are not considered by the TPR-tree, which uses the insertion/deletion algorithms of the R*-tree designed for static data. Motivated by this, we propose a new index structure called the TPR*- tree, which takes into account the unique features of dynamic objects through a set of improved construction algorithms. In addition, we provide cost models that determine the optimal performance achievable by any data-partition spatio-temporal access method. Using experimental comparison, we illustrate that the TPR*-tree is nearly-optimal and significantly outperforms the TPR-tree under all conditions.

488 citations


Proceedings ArticleDOI
09 Jun 2003
TL;DR: This paper proposes an approach that enables mobile clients to determine the validity of previous queries based on their current locations, and focuses on two of the most common spatial query types, namely nearest neighbor and window queries, define the validity region in each case and propose the corresponding query processing algorithms.
Abstract: In this paper we propose an approach that enables mobile clients to determine the validity of previous queries based on their current locations. In order to make this possible, the server returns in addition to the query result, a validity region around the client's location within which the result remains the same. We focus on two of the most common spatial query types, namely nearest neighbor and window queries, define the validity region in each case and propose the corresponding query processing algorithms. In addition, we provide analytical models for estimating the expected size of the validity region. Our techniques can significantly reduce the number of queries issued to the server, while introducing minimal computational and network overhead compared to traditional spatial queries.

310 citations


Proceedings ArticleDOI
05 Mar 2003
TL;DR: A cost model for selectivity estimation of predictive spatio-temporal window queries with high accuracy, ability to handle all query types, and efficient handling of updates is proposed.
Abstract: We propose a cost model for selectivity estimation of predictive spatio-temporal window queries. Initially, we focus on uniform data proposing formulae that capture both points and rectangles, and any type of object/query mobility combination (i.e., dynamic objects, dynamic queries or both). Then, we apply the model to nonuniform datasets by introducing spatio-temporal histograms, which in addition to the spatial, also consider the velocity distributions during partitioning. The advantages of our techniques are (i) high accuracy (1-2 orders of magnitude lower error than previous techniques), (ii) ability to handle all query types, and (iii) efficient handling of updates.

85 citations


Journal ArticleDOI
TL;DR: Time-parameterized and continuous versions of the most common spatial queries, i.e., window queries, nearest neighbors, spatial joins, are studied, proposing efficient processing algorithms and accurate cost models.
Abstract: Conventional spatial queries are usually meaningless in dynamic environments since their results may be invalidated as soon as the query or data objects move. In this paper we formulate two novel query types, time parameterized and continuous queries, applicable in such environments. A time-parameterized query retrieves the actual result at the time when the query is issued, the expiry time of the result given the current motion of the query and database objects, and the change that causes the expiration. A continuous query retrieves tuples of the form , where each result is accompanied by a future interval, during which it is valid. We study time-parameterized and continuous versions of the most common spatial queries (i.e., window queries, nearest neighbors, spatial joins), proposing efficient processing algorithms and accurate cost models.

81 citations


Journal ArticleDOI
TL;DR: Probabilistic cost models that estimate the selectivity of spatio-temporal window queries and joins, and the expected distance between a query and its nearest neighbor(s) are presented.
Abstract: Given a set of objects S, a spatio-temporal window query q retrieves the objects of S that will intersect the window during the (future) interval qT. A nearest neighbor query q retrieves the objects of S closest to q during qT. Given a threshold d, a spatio-temporal join retrieves the pairs of objects from two datasets that will come within distance d from each other during qT. In this article, we present probabilistic cost models that estimate the selectivity of spatio-temporal window queries and joins, and the expected distance between a query and its nearest neighbor(s). Our models capture any query/object mobility combination (moving queries, moving objects or both) and any data type (points and rectangles) in arbitrary dimensionality. In addition, we develop specialized spatio-temporal histograms, which take into account both location and velocity information, and can be incrementally maintained. Extensive performance evaluation verifies that the proposed techniques produce highly accurate estimation on both uniform and non-uniform data.

73 citations


Book ChapterDOI
24 Jul 2003
TL;DR: This paper presents the first theoretical study on validity queries, and develops indexes and algorithms with attractive I/O complexities that reveal the problem characteristics and permit the deployment of existing structures.
Abstract: The results of traditional spatial queries (ie, range search, nearest neighbor, etc) are usually meaningless in spatio-temporal applications, because they will be invalidated by the movements of query and/or data objects In practice, a query result R should be accompanied with validity information specifying (i) the (future) time T that R will expire, and (ii) the change C of R at time T (so that R can be updated incrementally) Although several algorithms have been proposed for this problem, their worst-case performance is the same as that of sequential scan This paper presents the first theoretical study on validity queries, and develops indexes and algorithms with attractive I/O complexities Our discussion covers numerous important variations of the problem and different query/object mobility combinations The solutions involve a set of non-trivial reductions that reveal the problem characteristics and permit the deployment of existing structures

19 citations


Proceedings ArticleDOI
03 Nov 2003
TL;DR: The Power-method is developed, a comprehensive technique applicable to a wide range of query optimization problems under various metrics that eliminates the local uniformity assumption and is accurate even in scenarios where existing approaches completely fail.
Abstract: Existing estimation approaches for multi-dimensional databases often rely on the assumption that data distribution in a small region is uniform, which seldom holds in practice. Moreover, their applicability is limited to specific estimation tasks under certain distance metric. This paper develops the Power-method, a comprehensive technique applicable to a wide range of query optimization problems under various metrics. The Power-method eliminates the local uniformity assumption and is accurate even in scenarios where existing approaches completely fail. Furthermore, it performs estimation by evaluating only one simple formula with minimal computational overhead. Extensive experiments confirm that the Power-method outperforms previous techniques in terms of accuracy and applicability to various optimization scenarios.

15 citations


Journal ArticleDOI
TL;DR: Nine young Chinese researchers working in the United States, present concise surveys and report their recent progress on the selected fields that they are working on, and hope that such an effort would attract more and more researchers, especially those in China, to enter the frontiers of database research and promote collaborations.
Abstract: The study on database technologies, or more generally, the technologies of data and information management, is an important and active research field. Recently, many exciting results have been reported. In this fast growing field, Chinese researchers play more and more active roles. Research papers from Chinese scholars, both in China and abroad, appear in prestigious academic forums.In this paper, we, nine young Chinese researchers working in the United States, present concise surveys and report our recent progress on the selected fields that we are working on. Although the paper covers only a small number of topics and the selection of the topics is far from balanced, we hope that such an effort would attract more and more researchers, especially those in China, to enter the frontiers of database research and promote collaborations. For the obvious reason, the authors are listed alphabetically, while the sections are arranged in the order of the author list.

7 citations