Showing papers by "Nikos Mamoulis published in 2003"

PDF

Open Access

Book Chapter•DOI•

Query processing in spatial network databases

[...]

Dimitris Papadias¹, Jun Zhang¹, Nikos Mamoulis², Yufei Tao³•Institutions (3)

Hong Kong University of Science and Technology¹, University of Hong Kong², City University of Hong Kong³

09 Sep 2003

TL;DR: A Euclidean restriction and a network expansion framework that take advantage of location and connectivity to efficiently prune the search space are developed and applied to the most popular spatial queries.

...read moreread less

Abstract: Despite the importance of spatial networks in real-life applications, most of the spatial database literature focuses on Euclidean spaces. In this paper we propose an architecture that integrates network and Euclidean information, capturing pragmatic constraints. Based on this architecture, we develop a Euclidean restriction and a network expansion framework that take advantage of location and connectivity to efficiently prune the search space. These frameworks are successfully applied to the most popular spatial queries, namely nearest neighbors, range search, closest pairs and e-distance joins, in the context of spatial network databases.

...read moreread less

675 citations

Proceedings Article•DOI•

Efficient processing of joins on set-valued attributes

[...]

Nikos Mamoulis¹•Institutions (1)

University of Hong Kong¹

09 Jun 2003

TL;DR: It is shown that the inverted file, a powerful index for selection queries, can also facilitate the efficient evaluation of most join predicates, and proposes join algorithms that utilize inverted files and compare them with signature-based methods for several set-comparison predicates.

...read moreread less

Abstract: Object-oriented and object-relational DBMS support set valued attributes, which are a natural and concise way to model complex information. However, there has been limited research to-date on the evaluation of query operators that apply on sets. In this paper we study the join of two relations on their set-valued attributes. Various join types are considered, namely the set containment, set equality, and set overlap joins. We show that the inverted file, a powerful index for selection queries, can also facilitate the efficient evaluation of most join predicates. We propose join algorithms that utilize inverted files and compare them with signature-based methods for several set-comparison predicates.

...read moreread less

93 citations

Proceedings Article•DOI•

Frequent-pattern based iterative projected clustering

[...]

Man Lung Yiu¹, Nikos Mamoulis¹•Institutions (1)

University of Hong Kong¹

19 Nov 2003

TL;DR: This work proposes a methodology for finding projected clusters by mining frequent itemsets and presents heuristics that improve its quality and evaluates the techniques with synthetic and real data.

...read moreread less

Abstract: Irrelevant attributes add noise to high dimensional clusters and make traditional clustering techniques inappropriate. Projected clustering algorithms have been proposed to find the clusters in hidden subspaces. We realize the analogy between mining frequent itemsets and discovering the relevant subspace for a given cluster. We propose a methodology for finding projected clusters by mining frequent itemsets and present heuristics that improve its quality. Our techniques are evaluated with synthetic and real data; they are scalable and discover projected clusters accurately.

...read moreread less

88 citations

Journal Article•DOI•

Slot index spatial join

[...]

Nikos Mamoulis¹, Dimitris Papadias•Institutions (1)

University of Hong Kong¹

01 Jan 2003-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes slot index spatial join (SISJ), an algorithm that joins a nonindexed data set with one indexed by an R-tree and compares it, analytically and experimentally, with other spatial join methods for two cases.

...read moreread less

Abstract: Efficient processing of spatial joins is very important due to their high cost and frequent application in spatial databases and other areas involving multidimensional data. This paper proposes slot index spatial join (SISJ), an algorithm that joins a nonindexed data set with one indexed by an R-tree. We explore two optimization techniques that reduce the space requirements and the computational cost of SISJ and we compare it, analytically and experimentally, with other spatial join methods for two cases: 1) when the nonindexed input is read from disk and 2) when it is an intermediate result of a preceding database operator in a complex query plan. The importance of buffer splitting between consecutive join operators is also demonstrated through a two-join case study and a method that estimates the optimal splitting is proposed. Our evaluation shows that SISJ outperforms alternative methods in most cases and is suitable for limited memory conditions.

...read moreread less

43 citations

Proceedings Article•DOI•

[...]

Nikos Mamoulis¹, David W. Cheung¹, Wang Lian¹•Institutions (1)

University of Hong Kong¹

05 Mar 2003

TL;DR: A method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types is proposed, which is robust to different data characteristics, scalable to the database size and efficient for various queries.

...read moreread less

Abstract: Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods/or multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries.

...read moreread less

33 citations

Book Chapter•DOI•

Validity information retrieval for spatio-temporal queries: Theoretical performance bounds

[...]

Yufei Tao¹, Nikos Mamoulis², Dimitris Papadias³•Institutions (3)

Carnegie Mellon University¹, University of Hong Kong², Hong Kong University of Science and Technology³

24 Jul 2003

TL;DR: This paper presents the first theoretical study on validity queries, and develops indexes and algorithms with attractive I/O complexities that reveal the problem characteristics and permit the deployment of existing structures.

...read moreread less

Abstract: The results of traditional spatial queries (ie, range search, nearest neighbor, etc) are usually meaningless in spatio-temporal applications, because they will be invalidated by the movements of query and/or data objects In practice, a query result R should be accompanied with validity information specifying (i) the (future) time T that R will expire, and (ii) the change C of R at time T (so that R can be updated incrementally) Although several algorithms have been proposed for this problem, their worst-case performance is the same as that of sequential scan This paper presents the first theoretical study on validity queries, and develops indexes and algorithms with attractive I/O complexities Our discussion covers numerous important variations of the problem and different query/object mobility combinations The solutions involve a set of non-trivial reductions that reveal the problem characteristics and permit the deployment of existing structures

...read moreread less

19 citations

Book Chapter•DOI•

Evaluation of Iceberg Distance Joins

[...]

Yutao Shou¹, Nikos Mamoulis¹, Huiping Cao¹, Dimitris Papadias², David W. Cheung¹ - Show less +1 more•Institutions (2)

University of Hong Kong¹, Hong Kong University of Science and Technology²

24 Jul 2003

TL;DR: Output-sensitive algorithms that prune the search space by integrating the cardinality with the distance constraint are proposed and evaluated with extensive experimental evaluation covering a wide range of problem parameters.

...read moreread less

Abstract: The iceberg distance join returns object pairs within some distance from each other, provided that the first object appears at least a number of times in the result, e.g., “find hotels which are within 1km to at least 10 restaurants”. The output of this query is the subset of the corresponding distance join (e.g., “find hotels which are within 1km to some restaurant”) that satisfies the additional cardinality constraint. Therefore, it could be processed by using a conventional spatial join algorithm and then filtering-out the non-qualifying pairs. This approach, however, is expensive, especially when the cardinality constraint is highly selective. In this paper, we propose output-sensitive algorithms that prune the search space by integrating the cardinality with the distance constraint. We deal with cases of indexed/non-indexed datasets and evaluate the performance of the proposed techniques with extensive experimental evaluation covering a wide range of problem parameters.

...read moreread less

18 citations

Book Chapter•DOI•

Optimization of Spatial Joins on Mobile Devices

[...]

Nikos Mamoulis¹, Panos Kalnis², Spiridon Bakiras¹, Xiaochen Li²•Institutions (2)

University of Hong Kong¹, National University of Singapore²

24 Jul 2003

TL;DR: An adaptive algorithm is described that optimizes the overall process of statistics retrieval and query execution and retrieves statistics dynamically in order to generate a low-cost execution plan, while considering the storage and computational power limitations of the PDA.

...read moreread less

Abstract: Mobile devices like PDAs are capable of retrieving information from various types of services. In many cases, the user requests cannot directly be processed by the service providers, if their hosts have limited query capabilities or the query combines data from various sources, which do not collaborate with each other. In this paper, we present a framework for optimizing spatial join queries that belong to this class. We presume that the connection and queries are ad-hoc, there is no mediator available and the services are non-collaborative. We also assume that the services are not willing to share their statistics or indexes with the client. We retrieve statistics dynamically in order to generate a low-cost execution plan, while considering the storage and computational power limitations of the PDA. Since acquiring the statistics causes overhead, we describe an adaptive algorithm that optimizes the overall process of statistics retrieval and query execution. We demonstrate the applicability of our methods with a prototype implementation on a PDA with wireless network access.

...read moreread less

13 citations

Book Chapter•DOI•

A Filter Index for Complex Queries on Semi-structured Data

[...]

Wang Lian¹, Nikos Mamoulis¹, David W. Cheung¹•Institutions (1)

University of Hong Kong¹

17 Aug 2003

TL;DR: This paper proposes an alternative technique that uses a signature index to search fast and prune effectively the search space and uses these components to filter a large part of the database that does not qualify them, before validating the query on the actual data.

...read moreread less

Abstract: Answering a query on XML data usually involves breaking it into a number of small components (e.g., edges, paths, twigs, etc.), evaluating them and joining the results. In this paper we propose an alternative technique that uses these components to filter a large part of the database that does not qualify them, before validating the query on the actual data. Our methodology uses a signature index to search fast and prune effectively the search space. The efficiency of the proposed technique is demonstrated by comparison with an existing index, on real data.

...read moreread less