scispace - formally typeset
Search or ask a question

Showing papers by "Nikos Mamoulis published in 2003"


Book ChapterDOI
09 Sep 2003
TL;DR: A Euclidean restriction and a network expansion framework that take advantage of location and connectivity to efficiently prune the search space are developed and applied to the most popular spatial queries.
Abstract: Despite the importance of spatial networks in real-life applications, most of the spatial database literature focuses on Euclidean spaces. In this paper we propose an architecture that integrates network and Euclidean information, capturing pragmatic constraints. Based on this architecture, we develop a Euclidean restriction and a network expansion framework that take advantage of location and connectivity to efficiently prune the search space. These frameworks are successfully applied to the most popular spatial queries, namely nearest neighbors, range search, closest pairs and e-distance joins, in the context of spatial network databases.

675 citations


Proceedings ArticleDOI
09 Jun 2003
TL;DR: It is shown that the inverted file, a powerful index for selection queries, can also facilitate the efficient evaluation of most join predicates, and proposes join algorithms that utilize inverted files and compare them with signature-based methods for several set-comparison predicates.
Abstract: Object-oriented and object-relational DBMS support set valued attributes, which are a natural and concise way to model complex information. However, there has been limited research to-date on the evaluation of query operators that apply on sets. In this paper we study the join of two relations on their set-valued attributes. Various join types are considered, namely the set containment, set equality, and set overlap joins. We show that the inverted file, a powerful index for selection queries, can also facilitate the efficient evaluation of most join predicates. We propose join algorithms that utilize inverted files and compare them with signature-based methods for several set-comparison predicates.

93 citations


Proceedings ArticleDOI
19 Nov 2003
TL;DR: This work proposes a methodology for finding projected clusters by mining frequent itemsets and presents heuristics that improve its quality and evaluates the techniques with synthetic and real data.
Abstract: Irrelevant attributes add noise to high dimensional clusters and make traditional clustering techniques inappropriate. Projected clustering algorithms have been proposed to find the clusters in hidden subspaces. We realize the analogy between mining frequent itemsets and discovering the relevant subspace for a given cluster. We propose a methodology for finding projected clusters by mining frequent itemsets and present heuristics that improve its quality. Our techniques are evaluated with synthetic and real data; they are scalable and discover projected clusters accurately.

88 citations


Journal ArticleDOI
TL;DR: This paper proposes slot index spatial join (SISJ), an algorithm that joins a nonindexed data set with one indexed by an R-tree and compares it, analytically and experimentally, with other spatial join methods for two cases.
Abstract: Efficient processing of spatial joins is very important due to their high cost and frequent application in spatial databases and other areas involving multidimensional data. This paper proposes slot index spatial join (SISJ), an algorithm that joins a nonindexed data set with one indexed by an R-tree. We explore two optimization techniques that reduce the space requirements and the computational cost of SISJ and we compare it, analytically and experimentally, with other spatial join methods for two cases: 1) when the nonindexed input is read from disk and 2) when it is an intermediate result of a preceding database operator in a complex query plan. The importance of buffer splitting between consecutive join operators is also demonstrated through a two-join case study and a method that estimates the optimal splitting is proposed. Our evaluation shows that SISJ outperforms alternative methods in most cases and is suitable for limited memory conditions.

43 citations


Proceedings ArticleDOI
05 Mar 2003
TL;DR: A method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types is proposed, which is robust to different data characteristics, scalable to the database size and efficient for various queries.
Abstract: Data mining applications analyze large collections of set data and high dimensional categorical data. Search on these data types is not restricted to the classic problems of mining association rules and classification, but similarity search is also a frequently applied operation. Access methods/or multidimensional numerical data are inappropriate for this problem and specialized indexes are needed. We propose a method that represents set data as bitmaps (signatures) and organizes them into a hierarchical index, suitable for similarity search and other related query types. In contrast to a previous technique, the signature tree is dynamic and does not rely on hardwired constants. Experiments with synthetic and real datasets show that it is robust to different data characteristics, scalable to the database size and efficient for various queries.

33 citations


Book ChapterDOI
24 Jul 2003
TL;DR: This paper presents the first theoretical study on validity queries, and develops indexes and algorithms with attractive I/O complexities that reveal the problem characteristics and permit the deployment of existing structures.
Abstract: The results of traditional spatial queries (ie, range search, nearest neighbor, etc) are usually meaningless in spatio-temporal applications, because they will be invalidated by the movements of query and/or data objects In practice, a query result R should be accompanied with validity information specifying (i) the (future) time T that R will expire, and (ii) the change C of R at time T (so that R can be updated incrementally) Although several algorithms have been proposed for this problem, their worst-case performance is the same as that of sequential scan This paper presents the first theoretical study on validity queries, and develops indexes and algorithms with attractive I/O complexities Our discussion covers numerous important variations of the problem and different query/object mobility combinations The solutions involve a set of non-trivial reductions that reveal the problem characteristics and permit the deployment of existing structures

19 citations


Book ChapterDOI
24 Jul 2003
TL;DR: Output-sensitive algorithms that prune the search space by integrating the cardinality with the distance constraint are proposed and evaluated with extensive experimental evaluation covering a wide range of problem parameters.
Abstract: The iceberg distance join returns object pairs within some distance from each other, provided that the first object appears at least a number of times in the result, e.g., “find hotels which are within 1km to at least 10 restaurants”. The output of this query is the subset of the corresponding distance join (e.g., “find hotels which are within 1km to some restaurant”) that satisfies the additional cardinality constraint. Therefore, it could be processed by using a conventional spatial join algorithm and then filtering-out the non-qualifying pairs. This approach, however, is expensive, especially when the cardinality constraint is highly selective. In this paper, we propose output-sensitive algorithms that prune the search space by integrating the cardinality with the distance constraint. We deal with cases of indexed/non-indexed datasets and evaluate the performance of the proposed techniques with extensive experimental evaluation covering a wide range of problem parameters.

18 citations


Book ChapterDOI
24 Jul 2003
TL;DR: An adaptive algorithm is described that optimizes the overall process of statistics retrieval and query execution and retrieves statistics dynamically in order to generate a low-cost execution plan, while considering the storage and computational power limitations of the PDA.
Abstract: Mobile devices like PDAs are capable of retrieving information from various types of services. In many cases, the user requests cannot directly be processed by the service providers, if their hosts have limited query capabilities or the query combines data from various sources, which do not collaborate with each other. In this paper, we present a framework for optimizing spatial join queries that belong to this class. We presume that the connection and queries are ad-hoc, there is no mediator available and the services are non-collaborative. We also assume that the services are not willing to share their statistics or indexes with the client. We retrieve statistics dynamically in order to generate a low-cost execution plan, while considering the storage and computational power limitations of the PDA. Since acquiring the statistics causes overhead, we describe an adaptive algorithm that optimizes the overall process of statistics retrieval and query execution. We demonstrate the applicability of our methods with a prototype implementation on a PDA with wireless network access.

13 citations


Book ChapterDOI
17 Aug 2003
TL;DR: This paper proposes an alternative technique that uses a signature index to search fast and prune effectively the search space and uses these components to filter a large part of the database that does not qualify them, before validating the query on the actual data.
Abstract: Answering a query on XML data usually involves breaking it into a number of small components (e.g., edges, paths, twigs, etc.), evaluating them and joining the results. In this paper we propose an alternative technique that uses these components to filter a large part of the database that does not qualify them, before validating the query on the actual data. Our methodology uses a signature index to search fast and prune effectively the search space. The efficiency of the proposed technique is demonstrated by comparison with an existing index, on real data.