Showing papers on "Skyline published in 2007"

PDF

Open Access

Proceedings Article•

Probabilistic skylines on uncertain data

[...]

Jian Pei¹, Bin Jiang², Xuemin Lin², Yidong Yuan²•Institutions (2)

Simon Fraser University¹, University of New South Wales²

23 Sep 2007

TL;DR: A novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p is proposed.

...read moreread less

Abstract: Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance.

...read moreread less

509 citations

Proceedings Article•DOI•

Selecting Stars: The k Most Representative Skyline Operator

[...]

Xuemin Lin¹, Yidong Yuan¹, Qing Zhang², Ying Zhang³•Institutions (3)

University of New South Wales¹, Commonwealth Scientific and Industrial Research Organisation², NICTA³

15 Apr 2007

TL;DR: An efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique and a comprehensive performance evaluation demonstrates that the randomized technique is very efficient, highly accurate, and scalable.

...read moreread less

Abstract: Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2d-space. Then, we show that the problem is NP-hard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1-1/e. To speed-up the computation, an efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable.

...read moreread less

405 citations

Proceedings Article•

Efficient computation of reverse skyline queries

[...]

Evangelos Dellis¹, Bernhard Seeger¹•Institutions (1)

University of Marburg¹

23 Sep 2007

TL;DR: This paper introduces the concept of Reverse Skyline Queries and proposes an enhanced algorithm (called RSSA) that is based on accurate pre-computed approximations of the skylines used to identify whether a point belongs to the reverse skyline or not.

...read moreread less

Abstract: In this paper, for the first time, we introduce the concept of Reverse Skyline Queries. At first, we consider for a multidimensional data set P the problem of dynamic skyline queries according to a query point q. This kind of dynamic skyline corresponds to the skyline of a transformed data space where point q becomes the origin and all points of P are represented by their distance vector to q. The reverse skyline query returns the objects whose dynamic skyline contains the query object q. In order to compute the reverse skyline of an arbitrary query point, we first propose a Branch and Bound algorithm (called BBRS), which is an improved customization of the original BBS algorithm. Furthermore, we identify a super set of the reverse skyline that is used to bound the search space while computing the reverse skyline. To further reduce the computational cost of determining if a point belongs to the reverse skyline, we propose an enhanced algorithm (called RSSA) that is based on accurate pre-computed approximations of the skylines. These approximations are used to identify whether a point belongs to the reverse skyline or not. Through extensive experiments with both real-world and synthetic datasets, we show that our algorithms can efficiently support reverse skyline queries. Our enhanced approach improves reversed skyline processing by up to an order of magnitude compared to the algorithm without the usage of pre-computed approximations.

...read moreread less

271 citations

Proceedings Article•

Approaching the skyline in Z order

[...]

Ken C. K. Lee¹, Baihua Zheng², Huajing Li¹, Wang-Chien Lee¹•Institutions (2)

Pennsylvania State University¹, Singapore Management University²

23 Sep 2007

TL;DR: A suite of novel and efficient skyline algorithms are developed, which scale very well to data dimensionality and cardinality, and soundly outperforms the state-of-the-art skyline algorithms in their specialized domains.

...read moreread less

Abstract: Given a set of multidimensional data points, skyline query retrieves a set of data points that are not dominated by any other points. This query is useful for multi-preference analysis and decision making. By analyzing the skyline query, we observe a close connection between Z-order curve and skyline processing strategies and propose to use a new index structure called ZBtree, to index and store data points based on Z-order curve. We develop a suite of novel and efficient skyline algorithms, which scale very well to data dimensionality and cardinality, including (1) ZSearch, which processes skyline queries and supports progressive result delivery; (2) ZUpdate, which facilitates incremental skyline result maintenance; and (3) k-ZSearch, which answers k-dominant skyline query (a skyline variant that retrieves a representative subset of skyline results). Extensive experiments have been conducted to evaluate our proposed algorithms and compare them against the best available algorithms designed for skyline search, skyline result update, and k-dominant skyline search, respectively. The result shows that our algorithms, developed coherently based on the same ideas and concepts, soundly outperforms the state-of-the-art skyline algorithms in their specialized domains.

...read moreread less

203 citations

Proceedings Article•DOI•

Multi-source Skyline Query Processing in Road Networks

[...]

Ke Deng¹, Xiaofang Zhou¹, Heng Tao Shen¹•Institutions (1)

University of Queensland¹

15 Apr 2007

TL;DR: The Lower Bound Constraint algorithm (LBC) is proven to be an instance optimal algorithm and extensive experiments demonstrate that LBC is four times more efficient than a straightforward algorithm.

...read moreread less

Abstract: Skyline query processing has been investigated extensively in recent years, mostly for only one query reference point. An example of a single-source skyline query is to find hotels which are cheap and close to the beach (an absolute query), or close to a user-given location (a relatively query). A multi-source skyline query considers several query points at the same time (e.g., to find hotels which are cheap and close to the University, the Botanic Garden and the China Town). In this paper, we consider the problem of efficient multi-source skyline query processing in road networks. It is not only the first effort to consider multi-source skyline query in road networks but also the first effort to process the relative skyline queries where the network distance between two locations needs to be computed on-the-fly. Three different query processing algorithms are proposed and evaluated in this paper. The Lower Bound Constraint algorithm (LBC) is proven to be an instance optimal algorithm. Extensive experiments using large real road network datasets demonstrate that LBC is four times more efficient than a straightforward algorithm.

...read moreread less

195 citations

Proceedings Article•

Efficient processing of top- k dominating queries on multi-dimensional data

[...]

Man Lung Yiu¹, Nikos Mamoulis²•Institutions (2)

Aalborg University¹, University of Hong Kong²

23 Sep 2007

TL;DR: The top-k dominating query as mentioned in this paper returns k data objects which dominate the highest number of objects in a dataset, which is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects.

...read moreread less

Abstract: The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. In this paper, we design specialized algorithms that apply on indexed multi-dimensional data and fully exploit the characteristics of the problem. Experiments on synthetic datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach, while our results on real datasets show the meaningfulness of top-k dominating queries.

...read moreread less

195 citations

Proceedings Article•

Efficient skyline computation over low-cardinality domains

[...]

Michael Morse¹, Jignesh M. Patel¹, H. V. Jagadish¹•Institutions (1)

University of Michigan¹

23 Sep 2007

TL;DR: The Lattice Skyline Algorithm (LS) is proposed that is built around a new paradigm for skyline evaluation on datasets with attributes that are drawn from low-cardinality domains and it is shown that for typical dimensionalities, the complexity of LS is linear in the number of input tuples.

...read moreread less

Abstract: Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution (i.e. whether the dataset attributes are correlated, independent, or anti-correlated). In this paper, we propose the Lattice Skyline Algorithm (LS) that is built around a new paradigm for skyline evaluation on datasets with attributes that are drawn from low-cardinality domains. LS continues to apply even if one attribute has high cardinality. Many skyline applications naturally have such data characteristics, and previous skyline methods have not exploited this property. We show that for typical dimensionalities, the complexity of LS is linear in the number of input tuples. Furthermore, we show that the performance of LS is independent of the input data distribution. Finally, we demonstrate through extensive experimentation on both real and synthetic databsets that LS can results in a significant performance advantage over existing technqiues.

...read moreread less

149 citations

Proceedings Article•DOI•

Efficient Skyline Query Processing on Peer-to-Peer Networks

[...]

Shiyuan Wang¹, Beng Chin Ooi², Anthony K. H. Tung², Lizhen Xu¹•Institutions (2)

Southeast University¹, National University of Singapore²

15 Apr 2007

TL;DR: By partitioning the skyline search space adaptively based on query accessing patterns, this paper is able to alleviate the problem of "hot" spots present in the skyline query processing, and its effectiveness and scalability are confirmed on P2P networks.

...read moreread less

Abstract: Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peer-to-peer (P2P) network is still an emerging topic. The desiderata of efficient skyline querying in P2P environment include: 1) progressive returning of answers, 2) low processing cost in terms of number of peers accessed and search messages, 3) balanced query loads among the peers. In this paper, we propose a solution that satisfies the three desiderata. Our solution is based on a balanced tree structured P2P network. By partitioning the skyline search space adaptively based on query accessing patterns, we are able to alleviate the problem of "hot" spots present in the skyline query processing. By being able to estimate the peer nodes within the query subspaces, we are able to control the amount of query forwarding, limiting the number of peers involved and the amount of messages transmitted in the network. Load balancing is achieved in query load conscious data space splitting/merging during the joining/departure of nodes and through dynamic load migration. Experiments on real and synthetic datasets confirm the effectiveness and scalability of our algorithm on P2P networks.

...read moreread less

147 citations

Journal Article•DOI•

Efficient continuous skyline computation

[...]

Michael Morse¹, Jignesh M. Patel¹, William I. Grosky¹•Institutions (1)

University of Michigan¹

01 Sep 2007-Information Sciences

TL;DR: An operator called the continuous time-interval skyline operator is introduced for evaluating the continuous and valid skyline summary efficiently, and a new algorithm called LookOut is presented for evaluating this operator efficiently and demonstrating the scalability of this algorithm.

...read moreread less

138 citations

Proceedings Article•DOI•

SKYPEER: Efficient Subspace Skyline Computation over Distributed Data

[...]

Akrivi Vlachou¹, Christos Doulkeridis¹, Yannis Kotidis¹, Michalis Vazirgiannis¹•Institutions (1)

Athens University of Economics and Business¹

15 Apr 2007

TL;DR: This paper addresses the efficient computation of subspace skyline queries in large-scale peer-to-peer (P2P) networks, where the dataset is horizontally distributed across the peers, and proposes a threshold based algorithm, called SKYPEER, which forwards the skyline query requests among peers in such a way that the amount of transferred data is significantly reduced.

...read moreread less

Abstract: Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multidimensional dataset. While most previous work has assumed a centralized setting, in this paper we address the efficient computation of subspace skyline queries in large-scale peer-to-peer (P2P) networks, where the dataset is horizontally distributed across the peers. Relying on a super-peer architecture we propose a threshold based algorithm, called SKYPEER, which forwards the skyline query requests among peers, in such a way that the amount of transferred data is significantly reduced. For efficient subspace skyline processing, we extend the notion of domination by defining the extended skyline set, which contains all data elements that are necessary to answer a skyline query in any arbitrary subspace. We prove that our algorithm provides the exact answers and we present optimization techniques to reduce communication cost and execution time. Finally, we provide an extensive experimental evaluation showing that SKYPEER performs efficiently and provides a viable solution when a large degree of distribution is required.

...read moreread less

136 citations

Journal Article•DOI•

Efficient Skyline and Top-k Retrieval in Subspaces

[...]

Yufei Tao¹, Xiaokui Xiao¹, Jian Pei•Institutions (1)

The Chinese University of Hong Kong¹

01 Aug 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A technique SUBSKY is proposed, which settles both types of queries by using purely relational technologies and outperforms alternative solutions significantly in both efficiency and scalability.

...read moreread less

Abstract: Skyline and top-k queries are two popular operations for preference retrieval. In practice, applications that require these operations usually provide numerous candidate attributes, whereas, depending on their interests, users may issue queries regarding different subsets of the dimensions. The existing algorithms are inadequate for subspace skyline/top-k search because they have at least one of the following defects: 1) they require scanning the entire database at least once, 2) they are optimized for one subspace but incur significant overhead for other subspaces, or 3) they demand expensive maintenance cost or space consumption. In this paper, we propose a technique SUBSKY, which settles both types of queries by using purely relational technologies. The core of SUBSKY is a transformation that converts multidimensional data to one-dimensional (1D) values. These values are indexed by a simple B-tree, which allows us to answer subspace queries by accessing a fraction of the database. SUBSKY entails low maintenance overhead, which equals the cost of updating a traditional B-tree. Extensive experiments with real data confirm that our technique outperforms alternative solutions significantly in both efficiency and scalability.

...read moreread less

Journal Article•DOI•

Approximately dominating representatives

[...]

Vladlen Koltun¹, Christos H. Papadimitriou¹•Institutions (1)

University of California, Berkeley¹

25 Feb 2007-Theoretical Computer Science

TL;DR: It is shown that the problem of minimizing the number of points returned, for a user-specified desired approximation, can be solved in polynomial time in two dimensions; for three and more it is NP-hard but has aPolynomial-time logarithmic approximation.

...read moreread less

Proceedings Article•DOI•

Computing Compressed Multidimensional Skyline Cubes Efficiently

[...]

Jian Pei¹, Ada Wai-Chee Fu², Xuemin Lin³, Haixun Wang⁴•Institutions (4)

Simon Fraser University¹, The Chinese University of Hong Kong², University of New South Wales³, IBM⁴

15 Apr 2007

TL;DR: This paper proposes a novel and efficient method, Stellar, which exploits an interesting skyline group lattice on a small subset of objects which are in the skyline of the full space and shows that this skylinegroup lattice is easy to compute and can be extended to the skyline group latice on all objects.

...read moreread less

Abstract: Recently, the skyline computation and analysis have been extended from one single full space to multidimensional subspaces, which can lead to valuable insights in some applications. Particularly, compressed skyline cubes in the form of skyline groups and their decisive subspaces provide a succinct summarization and compression of multidimensional subspace skylines. However, computing skyline cubes remains a challenging task since the existing methods have to search an exponential number of nonempty subspaces for subspace skylines. In this paper, we propose a novel and efficient method, Stellar, which exploits an interesting skyline group lattice on a small subset of objects which are in the skyline of the full space. We show that this skyline group lattice is easy to compute and can be extended to the skyline group lattice on all objects. After computing the skyline in the full space, Stellar only needs to enumerate skyline groups and their decisive subspaces using the full space skyline objects. Avoiding searching for skylines in an exponential number of subspaces improves the efficiency and the scalability of subspace skyline computation substantially in practice. An extensive performance study verifies the merits of our new method.

...read moreread less

Proceedings Article•DOI•

The Multi-Relational Skyline Operator

[...]

Wen Jin¹, Martin Ester¹, Zengjian Hu¹, Jiawei Han²•Institutions (2)

Simon Fraser University¹, University of Illinois at Urbana–Champaign²

15 Apr 2007

TL;DR: This paper systematically study the skyline operator on multi-relational databases, and proposes solutions aiming to seamlessly integrating state-of-the-art join methods into skyline computation.

...read moreread less

Abstract: Most of the existing work on skyline query has been extensively used in decision support, recommending systems etc, and mainly focuses on the efficiency issue for a single table. However the data retrieved by users for the targeting skylines may often be stored in multiple tables, thus require to perform join operations among tables. As a result, the cost on computing skylines on the joined table will be increased dramatically due to its potentially increasing cardinality and dimensionality. In this paper, we systematically study the skyline operator on multi-relational databases, and propose solutions aiming to seamlessly integrating state-of-the-art join methods into skyline computation. Our experiments not only demonstrate that the proposed methods are efficient, but also show the promising applicability of extending skyline operator to other typical database operators such as join and aggregates.

...read moreread less

Posted Content•

Efficient Skyline Querying with Variable User Preferences on Nominal Attributes

[...]

Raymond Chi-Wing Wong¹, Ada Wai-Chee Fu², Jian Pei³, Yip Sing Ho², Tai Wong², Yubao Liu⁴ - Show less +2 more•Institutions (4)

Hong Kong University of Science and Technology¹, The Chinese University of Hong Kong², Simon Fraser University³, Sun Yat-sen University⁴

13 Oct 2007-arXiv: Databases

TL;DR: In this article, the authors proposed two methods of different characteristics to generate online response for any such preference issued by a user, i.e., semi-materialization and adaptive SFS.

...read moreread less

Abstract: Current skyline evaluation techniques assume a fixed ordering on the attributes. However, dynamic preferences on nominal attributes are more realistic in known applications. In order to generate online response for any such preference issued by a user, we propose two methods of different characteristics. The first one is a semi-materialization method and the second is an adaptive SFS method. Finally, we conduct experiments to show the efficiency of our proposed algorithms.

...read moreread less

Proceedings Article•DOI•

DeltaSky: Optimal Maintenance of Skyline Deletions without Exclusive Dominance Region Generation

[...]

Ping Wu¹, Divyakant Agrawal¹, Ömer Eğecioğlu¹, A. El Abbadi¹•Institutions (1)

University of California, Santa Barbara¹

15 Apr 2007

TL;DR: A systematic way to decompose a d-dimensional EDR into a collection of hyper-rectangles is derived and DeltaSky helps the branch and bound skyline algorithm achieve I/O optimality for deletion maintenance by finding only the newly appeared skyline points after the deletion.

...read moreread less

Abstract: This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. Previous work suggested the use of the so called exclusive dominance region (EDR) to achieve optimal I/O performance for deletion maintenance. However, the shape of an EDR becomes extremely complex in higher dimensions, and algorithms for its computation have not been developed. We derive a systematic way to decompose a d-dimensional EDR into a collection of hyper-rectangles. We show that the number of such hyper-rectangles is O(md), where m is the current skyline result size. We then propose a novel algorithm DeltaSky which determines whether an intermediate R-tree MBR intersects with the EDR without explicitly calculating the EDR itself. This reduces the worse case complexity of the EDR intersection check from O(md) to O(md). Thus DeltaSky helps the branch and bound skyline algorithm achieve I/O optimality for deletion maintenance by finding only the newly appeared skyline points after the deletion. We discuss implementation issues and show that DeltaSky can be efficiently implemented using one extra B-Tree. Moreover, we propose two optimization techniques which further reduce the average cost in practice. Extensive experiments demonstrate that DeltaSky achieves orders of magnitude performance gain over alternative solutions.

...read moreread less

Book Chapter•DOI•

Continuously maintaining sliding window skylines in a sensor network

[...]

Junchang Xin¹, Guoren Wang¹, Lei Chen², Xiaoyi Zhang¹, Zhenhua Wang¹ - Show less +1 more•Institutions (2)

Northeastern University (China)¹, Hong Kong University of Science and Technology²

09 Apr 2007

TL;DR: This paper proposes an energy-efficient algorithm, called Sliding Window Skyline Monitoring Algorithm (SWSMA), to continuously maintain sliding window skylines over a wireless sensor network and employs two types of filters within each sensor to reduce the amount of data transferred and save the energy consumption.

...read moreread less

Abstract: Currently, wireless sensor network has been widely used in environment monitoring. The skyline query, as an important operator for multiple criteria decision making and data mining, plays an important role in many sensing applications. Though skyline queries have been well-studied in traditional database system, the existing solutions designed for data stored in a centralized site are not directly applicable to sensor environment due to the unique characteristics of wireless sensor network. In this paper, we propose an energy-efficient algorithm, called Sliding Window Skyline Monitoring Algorithm (SWSMA), to continuously maintain sliding window skylines over a wireless sensor network. Specifically, SWSMA employs two types of filters within each sensor to reduce the amount of data transferred and save the energy consumption as a consequence. In addition to SWSMA, a set of optimization mechanisms are also discussed to improve the performance of SWSMA. Our extensive simulation studies show that SWSMA together with the optimization techniques performs effectively on reducing communication cost and saving the energy on monitoring sliding window skylines.

...read moreread less

Book Chapter•DOI•

Towards energy-efficient skyline monitoring in wireless sensor networks

[...]

Hekang Chen¹, Shuigeng Zhou¹, Jihong Guan²•Institutions (2)

Fudan University¹, Tongji University²

29 Jan 2007

TL;DR: Experimental results show that the proposed threshold-based approach outperforms the naive approach substantially in energy saving.

...read moreread less

Abstract: Skyline computation is a hot topic in database community due to its promising application in multi-criteria decision making. In sensor network application scenarios, skyline is still useful and important in environment monitoring, industry control, etc. To support energy-efficient skyline monitoring in sensor networks, this paper first presents a naive approach as baseline, and then proposes an advanced approach that employs hierarchical thresholds at the nodes. The threshold-based approach focuses on minimizing the transmission traffic in the network to save the energy consumption. Finally, we conduct extensive experiments to evaluate the proposed approaches on simulated data sets, and compare the threshold-based approach with the naive approach. Experimental results show that the proposed threshold-based approach outperforms the naive approach substantially in energy saving.

...read moreread less

Book Chapter•DOI•

Eliciting matters: controlling skyline sizes by incremental integration of user preferences

[...]

Wolf-Tilo Balke¹, Ulrich Güntzer², Christoph Lofi¹•Institutions (2)

Leibniz University of Hanover¹, University of Tübingen²

09 Apr 2007

TL;DR: This paper discusses the incremental re-computation of skylines based on additional information elicited from the user based on the traditional case of totally ordered domains and considers preferences in their most general form as strict partial orders of attribute values.

...read moreread less

Abstract: Today, result sets of skyline queries are unmanageable due to their exponential growth with the number of query predicates. In this paper we discuss the incremental re-computation of skylines based on additional information elicited from the user. Extending the traditional case of totally ordered domains, we consider preferences in their most general form as strict partial orders of attribute values. After getting an initial skyline set our basic approach aims at interactively increasing the system's information about the user's wishes explicitly including indifferences. The additional knowledge then is incorporated into the preference information and constantly reduces skyline sizes. In fact, our approach even allows users to specify trade-offs between different query predicates, thus effectively decreasing the query dimensionality. We give theoretical proof for the soundness and consistence of the extended preference information and an extensive experimental evaluation of the efficiency of our approach. On average, skyline sizes can be considerably decreased in each elicitation step.

...read moreread less

Proceedings Article•

On dominating your neighborhood profitably

[...]

Cuiping Li¹, Anthony K. H. Tung, Wen Jin², Martin Ester²•Institutions (2)

Renmin University of China¹, Simon Fraser University²

23 Sep 2007

TL;DR: This paper introduces novel skyline query types taking into account not only min/max attributes but also spatial attributes and the relationships between these different attribute types, and investigates two alternative approaches for efficient query processing.

...read moreread less

Abstract: Recent research on skyline queries has attracted much interest in the database and data mining community. Given a database, an object belongs to the skyline if it cannot be dominated with respect to the given attributes by any other database object. Current methods have only considered so-called min/max attributes like price and quality which a user wants to minimize or maximize. However, objects can also have spatial attributes like x, y coordinates which can be used to represent relevant constraints on the query results. In this paper, we introduce novel skyline query types taking into account not only min/max attributes but also spatial attributes and the relationships between these different attribute types. Such queries support a micro-economic approach to decision making, considering not only the quality but also the cost of solutions. We investigate two alternative approaches for efficient query processing, a symmetrical one based on off-the-shelf index structures, and an asymmetrical one based on index structures with special purpose extensions. Our experimental evaluation using a real dataset and various synthetic datasets demonstrates that the new query types are indeed meaningful and the proposed algorithms are efficient and scalable.

...read moreread less

Proceedings Article•DOI•

On Efficient Processing of Subspace Skyline Queries on High Dimensional Data

[...]

Wen Jin¹, Anthony K. H. Tung², Martin Ester¹, Jiawei Han³•Institutions (3)

Simon Fraser University¹, National University of Singapore², University of Illinois at Urbana–Champaign³

09 Jul 2007

TL;DR: Methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated are proposed.

...read moreread less

Abstract: Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.

...read moreread less

Proceedings Article•DOI•

Parallel Computation of Skyline Queries

[...]

A. Cosgaya-Lozano¹, Andrew Rau-Chaplin¹, Norbert Zeh¹•Institutions (1)

Dalhousie University¹

13 May 2007

TL;DR: It is shown that parallel computing is an effective method to speed up the answering of skyline queries on large data sets and a proposal to preprocess the set of data points to quickly answer subsequent skyline query on any subset of the dimensions.

...read moreread less

Abstract: Skyline queries have received considerable attention in the database community. The goal is to retrieve all records in a database that have the property that no other record is better according to all of a given set of criteria. While this problem has been well studied in the computational geometry literature, the solution of this problem in the database context requires techniques designed particularly to handle large amounts of data. In this paper, we show that parallel computing is an effective method to speed up the answering of skyline queries on large data sets. We also propose to preprocess the set of data points to quickly answer subsequent skyline queries on any subset of the dimensions.

...read moreread less

Book Chapter•DOI•

Telescope: zooming to interesting skylines

[...]

Jongwuk Lee¹, Gae-won You¹, Seung-won Hwang¹•Institutions (1)

Pohang University of Science and Technology¹

09 Apr 2007

TL;DR: Algorithm Telescope abstracts skyline ranking as a dynamic search over skyline subspaces guided by user-specific preference with correctness and optimality guarantees, which validate the effectiveness and efficiency of Algorithm Telescope on both real-life and synthetic data.

...read moreread less

Abstract: As data of an unprecedented scale are becoming accessible, skyline queries have been actively studied lately, to retrieve "interesting" data objects that are not dominated by any other objects, i.e., skyline objects. When the dataset is high-dimensional, however, such skyline objects are often too numerous to identify truly interesting objects. This paper studies the "curse of dimensionality" problem in skyline queries. That is, our work complements existing research efforts to address this "curse of dimensionality", by ranking skyline objects based on user-specific qualitative preference. In particular, Algorithm Telescope abstracts skyline ranking as a dynamic search over skyline subspaces guided by user-specific preference with correctness and optimality guarantees. Our extensive evaluation results validate the effectiveness and efficiency of Algorithm Telescope on both real-life and synthetic data.

...read moreread less

Book Chapter•DOI•

Fuzzy dominance skyline queries

[...]

Marlene Goncalves¹, Leonid Tineo¹•Institutions (1)

Simón Bolívar University¹

03 Sep 2007

TL;DR: This work proposes to flexibilize Skyline queries using fuzzy comparison operators in order to retrieve interesting dominated rows and introduces an evaluation mechanism for these queries that has a reasonable performance.

...read moreread less

Abstract: Skyline is an important and recent proposal for expressing user preferences. While no one best row exists, Skyline discards rows which are worse on all criteria than some other and retrieves non-dominated or the best ones that match user preferences. Nevertheless, some dominated rows could be interesting to user requirement, but they will be rejected by Skyline. Dominated rows could be discriminated (or ranked) by means of user preferences, but Skyline only discards dominated ones and it does not discriminate them. SQLf is a proposal for preferences queries based on fuzzy logic that allows to discriminate rows and includes user-defined terms, such as fuzzy comparison operators. In this work, we propose to flexibilize Skyline queries using fuzzy comparison operators in order to retrieve interesting dominated rows. We also introduce an evaluation mechanism for these queries and our initial experimental study shows that this mechanism has a reasonable performance.

...read moreread less

Proceedings Article•

User Interaction Support for Incremental Refinement of Preference-Based Queries.

[...]

Wolf-Tilo Balke, Ulrich Güntzer, Christoph Lofi

01 Jan 2007

TL;DR: This work presents a sophisticated user interface to interactively refine queries in case the respective skyline set proves to be too large, and allows users to incrementally augment preferences given by their Hasse diagrams.

...read moreread less

Abstract: Preference-based queries (or skylines) play an important role in cooperative query processing. However, their prohibitive result sizes pose a severe challenge to the paradigm’s practical applicability. Since skyline sizes can only be predicted under strong assumptions, we present a sophisticated user interface to interactively refine queries in case the respective skyline set proves to be too large. Our approach allows users to incrementally augment preferences given by their Hasse diagrams. Moreover, we prove the correctness of the incremented skyline and in our experiments show the approach’s superior efficiency. Index Terms — Information Systems, User Interfaces

...read moreread less

Journal Article•DOI•

In-Network Processing for Skyline Queries in Sensor Networks

[...]

Yoon Kyung Kwon¹, Jae-Ho Choi¹, Yon Dohn Chung, Sang-Geun Lee•Institutions (1)

Korea University¹

01 Dec 2007-IEICE Transactions on Communications

TL;DR: A new algorithm of in-network processing for the skyline queries that reduces the communication cost and evenly distributes load, and shows the advantages of the algorithm over in- network aggregation in terms of improving energy efficiency.

...read moreread less

Abstract: Wireless sensor networks can be used in various fields, e.g., military and civil applications. The technique of saving energy to prolong the life of sensor nodes is one of main challenges to resource-constrained sensor networks. Therefore, in-network aggregation of data has been proposed in resource-constrained environments for energy efficiency. Most previous works on in-network aggregation only support a one-dimensional data (e.g., MIN and MAX). To support a multi-dimensional data, the skyline query is used. The skyline query returns a set of points that are not dominated by any other point on all dimensions. The majority of previous skyline query processing methods (e.g., BNL and BBS) work on centralized storage. Centralized query processing methods do not have merits in terms of energy efficiency in high event rate environments. In this paper, we propose new algorithm of in-network processing for the skyline queries. The proposed algorithm reduces the communication cost and evenly distributes load. The experimental results show the advantages of our algorithm over in-network aggregation in terms of improving energy efficiency.

...read moreread less

Proceedings Article•DOI•

Continuous monitoring of skyline query over highly dynamic moving objects

[...]

Li Tian¹, Le Wang¹, Peng Zou¹, Yan Jia¹, Aiping Li¹ - Show less +1 more•Institutions (1)

National University of Defense Technology¹

10 Jun 2007

TL;DR: This work proposes a continuous skyline query processing strategy for static query point, and the main idea is a grid-based algorithm that achieves low running time by handling movements only from objects that fall in the influence region, while data changes in the free region are omitted with correctness guarantee.

...read moreread less

Abstract: We address the problem of continuous skyline computation on highly dynamic moving objects (i.e. objects with dynamic dimensions move in an unrestricted and unpredictable fashion), which is quite a different scenario from existing literatures on skyline algorithms. We propose a continuous skyline query processing strategy for static query point, and the main idea is as follows: (1) The work space is divided into lots of regular grids, and the valid objects are indexed by this data structure. (2) Some grids are organized as the influence region, while the rest compose of the free region. The algorithm achieves low running time by handling movements only from objects that fall in the influence region, while data changes in the free region are omitted with correctness guarantee. (3) The initialization module adopts an efficient method to obtain the initial result without having to process all the data points; after that the maintenance module updates the change of skyline and influence region dynamically when data changes. We analyze the space and time costs of the proposed method and conduct an extensive experiment, which indicates that our grid-based algorithm is efficient and significantly outperforms existing methods adopted for the application.

...read moreread less

Journal Article•DOI•

Flexible integration of multimedia sub-queries with qualitative preferences

[...]

Ilaria Bartolini¹, Paolo Ciaccia¹, Vincent Oria², M. Tamer Özsu³•Institutions (3)

University of Bologna¹, New Jersey Institute of Technology², University of Waterloo³

01 Jun 2007-Multimedia Tools and Applications

TL;DR: The potentialities of a more general approach, based on the use of qualitative preferences, able to define arbitrary partial orders on database objects, are explored, so that a larger flexibility is gained in shaping what the user is looking for.

...read moreread less

Abstract: Complex multimedia queries, aiming to retrieve from large databases those objects that best match the query specification, are usually processed by splitting them into a set of m simpler sub-queries, each dealing with only some of the query features. To determine which are the overall best-matching objects, a rule is then needed to integrate the results of such sub-queries, i.e., how to globally rank the m-dimensional vectors of matching degrees, or partial scores, that objects obtain on the m sub-queries. It is a fact that state-of-the-art approaches all adopt as integration rule a scoring function, such as weighted average, that aggregates the m partial scores into an overall (numerical) similarity score, so that objects can be linearly ordered and only the highest scored ones returned to the user. This choice however forces the system to compromise between the different sub-queries and can easily lead to miss relevant results. In this paper we explore the potentialities of a more general approach, based on the use of qualitative preferences, able to define arbitrary partial (rather than only linear) orders on database objects, so that a larger flexibility is gained in shaping what the user is looking for. For the purpose of efficient evaluation, we propose two integration algorithms able to work with any (monotone) partial order (thus also with scoring functions): MPO, which delivers objects one layer of the partial order at a time, and iMPO, which can incrementally return one object at a time, thus also suitable for processing top k queries. Our analysis demonstrates that using qualitative preferences pays off. In particular, using Skyline and Region-prioritized Skyline preferences for queries on a real image database, we show that the results we get have a precision comparable to that obtainable using scoring functions, yet they are obtained much faster, saving up to about 70% database accesses.

...read moreread less

Journal Article•DOI•

Restricting skyline sizes using weak Pareto dominance

[...]

Wolf Tilo Balke, Ulrich Güntzer¹, Wolf Siberski•Institutions (1)

University of Tübingen¹

11 May 2007-Informatik - Forschung Und Entwicklung

TL;DR: This paper explores how to enable interactive tasks like query refinement or relevance feedback by providing interesting subsets of the full Pareto skyline, which give users a good overview over the skyline, and shows how this opens up the use of efficient and scalable query processing algorithms.

...read moreread less

Abstract: Skyline queries have recently received a lot of attention due to their intuitive query formulation: users can state preferences with respect to several attributes. Unlike numerical or score-based preferences, preferences over discrete value domains do not show an inherent total order, but have to rely on partial orders as stated by the user. In such orders typically many object values are incomparable, increasing the size of skyline sets significantly, and making their computation expensive. In this paper we explore how to enable interactive tasks like query refinement or relevance feedback by providing interesting subsets of the full Pareto skyline, which give users a good overview over the skyline. To be practical these subsets have to be small, efficient to compute, suitable for higher numbers of query predicates, and representative. The key to improved performance and reduced result set sizes is the relaxation of Pareto semantics to the concept of weak Pareto dominance. We argue that this relaxation yields intuitive results and show how it opens up the use of efficient and scalable query processing algorithms. We first derive the complete skyline subset given by weak Pareto dominance called ‘restricted skyline’ and then considering the individual performance of objects limit this further to a subset called ‘focused skyline’. Assessing the practical impact our experiments show that our approach indeed leads to lean result set sizes and outperforms Pareto skyline computations by up to two orders of magnitude.

...read moreread less

Book Chapter•DOI•

Domination mining and querying

[...]

Apostolos N. Papadopoulos¹, Apostolos Lyritsis¹, Alexandros Nanopoulos¹, Yannis Manolopoulos¹•Institutions (1)

Aristotle University of Thessaloniki¹

03 Sep 2007

TL;DR: This work provides a dominance-based analysis and querying scheme that aims at alleviating the skyline cardinality problem, trying to introduce ranking on the items.

...read moreread less

Abstract: Pareto dominance plays an important role in diverse application domains such as economics and e-commerce, and it is widely being used in multicriteria decision making. In these cases, objectives are usually contradictory and therefore it is not straightforward to provide a set of items that are the "best" according to the user's preferences. Skyline queries have been extensively used to recommend the most dominant items. However, in some cases skyline items are either too few, or too many, causing problems in selecting the prevailing ones. The number of skyline items depend heavily on both the data distribution, the data population and the dimensionality of the data set. In this work, we provide a dominance-based analysis and querying scheme that aims at alleviating the skyline cardinality problem, trying to introduce ranking on the items. The proposed scheme can be used either as a mining or as a querying tool, helping the user in selecting the mostly preferred items. Performance evaluation based on different distributions, populations and dimensionalities show the effectiveness of the proposed scheme

...read moreread less