scispace - formally typeset
Search or ask a question

Showing papers on "Skyline published in 2011"


Proceedings ArticleDOI
12 Jun 2011
TL;DR: This work uses hyperplane projections to obtain useful partitions of the data set for parallel processing that ensure small local skyline sets, but enable efficient merging of results as well and provides insights on the impacts of different optimization strategies.
Abstract: The skyline of a set of multi-dimensional points (tuples) consists of those points for which no clearly better point exists in the given set, using component-wise comparison on domains of interest. Skyline queries, i.e., queries that involve computation of a skyline, can be computationally expensive, so it is natural to consider parallelized approaches which make good use of multiple processors. We approach this problem by using hyperplane projections to obtain useful partitions of the data set for parallel processing. These partitions not only ensure small local skyline sets, but enable efficient merging of results as well. Our experiments show that our method consistently outperforms similar approaches for parallel skyline computation, regardless of data distribution, and provides insights on the impacts of different optimization strategies.

73 citations


Proceedings ArticleDOI
11 Dec 2011
TL;DR: This work establishes theoretical relationships between pattern condensed representations and skyline pattern mining and shows that it is possible to compute automatically a subset of measures involved in the user query which allows the patterns to be condensed and thus facilitates the computation of the skyline patterns.
Abstract: Pattern discovery is at the core of numerous data mining tasks. Although many methods focus on efficiency in pattern mining, they still suffer from the problem of choosing a threshold that influences the final extraction result. The goal of our study is to make the results of pattern mining useful from a user-preference point of view. To this end, we integrate into the pattern discovery process the idea of skyline queries in order to mine skyline patterns in a threshold-free manner. Because the skyline patterns satisfy a formal property of dominations, they not only have a global interest but also have semantics that are easily understood by the user. In this work, we first establish theoretical relationships between pattern condensed representations and skyline pattern mining. We also show that it is possible to compute automatically a subset of measures involved in the user query which allows the patterns to be condensed and thus facilitates the computation of the skyline patterns. This forms the basis for a novel approach to mining skyline patterns. We illustrate the efficiency of our approach over several data sets including a use case from chemo informatics and show that small sets of dominant patterns are produced under various measures.

70 citations


Journal Article
TL;DR: Preference SQL is a declarative extension of standard SQL by strict partial order preferences, behaving like soft constraints under the BMO query model, enabling a seamless application integration with standard SQL back-end systems.
Abstract: Preference SQL is a declarative extension of standard SQL by strict partial order preferences, behaving like soft constraints under the BMO query model. Preference queries can be formulated intuitively following an inductive constructor-based approach. Both qualitative methods like e.g. Pareto / skyline and quantative methods like numerical ranking, definable over categorical as well as numerical attribute domains can be used. The Preference SQL System is implemented as a middleware component, enabling a seamless application integration with standard SQL back-end systems. The preference query optimizer performs algebraic transformations of preference relational algebra as well as cost-based algorithm selection e.g. for efficient Pareto / skyline evaluation. Ongoing work extends Preference SQL towards efficient support for personalized location-based mobile geo-services and social networks.

68 citations


Proceedings ArticleDOI
11 Apr 2011
TL;DR: One of the main contributions is to formulate the problem of displaying k representative skyline points such that the probability that a random user would click on one of them is maximized.
Abstract: The study of skylines and their variants has received considerable attention in recent years. Skylines are essentially sets of most interesting (undominated) tuples in a database. However, since the skyline is often very large, much research effort has been devoted to identifying a smaller subset of (say k) “representative skyline” points. Several different definitions of representative skylines have been considered. Most of these formulations are intuitive in that they try to achieve some kind of clustering “spread” over the entire skyline, with k points. In this work, we take a more principled approach in defining the representative skyline objective. One of our main contributions is to formulate the problem of displaying k representative skyline points such that the probability that a random user would click on one of them is maximized.

59 citations


Journal ArticleDOI
TL;DR: This paper compares two parallel skyline algorithms: a parallel version of the branch-and-bound algorithm (BBS) and a new parallel algorithm based on skeletal parallel programming, which is comparable to parallel BBS in speed.

52 citations


Journal ArticleDOI
TL;DR: This paper proposes a partition algorithm that divides all data sites into incomparable groups such that the skyline computations in all groups can be parallelized without changing the final result, and develops a novel algorithm framework called PaDSkyline for parallel skyline query processing among partitioned site groups.
Abstract: The skyline of a multidimensional point set is a subset of interesting points that are not dominated by others. In this paper, we investigate constrained skyline queries in a large-scale unstructured distributed environment, where relevant data are distributed among geographically scattered sites. We first propose a partition algorithm that divides all data sites into incomparable groups such that the skyline computations in all groups can be parallelized without changing the final result. We then develop a novel algorithm framework called PaDSkyline for parallel skyline query processing among partitioned site groups. We also employ intragroup optimization and multifiltering technique to improve the skyline query processes within each group. In particular, multiple (local) skyline points are sent together with the query as filtering points, which help identify unqualified local skyline points early on a data site. In this way, the amount of data to be transmitted via network connections is reduced, and thus, the overall query response time is shortened further. Cost models and heuristics are proposed to guide the selection of a given number of filtering points from a superset. A cost-efficient model is developed to determine how many filtering points to use for a particular data site. The results of an extensive experimental study demonstrate that our proposals are effective and efficient.

51 citations


Proceedings ArticleDOI
04 Jul 2011
TL;DR: A new concept, called alpha-dominant service skyline, is introduced to address the above issues and a suitable algorithm for computing it efficiently is developed.
Abstract: Nowadays, the exploding number of functionally similar Web services has led to a new challenge of selecting the most relevant services using quality of service (QoS) aspects. Traditionally, the relevance of a service is determined by computing an overall score that aggregates individual QoS values. Users are required to assign weights to QoS attributes. This is a rather demanding task and an imprecise specification of the weights could result in missing some user desired services. Recent approaches focus on computing service skyline over a set of QoS aspects. This can completely free users from assigning weights to QoS attributes. However, two main drawbacks characterize such approaches. First, the service skyline often privileges services with a bad compromise between different QoS attributes. Second, as the size of the service skyline may be quite large, users will be overwhelmed during the service selection process. In this paper, we introduce a new concept, called alpha-dominant service skyline, to address the above issues and we develop a suitable algorithm for computing it efficiently. Experimental evaluation conducted on synthetically generated datasets, demonstrates both the effectiveness of the introduced concept and the efficiency of the proposed algorithm.

50 citations


Journal ArticleDOI
TL;DR: The paper proposes a new approach, called skyline ordering, that forms a skyline-based partitioning of a given data set such that an order exists among the partitions, and proposes a set-wide maximization techniques may be applied within each partition.
Abstract: Given a set of multidimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where k ;s. Based on these observations, the paper proposes a new approach, called skyline ordering, that forms a skyline-based partitioning of a given data set such that an order exists among the partitions. Then, set-wide maximization techniques may be applied within each partition. Efficient algorithms are developed for skyline ordering and for resolving size constraints using the skyline order. The results of extensive experiments show that skyline ordering yields a flexible framework for the efficient and scalable resolution of arbitrary size constraints on skyline queries.

50 citations


Journal ArticleDOI
TL;DR: This work proposes Collaborative Filtering Skyline (CFS), a general framework that combines the advantages of CF with those of the skyline operator, and proposes the top-k personalized skyline, where the user specifies the required output cardinality.
Abstract: Collaborative filtering (CF) systems exploit previous ratings and similarity in user behavior to recommend the top-k objects/records which are potentially most interesting to the user assuming a single score per object. However, in various applications, a record (e.g., hotel) maybe rated on several attributes (value, service, etc.), in which case simply returning the ones with the highest overall scores fails to capture the individual attribute characteristics and to accommodate different selection criteria. In order to enhance the flexibility of CF, we propose Collaborative Filtering Skyline (CFS), a general framework that combines the advantages of CF with those of the skyline operator. CFS generates a personalized skyline for each user based on scores of other users with similar behavior. The personalized skyline includes objects that are good on certain aspects, and eliminates the ones that are not interesting on any attribute combination. Although the integration of skylines and CF has several attractive properties, it also involves rather expensive computations. We face this challenge through a comprehensive set of algorithms and optimizations that reduce the cost of generating personalized skylines. In addition to exact skyline processing, we develop an approximate method that provides error guarantees. Finally, we propose the top-k personalized skyline, where the user specifies the required output cardinality.

48 citations


Proceedings ArticleDOI
11 Apr 2011
TL;DR: It is shown that the problem of stochastic skyline is NP-complete with respect to the dimensionality, and novel and efficient algorithms are developed to efficiently compute stoChastic skyline over multi-dimensional uncertain data, which run in polynomial time if thedimensionality is fixed.
Abstract: In many applications involving the multiple criteria optimal decision making, users may often want to make a personal trade-off among all optimal solutions. As a key feature, the skyline in a multi-dimensional space provides the minimum set of candidates for such purposes by removing all points not preferred by any (monotonic) utility/scoring functions; that is, the skyline removes all objects not preferred by any user no mater how their preferences vary. Driven by many applications with uncertain data, the probabilistic skyline model is proposed to retrieve uncertain objects based on skyline probabilities. Nevertheless, skyline probabilities cannot capture the preferences of monotonic utility functions. Motivated by this, in this paper we propose a novel skyline operator, namely stochastic skyline. In the light of the expected utility principle, stochastic skyline guarantees to provide the minimum set of candidates for the optimal solutions over all possible monotonic multiplicative utility functions. In contrast to the conventional skyline or the probabilistic skyline computation, we show that the problem of stochastic skyline is NP-complete with respect to the dimensionality. Novel and efficient algorithms are developed to efficiently compute stochastic skyline over multi-dimensional uncertain data, which run in polynomial time if the dimensionality is fixed. We also show, by theoretical analysis and experiments, that the size of stochastic skyline is quite similar to that of conventional skyline over certain data. Comprehensive experiments demonstrate that our techniques are efficient and scalable regarding both CPU and IO costs.

41 citations


Proceedings ArticleDOI
24 Oct 2011
TL;DR: This paper proposes two authentication methods: one based on the traditional MR-tree index and the other based on a newly developed MR-Sky-tree, which have recently been receiving increasing attention in LBS applications.
Abstract: In outsourced spatial databases, the location-based service (LBS) provides query services to the clients on behalf of the data owner. However, if the LBS is not trustworthy, it may return incorrect or incomplete query results. Thus, authentication is needed to verify the soundness and completeness of query results. In this paper, we study the authentication problem for location-based skyline queries, which have recently been receiving increasing attention in LBS applications. We propose two authentication methods: one based on the traditional MR-tree index and the other based on a newly developed MR-Sky-tree. Experimental results demonstrate the efficiency of our proposed methods in terms of the authentication cost.

Proceedings ArticleDOI
12 Jun 2011
TL;DR: The novel SFSJ algorithm is introduced that fuses the identification of skyline tuples with the computation of the join and is able to compute the correct skyline set by accessing only a subset of the input tuples, i.e., it has the property of early termination.
Abstract: This paper addresses the problem of efficiently computing the skyline set of a relational join. Existing techniques either require to access all tuples of the input relations or demand specialized multi-dimensional access methods to generate the skyline join result. To avoid these inefficiencies, we introduce the novel SFSJ algorithm that fuses the identification of skyline tuples with the computation of the join. SFSJ is able to compute the correct skyline set by accessing only a subset of the input tuples, i.e., it has the property of early termination. SFSJ employs standard access methods for reading the input tuples and is readily implementable in an existing database system. Moreover, it can be used in pipelined execution plans, as it generates the skyline tuples progressively. Additionally, we formally analyze the performance of SFSJ and propose a novel strategy for accessing the input tuples that is proven to be optimal for SFSJ. Finally, we present an extensive experimental study that validates the effectiveness of SFSJ and demonstrates its advantages over existing techniques.

Journal ArticleDOI
01 Apr 2011
TL;DR: This work studies p-skyline queries that generalize skyline queries by allowing varying attribute importance in preference relations, and proposes a proposed elicitation algorithm that has high accuracy and good scalability.
Abstract: Preference queries incorporate the notion of binary preference relation into relational database querying. Instead of returning all the answers, such queries return only the best answers, according to a given preference relation. Preference queries are a fast growing area of database research. Skyline queries constitute one of the most thoroughly studied classes of preference queries. A well-known limitation of skyline queries is that skyline preference relations assign the same importance to all attributes. In this work, we study p-skyline queries that generalize skyline queries by allowing varying attribute importance in preference relations. We perform an in-depth study of the properties of p-skyline preference relations. In particular, we study the problems of containment and minimal extension. We apply the obtained results to the central problem of the paper: eliciting relative importance of attributes. Relative importance is implicit in the constructed p-skyline preference relation. The elicitation is based on user-selected sets of superior (positive) and inferior (negative) examples. We show that the computational complexity of elicitation depends on whether inferior examples are involved. If they are not, elicitation can be achieved in polynomial time. Otherwise, it is NP complete. Our experiments show that the proposed elicitation algorithm has high accuracy and good scalability.

Journal ArticleDOI
Ying Zhang1, Wenjie Zhang1, Xuemin Lin1, Bin Jiang2, Jian Pei2 
TL;DR: An efficient exact algorithm for computing the top-k skyline objects is developed for discrete cases and an efficient randomized algorithm with an @e@?approximation guarantee is developed to address applications where each object may have a massive set of instances or a continuous probability density function.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: A novel framework, called SkyPlan, for processing distributed skyline queries that generates execution plans aiming at optimizing the performance of query processing that consistently outperforms the state-of-the-art algorithm.
Abstract: In this paper, we study the generation of efficient execution plans for skyline query processing in large-scale distributed environments. In such a setting, each server stores autonomously a fraction of the data, thus all servers need to process the skyline query. An execution plan defines the order in which the individual skyline queries are processed on different servers, and influences the performance of query processing. Querying servers consecutively reduces the amount of transferred data and the number of queried servers, since skyline points obtained by one server prune points in the subsequent servers, but also increases the latency of the system. To address this trade-off, we introduce a novel framework, called SkyPlan, for processing distributed skyline queries that generates execution plans aiming at optimizing the performance of query processing. Thus, we quantify the gain of querying consecutively different servers. Then, execution plans are generated that maximize the overall gain, while also taking into account additional objectives, such as bounding the maximum number of hops required for the query or balancing the load on different servers fairly. Finally, we present an algorithm for distributed processing based on the generated plan that continuously refines the execution plan during in-network processing. Our framework consistently outperforms the state-of-the-art algorithm.

Journal ArticleDOI
TL;DR: This work proposes a new algorithm for computing all skyline probabilities that is asymptotically faster and studies the online version of the problem, which involves answering an online query for d-dimensional data in O(n) time and space.
Abstract: Skyline computation is widely used in multicriteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied. Some earlier work focused on probabilistic skylines with a given threshold; Atallah and Qi [2009] studied the problem to compute skyline probabilities for all instances of uncertain objects without the use of thresholds, and proposed an algorithm with subquadratic time complexity. In this work, we propose a new algorithm for computing all skyline probabilities that is asymptotically faster: worst-case O(n √n log n) time and O(n) space for 2D data; O(n2−1/d logd−1n) time and O(n logd−2n) space for d-dimensional data. Furthermore, we study the online version of the problem: Given any query point p (unknown until the query time), return the probability that no instance in the given data set dominates p. We propose an algorithm for answering such an online query for d-dimensional data in O(n1−1/d logd−1n) time after preprocessing the data in O(n2−1/d logd−1) time and space.

Book ChapterDOI
28 Jun 2011
TL;DR: This paper deals with database preference queries based on the skyline paradigm, which aim at retrieving the tuples non Paretodominated by any other, and proposes different ways to fuzzify such queries in order to make them more flexible, to increase their discrimination power, to make they more drastic or more tolerant.
Abstract: This paper deals with database preference queries based on the skyline paradigm, which aim at retrieving the tuples non Paretodominated by any other. We propose different ways to fuzzify such queries in order to make them more flexible, to increase their discrimination power, to make them more drastic or more tolerant. In particular, some of these extensions make it possible to reduce the risk of getting many incomparable tuples, even when the number of dimensions is high.

Proceedings ArticleDOI
Prasad M. Deshpande1, Deepak P1
21 Mar 2011
TL;DR: This paper considers Reverse Skyline query processing where the distance between attribute values are not necessarily metric, and proposes a method of using group-level reasoning and early pruning to micro-optimize processing by reducing attribute level comparisons.
Abstract: A Reverse Skyline query returns all objects whose skyline contains the query object. In this paper, we consider Reverse Skyline query processing where the distance between attribute values are not necessarily metric. We outline real world cases that motivate Reverse Skyline processing in such scenarios. We consider various optimizations to develop efficient algorithms for Reverse Skyline processing. Firstly, we consider block-based processing of objects to optimize on IO costs. We then explore pre-processing to re-arrange objects on disk to speed-up computational and IO costs. We then present our main contribution, which is a method of using group-level reasoning and early pruning to micro-optimize processing by reducing attribute level comparisons. An extensive empirical evaluation with real-world datasets and synthetic data of varying characteristics shows that our optimization techniques are indeed very effective in dramatically speeding Reverse Skyline processing, both in terms of computational costs and IO costs.

Book ChapterDOI
29 May 2011
TL;DR: This paper presents an approach for optimizing skyline queries over RDF data stored using a vertically partitioned schema model based on the concept of a "Header Point" which maintains a concise summary of the already visited regions of the data space.
Abstract: Skyline queries are a class of preference queries that compute the pareto-optimal tuples from a set of tuples and are valuable for multicriteria decision making scenarios. While this problem has received significant attention in the context of single relational table, skyline queries over joins of multiple tables that are typical of storage models for RDF data has received much less attention. A naive approach such as a join-first-skyline-later strategy splits the join and skyline computation phases which limit opportunities for optimization. Other existing techniques for multi-relational skyline queries assume storage and indexing techniques that are not typically used with RDF which would require a preprocessing step for data transformation. In this paper, we present an approach for optimizing skyline queries over RDF data stored using a vertically partitioned schema model. It is based on the concept of a "Header Point" which maintains a concise summary of the already visited regions of the data space. This summary allows some fraction of nonskyline tuples to be pruned from advancing to the skyline processing phase, thus reducing the overall cost of expensive dominance checks required in the skyline phase. We further present more aggressive pruning rules that result in the computation of near-complete skylines in significantly less time than the complete algorithm. A comprehensive performance evaluation of different algorithms is presented using datasets with different types of data distributions generated by a benchmark data generator.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: This work considers the problem of computing the probability of each point lying on the skyline, that is, the probability that it is not dominated by any other input point, and improves the best known exact solution.
Abstract: Given a set of points with uncertain locations, we consider the problem of computing the probability of each point lying on the skyline, that is, the probability that it is not dominated by any other input point. If each point's uncertainty is described as a probability distribution over a discrete set of locations, we improve the best known exact solution. We also suggest why we believe our solution might be optimal. Next, we describe simple, near-linear time approximation algorithms for computing the probability of each point lying on the skyline. In addition, some of our methods can be adapted to construct data structures that can efficiently determine the probability of a query point lying on the skyline.

Book ChapterDOI
24 Aug 2011
TL;DR: This work presents a simple and efficient algorithm which, given a set P of data points and a set Q of query points in the plane, returns the set of spatial skyline points in just O(|P| log |P|) time, which is significantly lower in complexity than the best known method.
Abstract: Skyline queries have gained attention lately for supporting effective retrieval over massive spatial data. While efficient algorithms have been studied for spatial skyline queries using Euclidean distance, or, L2 norm, these algorithms are (1) still quite computationally intensive and (2) unaware of the road constraints. Our goal is to develop a more efficient algorithm for L1 norm, also known as Manhattan distance, which closely reflects road network distance for metro areas with well-connected road networks. Towards this goal, we present a simple and efficient algorithm which, given a set P of data points and a set Q of query points in the plane, returns the set of spatial skyline points in just O(|P| log |P|) time, assuming that |Q| = |P|. This is significantly lower in complexity than the best known method. In addition to efficiency and applicability, our proposed algorithm has another desirable property of independent computation and extensibility to L∞ norm, which naturally invites parallelism and widens applicability. Our extensive empirical results suggest that our algorithm outperforms the state-of-the-art approaches by orders of magnitude.

Journal ArticleDOI
01 Dec 2011
TL;DR: A new algorithm that requires a remarkably less number of network distance calculations is proposed in this work, which uses a progressive nearest neighbor algorithm to minimize the set of candidates then evaluates those candidates by only comparing them to a subset of discovered skyline points.
Abstract: Skyline queries are used with data extensive applications, such as mobile location-based services, to support multi-criteria decision-making and to prune the data space by returning the most "interesting" data points. Most interesting data points are the points, which are not dominated by any other point. Spatial network skyline query is a subset of the skyline query problem where data points are nodes in a road network and the attributes of the data points are network distance relative to a set of query points. Spatial network skyline query's problem is the need to calculate the attributes with an expensive distance calculation operation. Previous works (Deng et al. Proceedings of the 23th international conference on data engineering, 796---805, 2007), Sharifzadeh et al. Proceedings of the 32nd international conference on very large databases, 751---762, 2009) that addressed this problem involved extensive network distance calculation between the query points and data points. A new algorithm that requires a remarkably less number of network distance calculations is proposed in this work. Our approach uses a progressive nearest neighbor algorithm to minimize the set of candidates then evaluates those candidates by only comparing them to a subset of discovered skyline points. Experiments showed the effectiveness of our algorithm compared to previous works.

Journal ArticleDOI
TL;DR: The skyline features are extracted from panoramic 3D scans and encoded as strings enabling the use of string matching for merging the scans, and initial results in the old city center of Bremen are presented.
Abstract: Acquisition and registration of terrestrial 3D laser scans is a fundamental task in mapping and modeling of cities in three dimensions. To automate this task marker-free registration methods are required. Based on the existence of skyline features, this paper proposes a novel method. The skyline features are extracted from panoramic 3D scans and encoded as strings enabling the use of string matching for merging the scans. Initial results of the proposed method in the old city center of Bremen are presented.

Book ChapterDOI
26 Oct 2011
TL;DR: This paper deals with Skyline queries in the context of possilistic databases, where uncertain attribute values are represented by possibility distributions, and a basic algorithm suited to their evaluation is provided.
Abstract: This paper deals with Skyline queries in the context of possilistic databases, where uncertain attribute values are represented by possibility distributions. In this framework, Skyline queries aim at computing the extent to which any tuple from a given relation is possibly/certainly not dominated by any other tuple from that relation. Beside the interpretation of possibilistic Skyline queries, a basic algorithm suited to their evaluation is provided.

Book ChapterDOI
22 Apr 2011
TL;DR: An efficient algorithm based on the grid index and a novel variant of the well-known Z-order curve is proposed to solve the problem of computing dynamic skylines considering range queries and results demonstrate that it is effective and efficient.
Abstract: Dynamic skyline queries are practical in many applications. For example, if no data exist to fully satisfy a query q in an information system, the data "closer" to the requirements of q can be retrieved as answers. Finding the nearest neighbors of q can be a solution; yet finding the data not dynamically dominated by any other data with respect to q, i.e. the dynamic skyline regarding q can be another solution. A data point p is defined to dynamically dominate another data point s, if the distance between each dimension of p and the corresponding dimension of q is no larger than the corresponding distance regarding s and q and at least in one dimension, the corresponding distance regarding p and q is smaller than that regarding s and q. Some approaches for answering dynamic skyline queries have been proposed. However, the existing approaches only consider the query as a point rather than a range in each dimension, also frequently issued by users. We make the first attempt to solve a problem of computing dynamic skylines considering range queries in this paper. To deal with this problem, we propose an efficient algorithm based on the grid index and a novel variant of the well-known Z-order curve. Moreover, a series of experiments are performed to evaluate the proposed algorithm and the experiment results demonstrate that it is effective and efficient.

Journal Article
TL;DR: This proposed edge-based skyline extraction algorithm is robust under severe environments with clutters and has even good performance for infrared sensor images with a low resolution.
Abstract: Skyline extraction in mountainous images can be used for navigation of vehicles or UAV(unmanned air vehicles), but it is very hard to extract skyline shape because of clutters like clouds, sea lines and field borders in images. We developed the edge-based skyline extraction algorithm using a proposed multistage edge filtering (MEF) technique. In this method, characteristics of clutters in the image are first defined and then the lines classified as clutters are eliminated by stages using the proposed MEF technique. After this processing, we select the last line using skyline measures among the remained lines. This proposed algorithm is robust under severe environments with clutters and has even good performance for infrared sensor images with a low resolution. We tested this proposed algorithm for images obtained in the field by an infrared camera and confirmed that the proposed algorithm produced a better performance and faster processing time than conventional algorithms. Keywords—MEF, mountainous image, navigation, skyline

Journal ArticleDOI
TL;DR: An efficient evaluation approach is proposed which is based on the circinal index to seamlessly integrate subspace skyline computation, K-means clustering and representatives selection, and returns K ''representative'' and ''diverse'' skyline objects to users.
Abstract: Skyline query processing has recently received a lot of attention in database and data-mining communities. To the best of our knowledge, the existing researches mainly focus on considering how to efficiently return the whole skyline set. However, when the cardinality and dimensionality of input objects increase, the number of skylines grows exponentially, and hence this ''huge'' skyline set is completely useless to users. On the other hand, in most real applications, the objects are usually clustered, and therefore many objects have similar attribute values. Motivated by the above facts, in this paper, we present a novel type of SkyCluster query to capture the skyline diversity and improve the usefulness of skyline result. The SkyCluster query integrates K-means clustering into skyline computation, and returns K ''representative'' and ''diverse'' skyline objects to users. To process such query, a straightforward approach is to simply integrate the existing techniques developed for skyline-only and clustering-only together. But this approach is costly since both skyline computation and K-means clustering are all CPU-sensitive. We propose an efficient evaluation approach which is based on the circinal index to seamlessly integrate subspace skyline computation, K-means clustering and representatives selection. Also, we present a novel optimization heuristic to further improve the query performance. Experimental study shows that our approach is both efficient and effective.

Book ChapterDOI
18 Apr 2011
TL;DR: This paper pioneer an entirely new domain for skyline query--namely, the categorical data--with which the corresponding ranking measures for the skyline queries are developed, and tested the proposed algorithm using the ACM Computing Classification System.
Abstract: Skyline query is an effective method to process large-sized multidimensional data sets as it can pinpoint the target data so that dominated data (say, 95% of data) can be efficiently excluded as unnecessary data objects. However, most of the conventional skyline algorithms were developed to handle numerical data. Thus, most of the text data were excluded from being processed by the algorithms. In this paper, we pioneer an entirely new domain for skyline query--namely, the categorical data--with which the corresponding ranking measures for the skyline queries are developed. We tested our proposed algorithm using the ACM Computing Classification System.

Journal ArticleDOI
TL;DR: This work presents a simple and efficient algorithm that computes the correct results, and proposes a fast approximation algorithm that returns a desirable subset of the skyline results.
Abstract: As more data-intensive applications emerge, advanced retrieval semantics, such as ranking and skylines, have attracted the attention of researchers. Geographic information systems are a good example of an application using a massive amount of spatial data. Our goal is to efficiently support exact and approximate skyline queries over massive spatial datasets. A spatial skyline query, consisting of multiple query points, retrieves data points that are not father than any other data points, from all query points. To achieve this goal, we present a simple and efficient algorithm that computes the correct results, also propose a fast approximation algorithm that returns a desirable subset of the skyline results. In addition, we propose a continuous query algorithm to trace changes of skyline points while a query point moves. To validate the effectiveness and efficiency of our algorithm, we provide an extensive empirical comparison between our algorithms and the best known spatial skyline algorithms from several perspectives.

Journal ArticleDOI
TL;DR: A robust execution framework called SKIN is proposed to evaluate skyline over joins and is shown to be robust for both skyline-friendly (independent and correlated) as well as skyline-unfriendly (anti-correlated) data distributions.