scispace - formally typeset
Search or ask a question

Showing papers on "Skyline published in 2012"


Journal ArticleDOI
TL;DR: The capabilities of Skyline were expanded to process ion intensity chromatograms of peptide analytes from full scan mass spectral data (MS1) acquired during HPLC MS/MS proteomic experiments and the utility of the MS1 filtering approach was examined.

413 citations


Journal ArticleDOI
01 Jun 2012
TL;DR: This paper outlines the objectives and the main principles that any distributed skyline approach has to fulfill, leading to useful guidelines for developing algorithms for distributed skyline processing, and reviews in detail existing approaches that are applicable for highly distributed environments.
Abstract: During the last decades, data management and storage have become increasingly distributed. Advanced query operators, such as skyline queries, are necessary in order to help users to handle the huge amount of available data by identifying a set of interesting data objects. Skyline query processing in highly distributed environments poses inherent challenges and demands and requires non-traditional techniques due to the distribution of content and the lack of global knowledge. This paper surveys this interesting and still evolving research area, so that readers can easily obtain an overview of the state-of-the-art. We outline the objectives and the main principles that any distributed skyline approach has to fulfill, leading to useful guidelines for developing algorithms for distributed skyline processing. We review in detail existing approaches that are applicable for highly distributed environments, clarify the assumptions of each approach, and provide a comparative performance analysis. Moreover, we study the skyline variants each approach supports. Our analysis leads to a taxonomy of existing approaches. Finally, we present interesting research topics on distributed skyline computation that have not yet been explored.

132 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel sliding window skyline model where an uncertain tuple may take the probability to be in the skyline at a certain timestamp t, and proposes an efficient and effective approach, namely the candidate list approach, which maintains lists of candidates that might become skylines in future sliding windows.

59 citations


Journal ArticleDOI
TL;DR: This paper proposes the notation of distributed skyline queries over uncertain data, and two communication- and computation-efficient algorithms are proposed to retrieve the qualified skylines from distributed local sites.
Abstract: The skyline operator has received considerable attention from the database community, due to its importance in many applications including multicriteria decision making, preference answering, and so forth. In many applications where uncertain data are inherently exist, i.e., data collected from different sources in distributed locations are usually with imprecise measurements, and thus exhibit kind of uncertainty. Taking into account the network delay and economic cost associated with sharing and communicating large amounts of distributed data over an internet, an important problem in this scenario is to retrieve the global skyline tuples from all the distributed local sites with minimum communication cost. Based on the well-known notation of the probabilistic skyline query over centralized uncertain data, in this paper, we propose the notation of distributed skyline queries over uncertain data. Furthermore, two communication- and computation-efficient algorithms are proposed to retrieve the qualified skylines from distributed local sites. Extensive experiments have been conducted to verify the efficiency, the effectiveness and the progressiveness of our algorithms with both the synthetic and real data sets.

55 citations


Proceedings ArticleDOI
26 Mar 2012
TL;DR: This paper design and analyze parallel algorithms for skyline queries using the MP model and a variation of the model in (Afrati and Ullman, EDBT 2010), the GMP model, which demands weaker load balancing constraints, and presents a 1-step algorithm in theGMP model for any number of dimensions.
Abstract: In this paper, we design and analyze parallel algorithms for skyline queries. The skyline of a multidimensional set consists of the points for which no other point exists that is at least as good along every dimension. As a framework for parallel computation, we use both the MP model proposed in (Koutris and Suciu, PODS 2011), which requires that the data is perfectly load-balanced, and a variation of the model in (Afrati and Ullman, EDBT 2010), the GMP model, which demands weaker load balancing constraints. In addition to load balancing, we want to minimize the number of blocking steps, where all processors must wait and synchronize. We propose a 2-step algorithm in the MP model for any dimension of the dataset, as well a 1-step algorithm for the case of 2 and 3 dimensions. Moreover, we present a 1-step algorithm in the GMP model for any number of dimensions.

54 citations


Journal ArticleDOI
TL;DR: An energy-efficient approach is proposed to minimize the communication cost among sensor nodes of evaluating range reverse skyline query and optimization mechanisms to improve the performance of multiple reverse skylines are discussed.
Abstract: Reverse skyline query plays an important role in many sensing applications, such as environmental monitoring, habitat monitoring, and battlefield monitoring. Due to the limited power supplies of wireless sensor nodes, the existing centralized approaches, which do not consider energy efficiency, cannot be directly applied to the distributed sensor environment. In this paper, we investigate how to process reverse skyline queries energy efficiently in wireless sensor networks. Initially, we theoretically analyzed the properties of reverse skyline query and proposed a skyband-based approach to tackle the problem of reverse skyline query answering over wireless sensor networks. Then, an energy-efficient approach is proposed to minimize the communication cost among sensor nodes of evaluating range reverse skyline query. Moreover, optimization mechanisms to improve the performance of multiple reverse skylines are also discussed. Extensive experiments on both real-world data and synthetic data have demonstrated the efficiency and effectiveness of our proposed approaches with various experimental settings.

53 citations


Journal ArticleDOI
TL;DR: This work contains the first study of continuous top-k dominating queries over data streams, and two approximate algorithms are proposed (AHBA and AMSA).
Abstract: Top-k dominating queries use an intuitive scoring function which ranks multidimensional points with respect to their dominance power, i.e., the number of points that a point dominates. The k points with the best (e.g., highest) scores are returned to the user. Both top-k and skyline queries have been studied in a streaming environment, where changes to the data set are very frequent. In such an environment, continuous query processing techniques are required toward efficient monitoring of query results, since periodic query re-execution is computationally intensive, and therefore, prohibitive. This work contains the first study of continuous top-k dominating queries over data streams. In comparison to continuous top-k and skyline queries, continuous top-k dominating queries pose additional challenges. Three exact algorithms (BFA, EVA, ADA) are studied, and among them ADA, which is enhanced with additional optimization techniques, shows the best overall performance. In some cases, we are willing to trade accuracy for speed. Toward this direction, two approximate algorithms are proposed (AHBA and AMSA). AHBA offers probabilistic guarantees regarding the accuracy of the result based on the Hoeffding bound, whereas AMSA performs a more aggressive computation resulting in more efficient processing. Evaluation results, based on real-life and synthetic data sets, show the efficiency and scalability of our techniques.

53 citations


Proceedings ArticleDOI
29 Oct 2012
TL;DR: Two anti-monotonic properties with varying degrees of applicability are identified: order-specific property which applies to SUM, MIN, and MAX as well as weak candidate-generation property which applied to MIN and MAX only.
Abstract: We formulate and investigate the novel problem of finding the skyline k-tuple groups from an n-tuple dataset - i.e., groups of k tuples which are not dominated by any other group of equal size, based on aggregate-based group dominance relationship. The major technical challenge is to identify effective anti-monotonic properties for pruning the search space of skyline groups. To this end, we show that the anti-monotonic property in the well-known Apriori algorithm does not hold for skyline group pruning. We then identify order-specific property which applies to SUM, MIN, and MAX and weak candidate-generation property which applies to MIN and MAX only. Experimental results on both real and synthetic datasets verify that the proposed algorithms achieve orders of magnitude performance gain over a baseline method.

51 citations


Journal ArticleDOI
TL;DR: A group skyline algorithm GDynamic is developed which is equivalent to a dynamic algorithm that fills a table of skyline groups that determines the dominance relation between two groups by comparing their aggregate values such as sums or averages of elements of individual dimensions.

42 citations


Journal ArticleDOI
TL;DR: This paper addresses the issue of efficiently processing continuous skyline queries in road networks by proposing two novel and important distance-based skyline queries, namely, the continuousd"@e-skylinequery (Cd" @e-SQ) and the continuous k nearest neighbor-Skyline query (Cknn-S Q).

41 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated the skyline of Istanbul and its transformation due to high-rise buildings and developed a geomodel that can be applied to many urban skylines and urban areas in Turkey and other cities of the world.

Proceedings ArticleDOI
24 Jun 2012
TL;DR: This work represents each QoS attribute of a Web service using a possibility distribution and introduces two skyline extensions on uncertain QoS called pos-Dominant skyline and nec-dominant skyline, and develops appropriate algorithms to efficiently compute both the pos-dominate skyline and deathly skyline.
Abstract: Quality of service (QoS) has been considered as a significant criterion for selecting among functionally similar Web services. Recent approaches focus on computing the skyline over a set of QoS attributes. This can completely free users from assigning weights to QoS attributes. However, these approaches are not sufficient in a dynamic Web service environment where the delivered QoS by a Web service is inherently uncertain. In this paper, we tackle the problem of skyline on uncertain QoS. We represent each QoS attribute of a Web service using a possibility distribution and introduce two skyline extensions on uncertain QoS called pos-dominant skyline and nec-dominant skyline. We then develop appropriate algorithms to efficiently compute both the pos-dominant skyline and nec-dominant skyline. Finally, we present our experimental results that show both the effectiveness of the introduced skyline extensions and the efficiency of the proposed algorithms.

Journal ArticleDOI
TL;DR: A novel skyline operator, namely stochastic skylines, is proposed for efficiently and effectively retrieving lskyline and gskyline from a set of uncertain objects, respectively, together with efficient and effective filtering techniques.
Abstract: In many applications involving multiple criteria optimal decision making, users may often want to make a personal trade-off among all optimal solutions for selecting one object that fits best their personal needs. As a key feature, the skyline in a multidimensional space provides the minimum set of candidates for such purposes by removing all points not preferred by any (monotonic) utility/scoring functions; that is, the skyline removes all objects not preferred by any user no matter how their preferences vary. Driven by many recent applications with uncertain data, the probabilistic skyline model is proposed to retrieve uncertain objects based on skyline probabilities. Nevertheless, skyline probabilities cannot capture the preferences of monotonic utility functions. Motivated by this, in this article we propose a novel skyline operator, namely stochastic skylines. In the light of the expected utility principle, stochastic skylines guarantee to provide the minimum set of candidates to optimal solutions over a family of utility functions. We first propose the lskyline operator based on the lower orthant orders. lskyline guarantees to provide the minimum set of candidates to the optimal solutions for the family of monotonic multiplicative utility functions. While lskyline works very effectively for the family of multiplicative functions, it may miss optimal solutions for other utility /scoring functions (e.g., linear functions). To resolve this, we also propose a general stochastic skyline operator, gskyline, based on the usual orders. gskyline provides the minimum candidate set to the optimal solutions for all monotonic functions. For the first time regarding the existing literature, we investigate the complexities of determining a stochastic order between two uncertain objects whose probability distributions are described discretely. We firstly show that determining the lower orthant order is NP-complete with respect to the dimensionality; consequently the problem of computing lskyline is NP-complete. We also show an interesting result as follows. While the usual order involves more complicated geometric forms than the lower orthant order, the usual order may be determined in polynomial time regarding all the inputs, including the dimensionality; this implies that gskyline can be computed in polynomial time. A general framework is developed for efficiently and effectively retrieving lskyline and gskyline from a set of uncertain objects, respectively, together with efficient and effective filtering techniques. Novel and efficient verification algorithms are developed to efficiently compute lskyline over multidimensional uncertain data, which run in polynomial time if the dimensionality is fixed, and to efficiently compute gskyline in polynomial time regarding all inputs. We also show, by theoretical analysis and experiments, that the sizes of lskyline and gskyline are both quite similar to that of conventional skyline over certain data. Comprehensive experiments demonstrate that our techniques are efficient and scalable regarding both CPU and IO costs.

Proceedings ArticleDOI
13 Oct 2012
TL;DR: This work proposes a novel reconstruction technique that exploits architectural properties of urban environments to create an accurate 3D city model from incomplete data and shows that the reconstruction achieves higher accuracy than a commercial solution.
Abstract: We present a method for automatically creating compact and accurate 3D city models needed for enhanced Augmented Reality applications. The input data are panorama images and LIDAR scans collected at street level and positioned using an IMU and a GPS. Our method corrects for the GPS error and the IMU drift to produce a globally consistent and well registered dataset for the whole city. We use structure from motion and skyline detection to complement the limited range of LIDAR data. Additionally, we propose a novel reconstruction technique that exploits architectural properties of urban environments to create an accurate 3D city model from incomplete data. Our method is able to process an entire city, or several terabytes of data, in a matter of days. We show that our reconstruction achieves higher accuracy than a commercial solution.

Journal ArticleDOI
01 Feb 2012
TL;DR: A novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all objects whose skyline probabilities are at least p (0 < p ≤ 1).
Abstract: Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all objects whose skyline probabilities are at least p (0?

Proceedings ArticleDOI
21 May 2012
TL;DR: A new MapReduce Skyline method for scalable parallel skyline query processing with new angular partitioning of the data space reduces the processing time in selecting optimal skyline services and defines a new performance metric to assess the local optimality of selected skyline services.
Abstract: Fast skyline selection of high-quality web services is of critically importance to upgrade e-commerce and various cloud applications. In this paper, we present a new MapReduce Skyline method for scalable parallel skyline query processing. Our new angular partitioning of the data space reduces the processing time in selecting optimal skyline services. Our method shortens the Reduce time significantly due to the elimination of more redundant dominance computations. Through Hadoop experiments on large server clusters, our method scales well with the increase of both attribute dimensionality and data-space cardinality. We define a new performance metric to assess the local optimality of selected skyline services. By experimenting over 10,000 real-life web service applications over 10 performance attribute dimensions, we find that the angular-partitioned MapReduce method is 1.7 and 2.3 times faster than the dimensional and grid partitioning methods, respectively with a higher probability to reach the local optimality. These results are very encouraging to select optimal web services in real-time out of a large number of web services.

Journal ArticleDOI
TL;DR: Borders can be achieved for computing several skyline variants, including the k-dominant skyline, k-skyband, and α-skyline, and the performance can be improved if some dimensions of the data space have small domains.
Abstract: We consider the skyline problem (aka the maxima problem), which has been extensively studied in the database community. The input is a set P of d-dimensional points. A point dominates another if the coordinate of the former is at most that of the latter on every dimension. The goal is to find the skyline, which is the set of points p ∈ P such that p is not dominated by any other point in P.The main result of this article is that, for any fixed dimensionality d ≥ 3, in external memory the skyline problem can be settled by performing O((N/B)logM/Bd−2(N/B)) I/Os in the worst case, where N is the cardinality of P, B the size of a disk block, and M the capacity of main memory. Similar bounds can also be achieved for computing several skyline variants, including the k-dominant skyline, k-skyband, and α-skyline. Furthermore, the performance can be improved if some dimensions of the data space have small domains. When the dimensionality d is not fixed, the challenge is to outperform the naive algorithm that simply checks all pairs of points in P × P. We give an algorithm that terminates in O((N/B) logd − 2N) I/Os, thus beating the naive solution for any d = O(log N / log log N).

Proceedings ArticleDOI
17 Sep 2012
TL;DR: An in-depth coverage of skyline computation models, algorithms and optimization techniques for improving both efficiency and quality of multi-criteria decision making and a novel dominance test technique, called GNL (GPU-based Nested Loop), which can drastically reduce the cost of dominance tests by leveraging GPUs, and outperform CPU-based dominance tests.
Abstract: Multi-criteria decision making is one of the most critical and yet most challenging components in modern enterprise business intelligence. It is well known that complex business decisions are often made based on multi-dimensional criteria. The competitiveness of optimal business decision making typically resorts to finding a good trade-off among many different and possibly contradicting criteria, e.g., maximum profit, minimum price, minimum resource consumption. A skyline query operator is by design to find the set of interesting data points (objects) over a large dimensional data collection, satisfying a set of possibly contradicting conditions. In this paper, we provide an in-depth coverage of skyline computation models, algorithms and optimization techniques for improving both efficiency and quality of multi-criteria decision making. By reviewing and revising the state of art research in multi-criteria decision making using skyline operations, we describe the essential concepts, the alternative models, and the suite of techniques for providing scalable and elastic skyline computation in massively distributed computing environments. The paper consists of four parts. First, we provide an overview of skyline query operators in terms of concepts, basic processing algorithms and representative application scenarios. Second, we review the state of art literature in skyline query processing research and development, outline the most representative classes of skyline query processing and optimization techniques and discuss the pros and cons of existing approaches. Third, we provide a comprehensive analysis on the inherent limitations of some existing skyline models and algorithms and discuss why scaling skyline query processing over large high dimensional datasets continues to pose daunting challenges. Finally, we present optimization techniques for designing parallel skyline query processing algorithms and how to utilize GPUs to support and scale parallel skyline computations over high dimensional large datasets. We also introduce a novel dominance test technique, called GNL (GPU-based Nested Loop), which can drastically reduce the cost of dominance tests by leveraging GPUs, and outperform CPU-based dominance tests.

Proceedings ArticleDOI
29 Oct 2012
TL;DR: This work develops novel algorithms for efficiently processing two important classes of queries involving user preferences, i.e. potential customers identification and product positioning, and develops a batched extension of the RSA algorithm that significantly improves upon processing multiple queries individually.
Abstract: The rapid growth of social web has contributed vast amounts of user preference data. Analyzing this data and its relationships with products could have several practical applications, such as personalized advertising, market segmentation, product feature promotion etc. In this work we develop novel algorithms for efficiently processing two important classes of queries involving user preferences, i.e. potential customers identification and product positioning. With regards to the first problem, we formulate product attractiveness based on the notion of reverse skyline queries. We then present a new algorithm, termed as RSA, that significantly reduces the I/O cost, as well as the computation cost, when compared to the state-of-the-art reverse skyline algorithm, while at the same time being able to quickly report the first results. Several real-world applications require processing of a large number of queries, in order to identify the product characteristics that maximize the number of potential customers. Motivated by this problem, we also develop a batched extension of our RSA algorithm that significantly improves upon processing multiple queries individually, by grouping contiguous candidates, exploiting I/O commonalities and enabling shared processing. Our experimental study using both real and synthetic data sets demonstrates the superiority of our proposed algorithms for the studied classes of queries.

Journal ArticleDOI
TL;DR: The Compressed SkyCube (CSC) is proposed that is much more compact, yet can still return the skyline of any subspace without consulting the base table, and has the advantage of no-precomputation in that it has efficient space cost and update cost.
Abstract: The skyline query can help identify the “best” objects in a multi-attribute dataset. During the past decade, this query has received considerable attention in the database research community. Most research focused on computing the “skyline” of a dataset, or the set of “skyline objects” that are not dominated by any other object. Such algorithms are not appropriate in an online system, which should respond in real time to skyline query requests with arbitrary subsets of the attributes (also called subspaces). To guarantee real-time response, an online system should precompute the skylines for all subspaces, and look up a skyline upon query. Unfortunately, because the number of subspaces is exponential to the number of attributes, such pre computation has very expensive storage cost and update cost. We propose the Compressed SkyCube (CSC) that is much more compact, yet can still return the skyline of any subspace without consulting the base table. The CSC therefore combines the advantage of precomputation in that it can respond to queries in real time, and the advantage of no-precomputation in that it has efficient space cost and update cost. This article presents the CSC data structures, the CSC query algorithm, the CSC update algorithm, and the CSC initial computation scheme. A solution to extend to high-dimensional data is also proposed.

Journal ArticleDOI
TL;DR: An interactive preference elicitation framework is developed - while user preferences are collected at each iteration, the framework iteratively updates skylines, and it is demonstrated that a few questions are enough to acquire a skyline with a manageable size.

Journal ArticleDOI
TL;DR: A novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data, and extends DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance.
Abstract: Recently, a trend has been observed towards supporting rank-aware query operators, such as top-k, that enable users to retrieve only a limited set of the most interesting data objects. As data nowadays is commonly stored distributed over multiple servers, a challenging problem is to support rank-aware queries in distributed environments. In this paper, we propose a novel approach, called DiTo, for efficient top-k processing over multiple servers, where each server stores autonomously a fraction of the data. Towards this goal, we exploit the inherent relationship of top-k and skyline objects, and we employ the skyline objects of servers as a data summarization mechanism for efficiently identifying the servers that store top-k results. Relying on a thresholding scheme, DiTo retrieves the top-k result set progressively, while the number of queried servers and transferred data is minimized. Furthermore, we extend DiTo to support data summarizations of bounded size, thus restricting the cost of summary distribution and maintenance. To this end, we study the challenging problem of finding an abstraction of the skyline set of fixed size that influences the performance of DiTo only slightly. Our experimental evaluation shows that DiTo performs efficiently and provides a viable solution when a high degree of distribution is required.

Journal ArticleDOI
TL;DR: A probabilistic skyline algorithm called PSkyline is developed which computes exact skyline probabilities of all objects in a given uncertain data set and an online probabilists algorithm called O-PSkyline for uncertain data streams and a top-k probabilisms algorithm called K-PSKYline to find top- k objects with the highest skyline probabilities.
Abstract: With the rapid increase in the amount of uncertain data available, probabilistic skyline computation on uncertain databases has become an important research topic. Previous work on probabilistic skyline computation, however, only identifies those objects whose skyline probabilities are higher than a given threshold, or is useful only for 2D data sets. In this paper, we develop a probabilistic skyline algorithm called PSkyline which computes exact skyline probabilities of all objects in a given uncertain data set. PSkyline aims to identify blocks of instances with skyline probability zero, and more importantly, to find incomparable groups of instances and dispense with unnecessary dominance tests altogether. To increase the chance of finding such blocks and groups of instances, PSkyline uses a new in-memory tree structure called Z-tree. We also develop an online probabilistic skyline algorithm called O-PSkyline for uncertain data streams and a top-k probabilistic skyline algorithm called K-PSkyline to find top-k objects with the highest skyline probabilities. Experimental results show that all the proposed algorithms scale well to large and high-dimensional uncertain databases.

01 Jan 2012
TL;DR: The proposed uncertain QoS-aware Skyline service selection approach based on cloud model first uses cloud model to compute the uncertainty of QoS and then adopts Skyline computing to extract Skyline services from Web services to prune redundant services.
Abstract: Because traditional QoS-aware Web service selection approach cannot ensure the reliability and the real-time of service selection, this paper proposes an uncertain QoS-aware Skyline service selection approach based on cloud model. The approach first uses cloud model to compute the uncertainty of QoS and then adopts Skyline computing to extract Skyline services from Web services to prune redundant services. Finally, mixed integer programming is employed to perform service selection from Skyline services. The study evaluates the approach experimentally using both real and synthetically generated datasets. The experimental results show that the

Proceedings ArticleDOI
24 Jun 2012
TL;DR: This paper introduces a novel concept called collective skyline to deal with the problem of multiple users preferences and conducts a set of experiments that demonstrate the effectiveness of the introduced concept.
Abstract: In this paper, we introduce a novel concept called collective skyline to deal with the problem of multiple users preferences. We then conduct a set of experiments that demonstrate the effectiveness of the introduced concept

Proceedings ArticleDOI
06 Nov 2012
TL;DR: Efficient algorithms are given for computing range-skylines and a related hardness result is established.
Abstract: Let S be a set of n points in Rd where each point has t ≥ 1 real-valued attributes called features. A range-skyline query on S takes as input a query box q e Rd and returns the skyline of the points of q ∩ S, computed w.r.t. their features (not their coordinates in Rd). Efficient algorithms are given for computing range-skylines and a related hardness result is established.

Patent
11 Dec 2012
TL;DR: In this paper, a method and system for searching for points of interest along a route is disclosed, including a relation that includes records that associate link identifiers, point of interest identifiers, and distances between the links and the points of interests.
Abstract: A method and system for searching for points of interest along a route is disclosed. A relation that includes records that associate link identifiers, point of interest identifiers, and distances between the links and the points of interest is generated during the compilation process of a first version of a geographic database. The relation is stored in compiled database products. When a compiled database product is being used by a navigation system, for example, navigation application software programs use the relation to accurately and efficiently find points of interest along a computed route. Navigation systems can also use the relation to service skyline queries and responsively generate skyline graphs of points of interest.

Book ChapterDOI
15 Apr 2012
TL;DR: The experimental results show that OPRS is an effective way to solve the problem of continuous probabilistic reverse skyline queries, and it could significantly reduce the executionx time of continuous Probabilisticreverse skyline queries and meet the requirements of practical applications.
Abstract: Reverse skyline plays an important role in market decision-making, environmental monitoring and market analysis. Now the flow property and uncertainty of data are more and more apparent, probabilistic reverse skyline query over uncertain data stream has become a new research topic. Firstly, a novel pruning technique is proposed to reduce the number of uncertain tuples reserved for processing continuous probabilistic reverse skyline query. Then some probability pruning techniques are proposed to reduce some redundant calculations. Next, an efficient algorithm, called Optimization Probabilistic Reverse Skyline (OPRS), is proposed to process continuous probabilistic reverse skyline queries. Finally, the performance of OPRS is verified through a large number of simulation experiments. The experimental results show that OPRS is an effective way to solve the problem of continuous probabilistic reverse skyline, and it could significantly reduce the executionx time of continuous probabilistic reverse skyline queries and meet the requirements of practical applications.

Journal ArticleDOI
TL;DR: RBSSQ method uses areplacement-based approach and is applicable to the databases having any n umber of missing dimensions in the database objects and can efficiently compute skyline sets from data items with missing values.
Abstract: With the increase of data volume, advanced queryoperators, such as skyline queries, are necessary in order to help users to handle the huge amount ofavailable data by identifying a set of interesting data objects. Skyline queries help us to filter unnecessary information efficiently and provide us cluesfor various decision making tasks. Most of the existing skyline algorithms cannot preserveindividual’s privacy and are not well suited for data with outliers and frequently updated data. Considering these issues, earlierwe have proposed skyline sets queries from databases where all dimensions are available for all data items and considered an efficient algorithm for computing convex skyline sets. In this paper, we use that idea for skyline sets queries for incomplete data and propose a method, namely, RBSSQ. RBSSQ method uses areplacement-based approach and is applicable to the databases having any n umber of missing dimensions in the database objects. We have conducted several experiments in terms of computational cost and found that our proposed method can efficiently compute skyline sets from data items with missing values.

Journal ArticleDOI
TL;DR: An innovative purely sampling-based (PS) method for skyline cardinality estimation that does not assume any particular data distribution and is, thus, more robust than LS and much faster to yield the estimates than KB.
Abstract: A skyline query returns a set of candidate records that satisfy several preferences. It is an operation commonly performed to aid decision making. Since executing a skyline query is expensive and a query plan may combine skyline queries with other data operations such as join, it is important that the query optimizer can quickly yield an accurate cardinality estimate for a skyline query. Log Sampling (LS) and Kernel-Based ( KB) skyline cardinality estimation are the two state-of-the-art skyline cardinality estimation methods. LS is based on a hypothetical model A(log(n))B. Since this model is originally derived under strong assumptions like data independence between dimensions, it does not apply well to an arbitrary data set. Consequently, LS can yield large estimation errors. KB relies on the integration of the estimated probability density function (PDF) to derive the scale factor Ψds. As the estimation of PDF and the ensuing integration both involve complex mathematical calculations, KB is time consuming. In view of these problems, we propose an innovative purely sampling-based (PS) method for skyline cardinality estimation. PS is non-parametric. It does not assume any particular data distribution and is, thus, more robust than LS. PS does not require complex mathematical calculations. Therefore, it is much simpler to implement and much faster to yield the estimates than KB. Extensive empirical studies show that for a variety of real and synthetic data sets, PS outperforms LS in terms of estimation speed, estimation accuracy, and estimation variability under the same space budget. PS outperforms KB in terms of estimation speed and estimation variability under the same performance mark.