scispace - formally typeset
Search or ask a question

Showing papers on "Skyline published in 2010"


Journal ArticleDOI
TL;DR: The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM).
Abstract: Summary: Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools. Availability: Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.

3,794 citations


Proceedings ArticleDOI
01 Mar 2010
TL;DR: This work employs graph embedding techniques to enable a best-first based graph exploration considering route preferences based on arbitrary road attributes and shows that this approach is able to reduce the search space significantly and that the skyline can be computed in efficient time in the experimental evaluation.
Abstract: In recent years, the research community introduced various methods for processing skyline queries in multidimensional databases. The skyline operator retrieves all objects being optimal w.r.t. an arbitrary linear weighting of the underlying criteria. The most prominent example query is to find a reasonable set of hotels which are cheap but close to the beach. In this paper, we propose an new approach for computing skylines on routes (paths) in a road network considering multiple preferences like distance, driving time, the number of traffic lights, gas consumption, etc. Since the consideration of different preferences usually involves different routes, a skyline-fashioned answer with relevant route candidates is highly useful. In our work, we employ graph embedding techniques to enable a best-first based graph exploration considering route preferences based on arbitrary road attributes. The core of our skyline query processor is a route iterator which iteratively computes the top routes according to (at least one) preference in an efficient way avoiding that route computations need to be issued from scratch in each iteration. Furthermore, we propose pruning techniques in order to reduce the search space. Our pruning strategies aim at pruning as many route candidates as possible during the graph exploration. Therefore, we are able to prune candidates which are only partially explored. Finally, we show that our approach is able to reduce the search space significantly and that the skyline can be computed in efficient time in our experimental evaluation.

145 citations


Journal ArticleDOI
TL;DR: This work presents a novel concept, called p-dominant service skyline, which provides an integrated solution to tackle the above two issues simultaneously, and presents a p-R-tree indexing structure and a dual-pruning scheme to efficiently compute it.
Abstract: The performance of a service provider may fluctuate due to the dynamic service environment. Thus, the quality of service actually delivered by a service provider is inherently uncertain. Existing service optimization approaches usually assume that the quality of service does not change over time. Moreover, most of these approaches rely on computing a predefined objective function. When multiple quality criteria are considered, users are required to express their preference over different (and sometimes conflicting) quality attributes as numeric weights. This is rather a demanding task and an imprecise specification of the weights could miss user-desired services. We present a novel concept, called p-dominant service skyline. A provider S belongs to the p-dominant skyline if the chance that S is dominated by any other provider is less than p. Computing the p-dominant skyline provides an integrated solution to tackle the above two issues simultaneously. We present a p-R-tree indexing structure and a dual-pruning scheme to efficiently compute the p-dominant skyline. We assess the efficiency of the proposed algorithm with an analytical study and extensive experiments.

140 citations


Journal ArticleDOI
01 Jun 2010
TL;DR: The approaches are shown to outperform the state-of-the-art algorithms that are specialized to address particular skyline problems, especially when a large number of skyline points are resulted, via comprehensive experiments.
Abstract: Given a set of data points in a multidimensional space, a skyline query retrieves those data points that are not dominated by any other point in the same dataset. Observing that the properties of Z-order space filling curves (or Z-order curves) perfectly match with the dominance relationships among data points in a geometrical data space, we, in this paper, develop and present a novel and efficient processing framework to evaluate skyline queries and their variants, and to support skyline result updates based on Z-order curves. This framework consists of ZBtree, i.e., an index structure to organize a source dataset and skyline candidates, and a suite of algorithms, namely, (1) ZSearch, which processes skyline queries, (2) ZInsert, ZDelete and ZUpdate, which incrementally maintain skyline results in presence of source dataset updates, (3) ZBand, which answers skyband queries, (4) ZRank, which returns top-ranked skyline points, (5) k-ZSearch, which evaluates k-dominant skyline queries, and (6) ZSubspace, which supports skyline queries on a subset of dimensions. While derived upon coherent ideas and concepts, our approaches are shown to outperform the state-of-the-art algorithms that are specialized to address particular skyline problems, especially when a large number of skyline points are resulted, via comprehensive experiments.

90 citations


Patent
22 Nov 2010
TL;DR: In this article, a solar access measurement device (SAMD) located at a predetermined position is disclosed. The SAMD may include a skyline detector enabled to detect a skyline of a horizon relative to the SAMD, an orientation determination unit enabled to determine the orientation of the skyline detector, and a processor in signal communication with the skyline detectors and orientation determination units.
Abstract: A Solar Access Measurement Device (“SAMD”) located at a predetermined position is disclosed. The SAMD may include a skyline detector enabled to detect a skyline of a horizon relative to the SAMD, an orientation determination unit enabled to determine the orientation of the skyline detector, and a processor in signal communication with the skyline detector and orientation determination unit.

82 citations


Proceedings ArticleDOI
03 Dec 2010
TL;DR: The goal is to estimate global position by matching skylines extracted from omni-directional images to skyline segments from coarse 3D city models by proposing a sky-segmentation algorithm using graph cuts for estimating the geo-location.
Abstract: This paper investigates the problem of geo-localization in GPS challenged urban canyons using only skylines. Our proposed solution takes a sequence of upward facing omnidirectional images and coarse 3D models of cities to compute the geo-trajectory. The camera is oriented upwards to capture images of the immediate skyline, which is generally unique and serves as a fingerprint for a specific location in a city. Our goal is to estimate global position by matching skylines extracted from omni-directional images to skyline segments from coarse 3D city models. Under day-time and clear sky conditions, we propose a sky-segmentation algorithm using graph cuts for estimating the geo-location. In cases where the skyline gets affected by partial fog, night-time and occlusions from trees, we propose a shortest path algorithm that computes the location without prior sky detection. We show compelling experimental results for hundreds of images taken in New York, Boston and Tokyo under various weather and lighting conditions (daytime, foggy dawn and night-time).

80 citations


Proceedings ArticleDOI
22 Mar 2010
TL;DR: This work proposes that data incomparability should be treated as another key factor in optimizing skyline computation, and identifies common modules shared by existing non-index skyline algorithms to develop a cost model to guide a balanced pivot point selection.
Abstract: Skyline queries have gained a lot of attention for multi-criteria analysis in large-scale datasets. While existing skyline algorithms have focused mostly on exploiting data dominance to achieve efficiency, we propose that data incomparability should be treated as another key factor in optimizing skyline computation. Specifically, to optimize both factors, we first identify common modules shared by existing non-index skyline algorithms, and then analyze them to develop a cost model to guide a balanced pivot point selection. Based on the cost model, we lastly implement our balanced pivot selection in two algorithms, BSkyTree-S and BSkyTree-P, treating both dominance and incomparability as key factors. Our experimental results demonstrate that proposed algorithms outperform state-of-the-art skyline algorithms up to two orders of magnitude.

61 citations


Journal ArticleDOI
TL;DR: The assumption that the upper outline of ground objects formed against the sky, i.e. the skyline, provides sufficient information for visual navigation of desert ants is explored and allows novel approaches to technical outdoor navigation.
Abstract: Desert ants, foraging in cluttered semiarid environments, are thought to be visually guided along individual, habitual routes. While other navigational mechanisms (e.g. path integration) are well studied, the question of how ants extract reliable visual features from a complex visual scene is still largely open. This paper explores the assumption that the upper outline of ground objects formed against the sky, i.e. the skyline, provides sufficient information for visual navigation. We constructed a virtual model of the ant’s environment. In the virtual environment, panoramic images were recorded and adapted to the resolution of the desert ant’s complex eye. From these images either a skyline code or a pixel-based intensity code were extracted. Further, two homing algorithms were implemented, a modified version of the average landmark vector (ALV) model (Lambrinos et al. Robot Auton Syst 30:39–64, 2000) and a gradient ascent method. Results show less spatial aliasing for skyline coding and best homing performance for ALV homing based on skyline codes. This supports the assumption of skyline coding in visual homing of desert ants and allows novel approaches to technical outdoor navigation.

55 citations


Proceedings ArticleDOI
01 Mar 2010
TL;DR: This paper forms skyline and top-k queries in MCNs and designs algorithms for their efficient processing, which have two important properties in preference-based querying; the skyline methods are progressive and the top- k ones are incremental.
Abstract: Research on spatial network databases has so far considered that there is a single cost value associated with each road segment of the network. In most real-world situations, however, there may exist multiple cost types involved in transportation decision making. For example, the different costs of a road segment could be its Euclidean length, the driving time, the walking time, possible toll fee, etc. The relative significance of these cost types may vary from user to user. In this paper we consider such multi-cost transportation networks (MCN), where each edge (road segment) is associated with multiple cost values. We formulate skyline and top-k queries in MCNs and design algorithms for their efficient processing. Our solutions have two important properties in preference-based querying; the skyline methods are progressive and the top-k ones are incremental. The performance of our techniques is evaluated with experiments on a real road network.

49 citations


Proceedings ArticleDOI
05 Jul 2010
TL;DR: A skyline computation approach that enables service users to optimally access sets of services as an integrated service package is proposed and a dual progressive algorithm is developed that is able to progressively report the skyline.
Abstract: We propose a skyline computation approach that enables service users to optimally access sets of services as an integrated service package. We first present a one pass algorithm based on the observation that a multi-service skyline is completely determined by single service skylines. The skyline is returned after an enumeration on a significantly reduced candidate space. We then develop a dual progressive algorithm that is able to progressively report the skyline. We conduct an experimental study to assess the performance of the skyline computation approaches.

48 citations


Journal ArticleDOI
01 Sep 2010
TL;DR: SKYRANK is a framework for ranking the skyline points in the absence of a user-defined preference function, thereby discovering a limited subset of the most interesting points of the skyline set and is extended to handle top-k preference skyline queries, when the user's preferences are available.
Abstract: Skyline queries aim to help users make intelligent decisions over complex data by discovering a set of interesting points, when different and often conflicting criteria are considered. Unfortunately, as the dimensionality of the dataset grows, the skyline operator loses its discriminating power and returns a large fraction of the data. The huge size of the result set hinders decision-making and motivates the ranking of skyline points. Therefore, users prefer to retrieve the top-k skyline points instead of the whole skyline set. In this paper, we propose SKYRANK, a framework for ranking the skyline points in the absence of a user-defined preference function, thereby discovering a limited subset of the most interesting points of the skyline set. For this purpose, we define the skyline graph, which relies on the dominance relationships between the skyline points for different subsets of dimensions (subspaces). SKYRANK applies well-known authority-based ranking algorithms on the skyline graph and, as described in this paper, discovers the importance of a skyline point exploiting the subspace dominance relationships. Furthermore, we extend SKYRANK to handle top-k preference skyline queries, when the user's preferences are available. Our experimental evaluation illustrates the complexity of the dominance relationships and the ranking ability of our framework.

Proceedings ArticleDOI
01 Mar 2010
TL;DR: FlexPref, a framework for extensible preference evaluation in database systems implemented in the query processor, aims to support a wide-array of preference evaluation methods in a single extensible code base.
Abstract: Personalized database systems give users answers tailored to their personal preferences. While numerous preference evaluation methods for databases have been proposed (e.g., skyline, top-k, k-dominance, k-frequency), the implementation of these methods at the core of a database system is a double-edged sword. Core implementation provides efficient query processing for arbitrary database queries, however this approach is not practical as each existing (and future) preference method requires a custom query processor implementation. To solve this problem, this paper introduces FlexPref, a framework for extensible preference evaluation in database systems. FlexPref, implemented in the query processor, aims to support a wide-array of preference evaluation methods in a single extensible code base. Integration with FlexPref is simple, involving the registration of only three functions that capture the essence of the preference method. Once integrated, the preference method “lives” at the core of the database, enabling the efficient execution of preference queries involving common database operations. To demonstrate the extensibility of FlexPref, we provide case studies showing the implementation of three database operations (single table access, join, and sorted list access) and five state-of-the-art preference evaluation methods (top-k, skyline, k-dominance, top-k dominance, and k-frequency). We also experimentally study the strengths and weaknesses of an implementation of FlexPef in PostgreSQL over a range of single-table and multi-table preference queries.

Journal ArticleDOI
01 Sep 2010
TL;DR: This paper introduces a novel approach significantly reducing domination tests for a given subspace and the number of subspaces searched, and introduces two closure operators that enable a concise representation of skyline cubes.
Abstract: In this paper, we tackle the problem of efficient skycube computation. We introduce a novel approach significantly reducing domination tests for a given subspace and the number of subspaces searched. Technically, we identify two types of skyline points that can be directly derived without using any domination tests. Moreover, based on formal concept analysis, we introduce two closure operators that enable a concise representation of skyline cubes. We show that this concise representation is easy to compute and develop an efficient algorithm, which only needs to search a small portion of the huge search space. We show with empirical results the merits of our approach.

Journal ArticleDOI
01 Nov 2010
TL;DR: A novel ranking model, based on the concept of Skyline, is proposed as an alternative to the usual one based on aggregation functions and k-Nearest Neighbors queries, which is consistently the most efficient one, even if this sometimes comes at the price of a reduced effectiveness.
Abstract: Many modern image database systems adopt a region-based paradigm, in which images are segmented into homogeneous regions in order to improve the retrieval accuracy. With respect to the case where images are dealt with as a whole, this leads to some peculiar query processing issues that have not been investigated so far in an integrated way. Thus, it is currently hard to understand how the different alternatives for implementing the region-based image retrieval model might impact on performance. In this paper, we analyze in detail such issues, in particular the type of matching between regions (either one-to-one or many-to-many). Then, we propose a novel ranking model, based on the concept of Skyline, as an alternative to the usual one based on aggregation functions and k-Nearest Neighbors queries. We also discuss how different query types can be efficiently supported. For all the considered scenarios we detail efficient index-based algorithms that are provably correct. Extensive experimental analysis shows, among other things, that: (1) the 1–1 matching type has to be preferred to the N–M one in terms of efficiency, whereas the two have comparable effectiveness, (2) indexing regions rather than images performs much better, and (3) the novel Skyline ranking model is consistently the most efficient one, even if this sometimes comes at the price of a reduced effectiveness.

Book ChapterDOI
01 Apr 2010
TL;DR: This paper formally defines the new problem of top-k skyline computation, proposes an intelligent method to resolve this problem, and conducts a set of experiments to show the effectiveness and efficiency of the proposed algorithm.
Abstract: The problem of top-k skyline computation has attracted considerable research attention in the past few years. Given a dataset, a top-k skyline returns k “most interesting” skyline tuples based on some kind of preference specified by the user. We extend the concept of top-k skyline to a so-called top-k combinatorial skyline query (k-CSQ). In contrast to the existing top-k skyline query (which is mainly to find the interesting skyline tuples), a k-CSQ is to find the interesting skyline tuples from various kinds of combinations of the given tuples. The k-CSQ is an important tool for areas such as decision making, market analysis, business planning, and quantitative economics research. In this paper, we will formally define this new problem, propose an intelligent method to resolve this problem, and also conduct a set of experiments to show the effectiveness and efficiency of the proposed algorithm.

Book ChapterDOI
01 Apr 2010
TL;DR: This paper proposes a novel pruning rule based on graph properties to derive the candidates for DSG-query, that are guaranteed not to introduce false negatives, and employs a filter-and-refine framework to speed up the query processing.
Abstract: Given a set of query points, a dynamic skyline query reports all data points that are not dominated by other data points according to the distances between data points and query points In this paper, we study dynamic skyline queries in a large graph (DSG-query for short) Although dynamic skylines have been studied in Euclidean space [14], road network [5], and metric space [3,6], there is no previous work on dynamic skylines over large graphs We employ a filter-and-refine framework to speed up the query processing that can answer DSG-query efficiently We propose a novel pruning rule based on graph properties to derive the candidates for DSG-query, that are guaranteed not to introduce false negatives In the refinement step, with a carefully-designed index structure, we compute short path distances between vertices in O(H), where H is the number of maximal hops between any two vertices Extensive experiments demonstrate that our methods outperform existing algorithms by orders of magnitude

Proceedings ArticleDOI
06 Jun 2010
TL;DR: This paper studies direction-based spatial skyline queries (DSS queries) which retrieve nearest objects around the user from different directions and proposes algorithms to answer snapshot queries which find objects on the DSS according to the user's current position.
Abstract: Traditional location-based services recommend nearest objects to the user by considering their spatial proximity. However, an object not only has its distance but also has its direction which originates from the user to it. In this paper, we study direction-based spatial skyline queries (DSS queries) which retrieve nearest objects around the user from different directions. The closer object is better than or dominates the further object if they are in the same direction. The objects that cannot be dominated by any other object are included in the direction-based spatial skyline (DSS). We propose algorithms to answer snapshot queries which find objects on the DSS according to the user's current position. We also develop algorithms to support continuous queries which retrieve objects on the DSS while the user is moving linearly. Extensive experiments verify the performance of our proposed algorithms using both real and synthetic datasets.

Journal ArticleDOI
TL;DR: This paper is the first work to address skyline queries over distributed data streams, where streams derive from multiple horizontally split data sources, and presents an efficient and an effective algorithm called BOCS to handle this issue under a more challenging environment of distributed streams.
Abstract: Data management and data mining over distributed data streams have received considerable attention within the database community recently. This paper is the first work to address skyline queries over distributed data streams, where streams derive from multiple horizontally split data sources. Skyline query returns a set of interesting objects which are not dominated by any other objects within the base dataset. Previous work is concentrated on skyline computations over static data or centralized data streams. We present an efficient and an effective algorithm called BOCS to handle this issue under a more challenging environment of distributed streams. BOCS consists of an efficient centralized algorithm GridSky and an associated communication protocol. Based on the strategy of progressive refinement in BOCS, the skyline is incrementally computed by two phases. In the first phase, local skylines on remote sites are maintained by GridSky. At each time, only skyline increments on remote sites are sent to the coordinator. In the second phase, a global skyline is obtained by integrating remote increments with the latest global skyline. A theoretical analysis shows that BOCS is communication-optimal among all algorithms which use a share-nothing strategy. Extensive experiments demonstrate that our proposals are efficient, scalable, and stable.

Journal ArticleDOI
TL;DR: An efficient and scalable two-phase algorithm is proposed to process the skyline queries in different subspaces based on the full space skyline, and several novel pruning techniques are proposed to balance the query cost and update cost.
Abstract: Given a set of k-dimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different dimensions of the data, and issue queries on any subset of k dimensions in stream environments. This paper focuses on supporting concurrent and unpredictable subspace skyline queries over data streams. Simply to compute and store the skyline objects of every subspace in stream environments will incur expensive update cost. To balance the query cost and update cost, we only maintain the full space skyline in this paper. We first propose an efficient maintenance algorithm and several novel pruning techniques. Then, an efficient and scalable two-phase algorithm is proposed to process the skyline queries in different subspaces based on the full space skyline. Furthermore, we present the theoretical analyses and extensive experiments that demonstrate our method is both efficient and effective.

Proceedings ArticleDOI
01 Mar 2010
TL;DR: A framework for evaluating skylines in the presence of equijoins is described, including the development of algorithms to answer queries over large input tables in a non-blocking, pipeline fashion, which significantly speeds up the entire query evaluation time.
Abstract: When a database system is extended with the skyline operator, it is important to determine the most efficient way to execute a skyline query across tables with join operations. This paper describes a framework for evaluating skylines in the presence of equijoins, including: (1) the development of algorithms to answer such queries over large input tables in a non-blocking, pipeline fashion, which significantly speeds up the entire query evaluation time. These algorithms are built on top of the traditional relational Nested-Loop and the Sort-Merge join algorithms, which allows easy implementation of these methods in existing relational systems; (2) a novel method for estimating the skyline selectivity of the joined table; (3) evaluation of skyline computation based on the estimation method and the proposed evaluation techniques; and (4) a systematic experimental evaluation to validate our skyline evaluation framework.

Journal ArticleDOI
01 Dec 2010
TL;DR: A new indexing method named ZINC (for Z-order Indexing with Nested Code) that supports efficient skyline computation for data with both totally and partially ordered attribute domains and significantly outperforms the state-of-the-art TSS indexing scheme for skyline queries.
Abstract: We present a new indexing method named ZINC (for Z-order Indexing with Nested Code) that supports efficient skyline computation for data with both totally and partially ordered attribute domains. The key innovation in ZINC is based on combining the strengths of the ZB-tree, which is the state-of-the-art index method for computing skylines involving totally ordered domains, with a novel, nested coding scheme that succinctly maps partial orders into total orders. An extensive performance evaluation demonstrates that ZINC significantly outperforms the state-of-the-art TSS indexing scheme for skyline queries.

Proceedings ArticleDOI
21 Jun 2010
TL;DR: This paper proposes the notation of distributed skyline queries over uncertain data, and two communication- and computation-efficient algorithms are proposed to retrieve the qualified skylines from distributed local sites.
Abstract: The skyline operator has received considerable attention from the database community, due to its importance in many applications including multi-criteria decision making, preference answering, and so forth. In many applications where uncertain data are inherently exist, i.e., data collected from different sources in distributed locations are usually with imprecise measurements, and thus exhibit kind of uncertainty. Taking into account the network delay and economic cost associated with sharing and communicating large amounts of distributed data over an internet, an important problem in this scenario is to retrieve the global skyline tuples from all the distributed local sites with minimum communication cost. Based on the well known notation of the probabilistic skyline query over centralized uncertain data, in this paper, for the first time, we propose the notation of distributed skyline queries over uncertain data. Furthermore, two communication-and computation-efficient algorithms are proposed to retrieve the qualified skylines from distributed local sites. Extensive experiments have been conducted to verify the efficiency and the effectiveness of our algorithms with both the synthetic and real data sets.

Journal ArticleDOI
01 Apr 2010
TL;DR: It is proved that traditional dominance is the only relationship satisfying all desirable properties, and some new dominance relationships are presented by relaxing some of the properties to design new top-k skyline queries that return robust results of a controllable size.
Abstract: Skyline queries are often used on data sets in multi-dimensional space for many decision-making applications. Traditionally, an object p is said to dominate another object q if, for all dimensions, it is no worse than q and is better on at least one dimension. Therefore, the skyline of a data set consists of all objects not dominated by any other object. To better cater to application requirements such as controlling the size of the skyline or handling data sets that are not well-structured, various works have been proposed to extend the definition of skyline based on variants of the dominance relationship. In view of the proliferation of variants, in this paper, a generalized framework is proposed to guide the extension of skyline query from conventional definition to different variants. Our framework explicitly and carefully examines the various properties that should be preserved in a variant of the dominance relationship so that: (1) maintaining original advantages, while extending adaptivity to application semantics, and (2) keeping computational complexity almost unaffected. We prove that traditional dominance is the only relationship satisfying all desirable properties, and present some new dominance relationships by relaxing some of the properties. These relationships are general enough for us to design new top-k skyline queries that return robust results of a controllable size. We analyze the existing skyline algorithms based on their minimum requirements on dominance properties. We also extend our analysis to data sets with missing values, and present extensive experimental results on the combinations of new dominance relationships and skyline algorithms.

Proceedings ArticleDOI
01 Mar 2010
TL;DR: A progressive query evaluation framework ProgXe is proposed that transforms the execution of queries involving skyline over joins to be non-blocking, i.e., to be progressively generating results early and often.
Abstract: Multi-criteria decision support (MCDS) is crucial in many business and web applications such as web searches, B2B portals and on-line commerce. Such MCDS applications need to report results early; as soon as they are being generated so that they can react and formulate competitive decisions in near real-time. The ease in expressing user preferences in web-based applications has made Pareto-optimal (skyline) queries a popular class of MCDS queries. However, state-of-the-art techniques either focus on handling skylines on single input sets (i.e., no joins) or do not tackle the challenge of producing progressive early output results. In this work, we propose a progressive query evaluation framework ProgXe that transforms the execution of queries involving skyline over joins to be non-blocking, i.e., to be progressively generating results early and often. In ProgXe the query processing (join, mapping and skyline) is conducted at multiple levels of abstraction, thereby exploiting the knowledge gained from both input as well as mapped output spaces. This knowledge enables us to identify and reason about abstract-level relationships to guarantee correctness of early output. It also provides optimization opportunities previously missed by current techniques. To further optimize ProgXe, we incorporate an ordering technique that optimizes the rate at which results are reported by translating the optimization of tuple-level processing into a job-sequencing problem. Our experimental study over a wide variety of data sets demonstrates the superiority of our approach over state-of-the-art techniques.

Journal ArticleDOI
01 Sep 2010
TL;DR: Two new methods for skyline evaluation in multidimensional data with totally ordered attribute domains are proposed, inspired by the lattice theorem and an off-the-shelf skyline algorithm, which are up to an order of magnitude more efficient than previous work and scale well with different problem parameters.
Abstract: Although there has been a considerable body of work on skyline evaluation in multidimensional data with totally ordered attribute domains, there are only a few methods that consider attributes with partially ordered domains. Existing work maps each partially ordered domain to a total order and then adapts algorithms for totally-ordered domains to solve the problem. Nevertheless these methods either use stronger notions of dominance, which generate false positives, or require expensive dominance checks. In this paper, we propose two new methods, which do not have these drawbacks. The first method uses an appropriate mapping of a partial order to a total order, inspired by the lattice theorem and an off-the-shelf skyline algorithm. The second technique uses an appropriate storage and indexing approach, inspired by column stores, which enables efficient verification of whether a pair of objects are incompatible. We demonstrate that both our methods are up to an order of magnitude more efficient than previous work and scale well with different problem parameters, such as complexity of partial orders.

Journal ArticleDOI
01 Jan 2010
TL;DR: This paper considers sub-space skyline queries in a more general database environment, such that the skyline operator does not stand alone in users' queries, and introduces an algorithm to answer sub- space skyline queries with constraints.
Abstract: Multi-objective optimization has been extensively studied in the machine learning literature. And recently the database community adapted the concept as skyline queries focusing mainly on retrieving optimal values from the full-space. In this paper, we consider sub-space skyline queries in a more general database environment, such that the skyline operator does not stand alone in users' queries. In particular, the skyline operator may commute with the selection operator which may express users' preferences or constraints on the skylines; we call this class skyline queries with constraints. Queries in this class are different from constrained skyline queries as described in the literature. We introduce an algorithm to answer sub-space skyline queries with constraints. We investigate the conditions under which the two classes of queries are equivalent; this allows for more efficient computation of skyline queries. Unlike the previous works, we do not design a new index specifically for handling the skylines. We try to make full use of the resources available in traditional relational databases for skyline computation. Further, we consider the case when the constraints are absent. We study the relationship between the skylines of different sub-spaces and record this information in a special data structure to help in pruning the search space.

Book ChapterDOI
01 Apr 2010
TL;DR: This paper establishes a design space for parallel algorithms in the domain of personalized database retrieval by investigating the spectrum of base operations of different retrieval algorithms and various parallelization techniques to develop a set of highly scalable and high-performing skyline algorithms for different retrieval scenarios.
Abstract: Until recently algorithms continuously gained free performance improvements due to ever increasing processor speeds. Unfortunately, this development has reached its limit. Nowadays, new generations of CPUs focus on increasing the number of processing cores instead of simply increasing the performance of a single core. Thus, sequential algorithms will be excluded from future technological advances. Instead, highly scalable parallel algorithms are needed to fully tap new hardware potentials. In this paper we establish a design space for parallel algorithms in the domain of personalized database retrieval, taking skyline algorithms as a representative example. We will investigate the spectrum of base operations of different retrieval algorithms and various parallelization techniques to develop a set of highly scalable and high-performing skyline algorithms for different retrieval scenarios. Finally, we extensively evaluate these algorithms to showcase their superior characteristics.

Proceedings ArticleDOI
22 Mar 2010
TL;DR: An innovative and efficient method for computing skylines allowing the use of qualitative trade-offs and an novel trade-off representation structure to speed up retrieval is discussed.
Abstract: When selecting alternatives from large amounts of data, trade-offs play a vital role in everyday decision making. In databases this is primarily reflected by the top-k retrieval paradigm. But recently it has been convincingly argued that it is almost impossible for users to provide meaningful scoring functions for top-k retrieval, subsequently leading to the adoption of the skyline paradigm. Here users just specify the relevant attributes in a query and all suboptimal alternatives are filtered following the Pareto semantics. Up to now the intuitive concept of compensation, however, cannot be used in skyline queries, which also contributes to the often unmanageably large result set sizes. In this paper we discuss an innovative and efficient method for computing skylines allowing the use of qualitative trade-offs. Such trade-offs compare examples from the database on a focused subset of attributes. Thus, users can provide information on how much they are willing to sacrifice to gain an improvement in some other attribute(s). Our contribution is the design of the first skyline algorithm allowing for qualitative compensation across attributes. Moreover, we also provide an novel trade-off representation structure to speed up retrieval. Indeed our experiments show efficient performance allowing for focused skyline sets in practical applications. Moreover, we show that the necessary amount of object comparisons can be sped up by an order of magnitude using our indexing techniques.

Book ChapterDOI
30 Jun 2010
TL;DR: This work studies the problem of distributed skyline computation and proposes an adaptive algorithm towards controlling the degree of parallelism and the required network traffic, and handles efficiently diverse preferences imposed on attributes.
Abstract: Skyline queries have attracted considerable attention over the last few years, mainly due to their ability to return interesting objects without the need for user-defined scoring functions. In this work, we study the problem of distributed skyline computation and propose an adaptive algorithm towards controlling the degree of parallelism and the required network traffic. In contrast to state-of-the-art methods, our algorithm handles efficiently diverse preferences imposed on attributes. The key idea is to partition the data using a grid scheme and for each query to build on-the-fly a dependency graph among partitions which can help in effective pruning. Our algorithm operates in two modes: (i) full-parallel mode, where processors are activated simultaneously or (ii) cascading mode, where processors are activated in a cascading manner using propagation of intermediate results, thus reducing network traffic and potentially increasing throughput. Performance evaluation results, based on real-life and synthetic data sets, demonstrate the scalability with respect to the number of processors and database size.

Proceedings ArticleDOI
01 Mar 2010
TL;DR: This work defines a probabilistic contextual skyline query (p-CSQ) that returns the tuples which are interesting with high probability and emphasizes that uncertainty lies within the query and not the data, i.e., it is in the relationships among tuples rather than in their attribute values.
Abstract: The skyline query returns the most interesting tuples according to a set of explicitly defined preferences among attribute values. This work relaxes this requirement, and allows users to pose meaningful skyline queries without stating their choices. To compensate for missing knowledge, we first determine a set of uncertain preferences based on user profiles, i.e., information collected for previous contexts. Then, we define a probabilistic contextual skyline query (p-CSQ) that returns the tuples which are interesting with high probability. We emphasize that, unlike past work, uncertainty lies within the query and not the data, i.e., it is in the relationships among tuples rather than in their attribute values. Furthermore, due to the nature of this uncertainty, popular skyline methods, which rely on a particular tuple visit order, do not apply for p-CSQs. Therefore, we present novel non-indexed and index-based algorithms for answering p-CSQs. Our experimental evaluation concludes that the proposed techniques are significantly more efficient compared to a standard block nested loops approach.