scispace - formally typeset
Search or ask a question

Showing papers by "Nikos Mamoulis published in 2011"


Journal ArticleDOI
01 Feb 2011
TL;DR: A new version of the k-anonymity guarantee is defined, the km-Anonymity, to limit the effects of the data dimensionality, and an algorithm that finds the optimal solution is developed, however, at a high cost that makes it inapplicable for large, realistic problems.
Abstract: In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the k m -anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice.

107 citations


Book
05 Dec 2011
TL;DR: This book presents indexing approaches for spatial data, with a focus on the R-tree, and introduces spatial data models and queries and discusses the main issues of extending a database system to support spatial data.
Abstract: Spatial database management deals with the storage, indexing, and querying of data with spatial features, such as location and geometric extent. Many applications require the efficient management of spatial data, including Geographic Information Systems, Computer Aided Design, and Location Based Services. The goal of this book is to provide the reader with an overview of spatial data management technology, with an emphasis on indexing and search techniques. It first introduces spatial data models and queries and discusses the main issues of extending a database system to support spatial data. It presents indexing approaches for spatial data, with a focus on the R-tree. Query evaluation and optimization techniques for the most popular spatial query types (selections, nearest neighbor search, and spatial joins) are portrayed for data in Euclidean spaces and spatial networks. The book concludes by demonstrating the ample application of spatial data management technology on a wide range of related application domains: management of spatio-temporal data and high-dimensional feature vectors, multi-criteria ranking, data mining and OLAP, privacy-preserving data publishing, and spatial keyword search. Table of Contents: Introduction / Spatial Data / Indexing / Spatial Query Evaluation / Spatial Networks / Applications of Spatial Data Management Technology

48 citations


Journal ArticleDOI
TL;DR: This paper formally defines spatial preference queries and proposes appropriate indexing techniques and search algorithms for them and reveals that an optimized branch-and-bound solution is efficient and robust with respect to different parameters.
Abstract: A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example, using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different functions. It can be an explicit circular region within a given distance from the flat. Another intuitive definition is to assign higher weights to the features based on their proximity to the flat. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods on both real and synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to different parameters.

45 citations


Posted Content
TL;DR: In this article, a geometric pruning filter is proposed to estimate the probabilistic domination count, which is used to answer a wide range of probability similarity queries on uncertain data.
Abstract: In this paper, we propose a novel, effective and efficient probabilistic pruning criterion for probabilistic similarity queries on uncertain data. Our approach supports a general uncertainty model using continuous probabilistic density functions to describe the (possibly correlated) uncertain attributes of objects. In a nutshell, the problem to be solved is to compute the PDF of the random variable denoted by the probabilistic domination count: Given an uncertain database object B, an uncertain reference object R and a set D of uncertain database objects in a multi-dimensional space, the probabilistic domination count denotes the number of uncertain objects in D that are closer to R than B. This domination count can be used to answer a wide range of probabilistic similarity queries. Specifically, we propose a novel geometric pruning filter and introduce an iterative filter-refinement strategy for conservatively and progressively estimating the probabilistic domination count in an efficient way while keeping correctness according to the possible world semantics. In an experimental evaluation, we show that our proposed technique allows to acquire tight probability bounds for the probabilistic domination count quickly, even for large uncertain databases.

37 citations


Proceedings ArticleDOI
11 Apr 2011
TL;DR: A novel geometric pruning filter is proposed and an iterative filter-refinement strategy is introduced for conservatively and progressively estimating the probabilistic domination count in an efficient way while keeping correctness according to the possible world semantics.
Abstract: In this paper, we propose a novel, effective and efficient probabilistic pruning criterion for probabilistic similarity queries on uncertain data. Our approach supports a general uncertainty model using continuous probabilistic density functions to describe the (possibly correlated) uncertain attributes of objects. In a nutshell, the problem to be solved is to compute the PDF of the random variable denoted by the probabilistic domination count: Given an uncertain database object B, an uncertain reference object R and a set D of uncertain database objects in a multi-dimensional space, the probabilistic domination count denotes the number of uncertain objects in D that are closer to R than B. This domination count can be used to answer a wide range of probabilistic similarity queries. Specifically, we propose a novel geometric pruning filter and introduce an iterative filter-refinement strategy for conservatively and progressively estimating the probabilistic domination count in an efficient way while keeping correctness according to the possible world semantics. In an experimental evaluation, we show that our proposed technique allows to acquire tight probability bounds for the probabilistic domination count quickly, even for large uncertain databases.

35 citations


Posted Content
TL;DR: This paper argues that a good size-l OS should be a stand-alone and meaningful synopsis of the most important information about the particular Data Subject (DS) and proposes three algorithms for the efficient generation of size- l OSs.
Abstract: A previously proposed keyword search paradigm produces, as a query result, a ranked list of Object Summaries (OSs). An OS is a tree structure of related tuples that summarizes all data held in a relational database about a particular Data Subject (DS). However, some of these OSs are very large in size and therefore unfriendly to users that initially prefer synoptic information before proceeding to more comprehensive information about a particular DS. In this paper, we investigate the effective and efficient retrieval of concise and informative OSs. We argue that a good size-l OS should be a stand-alone and meaningful synopsis of the most important information about the particular DS. More precisely, we define a size-l OS as a partial OS composed of l important tuples. We propose three algorithms for the efficient generation of size-l OSs (in addition to the optimal approach which requires exponential time). Experimental evaluation on DBLP and TPC-H databases verifies the effectiveness and efficiency of our approach.

23 citations


Journal ArticleDOI
01 Nov 2011
TL;DR: In this paper, the authors investigate the effective and efficient retrieval of concise and informative object summaries (OSs), which is a tree structure of related tuples that summarizes all data held in a relational database about a particular Data Subject (DS).
Abstract: A previously proposed keyword search paradigm produces, as a query result, a ranked list of Object Summaries (OSs). An OS is a tree structure of related tuples that summarizes all data held in a relational database about a particular Data Subject (DS). However, some of these OSs are very large in size and therefore unfriendly to users that initially prefer synoptic information before proceeding to more comprehensive information about a particular DS. In this paper, we investigate the effective and efficient retrieval of concise and informative OSs. We argue that a good size-l OS should be a stand-alone and meaningful synopsis of the most important information about the particular DS. More precisely, we define a size-l OS as a partial OS composed of l important tuples. We propose three algorithms for the efficient generation of size-l OSs (in addition to the optimal approach which requires exponential time). Experimental evaluation on DBLP and TPC-H databases verifies the effectiveness and efficiency of our approach.

22 citations


Proceedings ArticleDOI
21 Mar 2011
TL;DR: This paper proposes a novel indexing scheme, the Ordered Inverted File (OIF), which, differently from the state-of-the-art, indexes set-valued attributes in an ordered fashion and introduces query processing algorithms that practically treat containment queries as range queries over the ordered postings lists of OIF.
Abstract: In this paper we address the problem of efficiently evaluating containment (i.e., subset, equality, and superset) queries over set-valued data. We propose a novel indexing scheme, the Ordered Inverted File (OIF) which, differently from the state-of-the-art, indexes set-valued attributes in an ordered fashion. We introduce query processing algorithms that practically treat containment queries as range queries over the ordered postings lists of OIF and exploit this ordering to quickly prune unnecessary page accesses. OIF is simple to implement and our experiments on both real and synthetic data show that it greatly outperforms the current state-of-the-art methods for all three classes of containment queries.

22 citations


Posted Content
TL;DR: In this paper, the authors leverage the huge number of threads available on Graphics Processing Units (GPUs) to speed up composite-order bilinear pairing computation, which can achieve more than 24 times speedup on a 2048bit security level and a record of 7× 10−6 USD per pairing on the Amazon cloud computing environment.
Abstract: Recently, composite-order bilinear pairing has been shown to be useful in many cryptographic constructions. However, it is time-costly to evaluate. This is because the composite order should be at least 1024bit and, hence, the elliptic curve group order n and base field become too large, rendering the bilinear pairing algorithm itself too slow to be practical (e.g., the Miller loop is Ω(n)). Thus, composite-order computation easily becomes the bottleneck of a cryptographic construction, especially, in the case where many pairings need to be evaluated at the same time. The existing solution to this problem that converts composite-order pairings to prime-order ones is only valid for certain constructions. In this paper, we leverage the huge number of threads available on Graphics Processing Units (GPUs) to speed up composite-order pairing computation. We investigate suitable SIMD algorithms for base field, extension field, elliptic curve and bilinear pairing computation as well as mapping these algorithms into GPUs with careful considerations. Experimental results show that our method achieves a record of 8.7ms per pairing on a 1024bit security level, which is a 20-fold speedup compared to state-of-the-art CPU implementation. This result also opens the road to adopting higher security levels and using rich-resource parallel platforms, which for example are available in cloud computing. In fact, we can achieve more than 24 times speedup on a 2048bit security level and a record of 7× 10−6 USD per pairing on the Amazon cloud computing environment.

19 citations


Book ChapterDOI
24 Aug 2011
TL;DR: In this paper, a filter-and-refinement framework is proposed for answering inverse spatial queries for any query predicate, which can be applied to a variety of inverse queries, including inverse epsilon range queries, inverse k-nearest neighbor queries, and inverse skyline queries.
Abstract: Traditional spatial queries return, for a given query object q, all database objects that satisfy a given predicate, such as epsilon range and k-nearest neighbors. This paper defines and studies inverse spatial queries, which, given a subset of database objects Q and a query predicate, return all objects which, if used as query objects with the predicate, contain Q in their result. We first show a straightforward solution for answering inverse spatial queries for any query predicate. Then, we propose a filter-and-refinement framework that can be used to improve efficiency. We show how to apply this framework on a variety of inverse queries, using appropriate space pruning strategies. In particular, we propose solutions for inverse epsilon range queries, inverse k-nearest neighbor queries, and inverse skyline queries. Our experiments show that our framework is significantly more efficient than naive approaches.

10 citations


Book ChapterDOI
20 Jul 2011
TL;DR: This paper introduces a scalable approach for continuous inverse ranking on uncertain streams and presents a framework that is able to update the query result very efficiently, as the stream provides new observations of the objects.
Abstract: This paper introduces a scalable approach for continuous inverse ranking on uncertain streams. An uncertain stream is a stream of object instances with confidences, e.g. observed positions of moving objects derived from a sensor. The confidence value assigned to each instance reflects the likelihood that the instance conforms with the current true object state. The inverse ranking query retrieves the rank of a given query object according to a given score function. In this paper we present a framework that is able to update the query result very efficiently, as the stream provides new observations of the objects. We will theoretically and experimentally show that the query update can be performed in linear time complexity. We conduct an experimental evaluation on synthetic data, which demonstrates the efficiency of our approach.

Journal ArticleDOI
01 Oct 2011
TL;DR: This special issue focuses on managing information about moving objects in space and time, both for online applications and for analysis of ‘historical’ trajectory data.
Abstract: Small, GPS-enabled and wireless networked mobile devices such as mobile phones, personal digital assistants, or car navigation systems have become powerful, affordable, and wide-spread.Not only do these devices interactwith the environment such as local services and facilities, searching for useful information, but they are also capable of collecting and transmitting position data. There is a need for addressing both aspects, of supporting online services by managing the locations of large sets of currently moving users, and of analyzing enormous volumes of captured trajectory data. The latter may in particular be useful for improving mobile services. This special issue focuses on managing information about movingobjects in space and time, both for online applications and for analysis of ‘historical’ trajectory data. The complex form of trajectory data obtained from objects (typically moving in road networks) calls for specializedmethods for indexing, in order to meet the demands of online query evaluation. In addition, the limited resources of the mobile devices that sense and transmit the locations of themoving objects call for techniques that minimize the communication cost of location updates, without sacrificing too much accuracy. Specialized data analysts and common users need effective and efficient tools for querying and mining the large volume of the mobile data that are collected. These include systems that allow the identification of complex forms of data patterns, support aggregate queries, proximity, and direction queries and

01 Jan 2011
TL;DR: This work proposes an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record, and presents an algorithm that anonymizes the data by first clustering them and then locally disassociating identifying combinations of terms.
Abstract: In this work, we focus on the preservation of user privacy in the publication of sparse multidimensional data. Existing works protect the users’ sensitive information by generalizing or suppressing quasi identifiers in the original data. In many real world cases, neither generalization nor the distinction between sensitive and non-sensitive items is appropriate. For example, web search query logs contain millions of terms that are very hard to categorize as sensitive or non sensitive. At the same time, a generalization-based anonymization would remove the most valuable information in the dataset; the original terms. Motivated by this problem, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. Up to now, such techniques were used to sever the link between quasiidentifiers and sensitive values in settings with a clear distinction between these types of values. Our proposal generalizes these techniques for sparse multidimensional data, where no such distinction holds. We protect the users’ privacy by disassociating combinations of terms that can act as quasi-identifiers from the rest of the record or by disassociating the constituent terms, so that the identifying combination cannot be accurately recognized. To this end, we present an algorithm that anonymizes the data by first clustering them and then locally disassociating identifying combinations of terms. We analyze the attack model and extend the km-anonymity guaranty to the aforementioned setting. We empirically evaluate our method on real and synthetic datasets.

Proceedings ArticleDOI
01 Nov 2011-Quest
TL;DR: This paper is the first work to propose general models for spatio-temporal uncertain data that have the potential to allow efficient processing on a wide range of queries and to develop new algorithms based on these models.
Abstract: Many spatial query problems defined on uncertain data are computationally expensive, in particular, if in addition to spatial attributes, a time component is added. Although there exists a wide range of applications dealing with uncertain spatio-temporal data, there is no solution for efficient management of such data available yet. This paper is the first work to propose general models for spatio-temporal uncertain data that have the potential to allow efficient processing on a wide range of queries. The main challenge here is to unfold this potential by developing new algorithms based on these models. In addition, we give examples of interesting spatio-temporal queries on uncertain data.

Posted Content
TL;DR: This paper defines and studies inverse spatial queries, which, given a subset of database objects Q and a query predicate, return all objects which, if used as query objects with the predicate, contain Q in their result.
Abstract: Traditional spatial queries return, for a given query object $q$, all database objects that satisfy a given predicate, such as epsilon range and $k$-nearest neighbors. This paper defines and studies {\em inverse} spatial queries, which, given a subset of database objects $Q$ and a query predicate, return all objects which, if used as query objects with the predicate, contain $Q$ in their result. We first show a straightforward solution for answering inverse spatial queries for any query predicate. Then, we propose a filter-and-refinement framework that can be used to improve efficiency. We show how to apply this framework on a variety of inverse queries, using appropriate space pruning strategies. In particular, we propose solutions for inverse epsilon range queries, inverse $k$-nearest neighbor queries, and inverse skyline queries. Our experiments show that our framework is significantly more efficient than naive approaches.