Showing papers by "Nikos Mamoulis published in 2007"

PDF

Open Access

Proceedings Article•

Fast data anonymization with low information loss

[...]

Gabriel Ghinita, Panagiotis Karras¹, Panos Kalnis, Nikos Mamoulis¹•Institutions (1)

23 Sep 2007

TL;DR: This paper focuses on one-dimensional (i.e., single attribute) quasi-identifiers, and study the properties of optimal solutions for k-anonymity and l-diversity, and develops efficient heuristics to solve the one- dimensional problems in linear time based on meaningful information loss metrics.

...read moreread less

Abstract: Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) The information loss metrics are counter-intuitive and fail to capture data inaccuracies inflicted for the sake of privacy. (ii) l-diversity is solved by techniques developed for the simpler k-anonymity problem, which introduces unnecessary inaccuracies. (iii) The anonymization process is inefficient in terms of computation and I/O cost. In this paper we propose a framework for efficient privacy preservation that addresses these deficiencies. First, we focus on one-dimensional (i.e., single attribute) quasi-identifiers, and study the properties of optimal solutions for k-anonymity and l-diversity, based on meaningful information loss metrics. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multi-dimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the state-of-the-art, in terms of execution time and information loss.

...read moreread less

319 citations

Proceedings Article•

Efficient processing of top- k dominating queries on multi-dimensional data

[...]

Man Lung Yiu¹, Nikos Mamoulis²•Institutions (2)

Aalborg University¹, University of Hong Kong²

23 Sep 2007

TL;DR: The top-k dominating query as mentioned in this paper returns k data objects which dominate the highest number of objects in a dataset, which is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects.

...read moreread less

Abstract: The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. In this paper, we design specialized algorithms that apply on indexed multi-dimensional data and fully exploit the characteristics of the problem. Experiments on synthetic datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach, while our results on real datasets show the meaningfulness of top-k dominating queries.

...read moreread less

195 citations

Journal Article•DOI•

Discovery of Periodic Patterns in Spatiotemporal Sequences

[...]

Huiping Cao¹, Nikos Mamoulis¹, David W. Cheung¹•Institutions (1)

University of Hong Kong¹

01 Apr 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper defines the problem of mining periodic patterns in spatiotemporal data and proposes an effective and efficient algorithm for retrieving maximal periodic patterns, and demonstrates how the mining technique can be adapted for two interesting variants of the problem.

...read moreread less

Abstract: In many applications that track and analyze spatiotemporal data, movements obey periodic patterns; the objects follow the same routes (approximately) over regular time intervals. For example, people wake up at the same time and follow more or less the same route to their work everyday. The discovery of hidden periodic patterns in spatiotemporal data could unveil important information to the data analyst. Existing approaches for discovering periodic patterns focus on symbol sequences. However, these methods cannot directly be applied to a spatiotemporal sequence because of the fuzziness of spatial locations in the sequence. In this paper, we define the problem of mining periodic patterns in spatiotemporal data and propose an effective and efficient algorithm for retrieving maximal periodic patterns. In addition, we study two interesting variants of the problem. The first is the retrieval of periodic patterns that are frequent only during a continuous subinterval of the whole history. The second problem is the discovery of periodic patterns, whose instances may be shifted or distorted. We demonstrate how our mining technique can be adapted for these variants. Finally, we present a comprehensive experimental evaluation, where we show the effectiveness and efficiency of the proposed techniques

...read moreread less

171 citations

Journal Article•DOI•

Efficient top-k aggregation of ranked inputs

[...]

Nikos Mamoulis¹, Man Lung Yiu², Kit Hung Cheng¹, David W. Cheung¹•Institutions (2)

University of Hong Kong¹, Aalborg University²

01 Aug 2007-ACM Transactions on Database Systems

TL;DR: A new algorithm is proposed, designed to minimize the number of object accesses, the computational cost, and the memory requirements of top-k search with monotone aggregate functions, and is shown to be orders of magnitude faster.

...read moreread less

Abstract: A top-k query combines different rankings of the same set of objects and returns the k objects with the highest combined score according to an aggregate function. We bring to light some key observations, which impose two phases that any top-k algorithm, based on sorted accesses, should go through. Based on them, we propose a new algorithm, which is designed to minimize the number of object accesses, the computational cost, and the memory requirements of top-k search with monotone aggregate functions. We provide an analysis for its cost and show that it is always no worse than the baseline “no random accesses” algorithm in terms of computations, accesses, and memory required. As a side contribution, we perform a space analysis, which indicates the memory requirements of top-k algorithms that only perform sorted accesses. For the case, where the required space exceeds the available memory, we propose disk-based variants of our algorithm. We propose and optimize a multiway top-k join operator, with certain advantages over evaluation trees of binary top-k join operators. Finally, we define and study the computation of top-k cubes and the implementation of roll-up and drill-down operations in such cubes. Extensive experiments with synthetic and real data show that, compared to previous techniques, our method accesses fewer objects, while being orders of magnitude faster.

...read moreread less

117 citations

Proceedings Article•DOI•

Top-k Spatial Preference Queries

[...]

Man Lung Yiu¹, Xiangyuan Dai², Nikos Mamoulis², Michail Vaitis³•Institutions (3)

Aalborg University¹, University of Hong Kong², University of the Aegean³

15 Apr 2007

TL;DR: This paper formally defines spatial preference queries and proposes appropriate indexing techniques and search algorithms for them and their methods are experimentally evaluated for a wide range of problem settings.

...read moreread less

Abstract: A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example, consider a real estate agency office that holds a database with available flats for lease. A customer may want to rank the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within a distance range from them. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search algorithms for them. Our methods are experimentally evaluated for a wide range of problem settings.

...read moreread less

103 citations

Proceedings Article•

Security in outsourcing of association rule mining

[...]

Wai Kit Wong¹, David W. Cheung¹, Edward Hung², Ben Kao¹, Nikos Mamoulis¹ - Show less +1 more•Institutions (2)

University of Hong Kong¹, Hong Kong Polytechnic University²

23 Sep 2007

TL;DR: This paper proposes a more secure encryption scheme based on a one-to-n item mapping that transforms transactions non-deterministically, yet guarantees correct decryption and develops an effective and efficient encryption algorithm based on this method.

...read moreread less

Abstract: Outsourcing association rule mining to an outside service provider brings several important benefits to the data owner. These include (i) relief from the high mining cost, (ii) minimization of demands in resources, and (iii) effective centralized mining for multiple distributed owners. On the other hand, security is an issue; the service provider should be prevented from accessing the actual data since (i) the data may be associated with private information, (ii) the frequency analysis is meant to be used solely by the owner. This paper proposes substitution cipher techniques in the encryption of transactional data for outsourcing association rule mining. After identifying the non-trivial threats to a straightforward one-to-one item mapping substitution cipher, we propose a more secure encryption scheme based on a one-to-n item mapping that transforms transactions non-deterministically, yet guarantees correct decryption. We develop an effective and efficient encryption algorithm based on this method. Our algorithm performs a single pass over the database and thus is suitable for applications in which data owners send streams of transactions to the service provider. A comprehensive cryptanalysis study is carried out. The results show that our technique is highly secure with a low data transformation cost.

...read moreread less

86 citations

Journal Article•DOI•

Reverse Nearest Neighbors Search in Ad Hoc Subspaces

[...]

Man Lung Yiu¹, Nikos Mamoulis²•Institutions (2)

Aalborg University¹, University of Hong Kong²

01 Mar 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper studies an interesting generalization of the RNN query, where not all dimensions are considered, but only an ad hoc subset thereof, and develops appropriate algorithms for projected RNN queries, without relying on multidimensional indexes.

...read moreread less

Abstract: Given an object q, modeled by a multidimensional point, a reverse nearest neighbors (RNN) query returns the set of objects in the database that have q as their nearest neighbor. In this paper, we study an interesting generalization of the RNN query, where not all dimensions are considered, but only an ad hoc subset thereof. The rationale is that 1) the dimensionality might be too high for the result of a regular RNN query to be useful, 2) missing values may implicitly define a meaningful subspace for RNN retrieval, and 3) analysts may be interested in the query results only for a set of (ad hoc) problem dimensions (i.e., object attributes). We consider a suitable storage scheme and develop appropriate algorithms for projected RNN queries, without relying on multidimensional indexes. Given the significant cost difference between random and sequential data accesses, our algorithms are based on applying sequential accesses only on the projected atomic values of the data at each dimension, to progressively derive a set of RNN candidates. Whether these candidates are actual RNN results is then validated via an optimized refinement step. In addition, we study variants of the projected RNN problem, including RkNN search, bichromatic RNN, and RNN retrieval for the case where sequential accesses are not possible. Our methods are experimentally evaluated with real and synthetic data

...read moreread less

50 citations

Proceedings Article•DOI•

Exploiting duality in summarization with deterministic guarantees

[...]

Panagiotis Karras¹, Dimitris Sacharidis², Nikos Mamoulis¹•Institutions (2)

University of Hong Kong¹, National Technical University of Athens²

12 Aug 2007

TL;DR: This paper develops an alternative methodology that dispels deficiencies in the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric, thanks to a fruitful application of the solution to the dual problem.

...read moreread less

Abstract: Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over loge* factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over loge* + logn in time and B(1-log B over log n) in space, where e* is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.

...read moreread less

43 citations

Proceedings Article•DOI•

The Haar+ Tree: A Refined Synopsis Data Structure

[...]

Panagiotis Karras¹, Nikos Mamoulis¹•Institutions (1)

University of Hong Kong¹

15 Apr 2007

TL;DR: The Haar+ tree is introduced: a refined, wavelet-inspired data structure for synopsis construction that achieves higher synopsis quality at the task of summarizing data sets with sharp discontinuities than state-of-the-art histogram and Haar wavelet techniques.

...read moreread less

Abstract: We introduce the Haar+ tree: a refined, wavelet-inspired data structure for synopsis construction. The advantages of this structure are twofold: First, it achieves higher synopsis quality at the task of summarizing data sets with sharp discontinuities than state-of-the-art histogram and Haar wavelet techniques. Second, thanks to its search space delimitation capacity, Haar+ synopsis construction operates in time linear to the size of the data set for any monotonic distributive error metric. Through experimentation, we demonstrate the superiority of Haar+ synopses over histogram and Haar wavelet methods in both construction time and achieved quality for representative error metrics.

...read moreread less

26 citations

Proceedings Article•DOI•

Retrieval of Spatial Join Pattern Instances from Sensor Networks

[...]

Man Lung Yiu¹, Nikos Mamoulis², Spiridon Bakiras³•Institutions (3)

Aalborg University¹, University of Hong Kong², City University of New York³

09 Jul 2007

TL;DR: This work devise acquisitional and distributed protocols for evaluating spatial join queries and extensions thereof, defined by interesting combinations of sensor readings (events) that co-occur in a spatial neighborhood.

...read moreread less

Abstract: We study the continuous evaluation of spatial join queries and extensions thereof, defined by interesting combinations of sensor readings (events) that co-occur in a spatial neighborhood. An example of such a pattern is "a high temperature reading in the vicinity of at least four high-pressure readings". We devise acquisitional and distributed protocols for evaluating this class of queries, aiming at the minimization of energy consumption. Cases of simple and complex join queries with single or multi-hop distance constraints are considered. Finally, we experimentally compare the effectiveness of the proposed solutions on an experimental platform that simulates real sensor networks. Our results show that acquisitional protocols perform best for multi-hop or high-selectivity queries while distributed techniques should be applied for the remaining cases.

...read moreread less

13 citations

Book Chapter•DOI•

Continuous monitoring of exclusive closest pairs

[...]

Leong Hou U¹, Nikos Mamoulis¹, Man Lung Yiu²•Institutions (2)

University of Hong Kong¹, Aalborg University²

16 Jul 2007

TL;DR: This paper proposes algorithms for the computation and continuous monitoring of ECP joins in memory, given a stream of events that indicate dynamic assignment requests and releases of pairs.

...read moreread less

Abstract: Given two datasets A and B, their exclusive closest pairs (ECP) join is a one-to-one assignment of objects from the two datasets, such that (i) the closest pair (a, b) in A×B is in the result and (ii) the remaining pairs are determined by removing objects a, b from A, B respectively, and recursively searching for the next closest pair. An application of exclusive closest pairs is the computation of (car, parking slot) assignments. In this paper, we propose algorithms for the computation and continuous monitoring of ECP joins in memory, given a stream of events that indicate dynamic assignment requests and releases of pairs. Experimental results on a system prototype demonstrate the efficiency of our solutions in practice.

...read moreread less

Book Chapter•DOI•

Continuous constraint query evaluation for spatiotemporal streams

[...]

Marios Hadjieleftheriou¹, Nikos Mamoulis², Yufei Tao³•Institutions (3)

AT&T Labs¹, University of Hong Kong², The Chinese University of Hong Kong³

16 Jul 2007

TL;DR: In this article, the evaluation of continuous constraint queries (CCQs) for spatiotemporal streams is studied, where a CCQ triggers an alert whenever a configuration of constraints between streaming events in space and time are satisfied.

...read moreread less

Abstract: In this paper we study the evaluation of continuous constraint queries (CCQs) for spatiotemporal streams. A CCQ triggers an alert whenever a configuration of constraints between streaming events in space and time are satisfied. Consider, for instance, a server that receives updates from GPS-enabled agents that report their positions and other measurements (e.g., environmental readings). An example of CCQ is: "Alert whenever at least 5 readings closer than 5km to each other and within a time difference of 5 minutes report high pressures and low temperatures". We model CCQs as Constraint Satisfaction Problems (CSPs) and develop solutions for their continuous evaluation. Our techniques (1) consider the fast arrival rate of incoming events, and (2) minimize the memory requirements, without using predefined window constraints, but by utilizing the structure of the queries. In order to show the merits of the proposed techniques, we implement a system prototype and evaluate it with real data.

...read moreread less

Discovery of Periodic Patterns in

[...]

Huiping Cao, Nikos Mamoulis, David W. Cheung

01 Jan 2007

TL;DR: This paper defines the problem of mining periodic patterns in spatiotemporal data and proposes an effective and efficient algorithm for retrieving maximal periodic patterns and presents a comprehensive experimental evaluation, where the effectiveness and efficiency of the proposed techniques are shown.

...read moreread less

Proceedings Article•

Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM)

[...]

Man Lung Yiu, Nikos Mamoulis, Spiridon Bakiras

01 Jan 2007

Evaluation of Spatial Pattern Queries in Sensor Networks

[...]

Man Lung Yiu¹, Nikos Mamoulis¹, Spiridon Bakiras¹•Institutions (1)

University of Hong Kong¹

01 Jan 2007

TL;DR: This work develops cost models that suggest the appropriateness of each protocol, based on various factors, including selectivity of query elements, energy requirements for sensing, and network topology, and devise protocols for ‘in-network’ evaluation of spatial join queries, aiming at the minimization of power consumption.

...read moreread less

Abstract: We study the continuous evaluation of spatial join queries and extensions thereof, defined by interesting combinations of sensor readings (events) that co-occur in a spatial neighborhood. An example of such a pattern is “a high temperature reading in the vicinity of at least four high-pressure readings”. We devise protocols for ‘in-network’ evaluation of this class of queries, aiming at the minimization of power consumption. In addition, we develop cost models that suggest the appropriateness of each protocol, based on various factors, including selectivity of query elements, energy requirements for sensing, and network topology. Finally, we experimentally compare the effectiveness of the proposed solutions on an experimental platform that emulates real sensor networks.

...read moreread less