Showing papers by "Nikos Mamoulis published in 2009"

PDF

Open Access

Proceedings Article•DOI•

Secure kNN computation on encrypted databases

[...]

Wai Kit Wong¹, David W. Cheung¹, Ben Kao¹, Nikos Mamoulis¹•Institutions (1)

29 Jun 2009

TL;DR: A new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product and is shown to resist practical attacks of a different background knowledge level, at a different overhead cost.

...read moreread less

Abstract: Service providers like Google and Amazon are moving into the SaaS (Software as a Service) business. They turn their huge infrastructure into a cloud-computing environment and aggressively recruit businesses to run applications on their platforms. To enforce security and privacy on such a service model, we need to protect the data running on the platform. Unfortunately, traditional encryption methods that aim at providing "unbreakable" protection are often not adequate because they do not support the execution of applications such as database queries on the encrypted data. In this paper we discuss the general problem of secure computation on an encrypted database and propose a SCONEDB Secure Computation ON an Encrypted DataBase) model, which captures the execution and security requirements. As a case study, we focus on the problem of k-nearest neighbor (kNN) computation on an encrypted database. We develop a new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product. We use APSE to construct two secure schemes that support kNN computation on encrypted data; each of these schemes is shown to resist practical attacks of a different background knowledge level, at a different overhead cost. Extensive performance studies are carried out to evaluate the overhead and the efficiency of the schemes.

...read moreread less

801 citations

Proceedings Article•DOI•

Scalable skyline computation using object-based space partitioning

[...]

Shiming Zhang¹, Nikos Mamoulis¹, David W. Cheung¹•Institutions (1)

University of Hong Kong¹

29 Jun 2009

TL;DR: A dynamic indexing technique for skyline points that can be integrated into state-of-the-art sort-based skyline algorithms to boost their computational performance and scales well with the input size and dimensionality.

...read moreread less

Abstract: The skyline operator returns from a set of multi-dimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multi-objective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority of them focuses on minimizing the I/O cost. However, in high dimensional spaces, the problem can easily become CPU-bound due to the large number of computations required for comparing objects with current skyline points while scanning the database. Based on this observation, we propose a dynamic indexing technique for skyline points that can be integrated into state-of-the-art sort-based skyline algorithms to boost their computational performance. The new indexing and dominance checking approach is supported by a theoretical analysis, while our experiments show that it scales well with the input size and dimensionality not only because unnecessary dominance checks are avoided but also because it allows efficient dominance checking with the help of bitwise operations.

...read moreread less

142 citations

Journal Article•DOI•

Multi-dimensional top-k dominating queries

[...]

Man Lung Yiu¹, Nikos Mamoulis²•Institutions (2)

Aalborg University¹, University of Hong Kong²

01 Jun 2009

TL;DR: An extensive study on the evaluation of top-k dominating queries, which proposes a set of algorithms that apply on indexed multi-dimensional data and investigates query evaluation on data that are not indexed.

...read moreread less

Abstract: The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of top-k dominating queries. First, we propose a set of algorithms that apply on indexed multi-dimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness of its results on real data.

...read moreread less

111 citations

Journal Article•DOI•

A framework for efficient data anonymization under privacy and accuracy constraints

[...]

Gabriel Ghinita¹, Panagiotis Karras¹, Panos Kalnis¹, Nikos Mamoulis²•Institutions (2)

National University of Singapore¹, University of Hong Kong²

02 Jul 2009-ACM Transactions on Database Systems

TL;DR: This article focuses on one-dimensional (i.e., single-attribute) quasi-identifiers, and study the properties of optimal solutions under the k-anonymity and l-diversity models for the privacy-constrained and the accuracy- Constrained anonymization problems.

...read moreread less

Abstract: Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) l-diversification is solved by techniques developed for the simpler k-anonymization problem, causing unnecessary information loss. (ii) The anonymization process is inefficient in terms of computational and I/O cost. (iii) Previous research focused exclusively on the privacy-constrained problem and ignored the equally important accuracy-constrained (or dual) anonymization problem.In this article, we propose a framework for efficient anonymization of microdata that addresses these deficiencies. First, we focus on one-dimensional (i.e., single-attribute) quasi-identifiers, and study the properties of optimal solutions under the k-anonymity and l-diversity models for the privacy-constrained (i.e., direct) and the accuracy-constrained (i.e., dual) anonymization problems. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multidimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the existing approaches in terms of execution time and information loss.

...read moreread less

88 citations

Journal Article•DOI•

Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data

[...]

Man Lung Yiu¹, Nikos Mamoulis², Xiangyuan Dai², Yufei Tao², Michail Vaitis³ - Show less +1 more•Institutions (3)

Aalborg University¹, University of Hong Kong², University of the Aegean³

01 Jan 2009-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conducts an extensive experimental study, which evaluates the effectiveness of proposed solutions.

...read moreread less

Abstract: We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions.

...read moreread less

77 citations

Journal Article•DOI•

An audit environment for outsourcing of frequent itemset mining

[...]

Wai Kit Wong¹, David W. Cheung¹, Edward Hung², Ben Kao¹, Nikos Mamoulis¹ - Show less +1 more•Institutions (2)

University of Hong Kong¹, Hong Kong Polytechnic University²

01 Aug 2009

TL;DR: This work proposes and develops an audit environment, which consists of a database transformation method and a result verification method that addresses the integrity issue in the outsourcing process, i.e., how the data owner verifies the correctness of the mining results.

...read moreread less

Abstract: Finding frequent itemsets is the most costly task in association rule mining. Outsourcing this task to a service provider brings several benefits to the data owner such as cost relief and a less commitment to storage and computational resources. Mining results, however, can be corrupted if the service provider (i) is honest but makes mistakes in the mining process, or (ii) is lazy and reduces costly computation, returning incomplete results, or (iii) is malicious and contaminates the mining results. We address the integrity issue in the outsourcing process, i.e., how the data owner verifies the correctness of the mining results. For this purpose, we propose and develop an audit environment, which consists of a database transformation method and a result verification method. The main component of our audit environment is an artificial itemset planting (AIP) technique. We provide a theoretical foundation on our technique by proving its appropriateness and showing probabilistic guarantees about the correctness of the verification process. Through analytical and experimental studies, we show that our technique is both effective and efficient.

...read moreread less

40 citations

Proceedings Article•DOI•

Operator Placement for Snapshot Multi-predicate Queries in Wireless Sensor Networks

[...]

Georgios Chatzimilioudis¹, Huseyin Hakkoymaz¹, Nikos Mamoulis¹, Dimitrios Gunopulos¹•Institutions (1)

University of California, Riverside¹

18 May 2009

TL;DR: It is shown that minimizing the communication cost for multi-predicate queries is NP-hard and a dynamic programming algorithm is proposed to compute the optimal solution for small problem instances and the low complexity heuristic algorithm is shown to be scalable and robust to different query characteristics and network size.

...read moreread less

Abstract: This work aims at minimize the cost of answering snapshot multi-predicate queries in high-communication-cost networks. High-communication-cost (HCC) networks is a family of networks where communicating data is very demanding in resources, for example in wireless sensor networks transmitting data drains the battery life of sensors involved. The important class of multi-predicate queries in horizontally or vertically distributed databases is addressed. We show that minimizing the communication cost for multi-predicate queries is NP-hard and we propose a dynamic programming algorithm to compute the optimal solution for small problem instances. We also propose a low complexity, approximate, heuristic algorithm for solving larger problem instances efficiently and running it on nodes with low computational power (e.g. sensors). Finally, we present a variant of the Fermat point problem where distances between points are minimal paths in a weighted graph, and propose a solution. An extensive experimental evaluation compares the proposed algorithms to the best known technique used to evaluate queries in wireless sensor networks and shows improvement of 10\% up to 95\%. The low complexity heuristic algorithm is also shown to be scalable and robust to different query characteristics and network size.

...read moreread less

21 citations

Journal Article•DOI•

Retrieval of Spatial Join Pattern Instances from Sensor Networks

[...]

Man Lung Yiu¹, Nikos Mamoulis², Spiridon Bakiras³•Institutions (3)

Aalborg University¹, University of Hong Kong², City University of New York³

01 Mar 2009-Geoinformatica

TL;DR: This work develops cost models that suggest the appropriateness of each protocol, based on various factors, including selectivity of query elements, energy requirements for sensing, and network topology, and devise protocols for ‘in-network’ evaluation of spatial join queries, aiming at the minimization of power consumption.

...read moreread less

Abstract: We study the continuous evaluation of spatial join queries and extensions thereof, defined by interesting combinations of sensor readings (events) that co-occur in a spatial neighborhood. An example of such a pattern is "a high temperature reading in the vicinity of at least four high-pressure readings". We devise protocols for `in-network' evaluation of this class of queries, aiming at the minimization of power consumption. In addition, we develop cost models that suggest the appropriateness of each protocol, based on various factors, including selectivity of query elements, energy requirements for sensing, and network topology. Finally, we experimentally compare the effectiveness of the proposed solutions on an experimental platform that emulates real sensor networks.

...read moreread less

12 citations

Proceedings Article•DOI•

Efficient Evaluation of Multiple Preference Queries

[...]

Leong Hou U¹, Nikos Mamoulis¹, Kyriakos Mouratidis•Institutions (1)

University of Hong Kong¹

29 Mar 2009

TL;DR: The algorithm is an iterative process, which finds at each step the query-object pair with the highest score and removes it from the problem, and is done efficiently by maintaining and matching the skyline of the remaining objects with the remaining queries at each steps.

...read moreread less

Abstract: Consider multiple users searching for a hotel room, based on size, cost, distance to the beach, etc. Users may have variable preferences expressed by different weights on the attributes of the searched objects. Although individual preference queries can be evaluated by selecting the object in the database with the highest aggregate score, in the case of multiple requests at the same time, a single object cannot be assigned to more than one users. The challenge is to compute a fair 1-1 matching between the queries and a subset of the objects. We model this as a stable-marriage problem and propose an efficient technique for its evaluation. Our algorithm is an iterative process, which finds at each step the query-object pair with the highest score and removes it from the problem. This is done efficiently by maintaining and matching the skyline of the remaining objects with the remaining queries at each step. An experimental evaluation with synthetic and real data confirms the effectiveness of our method.

...read moreread less

10 citations

Journal Article•DOI•

A fair assignment algorithm for multiple preference queries

[...]

Leong Hou U¹, Nikos Mamoulis¹, Kyriakos Mouratidis²•Institutions (2)

University of Hong Kong¹, Singapore Management University²

01 Aug 2009

TL;DR: At its core lies a novel skyline maintenance technique, which is proved to be I/O optimal and outperforms adaptations of previous methods by several orders of magnitude.

...read moreread less

Abstract: Consider an internship assignment system, where at the end of each academic year, interested university students search and apply for available positions, based on their preferences (e.g., nature of the job, salary, office location, etc). In a variety of facility, task or position assignment contexts, users have personal preferences expressed by different weights on the attributes of the searched objects. Although individual preference queries can be evaluated by selecting the object in the database with the highest aggregate score, in the case of multiple simultaneous requests, a single object cannot be assigned to more than one users. The challenge is to compute a fair 1--1 matching between the queries and the objects. We model this as a stable-marriage problem and propose an efficient method for its processing. Our algorithm iteratively finds stable query-object pairs and removes them from the problem. At its core lies a novel skyline maintenance technique, which we prove to be I/O optimal. We conduct an extensive experimental evaluation using real and synthetic data, which demonstrates that our approach outperforms adaptations of previous methods by several orders of magnitude.

...read moreread less

10 citations

Book Chapter•DOI•

Periodic Pattern Discovery from Trajectories of Moving Objects

[...]

David W. Cheung, Huiping Cao, Nikos Mamoulis

27 May 2009

Spatio-Temporal Data Mining.

[...]

Nikos Mamoulis

01 Jan 2009

TL;DR: Unexpected improvements in gas and vapor transport through the electrode are realized by incorporating a new dispersion process in the construction, reformulating the applied mix with solution additives, and creating a novel coating structure on a conductive web.

...read moreread less

Abstract: Gas Diffusion Electrodes (GDES) play a pivotal role in clean energy production as well as in electrochemical processes and sensors. These gas-consuming electrodes are typically designed for liquid electrolyte systems such as phosphoric acid and alkaline fuel cells, and are commercially manufactured by hand or in a batch process. However, GDEs using the new electrolytes such as conductive polymer membranes demand improved electrode structures. This invention pertains to GDEs and gas diffusion media with new structures for systems using membrane electrode assemblies (MEAs), and automated methods of manufacture that lend themselves to continuous mass production Unexpected improvements in gas and vapor transport through the electrode are realized by incorporating a new dispersion process in the construction, reformulating the applied mix with solution additives, and creating a novel coating structure on a conductive web. Furthermore, combining these changes with a judicious choice in coating methodology allows one to produce these materials in a continuous, automated fashion.

...read moreread less

Scalable Phrase Mining for Ad-hoc Text Analytics

[...]

Srikanta Bedathur¹, Klaus Berberich¹, Jens Dittrich¹, Nikos Mamoulis¹, Gerhard Weikum¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

01 Jan 2009

TL;DR: This work develops preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases on ad-hoc subsets of the corpus, and investigates alternative definitions of phrase interestingness, based on the probability of phrase occurrences.

...read moreread less

Abstract: Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. The ad-hoc subset may be derived by means of a keyword query against the corpus, or by focusing on a particular time period. We investigate alternative definitions of phrase interestingness, based on the probability of phrase occurrences. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases on ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.

...read moreread less

Book•DOI•

Advances in spatial and temporal databases : 11th International Symposium, SSTD 2009, Aalborg, Denmark, July 8-10, 2009 : proceedings

[...]

Nikos Mamoulis¹, Thomas Seidl², Torben Bach Pedersen, Kristian Torp, Ira Assent - Show less +1 more•Institutions (2)

University of Hong Kong¹, RWTH Aachen University²

01 Jan 2009-Lecture Notes in Computer Science

TL;DR: This work discusses the development of Spatio-Tempo-Social, a model for social media-based social media management, and its applications in the context of Geospatial Semantic Web research.

...read moreread less

Abstract: Keynotes.- Spatio-Tempo-Social: Learning from and about Humans with Social Media.- Recent Advances in Worst-Case Efficient Range Search Indexing.- Design and Architecture of GIS Servers for Web Based Information Systems - The ArcGIS Server System.- Research Sessions.- Versioning of Network Models in a Multiuser Environment.- Efficient Continuous Nearest Neighbor Query in Spatial Networks Using Euclidean Restriction.- Discovering Teleconnected Flow Anomalies: A Relationship Analysis of Dynamic Neighborhoods (RAD) Approach.- Continuous Spatial Authentication.- Query Integrity Assurance of Location-Based Services Accessing Outsourced Spatial Databases.- A Hybrid Technique for Private Location-Based Queries with Database Protection.- Spatial Cloaking Revisited: Distinguishing Information Leakage from Anonymity.- Analyzing Trajectories Using Uncertainty and Background Information.- Route Search over Probabilistic Geospatial Data.- Utilizing Wireless Positioning as a Tracking Data Source.- Indexing Moving Objects Using Short-Lived Throwaway Indexes.- Indexing the Trajectories of Moving Objects in Symbolic Indoor Space.- Monitoring Orientation of Moving Objects around Focal Points.- Spatial Skyline Queries: An Efficient Geometric Algorithm.- Incremental Reverse Nearest Neighbor Ranking in Vector Spaces.- Approximate Evaluation of Range Nearest Neighbor Queries with Quality Guarantee.- Time-Aware Similarity Search: A Metric-Temporal Representation for Complex Data.- Adaptive Management of Multigranular Spatio-Temporal Object Attributes.- TOQL: Temporal Ontology Querying Language.- Supporting Frameworks for the Geospatial Semantic Web.- Short Papers.- Efficient Construction of Safe Regions for Moving kNN Queries over Dynamic Datasets.- Robust Adaptable Video Copy Detection.- Efficient Evaluation of Static and Dynamic Optimal Route Queries.- Trajectory Compression under Network Constraints.- Exploring Spatio-Temporal Features for Traffic Estimation on Road Networks.- A Location Privacy Aware Friend Locator.- Semantic Trajectory Compression.- Demonstrations.- Pretty Easy Pervasive Positioning.- Spatiotemporal Pattern Queries in Secondo.- Nearest Neighbor Search on Moving Object Trajectories in Secondo.- A Visual Analytics Toolkit for Cluster-Based Classification of Mobility Data.- ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series.- Hide & Crypt: Protecting Privacy in Proximity-Based Services.- ROOTS, The ROving Objects Trip Simulator.- The TOQL System.- PDA: A Flexible and Efficient Personal Decision Assistant.- A Refined Mobile Map Format and Its Application.

...read moreread less

Proceedings of the 11th International Symposium on Advances in Spatial and Temporal Databases

[...]

Nikos Mamoulis¹, Thomas Seidl², Torben Bach Pedersen³, Kristian Torp³, Ira Assent³ - Show less +1 more•Institutions (3)

University of Hong Kong¹, RWTH Aachen University², Aalborg University³

30 Jun 2009

Proceedings Article•DOI•

Thresholded Range Aggregation in Sensor Networks

[...]

Zhifeng Lin¹, Man Lung Yiu², Nikos Mamoulis¹•Institutions (2)

University of Hong Kong¹, Aalborg University²

18 May 2009

TL;DR: This work proposes a novel query called thresholded range aggregate query (TRA), which retrieves the IDs of the sensors for which the average measurement in their neighborhood exceeds a user-given threshold, and provides results that are robust against individual sensor abnormality, and yet precisely summarize the sensors' status in each local region.

...read moreread less

Abstract: The recent advances in wireless sensor technologies (e.g., Mica, Telos motes) enable the economic deployment of lightweight sensors for capturing data from their surrounding environment, serving various monitoring tasks, like forest wildfire alarming and volcano activity. We propose a novel query called thresholded range aggregate query (TRA), which retrieves the IDs of the sensors for which the average measurement in their neighborhood exceeds a user-given threshold. This query provides results that they are robust against individual sensor abnormality, and yet precisely summarize the sensors' status in each local region. In order to process the (snapshot) TRA query, we develop energy-efficient protocols based on appropriate operators and filters in sensor nodes. The design of these operators and filters is non-trivial, due to the fact that each sensor measurement influences the actual results of other nodes in its neighborhood region. Furthermore, we extend our protocols for continuous evaluation of the TRA query. Experimental results show that our proposed solutions indeed offer substantial energy savings for both real and synthetic sensor networks.

...read moreread less

Book Chapter•DOI•

XML Data Integration Using Fragment Join

[...]

Jian Gong¹, David W. Cheung¹, Nikos Mamoulis¹, Ben Kao¹•Institutions (1)

University of Hong Kong¹

16 Mar 2009

TL;DR: The fragment join operator is developed -- a general operator that merges two XML fragments based on their overlapping components that defines schema-independent query processing over multiple data sources and proposes a novel framework to solve this problem.

...read moreread less

Abstract: We study the problem of answering XML queries over multiple data sources under a schema-independent scenario where XML schemas and schema mappings are unavailable. We develop the fragment join operator -- a general operator that merges two XML fragments based on their overlapping components. We formally define the operator and propose an efficient algorithm for implementing it. We define schema-independent query processing over multiple data sources and propose a novel framework to solve this problem. We provide theoretical analysis and experimental results that show that our approaches are both effective and efficient.

...read moreread less

Posted Content•

Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)

[...]

Thomas Bernecker, Hans-Peter Kriegel, Nikos Mamoulis, Matthias Renz, Andreas Züfle - Show less +1 more

16 Jul 2009-arXiv: Databases

TL;DR: A scalable approach for probabilistic top-k similarity ranking on uncertain vector data that reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to a reference object.

...read moreread less

Abstract: This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying a dynamic programming approach of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.

...read moreread less