scispace - formally typeset
Search or ask a question

Showing papers by "Nikos Mamoulis published in 2009"


Proceedings ArticleDOI
29 Jun 2009
TL;DR: A new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product and is shown to resist practical attacks of a different background knowledge level, at a different overhead cost.
Abstract: Service providers like Google and Amazon are moving into the SaaS (Software as a Service) business. They turn their huge infrastructure into a cloud-computing environment and aggressively recruit businesses to run applications on their platforms. To enforce security and privacy on such a service model, we need to protect the data running on the platform. Unfortunately, traditional encryption methods that aim at providing "unbreakable" protection are often not adequate because they do not support the execution of applications such as database queries on the encrypted data. In this paper we discuss the general problem of secure computation on an encrypted database and propose a SCONEDB Secure Computation ON an Encrypted DataBase) model, which captures the execution and security requirements. As a case study, we focus on the problem of k-nearest neighbor (kNN) computation on an encrypted database. We develop a new asymmetric scalar-product-preserving encryption (ASPE) that preserves a special type of scalar product. We use APSE to construct two secure schemes that support kNN computation on encrypted data; each of these schemes is shown to resist practical attacks of a different background knowledge level, at a different overhead cost. Extensive performance studies are carried out to evaluate the overhead and the efficiency of the schemes.

801 citations


Proceedings ArticleDOI
29 Jun 2009
TL;DR: A dynamic indexing technique for skyline points that can be integrated into state-of-the-art sort-based skyline algorithms to boost their computational performance and scales well with the input size and dimensionality.
Abstract: The skyline operator returns from a set of multi-dimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multi-objective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority of them focuses on minimizing the I/O cost. However, in high dimensional spaces, the problem can easily become CPU-bound due to the large number of computations required for comparing objects with current skyline points while scanning the database. Based on this observation, we propose a dynamic indexing technique for skyline points that can be integrated into state-of-the-art sort-based skyline algorithms to boost their computational performance. The new indexing and dominance checking approach is supported by a theoretical analysis, while our experiments show that it scales well with the input size and dimensionality not only because unnecessary dominance checks are avoided but also because it allows efficient dominance checking with the help of bitwise operations.

142 citations


Journal ArticleDOI
01 Jun 2009
TL;DR: An extensive study on the evaluation of top-k dominating queries, which proposes a set of algorithms that apply on indexed multi-dimensional data and investigates query evaluation on data that are not indexed.
Abstract: The top-k dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of top-k and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, top-k dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of top-k dominating queries. First, we propose a set of algorithms that apply on indexed multi-dimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skyline-based approach. We also illustrate the applicability of this multi-dimensional analysis query by studying the meaningfulness of its results on real data.

111 citations


Journal ArticleDOI
TL;DR: This article focuses on one-dimensional (i.e., single-attribute) quasi-identifiers, and study the properties of optimal solutions under the k-anonymity and l-diversity models for the privacy-constrained and the accuracy- Constrained anonymization problems.
Abstract: Recent research studied the problem of publishing microdata without revealing sensitive information, leading to the privacy-preserving paradigms of k-anonymity and l-diversity. k-anonymity protects against the identification of an individual's record. l-diversity, in addition, safeguards against the association of an individual with specific sensitive information. However, existing approaches suffer from at least one of the following drawbacks: (i) l-diversification is solved by techniques developed for the simpler k-anonymization problem, causing unnecessary information loss. (ii) The anonymization process is inefficient in terms of computational and I/O cost. (iii) Previous research focused exclusively on the privacy-constrained problem and ignored the equally important accuracy-constrained (or dual) anonymization problem.In this article, we propose a framework for efficient anonymization of microdata that addresses these deficiencies. First, we focus on one-dimensional (i.e., single-attribute) quasi-identifiers, and study the properties of optimal solutions under the k-anonymity and l-diversity models for the privacy-constrained (i.e., direct) and the accuracy-constrained (i.e., dual) anonymization problems. Guided by these properties, we develop efficient heuristics to solve the one-dimensional problems in linear time. Finally, we generalize our solutions to multidimensional quasi-identifiers using space-mapping techniques. Extensive experimental evaluation shows that our techniques clearly outperform the existing approaches in terms of execution time and information loss.

88 citations


Journal ArticleDOI
TL;DR: This work proposes adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conducts an extensive experimental study, which evaluates the effectiveness of proposed solutions.
Abstract: We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions.

77 citations


Journal ArticleDOI
01 Aug 2009
TL;DR: This work proposes and develops an audit environment, which consists of a database transformation method and a result verification method that addresses the integrity issue in the outsourcing process, i.e., how the data owner verifies the correctness of the mining results.
Abstract: Finding frequent itemsets is the most costly task in association rule mining. Outsourcing this task to a service provider brings several benefits to the data owner such as cost relief and a less commitment to storage and computational resources. Mining results, however, can be corrupted if the service provider (i) is honest but makes mistakes in the mining process, or (ii) is lazy and reduces costly computation, returning incomplete results, or (iii) is malicious and contaminates the mining results. We address the integrity issue in the outsourcing process, i.e., how the data owner verifies the correctness of the mining results. For this purpose, we propose and develop an audit environment, which consists of a database transformation method and a result verification method. The main component of our audit environment is an artificial itemset planting (AIP) technique. We provide a theoretical foundation on our technique by proving its appropriateness and showing probabilistic guarantees about the correctness of the verification process. Through analytical and experimental studies, we show that our technique is both effective and efficient.

40 citations


Proceedings ArticleDOI
18 May 2009
TL;DR: It is shown that minimizing the communication cost for multi-predicate queries is NP-hard and a dynamic programming algorithm is proposed to compute the optimal solution for small problem instances and the low complexity heuristic algorithm is shown to be scalable and robust to different query characteristics and network size.
Abstract: This work aims at minimize the cost of answering snapshot multi-predicate queries in high-communication-cost networks. High-communication-cost (HCC) networks is a family of networks where communicating data is very demanding in resources, for example in wireless sensor networks transmitting data drains the battery life of sensors involved. The important class of multi-predicate queries in horizontally or vertically distributed databases is addressed. We show that minimizing the communication cost for multi-predicate queries is NP-hard and we propose a dynamic programming algorithm to compute the optimal solution for small problem instances. We also propose a low complexity, approximate, heuristic algorithm for solving larger problem instances efficiently and running it on nodes with low computational power (e.g. sensors). Finally, we present a variant of the Fermat point problem where distances between points are minimal paths in a weighted graph, and propose a solution. An extensive experimental evaluation compares the proposed algorithms to the best known technique used to evaluate queries in wireless sensor networks and shows improvement of 10\% up to 95\%. The low complexity heuristic algorithm is also shown to be scalable and robust to different query characteristics and network size.

21 citations


Journal ArticleDOI
TL;DR: This work develops cost models that suggest the appropriateness of each protocol, based on various factors, including selectivity of query elements, energy requirements for sensing, and network topology, and devise protocols for ‘in-network’ evaluation of spatial join queries, aiming at the minimization of power consumption.
Abstract: We study the continuous evaluation of spatial join queries and extensions thereof, defined by interesting combinations of sensor readings (events) that co-occur in a spatial neighborhood. An example of such a pattern is "a high temperature reading in the vicinity of at least four high-pressure readings". We devise protocols for `in-network' evaluation of this class of queries, aiming at the minimization of power consumption. In addition, we develop cost models that suggest the appropriateness of each protocol, based on various factors, including selectivity of query elements, energy requirements for sensing, and network topology. Finally, we experimentally compare the effectiveness of the proposed solutions on an experimental platform that emulates real sensor networks.

12 citations


Proceedings ArticleDOI
29 Mar 2009
TL;DR: The algorithm is an iterative process, which finds at each step the query-object pair with the highest score and removes it from the problem, and is done efficiently by maintaining and matching the skyline of the remaining objects with the remaining queries at each steps.
Abstract: Consider multiple users searching for a hotel room, based on size, cost, distance to the beach, etc. Users may have variable preferences expressed by different weights on the attributes of the searched objects. Although individual preference queries can be evaluated by selecting the object in the database with the highest aggregate score, in the case of multiple requests at the same time, a single object cannot be assigned to more than one users. The challenge is to compute a fair 1-1 matching between the queries and a subset of the objects. We model this as a stable-marriage problem and propose an efficient technique for its evaluation. Our algorithm is an iterative process, which finds at each step the query-object pair with the highest score and removes it from the problem. This is done efficiently by maintaining and matching the skyline of the remaining objects with the remaining queries at each step. An experimental evaluation with synthetic and real data confirms the effectiveness of our method.

10 citations


Journal ArticleDOI
01 Aug 2009
TL;DR: At its core lies a novel skyline maintenance technique, which is proved to be I/O optimal and outperforms adaptations of previous methods by several orders of magnitude.
Abstract: Consider an internship assignment system, where at the end of each academic year, interested university students search and apply for available positions, based on their preferences (e.g., nature of the job, salary, office location, etc). In a variety of facility, task or position assignment contexts, users have personal preferences expressed by different weights on the attributes of the searched objects. Although individual preference queries can be evaluated by selecting the object in the database with the highest aggregate score, in the case of multiple simultaneous requests, a single object cannot be assigned to more than one users. The challenge is to compute a fair 1--1 matching between the queries and the objects. We model this as a stable-marriage problem and propose an efficient method for its processing. Our algorithm iteratively finds stable query-object pairs and removes them from the problem. At its core lies a novel skyline maintenance technique, which we prove to be I/O optimal. We conduct an extensive experimental evaluation using real and synthetic data, which demonstrates that our approach outperforms adaptations of previous methods by several orders of magnitude.

10 citations



01 Jan 2009
TL;DR: Unexpected improvements in gas and vapor transport through the electrode are realized by incorporating a new dispersion process in the construction, reformulating the applied mix with solution additives, and creating a novel coating structure on a conductive web.
Abstract: Gas Diffusion Electrodes (GDES) play a pivotal role in clean energy production as well as in electrochemical processes and sensors. These gas-consuming electrodes are typically designed for liquid electrolyte systems such as phosphoric acid and alkaline fuel cells, and are commercially manufactured by hand or in a batch process. However, GDEs using the new electrolytes such as conductive polymer membranes demand improved electrode structures. This invention pertains to GDEs and gas diffusion media with new structures for systems using membrane electrode assemblies (MEAs), and automated methods of manufacture that lend themselves to continuous mass production Unexpected improvements in gas and vapor transport through the electrode are realized by incorporating a new dispersion process in the construction, reformulating the applied mix with solution additives, and creating a novel coating structure on a conductive web. Furthermore, combining these changes with a judicious choice in coating methodology allows one to produce these materials in a continuous, automated fashion.

01 Jan 2009
TL;DR: This work develops preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases on ad-hoc subsets of the corpus, and investigates alternative definitions of phrase interestingness, based on the probability of phrase occurrences.
Abstract: Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. The ad-hoc subset may be derived by means of a keyword query against the corpus, or by focusing on a particular time period. We investigate alternative definitions of phrase interestingness, based on the probability of phrase occurrences. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases on ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.

BookDOI
TL;DR: This work discusses the development of Spatio-Tempo-Social, a model for social media-based social media management, and its applications in the context of Geospatial Semantic Web research.
Abstract: Keynotes.- Spatio-Tempo-Social: Learning from and about Humans with Social Media.- Recent Advances in Worst-Case Efficient Range Search Indexing.- Design and Architecture of GIS Servers for Web Based Information Systems - The ArcGIS Server System.- Research Sessions.- Versioning of Network Models in a Multiuser Environment.- Efficient Continuous Nearest Neighbor Query in Spatial Networks Using Euclidean Restriction.- Discovering Teleconnected Flow Anomalies: A Relationship Analysis of Dynamic Neighborhoods (RAD) Approach.- Continuous Spatial Authentication.- Query Integrity Assurance of Location-Based Services Accessing Outsourced Spatial Databases.- A Hybrid Technique for Private Location-Based Queries with Database Protection.- Spatial Cloaking Revisited: Distinguishing Information Leakage from Anonymity.- Analyzing Trajectories Using Uncertainty and Background Information.- Route Search over Probabilistic Geospatial Data.- Utilizing Wireless Positioning as a Tracking Data Source.- Indexing Moving Objects Using Short-Lived Throwaway Indexes.- Indexing the Trajectories of Moving Objects in Symbolic Indoor Space.- Monitoring Orientation of Moving Objects around Focal Points.- Spatial Skyline Queries: An Efficient Geometric Algorithm.- Incremental Reverse Nearest Neighbor Ranking in Vector Spaces.- Approximate Evaluation of Range Nearest Neighbor Queries with Quality Guarantee.- Time-Aware Similarity Search: A Metric-Temporal Representation for Complex Data.- Adaptive Management of Multigranular Spatio-Temporal Object Attributes.- TOQL: Temporal Ontology Querying Language.- Supporting Frameworks for the Geospatial Semantic Web.- Short Papers.- Efficient Construction of Safe Regions for Moving kNN Queries over Dynamic Datasets.- Robust Adaptable Video Copy Detection.- Efficient Evaluation of Static and Dynamic Optimal Route Queries.- Trajectory Compression under Network Constraints.- Exploring Spatio-Temporal Features for Traffic Estimation on Road Networks.- A Location Privacy Aware Friend Locator.- Semantic Trajectory Compression.- Demonstrations.- Pretty Easy Pervasive Positioning.- Spatiotemporal Pattern Queries in Secondo.- Nearest Neighbor Search on Moving Object Trajectories in Secondo.- A Visual Analytics Toolkit for Cluster-Based Classification of Mobility Data.- ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series.- Hide & Crypt: Protecting Privacy in Proximity-Based Services.- ROOTS, The ROving Objects Trip Simulator.- The TOQL System.- PDA: A Flexible and Efficient Personal Decision Assistant.- A Refined Mobile Map Format and Its Application.


Proceedings ArticleDOI
18 May 2009
TL;DR: This work proposes a novel query called thresholded range aggregate query (TRA), which retrieves the IDs of the sensors for which the average measurement in their neighborhood exceeds a user-given threshold, and provides results that are robust against individual sensor abnormality, and yet precisely summarize the sensors' status in each local region.
Abstract: The recent advances in wireless sensor technologies (e.g., Mica, Telos motes) enable the economic deployment of lightweight sensors for capturing data from their surrounding environment, serving various monitoring tasks, like forest wildfire alarming and volcano activity. We propose a novel query called thresholded range aggregate query (TRA), which retrieves the IDs of the sensors for which the average measurement in their neighborhood exceeds a user-given threshold. This query provides results that they are robust against individual sensor abnormality, and yet precisely summarize the sensors' status in each local region. In order to process the (snapshot) TRA query, we develop energy-efficient protocols based on appropriate operators and filters in sensor nodes. The design of these operators and filters is non-trivial, due to the fact that each sensor measurement influences the actual results of other nodes in its neighborhood region. Furthermore, we extend our protocols for continuous evaluation of the TRA query. Experimental results show that our proposed solutions indeed offer substantial energy savings for both real and synthetic sensor networks.

Book ChapterDOI
16 Mar 2009
TL;DR: The fragment join operator is developed -- a general operator that merges two XML fragments based on their overlapping components that defines schema-independent query processing over multiple data sources and proposes a novel framework to solve this problem.
Abstract: We study the problem of answering XML queries over multiple data sources under a schema-independent scenario where XML schemas and schema mappings are unavailable. We develop the fragment join operator -- a general operator that merges two XML fragments based on their overlapping components. We formally define the operator and propose an efficient algorithm for implementing it. We define schema-independent query processing over multiple data sources and propose a novel framework to solve this problem. We provide theoretical analysis and experimental results that show that our approaches are both effective and efficient.

Posted Content
TL;DR: A scalable approach for probabilistic top-k similarity ranking on uncertain vector data that reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to a reference object.
Abstract: This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying a dynamic programming approach of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.