Showing papers by "Nikos Mamoulis published in 2010"

PDF

Open Access

Proceedings Article•DOI•

Non-homogeneous generalization in privacy preserving data publishing

[...]

Wai Kit Wong¹, Nikos Mamoulis¹, David W. Cheung¹•Institutions (1)

06 Jun 2010

TL;DR: This work provides a methodology for verifying whether a non-homogeneous generalization violates k-anonymity, and proposes a randomization method that prevents this type of attack and shows that k-Anonymity is not compromised by it.

...read moreread less

Abstract: Most previous research on privacy-preserving data publishing, based on the k-anonymity model, has followed the simplistic approach of homogeneously giving the same generalized value in all quasi-identifiers within a partition. We observe that the anonymization error can be reduced if we follow a non-homogeneous generalization approach for groups of size larger than k. Such an approach would allow tuples within a partition to take different generalized quasi-identifier values. Anonymization following this model is not trivial, as its direct application can easily violate k-anonymity. In addition, non-homogeneous generalization allows for additional types of attack, which should be considered in the process. We provide a methodology for verifying whether a non-homogeneous generalization violates k-anonymity. Then, we propose a technique that generates a non-homogeneous generalization for a partition and show that its result satisfies k-anonymity, however by straightforwardly applying it, privacy can be compromised if the attacker knows the anonymization algorithm. Based on this, we propose a randomization method that prevents this type of attack and show that k-anonymity is not compromised by it. Nonhomogeneous generalization can be used on top of any existing partitioning approach to improve its utility. In addition, we show that a new partitioning technique tailored for non-homogeneous generalization can further improve quality. A thorough experimental evaluation demonstrates that our methodology greatly improves the utility of anonymized data in practice.

...read moreread less

76 citations

Journal Article•DOI•

Scalable Probabilistic Similarity Ranking in Uncertain Databases

[...]

Thomas Bernecker¹, Hans-Peter Kriegel¹, Nikos Mamoulis², Matthias Renz¹, Andreas Zuefle¹ - Show less +1 more•Institutions (2)

Ludwig Maximilian University of Munich¹, University of Hong Kong²

01 Sep 2010-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A scalable approach for probabilistic top-k similarity ranking on uncertain vector data that reduces to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to a reference object.

...read moreread less

Abstract: This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that is assumed to be mutually exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying the Poisson binomial recurrence technique of quadratic complexity. In this paper, we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.

...read moreread less

41 citations

Journal Article•DOI•

Optimal matching between spatial datasets under capacity constraints

[...]

Leong Hou U¹, Kyriakos Mouratidis², Man Lung Yiu³, Nikos Mamoulis¹•Institutions (3)

University of Hong Kong¹, Singapore Management University², Hong Kong Polytechnic University³

03 May 2010-ACM Transactions on Database Systems

TL;DR: Efficient algorithms for optimal assignment that employ novel edge-pruning strategies, based on the spatial properties of the problem are proposed that provide a tunable trade-off between result accuracy and computation cost, abiding by theoretical quality guarantees.

...read moreread less

Abstract: Consider a set of customers (e.g., WiFi receivers) and a set of service providers (e.g., wireless access points), where each provider has a capacity and the quality of service offered to its customers is anti-proportional to their distance. The Capacity Constrained Assignment (CCA) is a matching between the two sets such that (i) each customer is assigned to at most one provider, (ii) every provider serves no more customers than its capacity, (iii) the maximum possible number of customers are served, and (iv) the sum of Euclidean distances within the assigned provider-customer pairs is minimized. Although max-flow algorithms are applicable to this problem, they require the complete distance-based bipartite graph between the customer and provider sets. For large spatial datasets, this graph is expensive to compute and it may be too large to fit in main memory. Motivated by this fact, we propose efficient algorithms for optimal assignment that employ novel edge-pruning strategies, based on the spatial properties of the problem. Additionally, we develop incremental techniques that maintain an optimal assignment (in the presence of updates) with a processing cost several times lower than CCA recomputation from scratch. Finally, we present approximate (i.e., suboptimal) CCA solutions that provide a tunable trade-off between result accuracy and computation cost, abiding by theoretical quality guarantees. A thorough experimental evaluation demonstrates the efficiency and practicality of the proposed techniques.

...read moreread less

36 citations

Proceedings Article•DOI•

Durable top-k search in document archives

[...]

Leong Hou U¹, Nikos Mamoulis¹, Klaus Berberich², Srikanta Bedathur²•Institutions (2)

University of Hong Kong¹, Max Planck Society²

06 Jun 2010

TL;DR: A new ranking problem in versioned databases of versioned objects which have different valid instances along a history is proposed and a special indexing technique for archived data is proposed, based on a shared execution paradigm and more efficient than the first approach.

...read moreread less

Abstract: We propose and study a new ranking problem in versioned databases. Consider a database of versioned objects which have different valid instances along a history (e.g., documents in a web archive). Durable top-k search finds the set of objects that are consistently in the top-k results of a query (e.g., a keyword query) throughout a given time interval (e.g., from June 2008 to May 2009). Existing work on temporal top-k queries mainly focuses on finding the most representative top-k elements within a time interval. Such methods are not readily applicable to durable top-k queries. To address this need, we propose two techniques that compute the durable top-k result. The first is adapted from the classic top-k rank aggregation algorithm NRA. The second technique is based on a shared execution paradigm and is more efficient than the first approach. In addition, we propose a special indexing technique for archived data. The index, coupled with a space partitioning technique, improves performance even further. We use data from Wikipedia and the Internet Archive to demonstrate the efficiency and effectiveness of our solutions.

...read moreread less

31 citations

Journal Article•DOI•

Interesting-phrase mining for ad-hoc text analytics

[...]

Srikanta Bedathur¹, Klaus Berberich¹, Jens Dittrich², Nikos Mamoulis¹, Gerhard Weikum¹ - Show less +1 more•Institutions (2)

Max Planck Society¹, Saarland University²

01 Sep 2010

TL;DR: This work develops preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus of New York Times news articles.

...read moreread less

Abstract: Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.

...read moreread less

30 citations

Journal Article•DOI•

Efficient skyline evaluation over partially ordered domains

[...]

Shiming Zhang¹, Nikos Mamoulis¹, David W. Cheung¹, Ben Kao¹•Institutions (1)

University of Hong Kong¹

01 Sep 2010

TL;DR: Two new methods for skyline evaluation in multidimensional data with totally ordered attribute domains are proposed, inspired by the lattice theorem and an off-the-shelf skyline algorithm, which are up to an order of magnitude more efficient than previous work and scale well with different problem parameters.

...read moreread less

Abstract: Although there has been a considerable body of work on skyline evaluation in multidimensional data with totally ordered attribute domains, there are only a few methods that consider attributes with partially ordered domains. Existing work maps each partially ordered domain to a total order and then adapts algorithms for totally-ordered domains to solve the problem. Nevertheless these methods either use stronger notions of dominance, which generate false positives, or require expensive dominance checks. In this paper, we propose two new methods, which do not have these drawbacks. The first method uses an appropriate mapping of a partial order to a total order, inspired by the lattice theorem and an off-the-shelf skyline algorithm. The second technique uses an appropriate storage and indexing approach, inspired by column stores, which enables efficient verification of whether a pair of objects are incompatible. We demonstrate that both our methods are up to an order of magnitude more efficient than previous work and scale well with different problem parameters, such as complexity of partial orders.

...read moreread less

25 citations

Book Chapter•DOI•

Oblivious transfer with access control: realizing disjunction without duplication

[...]

Ye Zhang¹, Man Ho Au², Duncan S. Wong³, Qiong Huang³, Nikos Mamoulis¹, David W. Cheung¹, Siu-Ming Yiu¹ - Show less +3 more•Institutions (3)

University of Hong Kong¹, University of Wollongong², City University of Hong Kong³

13 Dec 2010

TL;DR: This paper proposes a new AC-OT construction secure in the standard model that supports policy in disjunctive form directly, without the duplication issue in the previous construction.

...read moreread less

Abstract: Oblivious Transfer with Access Control (AC-OT) is a protocol which allows a user to obtain a database record with a credential satisfying the access policy of the record while the database server learns nothing about the record or the credential. The only AC-OT construction that supports policy in disjunctive form requires duplication of records in the database, each with a different conjunction of attributes (representing one possible criterion for accessing the record). In this paper, we propose a new AC-OT construction secure in the standard model. It supports policy in disjunctive form directly, without the above duplication issue. Due to the duplication issue in the previous construction, the size of an encrypted record is in O(Πi=1t ni) for a CNF policy (A1,1 ∨ ... ∨ A1,n1) ∧ ... ∧ (At,1 ∨ ... ∨ At,nt) and in O((kn)) for a k-of-n threshold gate. In our construction, the encrypted record size can be reduced to O(Σi=1t ni) for CNF form and O(n) for threshold case.

...read moreread less

23 citations

Proceedings Article•DOI•

A Distributed Technique for Dynamic Operator Placement in Wireless Sensor Networks

[...]

Georgios Chatzimilioudis¹, Nikos Mamoulis², Dimitrios Gunopulos³•Institutions (3)

University of California, Riverside¹, University of Hong Kong², National and Kapodistrian University of Athens³

23 May 2010

TL;DR: To the knowledge, this is the first optimal and distributed algorithm to solve the 1-median (Fermat node) problem and saves 30%-85% of the energy compared to previously proposed techniques.

...read moreread less

Abstract: We present an optimal distributed algorithm to adapt the placement of a single operator in high communication cost networks, such as a wireless sensor network. Our parameter-free algorithm finds the optimal node to host the operator with minimum communication cost overhead. Three techniques, proposed here, make this feature possible: 1) identifying the special, and most frequent case, where no flooding is needed, otherwise 2) limitation of the neighborhood to be flooded and 3) variable speed flooding and eves-dropping. When no flooding is needed the communication cost overhead for adapting the operator placement is negligible. In addition, our algorithm does not require any extra communication cost while the query is executed. In our experiments we show that for the rest of cases our algorithm saves 30%-85% of the energy compared to previously proposed techniques. To our knowledge this is the first optimal and distributed algorithm to solve the 1-median (Fermat node) problem.

...read moreread less

15 citations

Book Chapter•DOI•

Anonymous fuzzy identity-based encryption for similarity search

[...]

David W. Cheung¹, Nikos Mamoulis¹, Wai Kit Wong¹, Siu-Ming Yiu¹, Ye Zhang¹ - Show less +1 more•Institutions (1)

University of Hong Kong¹

15 Dec 2010

TL;DR: In this paper, the problem of predicate encryption was considered and the size of the ciphertext and the token were both O(m) and O(n) for the equality and inequality versions of the problem, respectively.

...read moreread less

Abstract: In this paper, we consider the problem of predicate encryption and focus on the predicate for testing whether the Hamming distance between the attribute X of a data item and a target V is equal to (or less than) a threshold t where X and V are of length m. Existing solutions either do not provide attribute protection or produce a big ciphertext of size O(2 m ). For the equality version of the problem, we provide a scheme which is match-concealing (MC) secure and the sizes of the ciphertext and token are both O(m). For the inequality version of the problem, we give a practical scheme, also achieving MC security, which produces a ciphertext with size \(O(m^{t_{max}})\) if the maximum value of t, t max , is known in advance and is a constant. We also show how to update the ciphertext if the user wants to increase t max without constructing the ciphertext from scratch.

...read moreread less

14 citations

Journal Article•DOI•

Continuous spatial assignment of moving users

[...]

Leong Hou U¹, Kyriakos Mouratidis², Nikos Mamoulis¹•Institutions (2)

University of Hong Kong¹, Singapore Management University²

01 Apr 2010

TL;DR: This paper considers the continuous assignment problem (CAP), where an optimal assignment must be constantly maintained between mobile users and a set of servers and proposes an algorithm that utilizes the geometric characteristics of the problem and significantly accelerates the initial assignment computation and its subsequent maintenance.

...read moreread less

Abstract: Consider a set of servers and a set of users, where each server has a coverage region (i.e., an area of service) and a capacity (i.e., a maximum number of users it can serve). Our task is to assign every user to one server subject to the coverage and capacity constraints. To offer the highest quality of service, we wish to minimize the average distance between users and their assigned server. This is an instance of a well-studied problem in operations research, termed optimal assignment. Even though there exist several solutions for the static case (where user locations are fixed), there is currently no method for dynamic settings. In this paper, we consider the continuous assignment problem (CAP), where an optimal assignment must be constantly maintained between mobile users and a set of servers. The fact that the users are mobile necessitates real-time reassignment so that the quality of service remains high (i.e., their distance from their assigned servers is minimized). The large scale and the time-critical nature of targeted applications require fast CAP solutions. We propose an algorithm that utilizes the geometric characteristics of the problem and significantly accelerates the initial assignment computation and its subsequent maintenance. Our method applies to different cost functions (e.g., average squared distance) and to any Minkowski distance metric (e.g., Euclidean, L 1 norm, etc.).

...read moreread less

11 citations

Journal Article•DOI•

A data-mining approach for multiple structural alignment of proteins.

[...]

Wing-Yan Siu¹, Nikos Mamoulis, Siu-Ming Yiu¹, Ho-Leung Chan•Institutions (1)

University of Hong Kong¹

28 Feb 2010-Bioinformation

TL;DR: This paper model the structural alignment of proteins as a combinatorial problem, and proposes a data-mining approach that considers each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining.

...read moreread less

Abstract: Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools.

...read moreread less