scispace - formally typeset
Search or ask a question

Showing papers by "Nikos Mamoulis published in 2012"


Journal ArticleDOI
01 Nov 2012
TL;DR: This paper combines ideas from state-of-the-art spatial distance join and set similarity join methods and proposes efficient algorithms that take into account both spatial and textual constraints and proposes a batch processing technique which boosts the performance of the approaches.
Abstract: Given a collection of objects that carry both spatial and textual information, a spatio-textual similarity join retrieves the pairs of objects that are spatially close and textually similar. As an example, consider a social network with spatially and textually tagged persons (i.e., their locations and profiles). A useful task (for friendship recommendation) would be to find pairs of persons that are spatially close and their profiles have a large overlap (i.e., they have common interests). Another application is data de-duplication (e.g., finding photographs which are spatially close to each other and high overlap in their descriptive tags). Despite the importance of this operation, there is very little previous work that studies its efficient evaluation and in fact under a different definition; only the best match for each object is identified. In this paper, we combine ideas from state-of-the-art spatial distance join and set similarity join methods and propose efficient algorithms that take into account both spatial and textual constraints. Besides, we propose a batch processing technique which boosts the performance of our approaches. An experimental evaluation using real and synthetic datasets shows that our optimized techniques are orders of magnitude faster than base-line solutions.

117 citations


Posted Content
TL;DR: In this article, the authors proposed an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record.
Abstract: In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniquesa) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.

77 citations


Proceedings ArticleDOI
01 Apr 2012
TL;DR: This work presents a framework for efficiently modeling and querying uncertain spatio-temporal data with three major advantages over previous work, to model possible object trajectories by stochastic processes.
Abstract: The problem of modeling and managing uncertain data has received a great deal of interest, due to its manifold applications in spatial, temporal, multimedia and sensor databases. There exists a wide range of work covering spatial uncertainty in the static (snapshot) case, where only one point of time is considered. In contrast, the problem of modeling and querying uncertain spatio-temporal data has only been treated as a simple extension of the spatial case, disregarding time dependencies between consecutive timestamps. In this work, we present a framework for efficiently modeling and querying uncertain spatio-temporal data. The key idea of our approach is to model possible object trajectories by stochastic processes. This approach has three major advantages over previous work. First it allows answering queries in accordance with the possible worlds model. Second, dependencies between object locations at consecutive points in time are taken into account. And third it is possible to reduce all queries on this model to simple matrix multiplications. Based on these concepts we propose efficient solutions for different probabilistic spatio-temporal queries. In an experimental evaluation we show that our approaches are several order of magnitudes faster than state-of-the-art competitors.

77 citations


Journal ArticleDOI
01 Jun 2012
TL;DR: An anonymization technique termed disassociation is proposed that preserves the original terms but hides the fact that two or more different terms appear in the same record, which is the first to employ such a technique to provide protection against identity disclosure in the publication of sparse multidimensional data.
Abstract: In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniques (a) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.

70 citations


Proceedings ArticleDOI
29 Oct 2012
TL;DR: This work proposes novel approximation techniques in order to probabilistically bound the uncertain movement of objects and experimentally shows that it accelerates the existing, scan-based approach by orders of magnitude.
Abstract: The advances in sensing and telecommunication technologies allow the collection and management of vast amounts of spatio-temporal data combining location and time information.Due to physical and resource limitations of data collection devices (e.g., RFID readers, GPS receivers and other sensors) data are typically collected only at discrete points of time. In-between these discrete time instances, the positions of tracked moving objects are uncertain. In this work, we propose novel approximation techniques in order to probabilistically bound the uncertain movement of objects; these techniques allow for efficient and effective filtering during query evaluation using an hierarchical index structure.To the best of our knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics. We experimentally show that it accelerates the existing, scan-based approach by orders of magnitude.

29 citations


Book ChapterDOI
29 Oct 2012
TL;DR: In this paper, the authors leverage the huge number of threads available on Graphics Processing Units (GPUs) to speed up composite-order bilinear pairing computation, achieving a record of 8.7ms per pairing on a 80-bit security level.
Abstract: Recently, composite-order bilinear pairing has been shown to be useful in many cryptographic constructions. However, it is time-costly to evaluate. This is because the composite order should be at least 1024bit and, hence, the elliptic curve group order n and base field become too large, rendering the bilinear pairing algorithm itself too slow to be practical (e.g., the Miller loop is Ω(n)). Thus, composite-order computation easily becomes the bottleneck of a cryptographic construction, especially, in the case where many pairings need to be evaluated at the same time. The existing solution to this problem that converts composite-order pairings to prime-order ones is only valid for certain constructions. In this paper, we leverage the huge number of threads available on Graphics Processing Units (GPUs) to speed up composite-order pairing computation. We investigate suitable SIMD algorithms for base/extension field, elliptic curve and bilinear pairing computation as well as mapping these algorithms into GPUs with careful considerations. Experimental results show that our method achieves a record of 8.7ms per pairing on a 80bit security level, which is a 20-fold speedup compared to the state-of-the-art CPU implementation. This result also opens the road to adopting higher security levels and using rich-resource parallel platforms, which for example are available in cloud computing. For example, we can achieve a record of 7 ×10−6 USD per pairing on the Amazon cloud computing environment.

19 citations