scispace - formally typeset
Search or ask a question

Showing papers by "Bin Yao published in 2013"


Proceedings ArticleDOI
08 Apr 2013
TL;DR: New SNN methods are designed, which provide customizable tradeoff between efficiency and communication cost, and are as secure as the encryption scheme E used to encrypt the query and the database, where E can be any well-established encryption schemes.
Abstract: In this paper, we investigate the secure nearest neighbor (SNN) problem, in which a client issues an encrypted query point E(q) to a cloud service provider and asks for an encrypted data point in E(D) (the encrypted database) that is closest to the query point, without allowing the server to learn the plaintexts of the data or the query (and its result). We show that efficient attacks exist for existing SNN methods [21], [15], even though they were claimed to be secure in standard security models (such as indistinguishability under chosen plaintext or ciphertext attacks). We also establish a relationship between the SNN problem and the order-preserving encryption (OPE) problem from the cryptography field [6], [5], and we show that SNN is at least as hard as OPE. Since it is impossible to construct secure OPE schemes in standard security models [6], [5], our results imply that one cannot expect to find the exact (encrypted) nearest neighbor based on only E(q) and E(D). Given this hardness result, we design new SNN methods by asking the server, given only E(q) and E(D), to return a relevant (encrypted) partition E(G) from E(D) (i.e., G ⊆ D), such that that E(G) is guaranteed to contain the answer for the SNN query. Our methods provide customizable tradeoff between efficiency and communication cost, and they are as secure as the encryption scheme E used to encrypt the query and the database, where E can be any well-established encryption schemes.

219 citations


Journal ArticleDOI
TL;DR: This work investigates range queries augmented with a string similarity search predicate in both euclidean space and road networks and proposes a novel exact method, RSASSOL, which significantly outperforms the baseline algorithm in practice.
Abstract: This work deals with the approximate string search in large spatial databases. Specifically, we investigate range queries augmented with a string similarity search predicate in both euclidean space and road networks. We dub this query the spatial approximate string (SAS) query. In euclidean space, we propose an approximate solution, the MHR-tree, which embeds min-wise signatures into an R-tree. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the subtree of u. We analyze the pruning functionality of such signatures based on the set resemblance between the query string and the q-grams from the subtrees of index nodes. We also discuss how to estimate the selectivity of a SAS query in euclidean space, for which we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information stored in the tree. For queries on road networks, we propose a novel exact method, RSASSOL, which significantly outperforms the baseline algorithm in practice. The RSASSOL combines the q-gram-based inverted lists and the reference nodes based pruning. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our approaches.

45 citations


Book ChapterDOI
Yue Yin1, Bin Yao1, Yao Shen1, Minyi Guo1, Changliang Xu2 
13 Oct 2013
TL;DR: This study presents a novel tree based index scheme that incorporates and extends the functionality of Hadoop to create a fully parallel index system and uses the MapReduce framework to create an index and publish the index meta information and write it into a meta table.
Abstract: In this study, we present a novel tree based index scheme for efficient indexing and serving large datasets in the cloud. It incorporates and extends the functionality of Hadoop to create a fully parallel index system. Our new scheme can be summarized as follows. First, we leverage the MapReduce framework to create an index, then publish the index meta information and write it into a meta table. Second, we use the meta information to help the system adopting an efficient method to handle a given query. Finally, we optimize the system by using cache mechanism. We conduct extensive experiments on the Hadoop cluster to demonstrate the scalability, availability and efficiency of the proposed index framework.

Posted Content
TL;DR: This work develops targeted solutions to the CSPTRQ problem and demonstrates the efficiency and effectiveness of the proposed methods through extensive experiments.
Abstract: This paper studies the constrained-space probabilistic threshold range query (CSPTRQ) for moving objects. We differentiate two kinds of CSPTRQs: implicit and explicit ones. Specifically, for each moving object $o$, we assume $o$ cannot be located in some specific areas, we model its location as a closed region, $u$, together with a probability density function, and model a query range, $R$, as an arbitrary polygon. An implicit CSPTRQ can be reduced to a search (over all the $u$) that returns a set of objects, which have probabilities higher than a probability threshold $p_t$ to be located in $R$, where $0\leq p_t\leq 1$. In contrast, an explicit CSPTRQ returns a set of tuples in form of ($o$, $p$) such that $p\geq p_t$, where $p$ is the probability of $o$ being located in $R$. A straightforward adaptation of existing method is inefficient due to its weak pruning/validating capability. In order to efficiently process such queries, we propose targeted solutions, in which three main ideas are incorporated: (1) swapping the order of geometric operations based on the computation duality; (2) pruning unrelated objects in the early stages using the location unreachability; and (3) computing the probability using the multi-step mechanism. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.