Showing papers by "Bin Yao published in 2013"

PDF

Open Access

Proceedings Article•DOI•

[...]

Bin Yao¹, Feifei Li², Xiaokui Xiao³•Institutions (3)

Shanghai Jiao Tong University¹, University of Utah², Nanyang Technological University³

08 Apr 2013

TL;DR: New SNN methods are designed, which provide customizable tradeoff between efficiency and communication cost, and are as secure as the encryption scheme E used to encrypt the query and the database, where E can be any well-established encryption schemes.

...read moreread less

Abstract: In this paper, we investigate the secure nearest neighbor (SNN) problem, in which a client issues an encrypted query point E(q) to a cloud service provider and asks for an encrypted data point in E(D) (the encrypted database) that is closest to the query point, without allowing the server to learn the plaintexts of the data or the query (and its result). We show that efficient attacks exist for existing SNN methods [21], [15], even though they were claimed to be secure in standard security models (such as indistinguishability under chosen plaintext or ciphertext attacks). We also establish a relationship between the SNN problem and the order-preserving encryption (OPE) problem from the cryptography field [6], [5], and we show that SNN is at least as hard as OPE. Since it is impossible to construct secure OPE schemes in standard security models [6], [5], our results imply that one cannot expect to find the exact (encrypted) nearest neighbor based on only E(q) and E(D). Given this hardness result, we design new SNN methods by asking the server, given only E(q) and E(D), to return a relevant (encrypted) partition E(G) from E(D) (i.e., G ⊆ D), such that that E(G) is guaranteed to contain the answer for the SNN query. Our methods provide customizable tradeoff between efficiency and communication cost, and they are as secure as the encryption scheme E used to encrypt the query and the database, where E can be any well-established encryption schemes.

...read moreread less

219 citations

Journal Article•DOI•

Spatial Approximate String Search

[...]

Feifei Li¹, Bin Yao², Mingwang Tang¹, Marios Hadjieleftheriou³•Institutions (3)

University of Utah¹, Shanghai Jiao Tong University², AT&T Labs³

01 Jun 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work investigates range queries augmented with a string similarity search predicate in both euclidean space and road networks and proposes a novel exact method, RSASSOL, which significantly outperforms the baseline algorithm in practice.

...read moreread less

Abstract: This work deals with the approximate string search in large spatial databases. Specifically, we investigate range queries augmented with a string similarity search predicate in both euclidean space and road networks. We dub this query the spatial approximate string (SAS) query. In euclidean space, we propose an approximate solution, the MHR-tree, which embeds min-wise signatures into an R-tree. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the subtree of u. We analyze the pruning functionality of such signatures based on the set resemblance between the query string and the q-grams from the subtrees of index nodes. We also discuss how to estimate the selectivity of a SAS query in euclidean space, for which we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information stored in the tree. For queries on road networks, we propose a novel exact method, RSASSOL, which significantly outperforms the baseline algorithm in practice. The RSASSOL combines the q-gram-based inverted lists and the reference nodes based pruning. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our approaches.

...read moreread less

45 citations

Book Chapter•DOI•

A Generic Tree-Like Index Framework in the Cloud

[...]

Yue Yin¹, Bin Yao¹, Yao Shen¹, Minyi Guo¹, Changliang Xu² - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, Alibaba Group²

13 Oct 2013

TL;DR: This study presents a novel tree based index scheme that incorporates and extends the functionality of Hadoop to create a fully parallel index system and uses the MapReduce framework to create an index and publish the index meta information and write it into a meta table.

...read moreread less

Abstract: In this study, we present a novel tree based index scheme for efficient indexing and serving large datasets in the cloud. It incorporates and extends the functionality of Hadoop to create a fully parallel index system. Our new scheme can be summarized as follows. First, we leverage the MapReduce framework to create an index, then publish the index meta information and write it into a meta table. Second, we use the meta information to help the system adopting an efficient method to handle a given query. Finally, we optimize the system by using cache mechanism. We conduct extensive experiments on the Hadoop cluster to demonstrate the scalability, availability and efficiency of the proposed index framework.

...read moreread less

Posted Content•

Explicit and Implicit Constrained-Space Probabilistic Threshold Range Queries for Moving Objects

[...]

Zhi-Jie Wang, Bin Yao, Minyi Guo

01 Nov 2013-arXiv: Databases

TL;DR: This work develops targeted solutions to the CSPTRQ problem and demonstrates the efficiency and effectiveness of the proposed methods through extensive experiments.

...read moreread less

Abstract: This paper studies the constrained-space probabilistic threshold range query (CSPTRQ) for moving objects. We differentiate two kinds of CSPTRQs: implicit and explicit ones. Specifically, for each moving object $o$, we assume $o$ cannot be located in some specific areas, we model its location as a closed region, $u$, together with a probability density function, and model a query range, $R$, as an arbitrary polygon. An implicit CSPTRQ can be reduced to a search (over all the $u$) that returns a set of objects, which have probabilities higher than a probability threshold $p_t$ to be located in $R$, where $0\leq p_t\leq 1$. In contrast, an explicit CSPTRQ returns a set of tuples in form of ($o$, $p$) such that $p\geq p_t$, where $p$ is the probability of $o$ being located in $R$. A straightforward adaptation of existing method is inefficient due to its weak pruning/validating capability. In order to efficiently process such queries, we propose targeted solutions, in which three main ideas are incorporated: (1) swapping the order of geometric operations based on the computation duality; (2) pruning unrelated objects in the early stages using the location unreachability; and (3) computing the probability using the multi-step mechanism. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.

...read moreread less