Home
/
Authors
/
Bin Yao

Author

Bin Yao

Other affiliations: Shenzhen University, Florida State University

Bio: Bin Yao is an academic researcher from Shanghai Jiao Tong University. The author has contributed to research in topics: Spatial query & Tree (data structure). The author has an hindex of 20, co-authored 79 publications receiving 1540 citations. Previous affiliations of Bin Yao include Shenzhen University & Florida State University.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Simba: Efficient In-Memory Spatial Analytics

[...]

Dong Xie¹, Feifei Li¹, Bin Yao², Gefei Li², Liang Zhou², Minyi Guo² - Show less +2 more•Institutions (2)

University of Utah¹, Shanghai Jiao Tong University²

14 Jun 2016

TL;DR: Simba is a scalable and efficient in-memory spatial query processing and analytics for big spatial data that extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API.

...read moreread less

Abstract: Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are disk-based and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other spatial analytics system.

...read moreread less

228 citations

Proceedings Article•DOI•

Secure nearest neighbor revisited

[...]

Bin Yao¹, Feifei Li², Xiaokui Xiao³•Institutions (3)

Shanghai Jiao Tong University¹, University of Utah², Nanyang Technological University³

08 Apr 2013

TL;DR: New SNN methods are designed, which provide customizable tradeoff between efficiency and communication cost, and are as secure as the encryption scheme E used to encrypt the query and the database, where E can be any well-established encryption schemes.

...read moreread less

Abstract: In this paper, we investigate the secure nearest neighbor (SNN) problem, in which a client issues an encrypted query point E(q) to a cloud service provider and asks for an encrypted data point in E(D) (the encrypted database) that is closest to the query point, without allowing the server to learn the plaintexts of the data or the query (and its result). We show that efficient attacks exist for existing SNN methods [21], [15], even though they were claimed to be secure in standard security models (such as indistinguishability under chosen plaintext or ciphertext attacks). We also establish a relationship between the SNN problem and the order-preserving encryption (OPE) problem from the cryptography field [6], [5], and we show that SNN is at least as hard as OPE. Since it is impossible to construct secure OPE schemes in standard security models [6], [5], our results imply that one cannot expect to find the exact (encrypted) nearest neighbor based on only E(q) and E(D). Given this hardness result, we design new SNN methods by asking the server, given only E(q) and E(D), to return a relevant (encrypted) partition E(G) from E(D) (i.e., G ⊆ D), such that that E(G) is guaranteed to contain the answer for the SNN query. Our methods provide customizable tradeoff between efficiency and communication cost, and they are as secure as the encryption scheme E used to encrypt the query and the database, where E can be any well-established encryption schemes.

...read moreread less

219 citations

Proceedings Article•DOI•

Optimal location queries in road network databases

[...]

Xiaokui Xiao¹, Bin Yao², Feifei Li²•Institutions (2)

Nanyang Technological University¹, Florida State University²

11 Apr 2011

TL;DR: A unified framework is proposed that addresses three variants of OL queries that find important applications in practice, and is instantiate the framework with several novel query processing algorithms.

...read moreread less

Abstract: Optimal location (OL) queries are a type of spatial queries particularly useful for the strategic planning of resources. Given a set of existing facilities and a set of clients, an OL query asks for a location to build a new facility that optimizes a certain cost metric (defined based on the distances between the clients and the facilities). Several techniques have been proposed to address OL queries, assuming that all clients and facilities reside in an L p space. In practice, however, movements between spatial locations are usually confined by the underlying road network, and hence, the actual distance between two locations can differ significantly from their L p distance. Motivated by the deficiency of the existing techniques, this paper presents the first study on OL queries in road networks. We propose a unified framework that addresses three variants of OL queries that find important applications in practice, and we instantiate the framework with several novel query processing algorithms. We demonstrate the efficiency of our solutions through extensive experiments with real data.

...read moreread less

111 citations

Proceedings Article•DOI•

Trichromatic Online Matching in Real-Time Spatial Crowdsourcing

[...]

Tianshu Song¹, Yongxin Tong¹, Libin Wang¹, Jieying She², Bin Yao³, Lei Chen², Ke Xu¹ - Show less +3 more•Institutions (3)

Beihang University¹, Hong Kong University of Science and Technology², Shanghai Jiao Tong University³

19 Apr 2017

TL;DR: This paper formally defines a novel dynamic online task assignment problem, called the trichromatic online matching in real-time spatial crowdsourcing (TOM) problem, which is proven to be NP-hard and presents a threshold-based randomized algorithm that not only guarantees a tighter competitive ratio but also includes an adaptive optimization technique, which can quickly learn the optimal threshold for the randomized algorithm.

...read moreread less

Abstract: The prevalence of mobile Internet techniques and Online-To-Offline (O2O) business models has led the emergence of various spatial crowdsourcing (SC) platforms in our daily life. A core issue of SC is to assign real-time tasks to suitable crowd workers. Existing approaches usually focus on the matching of two types of objects, tasks and workers, or assume the static offline scenarios, where the spatio-temporal information of all the tasks and workers is known in advance. Recently, some new emerging O2O applications incur new challenges: SC platforms need to assign three types of objects, tasks, workers and workplaces, and support dynamic real-time online scenarios, where the existing solutions cannot handle. In this paper, based on the aforementioned challenges, we formally define a novel dynamic online task assignment problem, called the trichromatic online matching in real-time spatial crowdsourcing (TOM) problem, which is proven to be NP-hard. Thus, we first devise an efficient greedy online algorithm. However, the greedy algorithm can be trapped into local optimal solutions easily. We then present a threshold-based randomized algorithm that not only guarantees a tighter competitive ratio but also includes an adaptive optimization technique, which can quickly learn the optimal threshold for the randomized algorithm. Finally, we verify the effectiveness and efficiency of the proposed methods through extensive experiments on real and synthetic datasets.

...read moreread less

95 citations

Proceedings Article•DOI•

K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free

[...]

Bin Yao¹, Feifei Li¹, Piyush Kumar¹•Institutions (1)

Florida State University¹

01 Mar 2010

TL;DR: This work designs algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan.

...read moreread less

Abstract: Finding the k nearest neighbors (kNN) of a query point, or a set of query points (kNN-Join) are fundamental problems in many application domains. Many previous efforts to solve these problems focused on spatial databases or stand-alone systems, where changes to the database engine may be required, which may limit their application on large data sets that are stored in a relational database management system. Furthermore, these methods may not automatically optimize kNN queries or kNN-Joins when additional query conditions are specified. In this work, we study both the kNN query and the kNN-Join in a relational database, possibly augmented with additional query conditions. We search for relational algorithms that require no changes to the database engine. The straightforward solution uses the user-defined-function (UDF) that a query optimizer cannot optimize.We design algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the “best” query plan. Using only a small constant number of random shifts for databases in any fixed dimension, our approach guarantees to find the approximate kNN with only logarithmic number of page accesses in expectation with a constant approximation ratio and it could be extended to find the exact kNN efficiently in any fixed dimension. Our design paradigm easily supports the kNN-Join and updates. Extensive experiments on large, real and synthetic, data sets confirm the efficiency and practicality of our approach.

...read moreread less

90 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17

Collapse

Cited by

PDF

Open Access

More filters

What is Twitter

[...]

Rizal Setya Perdana

01 Jan 2013

1,098 citations

Journal Article•DOI•

Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement

[...]

Zhangjie Fu¹, Xinle Wu¹, Chaowen Guan², Xingming Sun¹, Kui Ren² - Show less +1 more•Institutions (2)

Nanjing University of Information Science and Technology¹, University at Buffalo²

28 Jul 2016-IEEE Transactions on Information Forensics and Security

TL;DR: A new method of keyword transformation based on the uni-gram is developed, which will simultaneously improve the accuracy and creates the ability to handle other spelling mistakes and consider the keyword weight when selecting an adequate matching file set.

...read moreread less

Abstract: Keyword-based search over encrypted outsourced data has become an important tool in the current cloud computing scenario. The majority of the existing techniques are focusing on multi-keyword exact match or single keyword fuzzy search. However, those existing techniques find less practical significance in real-world applications compared with the multi-keyword fuzzy search technique over encrypted data. The first attempt to construct such a multi-keyword fuzzy search scheme was reported by Wang et al. , who used locality-sensitive hashing functions and Bloom filtering to meet the goal of multi-keyword fuzzy search. Nevertheless, Wang’s scheme was only effective for a one letter mistake in keyword but was not effective for other common spelling mistakes. Moreover, Wang’s scheme was vulnerable to server out-of-order problems during the ranking process and did not consider the keyword weight. In this paper, based on Wang et al. ’s scheme, we propose an efficient multi-keyword fuzzy ranked search scheme based on Wang et al. ’s scheme that is able to address the aforementioned problems. First, we develop a new method of keyword transformation based on the uni-gram, which will simultaneously improve the accuracy and creates the ability to handle other spelling mistakes. In addition, keywords with the same root can be queried using the stemming algorithm. Furthermore, we consider the keyword weight when selecting an adequate matching file set. Experiments using real-world data show that our scheme is practically efficient and achieve high accuracy.

...read moreread less

464 citations

Book•

Mathematical programming

[...]

Michael J. Todd

01 Jan 1997

437 citations

Proceedings Article•DOI•

Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking

[...]

Wenhai Sun¹, Bing Wang¹, Ning Cao², Ming Li³, Wenjing Lou¹, Y. Thomas Hou¹, Hui Li⁴ - Show less +3 more•Institutions (4)

Virginia Tech¹, Worcester Polytechnic Institute², Utah State University³, Xidian University⁴

08 May 2013

TL;DR: This paper presents a verifiable privacy-preserving multi-keyword text search (MTS) scheme with similarity-based ranking to address the problem of secure search functions over encrypted data and proposes two secure index schemes to meet the stringent privacy requirements under strong threat models.

...read moreread less

Abstract: With the increasing popularity of cloud computing, huge amount of documents are outsourced to the cloud for reduced management cost and ease of access. Although encryption helps protecting user data confidentiality, it leaves the well-functioning yet practically-efficient secure search functions over encrypted data a challenging problem. In this paper, we present a privacy-preserving multi-keyword text search (MTS) scheme with similarity-based ranking to address this problem. To support multi-keyword search and search result ranking, we propose to build the search index based on term frequency and the vector space model with cosine similarity measure to achieve higher search result accuracy. To improve the search efficiency, we propose a tree-based index structure and various adaption methods for multi-dimensional (MD) algorithm so that the practical search efficiency is much better than that of linear search. To further enhance the search privacy, we propose two secure index schemes to meet the stringent privacy requirements under strong threat models, i.e., known ciphertext model and known background model. Finally, we demonstrate the effectiveness and efficiency of the proposed schemes through extensive experimental evaluation.

...read moreread less

349 citations

Journal Article•DOI•

A framework for protecting worker location privacy in spatial crowdsourcing

[...]

Hien To¹, Gabriel Ghinita², Cyrus Shahabi¹•Institutions (2)

University of Southern California¹, University of Massachusetts Boston²

01 Jun 2014

TL;DR: This paper argues that existing location privacy techniques are not sufficient for SC, and a mechanism based on differential privacy and geocasting that achieves effective SC services while offering privacy guarantees to workers is proposed.

...read moreread less

Abstract: Spatial Crowdsourcing (SC) is a transformative platform that engages individuals, groups and communities in the act of collecting, analyzing, and disseminating environmental, social and other spatio-temporal information. The objective of SC is to outsource a set of spatio-temporal tasks to a set of workers, i.e., individuals with mobile devices that perform the tasks by physically traveling to specified locations of interest. However, current solutions require the workers, who in many cases are simply volunteering for a cause, to disclose their locations to untrustworthy entities. In this paper, we introduce a framework for protecting location privacy of workers participating in SC tasks. We argue that existing location privacy techniques are not sufficient for SC, and we propose a mechanism based on differential privacy and geocasting that achieves effective SC services while offering privacy guarantees to workers. We investigate analytical models and task assignment strategies that balance multiple crucial aspects of SC functionality, such as task completion rate, worker travel distance and system overhead. Extensive experimental results on real-world datasets show that the proposed technique protects workers' location privacy without incurring significant performance metrics penalties.

...read moreread less

343 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse