Top 7 papers published by Bin Yao from Shanghai Jiao Tong University in 2016

Proceedings Article•DOI•

Simba: Efficient In-Memory Spatial Analytics

[...]

Dong Xie¹, Feifei Li¹, Bin Yao², Gefei Li², Liang Zhou², Minyi Guo² - Show less +2 more•Institutions (2)

University of Utah¹, Shanghai Jiao Tong University²

14 Jun 2016

TL;DR: Simba is a scalable and efficient in-memory spatial query processing and analytics for big spatial data that extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API.

...read moreread less

Abstract: Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are disk-based and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other spatial analytics system.

...read moreread less

228 citations

Proceedings Article•DOI•

Practical private shortest path computation based on Oblivious Storage

[...]

Dong Xie¹, Guanru Li¹, Bin Yao¹, Xuan Wei¹, Xiaokui Xiao², Yunjun Gao³, Minyi Guo¹ - Show less +3 more•Institutions (3)

Shanghai Jiao Tong University¹, Nanyang Technological University², Zhejiang University³

16 May 2016

TL;DR: This paper introduces a general system model based on the concept of Oblivious Storage (OS), which can deal with queries requiring strong privacy properties, and proposes a new oblivious shuffle algorithm to optimize an existing OS scheme.

...read moreread less

Abstract: As location-based services (LBSs) become popular, location-dependent queries have raised serious privacy concerns since they may disclose sensitive information in query processing. Among typical queries supported by LBSs, shortest path queries may reveal information about not only current locations of the clients, but also their potential destinations and travel plans. Unfortunately, existing methods for private shortest path computation suffer from issues of weak privacy property, low performance or poor scalability. In this paper, we aim at a strong privacy guarantee, where the adversary cannot infer almost any information about the queries, with better performance and scalability. To achieve this goal, we introduce a general system model based on the concept of Oblivious Storage (OS), which can deal with queries requiring strong privacy properties. Furthermore, we propose a new oblivious shuffle algorithm to optimize an existing OS scheme. By making trade-offs between query performance, scalability and privacy properties, we design different schemes for private shortest path computation. Eventually, we comprehensively evaluate our schemes upon real road networks in a practical environment and show their efficiency.

...read moreread less

28 citations

Journal Article•DOI•

Efficient R-Tree Based Indexing Scheme for Server-Centric Cloud Storage System

[...]

Yang Hong¹, Qiwei Tang¹, Xiaofeng Gao¹, Bin Yao¹, Guihai Chen¹, Shaojie Tang² - Show less +2 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Texas at Dallas²

01 Jun 2016-IEEE Transactions on Knowledge and Data Engineering

TL;DR: It is proved theoretically that RT-HCN is both space-efficient and query-efficient, by which each node actually maintains a tolerable number of global indices while high concurrent queries can be processed within accepted overhead.

...read moreread less

Abstract: Cloud storage system poses new challenges to the community to support efficient concurrent querying tasks for various data-intensive applications, where indices always hold important positions. In this paper, we explore a practical method to construct a two-layer indexing scheme for multi-dimensional data in diverse server-centric cloud storage system. We first propose RT-HCN, an indexing scheme integrating R-tree based indexing structure and HCN-based routing protocol. RT-HCN organizes storage and compute nodes into an HCN overlay, one of the newly proposed sever-centric data center topologies. Based on the properties of HCN, we design a specific index mapping technique to maintain layered global indices and corresponding query processing algorithms to support efficient query tasks. Then, we expand the idea of RT-HCN onto another server-centric data center topology DCell, discovering a potential generalized and feasible way of deploying two-layer indexing schemes on other server-centric networks. Furthermore, we prove theoretically that RT-HCN is both space-efficient and query-efficient, by which each node actually maintains a tolerable number of global indices while high concurrent queries can be processed within accepted overhead. We finally conduct targeted experiments on Amazon's EC2 platforms, comparing our design with RT-CAN, a similar indexing scheme for traditional P2P network. The results validate the query efficiency, especially the speedup of point query of RT-HCN, depicting its potential applicability in future data centers.

...read moreread less

17 citations

Proceedings Article•DOI•

Simba: spatial in-memory big data analysis

[...]

Dong Xie¹, Feifei Li¹, Bin Yao², Gefei Li², Zhongpu Chen², Liang Zhou², Minyi Guo² - Show less +3 more•Institutions (2)

University of Utah¹, Shanghai Jiao Tong University²

31 Oct 2016

TL;DR: The Simba (Spatial In-Memory Big data Analytics) system, which offers scalable and efficient in-memory spatial query processing and analytics for big spatial data, is presented.

...read moreread less

Abstract: We present the Simba (Spatial In-Memory Big data Analytics) system, which offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba natively extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and DataFrame API. It enables the construction of indexes over RDDs inside the engine in order to work with big spatial data and complex spatial operations. Simba also comes with an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput in big spatial data analysis. This demonstration proposal describes key ideas in the design of Simba, and presents a demonstration plan.

...read moreread less

16 citations

Journal Article•DOI•

Exact and approximate flexible aggregate similarity search

[...]

Feifei Li¹, Ke Yi², Yufei Tao³, Bin Yao⁴, Yang Li⁴, Dong Xie⁴, Min Wang - Show less +3 more•Institutions (4)

University of Utah¹, Hong Kong University of Science and Technology², The Chinese University of Hong Kong³, Shanghai Jiao Tong University⁴

01 Jun 2016

TL;DR: This paper proposes an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of M objects in Q for some support, and calls this new definition flexible aggregate similarity search.

...read moreread less

Abstract: Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the similarity is an aggregation (e.g., $${{\mathrm{sum}}}$$sum, $$\max $$max) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of $$\phi M$$?M objects in Q for some support$$0< \phi \le 1$$0

...read moreread less

11 citations

Journal Article•DOI•

SMe: explicit & implicit constrained-space probabilistic threshold range queries for moving objects

[...]

Zhi-Jie Wang¹, Bin Yao¹, Reynold Cheng², Xiaofeng Gao¹, Lei Zou³, Haibing Guan¹, Minyi Guo¹ - Show less +3 more•Institutions (3)

Shanghai Jiao Tong University¹, University of Hong Kong², Peking University³

01 Jan 2016-Geoinformatica

TL;DR: The central idea is to swap the order of geometric operations and to compute the appearance probability in a multi-step manner and to differentiate two forms of CSPTRQs: explicit and implicit ones.

...read moreread less

Abstract: This paper studies the constrained-space probabilistic threshold range query (CSPTRQ) for moving objects, where objects move in a constrained-space (i.e., objects are forbidden to be located in some specific areas), and objects' locations are uncertain. We differentiate two forms of CSPTRQs: explicit and implicit ones. Specifically, for each moving object o, we model its location uncertainty as a closed region, u, together with a probability density function. We also model a query range, R, as an arbitrary polygon. An explicit query can be reduced to a search (over all the u) that returns a set of tuples in form of (o, p) such that p ? pt, where p is the probability of o being located in R, and 0≤pt ≤ 1 is a given probabilistic threshold. In contrast, an implicit query returns only a set of objects (without attaching the specific probability information), whose probabilities being located in R are higher than pt. The CSPTRQ is a variant of the traditional probabilistic threshold range query (PTRQ). As objects moving in a constrained-space are common, clearly, it can also find many applications. At the first sight, our problem can be easily tackled by extending existing methods used to answer the PTRQ. Unfortunately, those classical techniques are not well suitable for our problem, due to a set of new challenges. Another method used to answer the constrained-space probabilistic range query (CSPRQ) can be easily extended to tackle our problem, but a simple adaptation of this method is inefficient, due to its weak pruning/validating capability. To solve our problem, we develop targeted solutions that are easy-to-understand and also easy-to-implement. Our central idea is to swap the order of geometric operations and to compute the appearance probability in a multi-step manner. We demonstrate the efficiency and effectiveness of the proposed methods through extensive experiments. Meanwhile, from the experimental results, we further perceive the difference between explicit and implicit queries; this finding is interesting and also meaningful especially for the topics of other types of probabilistic threshold queries.

...read moreread less

8 citations

Proceedings Article•DOI•

Indexing and Querying A Large Database of Typed Intervals

[...]

Jianqiu Xu¹, Hua Lu, Bin Yao²•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, Shanghai Jiao Tong University²

01 Jan 2016

TL;DR: A new structure to manage typed intervals based on the standard interval tree is developed and an efficient query algorithms are proposed to improve the performance of this solution over alternative methods.

...read moreread less

Abstract: Assume that a database stores a set of intervals, each of which defines start and end points, a weight and a type. Typed intervals enrich the data representation and support applications involving different kinds of data intervals. Given a query time and type, the system reports k intervals that intersect the time, contain the type and have the largest weight. We develop a new structure to manage typed intervals based on the standard interval tree and propose efficient query algorithms. Experiments with synthetic datasets are conducted to verify the performance advantage of our solution over alternative methods.

...read moreread less

3 citations

Showing papers by "Bin Yao published in 2016"