Showing papers by "Bin Yao published in 2015"

PDF

Open Access

Journal Article•DOI•

OFScheduler: A Dynamic Network Optimizer for MapReduce in Heterogeneous Cluster

[...]

Zhao Li¹, Yao Shen¹, Bin Yao¹, Minyi Guo¹•Institutions (1)

01 Jun 2015-International Journal of Parallel Programming

TL;DR: A new dynamic network optimizer called OFScheduler for heterogeneous clusters to relieve the network traffic during the execution of MapReduce jobs to reduce bandwith competition, balancing the workload of network links and increasing bandwidth utilization is proposed.

...read moreread less

Abstract: MapReduce is a popular programming paradigm in cloud computing due to its excellent scalability for processing large-scale data. However, MapReduce performs poorly in heterogeneous clusters. One of the reasons is that Hadoop's built-in load balancing algorithm for Map function leads to excessive network traffic. We propose a new dynamic network optimizer called OFScheduler for heterogeneous clusters to relieve the network traffic during the execution of MapReduce jobs. The optimizer focuses on reducing bandwith competition, balancing the workload of network links and increasing bandwidth utilization. The proposed optimizer tags different types of traffic and utilize the Openflow to adjust transfers of flows dynamically. We instantiate a simulator and an OpenFlow testbed for evaluation. The simulation results demonstrate that the proposed optimizer has a significant effect on increasing bandwidth utilization and improving the performance of MapReduce by 24 ~ 63 % for most of jobs in a multi-path heterogeneous cluster. The experiment results show that the proposed optimizer can be deployed into a real environment.

...read moreread less

27 citations

Proceedings Article•DOI•

Cowic: a column-wise independent compression for log stream analysis

[...]

Hao Lin¹, Jingyu Zhou¹, Bin Yao¹, Minyi Guo¹, Jie Li¹ - Show less +1 more•Institutions (1)

Shanghai Jiao Tong University¹

04 May 2015

TL;DR: This work proposes a column-wise compression approach for well-formatted log streams, where each log entry can be independently compressed or decompressed for analysis, and shows that this scheme outperforms traditional compression methods for decompression times and has a competitive compression ratio.

...read moreread less

Abstract: Nowadays massive log streams are generated from many Internet and cloud services. Storing log streams consumes a large amount of disk space and incurs high cost. Traditional compression methods can be applied to reduce storage cost, but are inefficient for log analysis, because fetching relevant log entries from compressed data often requires retrieval and decompression of large blocks of data. We propose a column-wise compression approach for well-formatted log streams, where each log entry can be independently compressed or decompressed for analysis. Specifically, we separate a log entry into several columns and compress each column with different models. We have implemented our approach as a library and integrated it into two applications, a log search system and a log joining system. Experimental results show that our compression scheme outperforms traditional compression methods for decompression times and has a competitive compression ratio. For log search, our approach achieves better query times than using traditional compression algorithms for both in-core and out-of-core cases. For joining log streams, our approach achieves the same join quality with only 30% memory of uncompressed streams.

...read moreread less

19 citations

Journal Article•DOI•

Efficient $$k$$k-closest pair queries in general metric spaces

[...]

Yunjun Gao¹, Lu Chen¹, Xinhan Li¹, Bin Yao², Gang Chen¹ - Show less +1 more•Institutions (2)

Zhejiang University¹, Shanghai Jiao Tong University²

01 Jun 2015

TL;DR: This paper studies the problem of kCP query processing in general metric spaces, namely Metric kCP (MkCP)search, and proposes several efficient algorithms using dynamic disk-based metric indexes, and derives a node-based cost model for MkCP retrieval.

...read moreread less

Abstract: Given two object sets $$P$$P and $$Q$$Q, a k-closest pair$$(k\hbox {CP})$$(kCP)query finds $$k$$k closest object pairs from $$P\times Q$$P×Q This operation is common in many real-life applications such as GIS, data mining, and recommender systems Although it has received much attention in the Euclidean space, there is little prior work on the metric space In this paper, we study the problem of kCP query processing in general metric spaces, namely Metric kCP$$(\hbox {M}k\hbox {CP})$$(MkCP)search, and propose several efficient algorithms using dynamic disk-based metric indexes (eg, M-tree), which can be applied to arbitrary type of data as long as a certain metric distance is defined and satisfies the triangle inequality Our approaches follow depth-first and/or best-first traversal paradigm(s), employ effective pruning rules based on metric space properties and the counting information preserved in the metric index, take advantage of aggressive pruning and compensation to further boost query efficiency, and derive a node-based cost model for $$\hbox {M}k\hbox {CP}$$MkCP retrieval In addition, we extend our techniques to tackle two interesting variants of $$\hbox {M}k\hbox {CP}$$MkCP queries Extensive experiments with both real and synthetic data sets demonstrate the performance of our proposed algorithms, the effectiveness of our developed pruning rules, and the accuracy of our presented cost model

...read moreread less

17 citations

Journal Article•DOI•

Probabilistic Range Query over Uncertain Moving Objects in Constrained Two-Dimensional Space

[...]

Zhi-Jie Wang¹, Dong-Hua Wang, Bin Yao¹, Minyi Guo¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Mar 2015-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper studies the PRQ over objects moving in a constrained 2D space where objects are forbidden to be located in some specific areas and uses a strategy called pre-approximation to reduce the initial problem to a highly simplified version, implying that it makes the rest of steps easy to tackle.

...read moreread less

Abstract: Probabilistic range query (PRQ) over uncertain moving objects has attracted much attentions in recent years. Most of existing works focus on the PRQ for objects moving freely in two-dimensional (2D) space. In contrast, this paper studies the PRQ over objects moving in a constrained 2D space where objects are forbidden to be located in some specific areas. We dub it the constrained space probabilistic range query (CSPRQ). We analyze its unique properties and show that to process the CSPRQ using a straightforward solution is infeasible. The key idea of our solution is to use a strategy called pre-approximation that can reduce the initial problem to a highly simplified version, implying that it makes the rest of steps easy to tackle. In particular, this strategy itself is pretty simple and easy to implement. Furthermore, motivated by the cost analysis, we further optimize our solution. The optimizations are mainly based on two insights: (i) the number of effective subdivision s is no more than 1; and (ii) an entity with the larger span is more likely to subdivide a single region. We demonstrate the effectiveness and efficiency of our proposed approaches through extensive experiments under various experimental settings, and highlight an extra finding—the precomputation based method suffers a non-trivial preprocessing time, which offers an important indication sign for the future research.

...read moreread less

16 citations

Proceedings Article•DOI•

Fast Proof Generation for Verifying Cloud Search

[...]

Jingyu Zhou¹, Jiannong Cao¹, Bin Yao¹, Minyi Guo¹•Institutions (1)

Shanghai Jiao Tong University¹

25 May 2015

TL;DR: This work proposes a hybrid approach for generating proofs of cloud search results, which model search indices as sets and search operations as set intersections, and build proofs based on RSA accumulators and aggregated membership and no membership witnesses.

...read moreread less

Abstract: As cloud computing has become prominent, the need for searching cloud data has grown increasingly urgent. However, cloud search may be incorrect due to errors of cloud providers and attacks from other malicious tenants. Previous work on verifiable computing returns results with probabilistically checkable proofs, which targets at different applications other than search and requires a large computation overhead. We propose a hybrid approach for generating proofs of cloud search results. Specifically, we model search indices as sets and search operations as set intersections, and build proofs based on RSA accumulators and aggregated membership and no membership witnesses. Because generating witnesses for large sets is computationally expensive, we employ interval-based witnesses for fast proof generation. To reduce proof size, our hybrid method uses Bloom filters when set difference is large. Evaluation on real datasets shows that our hybrid approach generates proofs in an average of 0.197s, up to 83.2% faster than previous work with a smaller proof size. Experiments also show our approach allows incremental updates with constant cost.

...read moreread less

2 citations

Patent•

Spatial data-based method of safety range query

[...]

Guo Minyi, Bin Yao, Shen Yao, Xie Dingxing, Zhou Jingyu, Xue Guangtao - Show less +2 more

02 Sep 2015

TL;DR: In this paper, a spatial data-based method of safety range query is characterized in that when a client acquires an ID (identifier) from a server, the ID is decrypted and is re-encrypted when the client returns data of the server.

...read moreread less

Abstract: A spatial data-based method of safety range query is characterized in that when a client acquires an ID (identifier) from a server, the ID is decrypted and is re-encrypted when the client returns data of the server. The method has the advantages that query efficiency is ensured, data encryption is implemented, a data access mode is hidden and protected, and the risk of information leakage is greatly decreased.

...read moreread less