Showing papers by "Yongrui Qin published in 2013"

PDF

Open Access

Proceedings Article•DOI•

OPTAS: Optimal Data Placement in MapReduce

[...]

Changjian Wang¹, Yongrui Qin², Zhen Huang¹, Yuxing Peng¹, Dongsheng Li¹, Huiba Li¹ - Show less +2 more•Institutions (2)

National University of Defense Technology¹, University of Adelaide²

15 Dec 2013

TL;DR: This work proposes a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time and proves that the global optimal DPI can be found as the first local optimal D PI whose total time stops decreasing, thus further pruning the search space.

...read moreread less

Abstract: The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40% compared with the existing strategy used by MapReduce.

...read moreread less

5 citations

Book Chapter•DOI•

Effectively Delivering XML Information in Periodic Broadcast Environments

[...]

Yongrui Qin¹, Quan Z. Sheng¹, Muntazir Mehdi², Hua Wang³, Dong Xie⁴ - Show less +1 more•Institutions (4)

University of Adelaide¹, Kaiserslautern University of Technology², University of Southern Queensland³, Hunan University of Humanities, Science and Technology⁴

26 Aug 2013

TL;DR: The proposed placement algorithm is validated through a set of experiments and the results show that the algorithm can effectively place XML data on air and significantly improve the overall access efficiency.

...read moreread less

Abstract: Existing data placement algorithms for wireless data broadcast generally make assumptions that the clients' queries are already known and the distribution of access frequencies of their queries can be obtained a priori. Unfortunately, these assumptions are not realistic in most real life applications because new mobile clients may join in anytime and clients may be reluctant to disclose their queries due to privacy concerns. In this paper, we study the data placement problem of periodic XML data broadcast in mobile wireless environments. This is an important issue, particularly when XML becomes prevalent in today's ubiquitous Web and mobile computing devices. Taking advantage of the structured characteristics of XML data, we are able to generate effective broadcast programs based purely on XML data on the server without any knowledge of the clients' access patterns. This not only makes our work distinguished from previous studies, but also enables it to have broader applicability. We discuss structural sharing in XML data which forms the basis of our novel data placement algorithm. The proposed placement algorithm is validated through a set of experiments and the results show that our algorithm can effectively place XML data on air and significantly improve the overall access efficiency.

...read moreread less

4 citations

Book Chapter•DOI•

A framework for processing uncertain RFID data in supply chain management

[...]

Dong Xie¹, Quan Z. Sheng², Jiangang Ma², Yun Cheng¹, Yongrui Qin², Rui Zeng³ - Show less +2 more•Institutions (3)

Hunan University of Humanities, Science and Technology¹, University of Adelaide², Yunnan Normal University³

13 Oct 2013

TL;DR: A comprehensive data model is proposed, which is suitable for a wide range of application scenarios, and a path coding scheme is proposed to significantly compress massive data by aggregating the path sequences, the positions and the time intervals.

...read moreread less

Abstract: Radio Frequency Identification (RFID) is widely used to track and trace objects in supply chain management. However, massive uncertain data produced by RFID readers are not suitable for directly use in RFID applications. Following our thorough analysis of key features of RFID objects, this paper proposes a new framework for effectively and efficiently processing uncertain RFID data, and supporting a variety of queries for tracking and tracing RFID objects. In particular, we propose an adaptive cleaning method by adjusting size of smoothing window according to various rates of uncertain data, employing different strategies to process uncertain readings, and distinguishing different types of uncertain data according to their appearing positions. We propose a comprehensive data model, which is suitable for a wide range of application scenarios. In addition, a path coding scheme is proposed to significantly compress massive data by aggregating the path sequences, the positions and the time intervals. Experimental evaluations show that our approach is effective and efficient in terms of the compression and traceability queries.

...read moreread less

3 citations

Journal Article•DOI•

Maximum Variance Hashing via Column Generation

[...]

Lei Luo¹, Chao Zhang², Yongrui Qin³, Chunyuan Zhang¹•Institutions (3)

National University of Defense Technology¹, Beijing Institute of Technology², University of Adelaide³

15 May 2013-Mathematical Problems in Engineering

TL;DR: A novel unsupervised hashing method, named maximum variance hashing, is proposed, which aims to maximize the total variance of the hash codes while preserving the local structure of the training data and is extended using anchor graphs to reduce the computational cost.

...read moreread less

Abstract: With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.

...read moreread less

2 citations

Proceedings Article•DOI•

Demo: approximate semantic matching in the collider event processing engine

[...]

Souleiman Hasan¹, Kalpa Gunaratna², Yongrui Qin³, Edward Curry¹•Institutions (3)

National University of Ireland, Galway¹, Wright State University², University of Adelaide³

29 Jun 2013

TL;DR: This demo presents a use case from the energy management domain and compares two semantic matching scenarios: exact and approximate, and illustrates how a large number of exact matching event subscriptions are needed to match heterogeneous power consumption events.

...read moreread less

Abstract: This demo presents a use case from the energy management domain. It builds upon previous work on approximate semantic matching of heterogeneous events and compares two semantic matching scenarios: exact and approximate. It illustrates how a large number of exact matching event subscriptions are needed to match heterogeneous power consumption events. It then demonstrates how a small number of approximate semantic matching subscriptions are needed but possibly with a lower true positives/negatives performance. The demo is delivered via the COLLIDER approximate event processing engine currently under development in DERI.

...read moreread less

1 citations