scispace - formally typeset
Search or ask a question

Showing papers by "Yongrui Qin published in 2013"


Proceedings ArticleDOI
15 Dec 2013
TL;DR: This work proposes a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time and proves that the global optimal DPI can be found as the first local optimal D PI whose total time stops decreasing, thus further pruning the search space.
Abstract: The data placement strategy greatly affects the efficiency of MapReduce. The current strategy only takes the map phase into account to optimize the map time. But the ignored shuffle phase may increase the total running time significantly in many jobs. We propose a new data placement strategy, named OPTAS, which optimizes both the map and shuffle phases to reduce their total time. However, the huge search space makes it difficult to find out an optimal data placement instance (DPI) rapidly. To address this problem, an algorithm is proposed which can prune most of the search space and find out an optimal result quickly. The search space firstly is segmented in ascending order according to the potential map time. Within each segment, we propose an efficient method to construct a local optimal DPI with the minimal total time of both the map and shuffle phases. To find the global optimal DPI, we scan the local optimal DPIs in order. We have proven that the global optimal DPI can be found as the first local optimal DPI whose total time stops decreasing, thus further pruning the search space. In practice, we find that at most fourteen local optimal DPIs are scanned in tens of thousands of segments with the pruning strategy. Extensive experiments with real trace data verify not only the theoretic analysis of our pruning strategy and construction method but also the optimality of OPTAS. The best improvements obtained in our experiments can be over 40% compared with the existing strategy used by MapReduce.

5 citations


Book ChapterDOI
26 Aug 2013
TL;DR: The proposed placement algorithm is validated through a set of experiments and the results show that the algorithm can effectively place XML data on air and significantly improve the overall access efficiency.
Abstract: Existing data placement algorithms for wireless data broadcast generally make assumptions that the clients' queries are already known and the distribution of access frequencies of their queries can be obtained a priori. Unfortunately, these assumptions are not realistic in most real life applications because new mobile clients may join in anytime and clients may be reluctant to disclose their queries due to privacy concerns. In this paper, we study the data placement problem of periodic XML data broadcast in mobile wireless environments. This is an important issue, particularly when XML becomes prevalent in today's ubiquitous Web and mobile computing devices. Taking advantage of the structured characteristics of XML data, we are able to generate effective broadcast programs based purely on XML data on the server without any knowledge of the clients' access patterns. This not only makes our work distinguished from previous studies, but also enables it to have broader applicability. We discuss structural sharing in XML data which forms the basis of our novel data placement algorithm. The proposed placement algorithm is validated through a set of experiments and the results show that our algorithm can effectively place XML data on air and significantly improve the overall access efficiency.

4 citations


Book ChapterDOI
13 Oct 2013
TL;DR: A comprehensive data model is proposed, which is suitable for a wide range of application scenarios, and a path coding scheme is proposed to significantly compress massive data by aggregating the path sequences, the positions and the time intervals.
Abstract: Radio Frequency Identification (RFID) is widely used to track and trace objects in supply chain management. However, massive uncertain data produced by RFID readers are not suitable for directly use in RFID applications. Following our thorough analysis of key features of RFID objects, this paper proposes a new framework for effectively and efficiently processing uncertain RFID data, and supporting a variety of queries for tracking and tracing RFID objects. In particular, we propose an adaptive cleaning method by adjusting size of smoothing window according to various rates of uncertain data, employing different strategies to process uncertain readings, and distinguishing different types of uncertain data according to their appearing positions. We propose a comprehensive data model, which is suitable for a wide range of application scenarios. In addition, a path coding scheme is proposed to significantly compress massive data by aggregating the path sequences, the positions and the time intervals. Experimental evaluations show that our approach is effective and efficient in terms of the compression and traceability queries.

3 citations


Journal ArticleDOI
TL;DR: A novel unsupervised hashing method, named maximum variance hashing, is proposed, which aims to maximize the total variance of the hash codes while preserving the local structure of the training data and is extended using anchor graphs to reduce the computational cost.
Abstract: With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.

2 citations


Proceedings ArticleDOI
29 Jun 2013
TL;DR: This demo presents a use case from the energy management domain and compares two semantic matching scenarios: exact and approximate, and illustrates how a large number of exact matching event subscriptions are needed to match heterogeneous power consumption events.
Abstract: This demo presents a use case from the energy management domain. It builds upon previous work on approximate semantic matching of heterogeneous events and compares two semantic matching scenarios: exact and approximate. It illustrates how a large number of exact matching event subscriptions are needed to match heterogeneous power consumption events. It then demonstrates how a small number of approximate semantic matching subscriptions are needed but possibly with a lower true positives/negatives performance. The demo is delivered via the COLLIDER approximate event processing engine currently under development in DERI.

1 citations