Home
/
Authors
/
Yulai Yuan

Author

Yulai Yuan

Bio: Yulai Yuan is an academic researcher from Tsinghua University. The author has contributed to research in topics: Grid & Grid computing. The author has an hindex of 8, co-authored 11 publications receiving 247 citations.

Topics: Grid, Grid computing, Scheduling (computing), Job scheduler, Data grid ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Adaptive Workload Prediction of Grid Performance in Confidence Windows

[...]

Yongwei Wu¹, Kai Hwang², Yulai Yuan¹, Weiming Zheng¹•Institutions (2)

Tsinghua University¹, University of Southern California²

01 Jul 2010-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A new adaptive hybrid method (AHModel) for load prediction guided by trained confidence windows that was proved especially effective to predict large workload that demands very long execution time, such as exceeding 4 hours on the Grid5000 over 5,000 processors.

...read moreread less

Abstract: Predicting grid performance is a complex task because heterogeneous resource nodes are involved in a distributed environment. Long execution workload on a grid is even harder to predict due to heavy load fluctuations. In this paper, we use Kalman filter to minimize the prediction errors. We apply Savitzky-Golay filter to train a sequence of confidence windows. The purpose is to smooth the prediction process from being disturbed by load fluctuations. We present a new adaptive hybrid method (AHModel) for load prediction guided by trained confidence windows. We test the effectiveness of this new prediction scheme with real-life workload traces on the AuverGrid and Grid5000 in France. Both theoretical and experimental results are reported in this paper. As the lookahead span increases from 10 to 50 steps (5 minutes per step), the AHModel predicts the grid workload with a mean-square error (MSE) of 0.04-0.73 percent, compared with 2.54-30.2 percent in using the static point value autoregression (AR) prediction method. The significant gain in prediction accuracy makes the new model very attractive to predict Grid performance. The model was proved especially effective to predict large workload that demands very long execution time, such as exceeding 4 hours on the Grid5000 over 5,000 processors. With minor changes of some system parameters, the AHModel can apply to other computational grids as well. At the end, we discuss extended research issues and tool development for Grid performance prediction.

...read moreread less

71 citations

Proceedings Article•DOI•

Load prediction using hybrid model for computational grid

[...]

Yongwei Wu¹, Yulai Yuan¹, Guangwen Yang¹, Weimin Zheng¹•Institutions (1)

Tsinghua University¹

19 Sep 2007

TL;DR: A new hybrid model is presented, which predicts the n-step-ahead load status by using interval values and integrates autoregressive (AR) model with confidence interval estimations to forecast the future load of a system.

...read moreread less

Abstract: Due to the dynamic nature of grid environments, schedule algorithms always need assistance of a long-time-ahead load prediction to make decisions on how to use grid resources efficiently. In this paper, we present and evaluate a new hybrid model, which predicts the n-step-ahead load status by using interval values. This model integrates autoregressive (AR) model with confidence interval estimations to forecast the future load of a system. Meanwhile, two filtering technologies from signal processing field are also introduced into this model to eliminate data noise and enhance prediction accuracy. The results of experiments conducted on a real grid environment demonstrate that this new model is more capable of predicting n-step-ahead load in a computational grid than previous works. The proposed hybrid model performs well on prediction advance time for up to 50 minutes, with significant less prediction errors than conventional AR model. It also achieves an interval length acceptable for task scheduler.

...read moreread less

54 citations

Journal Article•DOI•

Job failures in high performance computing systems: A large-scale empirical study

[...]

Yulai Yuan¹, Yongwei Wu¹, Qiuping Wang¹, Guangwen Yang¹, Weimin Zheng¹ - Show less +1 more•Institutions (1)

Tsinghua University¹

01 Jan 2012-Computers & Mathematics With Applications

TL;DR: An empirical study on job failures of 10 public workload data sets collected from 8 large-scale HPCs all over the world finds evidence that failed jobs' lifetime accuracy always follows the ''bathtub curve'' and job failures exhibit strong locality properties that can support the prediction offailed jobs' occurrence and runtime.

...read moreread less

Abstract: The growing complexity and size of High Performance Computing systems (HPCs) lead to frequent job failures, which may cause significant performance degradation. In order to provide high performance and reliable computing services, an in-depth understanding of the characteristics of HPC job failures is essential. In this paper, we present an empirical study on job failures of 10 public workload data sets collected from 8 large-scale HPCs all over the world. Multiple analysis methods are applied to provide a comprehensive and in-depth understanding of job failures. In order to facilitate design, testing and management of HPCs, we study properties of job failures from the following four aspects: proportion in workload and resource consumption, submission inter-arrival time, locality, and runtime. Our analysis results show that job failure rates are significant in most HPCs, and on average, a failed job often consumes more computational resources than a successful job. We also observe that the submission inter-arrival time of failed jobs is better fit by Generalized Pareto and Lognormal distributions, and the probability of failed job submission follows a ''V'' shape: decreasing during the first 100 seconds right after the submission of the last failed job and increasing afterward. The majority of job failures come from a small number of users and applications, and furthermore these users are the primary factor related to job failures compared with these applications. We find evidence that failed jobs' lifetime accuracy (runtime / request time) always follows the ''bathtub curve''. Moreover, job failures exhibit strong locality properties that can support the prediction of failed jobs' occurrence and runtime. Most of these findings are new contributions from the research community, and some findings also reveal important properties of job failures that were misunderstood or poorly understood before. The wide range of studies in this paper can directly and thoroughly facilitate fault tolerant, scheduling, workload modeling, etc. in HPCs, and lead to better system utility while reducing costs.

...read moreread less

30 citations

Proceedings Article•DOI•

Dynamic Data Replication based on Local Optimization Principle in Data Grid

[...]

Yulai Yuan¹, Yongwei Wu¹, Guangwen Yang¹, Feng Yu²•Institutions (2)

Tsinghua University¹, French Institute for Research in Computer Science and Automation²

16 Aug 2007

TL;DR: A new dynamic replication strategy based on the principle of local optimization is proposed, taking into account two important issues which bound the replication: storage capability of different nodes and the bandwidth between these nodes.

...read moreread less

Abstract: Efficient data access is one way of improving the performance of the data grid. In order to speed up the data access and reduce bandwidth consumption, data grid replicates essential data in multiple locations. This paper studies data replication strategy in data grid, taking into account two important issues which bound the replication: storage capability of different nodes and the bandwidth between these nodes. We propose a new dynamic replication strategy based on the principle of local optimization. The data grid can achieve the global data access optimization through the interaction of the local optimization in the local optimization areas.

...read moreread less

29 citations

Proceedings Article•DOI•

Adaptive Hybrid Model for Long Term Load Prediction in Computational Grid

[...]

Yulai Yuan¹, Yongwei Wu¹, Guangwen Yang¹, Weimin Zheng¹•Institutions (1)

Tsinghua University¹

19 May 2008

TL;DR: The results of the experiments demonstrate that the adaptive hybrid model (AHModel) outperforms the widely used autoregressive (AR) model in long term load prediction significantly, and it also achieves obvious reduction in prediction mean square error comparing with HModel which uses fixed parameter value.

...read moreread less

Abstract: Long term load prediction can assist task scheduling and load balancing greatly in distributed environment such as computational grid. Due to the dynamic property of grid environment, fixed-parameter prediction model can not exert its forecast capability completely. In this paper we first observe and analyze parameters' impact on prediction accuracy for our previous long term load prediction hybrid model (HModel) in detail. And then, a parameter-level adaptive method based on previous analysis is proposed in order to make HModel adapt to the time-varying characteristics of load in computational grid. The results of the experiments demonstrate that our adaptive hybrid model (AHModel) outperforms the widely used autoregressive (AR) model in long term load prediction significantly, and it also achieves obvious reduction in prediction mean square error comparing with HModel which uses fixed parameter value.

...read moreread less

25 citations

Cited by

PDF

Open Access

More filters

Book•

Distributed and Cloud Computing: From Parallel Processing to the Internet of Things

[...]

Kai Hwang, Jack Dongarra, Geoffrey C. Fox

31 Oct 2011

TL;DR: This book will teach you how to create high-performance, scalable, reliable systems, providing comprehensive coverage of distributed and cloud computing, including: Facilitating management, debugging, migration, and disaster recovery through virtualization

...read moreread less

Abstract: From the leading minds in the field, Distributed and Cloud Computing is the first modern, up-to-date distributed systems textbook Starting with an overview of modern distributed models, the book exposes the design principles, systems architecture, and innovative applications of parallel, distributed, and cloud computing systems It will teach you how to create high-performance, scalable, reliable systems, providing comprehensive coverage of distributed and cloud computing, including: Facilitating management, debugging, migration, and disaster recovery through virtualization Clustered systems for research or ecommerce applications Designing systems as web services Social networking systems using peer-to-peer computing Principles of cloud computing using examples from open-source and commercial applications Using examples from open-source and commercial vendors, the text describes cloud-based systems for research, e-commerce, social networking and more Complete coverage of modern distributed computing technology including clusters, the grid, service-oriented architecture, massively parallel processors, peer-to-peer networking, and cloud computing Includes case studies from the leading distributed computing vendors: Amazon, Microsoft, Google, and more Designed to meet the needs of students taking a distributed systems course, each chapter includes exercises and further reading, with lecture slides and solutions available online

...read moreread less

307 citations

Journal Article•DOI•

Survey on prediction models of applications for resources provisioning in cloud

[...]

Maryam Amiri¹, Leyli Mohammad-Khanli¹•Institutions (1)

University of Tabriz¹

15 Mar 2017-Journal of Network and Computer Applications

TL;DR: A taxonomy for application prediction models is presented that investigates main characteristics and challenges of the different models and discusses open research issues and future trends of the application prediction.

...read moreread less

168 citations

Proceedings Article•DOI•

Host load prediction in a Google compute cloud with a Bayesian model

[...]

Sheng Di¹, Derrick Kondo¹, Walfredo Cirne²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Google²

10 Nov 2012

TL;DR: A prediction method based on Bayes model to predict themean load over a long-term time interval, as well as the mean load in consecutive future time intervals, and improves the load prediction accuracy by 5.6 -- 50% compared to other state-of-the-art methods based on moving averages, auto-regression, and/or noise filters.

...read moreread less

Abstract: Prediction of host load in Cloud systems is critical for achieving service-level agreements. However, accurate prediction of host load in Clouds is extremely challenging because it fluctuates drastically at small timescales. We design a prediction method based on Bayes model to predict the mean load over a long-term time interval, as well as the mean load in consecutive future time intervals. We identify novel predictive features of host load that capture the expectation, predictability, trends and patterns of host load. We also determine the most effective combinations of these features for prediction. We evaluate our method using a detailed one-month trace of a Google data center with thousands of machines. Experiments show that the Bayes method achieves high accuracy with a mean squared error of 0.0014. Moreover, the Bayes method improves the load prediction accuracy by 5.6 -- 50% compared to other state-of-the-art methods based on moving averages, auto-regression, and/or noise filters.

...read moreread less

151 citations

Journal Article•DOI•

Resource provision algorithms in cloud computing

[...]

Jiangtao Zhang, Hejiao Huang, Xuan Wang

01 Apr 2016-Journal of Network and Computer Applications

TL;DR: The resource provision algorithms are surveyed from a point view of top-do-down according to their objectives and VM placement phase and several tightly related topics, i.e., virtual machine migration, forecast methods, stability and availability are discussed.

...read moreread less

143 citations

Journal Article•DOI•

Multi-objective scheduling of many tasks in cloud platforms

[...]

Fan Zhang¹, Fan Zhang², Junwei Cao¹, Keqin Li³, Samee U. Khan⁴, Kai Hwang⁵ - Show less +2 more•Institutions (5)

Tsinghua University¹, Massachusetts Institute of Technology², State University of New York System³, North Dakota State University⁴, University of Southern California⁵

01 Jul 2014-Future Generation Computer Systems

TL;DR: This work proposes a new method to generate suboptimal or sufficiently good schedules for smooth multitask workflows on cloud platforms and proves the suboptimality through mathematical analysis.

...read moreread less

133 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

Collapse