scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Efficient resource prediction model for small and medium scale cloud data centers

01 Jan 2020-Journal of Intelligent and Fuzzy Systems (IOS Press)-Vol. 39, Iss: 3, pp 4731-4747
TL;DR: A regressive predictive analysis, i.e., multi-output random forest regressor, is proposed to forecast the resource usage and power utilization of virtual machines and shows that the proposed approach yields better predictions than other single-output prediction methods for future resource demand from end users.
Abstract: By leveraging the performance of small and medium-scale data centers (SMSDCs), which are involved in high-performance computing, data centers are central to the current modern industrial business world. Extensive enhancements in the SMSDC infrastructure comprise a diverse set of connected devices that disseminate resources to the end users. The high certainty workloads of end users and over resource provisioning result in high power consumption in SMSDCs, which are pivotal factors contributing to high carbon footprints from SMSDCs. The excessive emission of CO2 is higher in SMSDCs compared with that of hyperscale data centers (HSDCs). An exorbitant amount of electricity is utilized by 8.6 million data centers worldwide, and is expected to increase by up to 13% in 2030. The power requirement of an SMSDC domain is expected to be 5% of the global power production. However, the power consumption of SMSDCs changes annually. To aid SMSDCs, machine learning prediction is deployed. Literature review indicates that many studies have focused on the recurring issues of HSDCs rather than those of SMSDC. Herein, a regressive predictive analysis, i.e., multi-output random forest regressor, is proposed to forecast the resource usage and power utilization of virtual machines. These prediction results in diminishes the power utilization of SMSDC whilst reduces the CO2 emission from SMSDC. The obtained result shows that the proposed approach yields better predictions than other single-output prediction methods for future resource demand from end users.
Citations
More filters
Journal ArticleDOI
TL;DR: A framework for a predictive optimization approach for delivering the data center services to end-users to escalate the resource usage, minimizes the power utilization, and curtail the carbon footprint is proposed.

2 citations

Book ChapterDOI
01 Jan 2010
TL;DR: In Chap.
Abstract: In Chap. 3, we introduced several statistical techniques for the analysis of data, most of which were descriptive or exploratory. But, we also got our first glimpse of another form of statistical analysis known as Inferential Statistics. Inferential statistics is how statisticians use inductive reasoning to move from the specific, the data contained in a sample, to the general, inferring characteristics of the population from which the sample was taken.

1 citations

Proceedings ArticleDOI
01 Aug 2022
TL;DR: In this article , the authors adopt a neural network model to solve the problem of resource prediction in the cloud environment, which is nonlinear, non-stationary, noisy, and uncertain.
Abstract: With the rapid development of cloud computing and data centers, the demand for cloud computing resources is constantly changing, and resource allocation has gradually become one of the most challenging problems in the cloud environment. Oversupply increases energy waste and costs, while undersupply, on the other hand, leads to violations of Service Level Agreements(SLAs) and reduces Quality of Service(QoS). Therefore, in order to cope with the changing cloud resource requirements, resource prediction models are required to provide reasonable accuracy. Traditional statistical models are simple in structure, and most of them can only deal with stationary series. In view of the relatively complex sequence of resource occupation in the cloud environment, which is nonlinear, non-stationary, noisy, and uncertain, it is difficult to capture its stable state. In this paper, we adopt a neural network model to solve this problem. First, the sequence is decomposed by variational mode decomposition(VMD), and multiple sub-sequences are preprocessed by sliding windows. The Gated Recurrent Unit(GRU) based on attention mechanism is used to preliminarily obtain the data characteristic information, and global features are extracted with Temporal Convolutional Network(TCN) by dilation convolution. Then, the comprehensive feature information extracted from the two networks with different structures is fused to predict the resource occupancy rate at a future time. For three datasets with different characteristics, the model reduces the value of the MAPE indicator by 0.442%, 4.764% and 2.331%. Experimental results show that the proposed method improves the accuracy of resource prediction, and is superior to existing models in MAPE, RMSE, and R2 evaluation criteria.
Proceedings ArticleDOI
01 Aug 2022
TL;DR: In this paper , the authors adopt a neural network model to solve the problem of resource prediction in the cloud environment, which is nonlinear, non-stationary, noisy, and uncertain.
Abstract: With the rapid development of cloud computing and data centers, the demand for cloud computing resources is constantly changing, and resource allocation has gradually become one of the most challenging problems in the cloud environment. Oversupply increases energy waste and costs, while undersupply, on the other hand, leads to violations of Service Level Agreements(SLAs) and reduces Quality of Service(QoS). Therefore, in order to cope with the changing cloud resource requirements, resource prediction models are required to provide reasonable accuracy. Traditional statistical models are simple in structure, and most of them can only deal with stationary series. In view of the relatively complex sequence of resource occupation in the cloud environment, which is nonlinear, non-stationary, noisy, and uncertain, it is difficult to capture its stable state. In this paper, we adopt a neural network model to solve this problem. First, the sequence is decomposed by variational mode decomposition(VMD), and multiple sub-sequences are preprocessed by sliding windows. The Gated Recurrent Unit(GRU) based on attention mechanism is used to preliminarily obtain the data characteristic information, and global features are extracted with Temporal Convolutional Network(TCN) by dilation convolution. Then, the comprehensive feature information extracted from the two networks with different structures is fused to predict the resource occupancy rate at a future time. For three datasets with different characteristics, the model reduces the value of the MAPE indicator by 0.442%, 4.764% and 2.331%. Experimental results show that the proposed method improves the accuracy of resource prediction, and is superior to existing models in MAPE, RMSE, and R2 evaluation criteria.
Journal ArticleDOI
TL;DR: In this article , the authors proposed a framework for a predictive optimization approach for delivering the data center services to end-users, where the Multi-Output (MO) Random Forest Regressor (RFR) concurrently predicts the multiple-resource utilization of Virtual Machines (VMs).
Abstract: The overall development of the cloud paradigm is dominating omnipresence in the industry 4.0 business world. Over the last decade, the control measures for power utilization among the proliferative Hyper-Scale Data Centers (HSDCs) have been elucidated. However, the lack of attention to regulating power in Small and Medium-Scale Data Centers (SMSDCs) has ensued in excessive power drainage in small and medium-scale cloud data centers. The crucial factor for excessive power utilization of SMSDCs encompasses providing excessive resources, high certainty tasks. Majority of the previously reported studies zeroed-in on problems associated with hyper-scale data centers, excluding probes of the issues prevalent in small and medium-scale cloud data centers. This paper proffers a framework for a predictive optimization approach for delivering the data center services to end-users. In the first phase, the Multi-Output (MO) Random Forest Regressor (RFR) (MO-RFR) concurrently predicts the multiple-resource utilization of Virtual Machines (VMs). The predictive framework outcome was utilized by the Multi-Objective Particle Swarm Optimization (MO-PSO) framework in the second phase to resolve the issue in virtual machine placement and to accomplish better physical machine consolidation. The proposed multi-prediction-based MO-PSO to escalate the resource usage, minimizes the power utilization, and curtail the carbon footprint. The efficacy of the proposed approach was appraised via performance metrics and actual workload traces. The acquired result from the proposed method outperforms the baseline approaches.
References
More filters
Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations

Book
01 Jan 1975
TL;DR: In this article, the Mathematical Basis for Multiple Regression/Correlation and Identification of the Inverse Matrix Elements is presented. But it does not address the problem of missing data.
Abstract: Contents: Preface. Introduction. Bivariate Correlation and Regression. Multiple Regression/Correlation With Two or More Independent Variables. Data Visualization, Exploration, and Assumption Checking: Diagnosing and Solving Regression Problems I. Data-Analytic Strategies Using Multiple Regression/Correlation. Quantitative Scales, Curvilinear Relationships, and Transformations. Interactions Among Continuous Variables. Categorical or Nominal Independent Variables. Interactions With Categorical Variables. Outliers and Multicollinearity: Diagnosing and Solving Regression Problems II. Missing Data. Multiple Regression/Correlation and Causal Models. Alternative Regression Models: Logistic, Poisson Regression, and the Generalized Linear Model. Random Coefficient Regression and Multilevel Models. Longitudinal Regression Methods. Multiple Dependent Variables: Set Correlation. Appendices: The Mathematical Basis for Multiple Regression/Correlation and Identification of the Inverse Matrix Elements. Determination of the Inverse Matrix and Applications Thereof.

29,764 citations

Journal ArticleDOI
TL;DR: Algorithmic models have been widely used in fields outside statistics as discussed by the authors, both in theory and practice, and can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets.
Abstract: There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

2,948 citations

Journal ArticleDOI
TL;DR: Various classification algorithms and the recent attempt for improving classification accuracy—ensembles of classifiers are described.
Abstract: Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large number of techniques have been developed based on Artificial Intelligence (Logic-based techniques, Perceptron-based techniques) and Statistics (Bayesian Networks, Instance-based techniques). The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various classification algorithms and the recent attempt for improving classification accuracy--ensembles of classifiers.

1,127 citations

Journal ArticleDOI
TL;DR: This study discusses several frequently-used evaluation measures for feature selection, and surveys supervised, unsupervised, and semi-supervised feature selection methods, which are widely applied in machine learning problems, such as classification and clustering.

1,057 citations