scispace - formally typeset
Search or ask a question

Showing papers by "Marcus Fontoura published in 2017"


Proceedings ArticleDOI
14 Oct 2017
TL;DR: An extensive characterization of Microsoft Azure's VM workload, including distributions of the VMs' lifetime, deployment size, and resource consumption is introduced, and Resource Central, a system that collects VM telemetry, learns these behaviors offline, and provides predictions online to various resource managers via a general client-side library is introduced.
Abstract: Cloud research to date has lacked data on the characteristics of the production virtual machine (VM) workloads of large cloud providers. A thorough understanding of these characteristics can inform the providers' resource management systems, e.g. VM scheduler, power manager, server health manager. In this paper, we first introduce an extensive characterization of Microsoft Azure's VM workload, including distributions of the VMs' lifetime, deployment size, and resource consumption. We then show that certain VM behaviors are fairly consistent over multiple lifetimes, i.e. history is an accurate predictor of future behavior. Based on this observation, we next introduce Resource Central (RC), a system that collects VM telemetry, learns these behaviors offline, and provides predictions online to various resource managers via a general client-side library. As an example of RC's online use, we modify Azure's VM scheduler to leverage predictions in oversubscribing servers (with oversubscribable VM types), while retaining high VM performance. Using real VM traces, we then show that the prediction-informed schedules increase utilization and prevent physical resource exhaustion. We conclude that providers can exploit their workloads' characteristics and machine learning to improve resource management substantially.

479 citations


Patent
03 Feb 2017
TL;DR: In this article, a system receives a request to deploy a virtual machine on a node from a plurality of nodes running the plurality of virtual machines in a cloud computing system, and the system selects one of the nodes having a hard disk drive (HDD) input output output operations per second (IOPS) value less than an observed HDD IOPS value for the node.
Abstract: A system receives a request to deploy a virtual machine on a node from a plurality of nodes running a plurality of virtual machines in a cloud computing system. The system selects one of the plurality of nodes having a hard disk drive (HDD) input output operations per second (IOPS) value less than an observed HDD IOPS value for the plurality of nodes running the plurality of virtual machines. The system receives a predicted HDD IOPS value for the virtual machine and determines a new HDD IOPS value for the selected node based on the HDD IOPS value for the selected node and the predicted HDD IOPS value for the virtual machine. The system instantiates the virtual machine on the selected node when the new HDD IOPS value for the selected node is less than or equal to the observed HDD IOPS value for the plurality of nodes.

7 citations