scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Managing energy and server resources in hosting centers

21 Oct 2001-Vol. 35, Iss: 5, pp 103-116
TL;DR: Experimental results from a prototype confirm that the system adapts to offered load and resource availability, and can reduce server energy usage by 29% or more for a typical Web workload.
Abstract: Internet hosting centers serve multiple service sites from a common hardware base. This paper presents the design and implementation of an architecture for resource management in a hosting center operating system, with an emphasis on energy as a driving resource management issue for large server clusters. The goals are to provision server resources for co-hosted services in a way that automatically adapts to offered load, improve the energy efficiency of server clusters by dynamically resizing the active server set, and respond to power supply disruptions or thermal events by degrading service in accordance with negotiated Service Level Agreements (SLAs).Our system is based on an economic approach to managing shared server resources, in which services "bid" for resources as a function of delivered performance. The system continuously monitors load and plans resource allotments by estimating the value of their effects on service performance. A greedy resource allocation algorithm adjusts resource prices to balance supply and demand, allocating resources to their most efficient use. A reconfigurable server switching infrastructure directs request traffic to the servers assigned to each service. Experimental results from a prototype confirm that the system adapts to offered load and resource availability, and can reduce server energy usage by 29% or more for a typical Web workload.

Summary (4 min read)

1. INTRODUCTION

  • Hosting centers face familiar operating system challenges common to any shared computing resource.
  • Several research projects address these challenges for networks of servers; Section 7 surveys related research.
  • This paper investigates the policies for allocating resources in a hosting center, with a principal focus on energy management.
  • Section 5 outlines the Muse prototype, and Section 6 presents experimental results.
  • Section 7 sets their approach in context with related work.

3. OVERVIEW OF MUSE

  • Switches dynamically redirect incoming request traffic to eligible servers.
  • Each hosted service appears to external clients as a single virtual server, whose power grows and shrinks with request load and available resources.
  • The "brain" of the hosting center OS is a policy service that dynamically reallocates server resources and reconfigures the network to respond to variations in observed load, resource availability, and service value.

3.2 Redirecting Switches

  • The authors system uses reconfigurable server switches as a mechanism to support the resource assignments planned by the executive.
  • Muse switches maintain an active set of servers selected to serve requests for each network-addressable service.
  • The switches are dynamically reconfigurable to change the active set for each service.
  • Since servers may be shared, the active sets of different services may overlap.
  • The switches may use load status to balance incoming request traffic across the servers in the active set.

3.4 Energy-Conscious Provisioning

  • One potential concern with this approach is that power transitions (e.g., disk spin-down) may reduce the lifetime of disks on servers that store data locally, although it may extend the lifetime of other components.
  • One solution is to keep servers stateless and leave the network storage tier powered.
  • A second concern is that power transitions impose a time lag ranging from several seconds to a minute.

4. THE RESOURCE ECONOMY

  • The executive's goal of maximizing resource efficiency corresponds to maximizing profit in this economic formulation.
  • Crucially, customer utility is defined in terms of delivered performance, e.g., as described in Section 4.1 below.
  • The number of resource units consumed to achieve that performance is known internally to the system, which plans its resource assignments by estimating the the value of the resulting performance effects as described in Section 4.3.
  • Section 4.4 addresses the problem of obtaining stable and responsive load estimates from bursty measures.

4.1 Bids and Penalties

  • Since utilization is an excellent predictor of service quality , the SLA can specify penalties in a direct and general way by defining a maximum target utilization level p~,~rg~ for the allotted resource.
  • Suppose that #i is the portion of its allotment #i that customer i uses during some interval, which is easily mea-sured.
  • The penaltyi function could increase with the degree of the shortfall rl/#i, or with the resulting queuing delays or stretch factor [47] .
  • The center must balance the revenue increase from overbooking against the risk of incurring a penalty.
  • Similar tradeoffs are made for seat reservations on American Airlines [37] .

4.3 Estimating Performance Effects

  • The optimization problem is in general nonlinear unless the utility functions are also linear.
  • It is interesting to note that linear utility functions correspond almost directly to a priority-based scheme, with a customer's priority given by the slope of its utility function scaled by its average per-request resource demand.
  • The relative priority of a customer declines as its allotment approaches its full resource demand; this property allocates resources fairly among customers with equivalent priority.

4.4 Feedback and Stability

  • Fortunately, persistent load swells and troughs caused by shifts in population tend to occur on the scale of hours rather than seconds, although service popularity may increase rapidly due to advertis- ).
  • Even during steady load, this bursty signal varies by as much as 40% between samples.
  • For comparison, the authors plot the output from an EWMA filter with a heavily damped a = 7/8, the filter used to estimate TCP round-trip time [21] .
  • The flop-flip filter is less agile than the EWMA or flip-flop; its signal is shifted slightly to the right on the graph.
  • The step effect reduces the number of unproductive reallocations in the executive, and yields stable, responsive behavior in their environment.

4.5 Pricing

  • An important limitation of their framework as defined here is that customers do not respond to the price signals by adjusting their bids or switching suppliers as the resource congestion level varies.
  • Thus, the system is "economic" but not "microeconomic".
  • Utility functions are revealed in full to the supplier and the bids are "sealed", i.e., they are not known to other customers.
  • While this allows a computationally efficient "market", there is no meaningful competition among customers or suppliers.
  • The authors approach could incorporate competition by allowing customers to change their bid functions in real time; if the new utility function meets the concavity assumption then the system will quickly converge on a new utility-maximizing resource assignment.

4.6 Multiple Resources

  • The authors approach could extend to manage multiple resources given support in the server node OS for enforcing assignments of disk, memory, and network bandwidth [44, 40, 30] .
  • Economic problems involving multiple complementary goods are often intractable, but the problem is simplified in Muse because customer utility functions specify value indirectly in terms of delivered performance; the resource allotment to achieve that performance need not be visible to customers.
  • Only adjustments to the bottleneck resource (the resource class with the highest utilization) are likely to affect performance for services with many concurrent requests.
  • Once the bottleneck resource is provisioned correctly, other resources may be safely reclaimed to bring their utilization down to ptarget with-OUt affecting estimated performance or revenue.

5. PROTOTYPE

  • The Muse prototype includes a user-level executive server and two loadable kernel modules for the FreeBSD operating system, implementing a host-based redirecting server switch and load monitoring extensions for the servers.
  • In addition, the prototype uses the Resource Containers kernel modules from Rice University [10, 8] as a mechanism to allocate resources to service classes on individual servers.
  • The following subsections discuss key elements of the prototype in more detail.

5.1 Monitoring and Estimation

  • If a service is not reaching the target utilization level for its resource allotment (pi < ptarget), but smoothed queue lengths exceed a threshold, then the monitor assumes that the service is I/O bound.
  • The authors prototype adapts by dropping the ptarget for that service to give it a larger share of each node and avoid contention for I/O.
  • Once the ptarget is reached, the system gradually raises it back to the default, unless more queued requests appear.
  • Queued requests may also result from transient load bursts or sluggish action by the executive to increase the allotment when load is in an uptrend.
  • The adjustment serves to temporarily make the executive more aggressive.

5.2 The Executive

  • The cluster adapts only when the executive is functioning, but servers and switches run autonomously if the executive fails.
  • The executive may be restarted at any time; it observes the cluster to obtain adequately smoothed measures before taking action.

5.3 The Request Redirector

  • The redirector also gathers load information (packet and connection counts) from the redirected connections, and maintains active server sets for each registered service endpoint.
  • The active sets are lists of (lPaddr, TCPport) pairs, allowing sharing of an individual server by multiple services.
  • Active set membership is controlled by the executive and its actuator, which connects to the redirector through a TCP socket and issues commands using a simple protocol to add or remove servers.
  • This allows the executive to dynamically recortfigure the sets for each service, for example, to direct request traffic away from retired (off-power) servers and toward recruited servers.

6. EXPERIMENTAL RESULTS

  • This section presents experimental results from the Muse prototype to show the behavior of dynamic server resource management.
  • The authors experiments use a combination of synthetic traffic and real request traces.
  • SURGE generates highly bursty traffic characteristic of Web workloads, with heavy-tailed object size distributions so that per-request service demand is highly variable.
  • These factors stress the prototype's load estimation and resource allocation algorithms.
  • Also, the authors can generate synthetic load "waves" with any amplitude and period by modulating the number of generators.

6.2 Allocation Under Constraint

  • The second experiment in Figure 9 is identical to the first, except that the utility functions are reversed so that the fixed sO bids higher than sl.
  • The behavior is similar when there are sufficient resources to handle the load.
  • During the sl swells the executive preserves the allocation for the higher-bidding sO, doling out most of the remaining resources to sl.

7. R E L A T E D W O R K

  • Previous research on power management focuses on mobile systems, which are battery-constrained.
  • One aspect the authors have not investigated in the server context is the role of application-specific adaptation to resource constraints [31, 17, 30, 2] .
  • Most approaches used exponential moving averages of point samples or average resource utilization over some sample period [23, 27, 2, 9, 8] .
  • Kim and Noble have compared several statistical methods for estimating available bandwidth on a noisy link [25] ; their study laid the groundwork for the "flop-flip" filter in Muse.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Managing Energy and Server Resources in Hosting Centers
Jeffrey S. Chase, Darrell C. Anderson, Prachi N. Thakar, Amin M. Vahdat *
Department of Computer Science
Duke University
{chase, anderson, prachi, vahdat}@cs.duke, edu
Ronald P. Doyle t
Application Integration and Middleware
IBM Research Triangle Park
rdoyle@us, ibm. com
ABSTRACT
Interact hosting centers serve multiple service sites from a common
hardware base. This paper presents the design and implementation
of an architecture for resource management in a hosting center op-
erating system, with an emphasis on
energy
as a driving resource
management issue for large server clusters. The goals are to provi-
sion server resources for co-hosted services in a way that automati-
cally adapts to offered load, improve the energy efficiency of server
dusters by dynamically resizing the active server set, and respond
to power supply disruptions or thermal events by degrading service
in accordance with negotiated Service Level Agreements (SLAs).
Our system is based on an economic approach to managing shared
server resources, in which services "bid" for resources as a func-
tion of delivered performance. The system continuously moni-
tors load and plans resource allotments by estimating the value of
their effects on service performance. A greedy resource allocation
algorithm adjusts resource prices to balance supply and demand,
allocating resources to their most efficient use. A reconfigurable
server switching infrastructure directs request traffic to the servers
assigned to each service. Experimental results from a prototype
confirm that the system adapts to offered load and resource avail-
ability, and can reduce server energy usage by 29% or more for a
typical Web workload.
1.
INTRODUCTION
The Internet buildout is driving a shift toward server-based com-
puting. Internet-based services host content and applications in
data centers for networked access from diverse client devices. Ser-
vice providers are adding new data center capacity for Web host-
ing, application services, outsourced storage, electronic markets,
*This work is supported by the U.S. National Science Foundation
through grants CCR-00-82912 and EIA-9972879. Anderson is sup-
ported by a U.S. Department of Education GAANN fellowship.
Vahdat is supported by NSF CAREER award CCR-9984328.
tR. Doyle is also a PhD student in the Computer Science depart-
ment at Duke University.
Permission to
make digital or hard
copies of all or part of this work for
personal or classroom use is granted without fee provided
that
copies are not
made
or distributed for profit or commercial
advan-
tage and that
copies bear this notice and the
full citation on
the first
page.
To
copy otherwise, to republish, to
post on servers or to
redistribute
to lists, requires prior specific permission
and/or a
fee.
SOSP01 Banff. Canada
© 2001 ACM ISBN
1-58113-389-8-1101110...~;5,00
and other network services. Many of these services are co-hosted
in shared data centers managed by third-party hosting providers.
Managed hosting in shared centers offers economies of scale and a
potential for dynamic capacity provisioning to respond to request
traffic, quality-of-service specifications, and network conditions.
The services lease resources from the hosting provider on a "pay as
you go" basis; the provider multiplexes the shared resources (e.g.,
servers, storage, and network bandwidth) to insulate its customers
from demand surges and capital costs for excess capacity.
Hosting centers face familiar operating system challenges common
to any shared computing resource. The center's operating system
must provide a uniform and secure execution environment, isolate
services from unintended consequences of resource sharing, share
resources fairly in a way that reflects priority, and degrade grace-
fully in the event of failures or unexpected demand surges. Several
research projects address these challenges for networks of servers;
Section 7 surveys related research.
This paper investigates the
policies
for allocating resources in a
hosting center, with a principal focus on energy management. We
present the design and implementation of a flexible resource man-
agement architecture--called Muse--that controls server alloca-
tion and routing of requests to selected servers through a recon-
figurable switching infrastructure. Muse is based on an economic
model in which customers "bid" for resources as a function of ser-
vice volume and quality. We show that this approach enables adap-
tive resource provisioning
in accordance with flexible Service Level
Agreements (SLAs) specifying dynamic tradeoffs of service qual-
ity and cost. In addition, Muse promotes energy efficiency of Inter-
net server clusters by balancing the cost of resources (e.g., energy)
against the benefit realized by employing them. We show how this
energy-conscious provisioning
allows the center to automatically
adjust on-power capacity to scale with load, yielding a significant
energy savings for typical Internet service workloads.
This paper is organized as follows. Section 2 outlines the motiva-
tion for adaptive resource provisioning and the importance of en-
ergy in resource management for large server dusters. Section 3
gives an overview of the Muse architecture. Section 4 presents
the economic resource allocation scheme in detail, including the
bidding model, load estimation techniques, and a greedy resource
allocation algorithm that we call Maximize Service Revenue and
Profit (MSRP) to set resource prices and match resource supply
and demand. Section 5 outlines the Muse prototype, and Section 6
presents experimental results. Section 7 sets our approach in con-
text with related work.
103

brmance
,.asures
storage
tier
server pool
Figure 1: Adaptive resource provisioning in Muse.
2. MOTIVATION
Hosting utilities provision server resources for their customersm
co-hosted server applications or
services--according to their re-
source demands at their expected loads. Since ontsourced host-
ing is a competitive business, hosting centers must manage their
resources efficiently. Rather than overprovisioning for the worst-
case load, efficient admission control and capacity planning poli-
cies may be designed to limit rather than eliminate the risk of failing
to meet demand [8, 2]. An efficient resource management scheme
would automatically allocate to each service the minimal server
resources needed for acceptable service quality, leaving surplus re-
sources free to deploy elsewhere. Provisioning choices must adapt
to changes in load as they occur, and respond gracefully to unantic-
ipated demand surges or resource failures. For these reasons man-
aging server resources automatically is a difficult challenge.
This paper describes a system for adaptive resource management
that incorporates power and energy as primary resources of a host-
ing center. A data center is effectively a large distributed computer
comprising an "ecosystem" of heterogeneous components includ-
ing edge servers, application servers, databases, and storage, as-
sembled in a building with an infrastructure to distribute power,
communications, and cooling. Viewed from the outside, the center
is a "black box" with common external connections to the Inter-
net and the electrical grid; it generates responses to a stream of
requests from the network, while consuming power and producing
waste heat. Energy management is critical in this context for sev-
eral inter-related reasons:
Data centers are vulnerable to overloading of the thermal
system due to cooling failures, external conditions, or high
service load. Muse can respond to these threats by auto-
matically scaling back power demand (and therefore waste
heat), rather than shutting down or risking damage to expen-
sive components. Similarly, power supply disruptions may
force the service to reduce power draw, e.g., to run longer
on limited backup power. We refer to this as
browndown--a
new partial failure mode specific to "data center computers".
The effect of browndown is to create a resource shortage,
forcing the center to degrade service quality.
Power supply (including backup power) and thermal systems
are a significant share of capital outlay for a hosting cen-
ter. Capacity planning for the worst case increases this cost.
Muse enables the center to size for expected loads and scale
back power (browndown) when exceptional conditions ex-
ceed energy or thermal limits. This is analogous to
dynamic
thermal management
for individual servers [13].
The Internet service infrastructure is a major energy con-
sumer, and its energy demands are growing rapidly. One
projection is that US data centers will consume 22 TWh of
electricity in 2003 for servers, storage, switches, power con-
ditioning, and cooling [29]. This energy would cost $2B an-
nually at a common price of $100 per MWh; price peaks of
$500 per MWh have been common on the California spot
market. Energy will make up a growing share of operating
costs as administration for these centers is increasingly au-
tomated [5, 12]. Moreover, generating this electricity would
release about 12M tons of new
C02 annually. Some areas
are zoning against data centers to protect their local power
systems [29]. Improving energy efficiency for data centers
will yield important social and environmental benefits, in ad-
dition to reducing costs.
Fine-grained balancing of service quality and resource usage--
including powermgives businesses control over quality and
price tradeoffs in negotiating SLAs. For example, energy
suppliers offer cheaper power to customers who can reduce
consumption on demand. We show how SLAs for a host-
ing utility may directly specify similar tradeoffs of price and
quality. For example, customers might pay a lower hosting
rate to allow for degraded service during power disruptions.
This paper responds to these needs with a resource management
architecture that adaptively provisions resources in a hosting center
to (1) avoid inefficient use of energy and server resources, (2) pro-
vide a capability to adaptively scale back power demand, and (3)
respond to unexpected resource constraints in a way that minimizes
the impact of service disruption.
3. OVERVIEW OF
MUSE
Muse is an operating system for a hosting center. The components
of the hosting center are highly specialized, governed by their own
internal operating systems and interacting at high levels of abstrac-
tion. The role of the center's operating system is to establish and
coordinate these interactions, supplementing the operating systems
of the individual components.
Figure I depicts the four elements of the Muse architecture:
Generic server appliances. Pools of shared servers act to-
gether to serve the request load of each co-hosted service.
Server resources are generic and interchangeable.
Reconfigurable network switching fabric. Switches dynam-
ically redirect incoming request traffic to eligible servers.
Each hosted service appears to external clients as a single
virtual server, whose power grows and shrinks with request
load and available resources.
Load monitoring and estimation modules. Server operating
systems and switches continuously monitor load; the system
combines periodic load summaries to detect load shifts and
to estimate aggregate service performance.
104

144
i
120
............................................
i 96 ..................................
~ 72 .............................
.o
~- 24 ..................................................
o
0 24 48 72 96 120 144 168
Time (hours)
Figure 2: Request rate for the
www.ibm, com
site over February
5-11, 2001.
2200.
~1760.
~
1320.
880.
2 440.
........ i .............................
200 400 600 800 1000 1200
Time (hours)
1400
Figure 3: Request rate for the World Cup site for May-June
1998.
The executive. The
"brain" of the hosting center OS is a pol-
icy service that dynamically reallocates server resources and
reconfigures the network to respond to variations in observed
load, resource availability, and service value.
This section gives an overview of each of these components, and
outlines how the combination can improve the energy efficiency
of server clusters as a natural outgrowth of adaptive resource provi-
sioning. Various aspects of Muse are related to many other systems;
we leave a full treatment of related work to Section 7.
3.1 Services and Servers
Each co-hosted service consists of a body of data and software,
such as a Web application. As in other cluster-based services,
a given request could be handled by any of several servers run-
ning the software, improving scalability and fault-tolerance [7, 36].
Servers may be grouped into pools with different software configu-
rations, but they may he reconfigured and redeployed from one pool
to another. To enable this reconfiguration, servers are stateless, e.g.,
they are backed by shared network storage. The switches balance
request traffic across the servers so it is acceptable for servers to
have different processing speeds.
Muse allocates to each service a suitable share of the server re-
sources that it needs to serve its load, relying on support for re-
source principals such as
resource containers
[10, 9] in the server
operating systems to ensure performance isolation on shared servers.
3.2 Redirecting Switches
Large Web sites utilize commercial redirecting
server switches
to
spread request traffic across servers using a variety of policies.
Our system uses reconfigurable server switches as a mechanism to
support the resource assignments planned by the executive. Muse
switches maintain an
active set
of servers selected to serve requests
for each network-addressable service. The switches are dynami-
cally reconfigurable to change the active set for each service. Since
servers may be shared, the active sets of different services may
overlap. The switches may use load status to balance incoming
request traffic across the servers in the active set.
3.3 Adaptive Resource Provisioning
Servers and switches in a Muse hosting center monitor load con-
ditions and report load measures to the executive. The executive
gathers load information, periodically determines new resource as-
signments, and issues commands to the servers and switches to ad-
just resource allotments and active sets for each service. The exec-
utive employs an economic framework to manage resource alloca-
tion and provisioning in a way that maximizes resource efficiency
and minimizes unproductive costs. This defines a simple, flexible,
and powerful means to quantify the value tradeoffs embodied in
SLAs. Section 4 describes the resource economy in detail.
One benefit of the economic framework is that it defines a metric
for the center to determine when it is or is not worthwhile to deploy
resources to improve service quality. This enables a center to ad-
just resource allocations in a way that responds to load shifts across
multiple services. Typical Internet service loads vary by factors of
three or more through the day and through the week. For exam-
ple, Figure 2 shows request rates for the
www.ibm.com
site over a
typical week starting on a Monday and ending on a Sunday. The
trace shows a consistent pattern of load shifts by day, with a week-
day 4PM EST peak of roughly 260% of daily minimum load at
6AM EST, and a traffic drop over the weekend. A qualitatively
similar pattern appears in other Internet service traces, with daily
peak-to-trough ratios as high as 11:1 and significant seasonal vari-
ations [20]. To illustrate a more extreme seasonal load fluctuation,
Figure 3 depicts accesses to the World Cup site [6] through May
and June, 1998. In May the peak request rate is 30 requests per
second, but it surges to nearly 2000 requests per second in June.
Adaptive resource provisioning can respond to these load variations
to multiplex shared server resources.
3.4 Energy-Conscious Provisioning
This ability to dynamically adjust server allocations enables the
system to improve energy-efficiency by matching a duster's en-
ergy consumption to the aggregate request load and resource de-
mand.
Energy-conscious provisioning
configures switches to con-
centrate request load on a minimal active set of servers for the cur-
rent aggregate load level. Active servers always run near a con-
figured utilization threshold, while the excess servers transition to
low-power idle states to reduce the energy cost of maintaining sur-
plus capacity during periods of light load. Energy savings from
the off-power servers is compounded in the cooling system, which
consumes power to remove the energy dissipated in the servers as
waste heat. Thus energy-conscious provisioning can also reduce
fixed capacity costs for cooling, since the cooling for short periods
of peak load may extend over longer intervals.
A key observation behind this policy is that today's servers are less
105

Architecture Machine
Pill 866MHz SuperMicro 370-DER
PII1866MHz SuperMicro 370-DER
PII1450MHz ASUS P2BLS
PIII Xeon 733MHz PowerEdge 4400
PII1500MHz PowerEdge 2400
PII1500MHz PowerEdge 2400
Pill 500MHz PowerEdge 2400
Disks Operating System
Boot [
1 FreeBSD 4.0 136
1 Windows 2000 134
1 FreeBSD 4.0 66
8 FreeBSD 4.0 278
3 FreeBSD 4.0 130
3 Windows 2000 127
3 Solaris 2.7 129
Level of Usage
Max [Idle [Hibernate
120 93
. ~ ,
120 98 5.5
55 35 4
270 225
~ .
128 95 2.5
120 98 2.5
124 127 2.5
Table 1: Power draw in watts for various server platforms and operating systems.
120 -~ 1000 -4
....................
................ ooo
48 400
O~ 24 +~tattime--160 I_W~qjJ~_ 200
0 ~ I
.....
Request throughput ] ~
0 100 200 300 400 500
Time (s)
Figure 4: A comparison of power draw for two experiments
serving a synthetic load swell on two servers.
energy-efficient at low CPU utilization due to fixed energy costs for
the power supply. To illustrate, Figure 4 shows request throughput
for two separate experiments serving a load swell on two servers
through a redirecting switch, and server power draw as measured
by a Brand Electronics 21-1850/(21 digital power meter. The exper-
imental configuration and synthetic Web workload are similar to
other experiments reported in Section 6. Only one of the servers is
powered and active at the start of each experiment. In the first ex-
periment, the second server powers on at t = 80, creating a power
spike as it boots, and joins the active set at t = 130. In the second
experiment, the second server powers on later at time t = 160 and
joins the active set at t = 210. Note that from t = 130 to t = 160
the second configuration serves the same request load as the first,
but the first configuration draws over 80 watts while the second
draws about 50, an energy savings of 37%. Section 4.1 discusses
the effects on latency and throughput.
The effect is qualitatively similar on other server/OS combinations.
Table 1 gives the power draw for a selection of systems, all of which
pull over 60% of their peak power when idle, even when the OS
halts the CPU between interrupts in the idle loop. This is because
the power supply transformers impose a fixed energy cost---even if
the system is idle--to maintain charged capacity to respond rapidly
to demand when it occurs. When these systems are active their
power draw is roughly linear with system load, ranging from the
base idle draw rate up to some peak. In today's servers the CPU
is the next most significant power consumer (e.g., up to 38 watts
for a 600 MHz Intel Pentium-III); power draw from memory and
network devices is negligible in comparison, and disks consume
from 50-250 watts per terabyte of capacity (this number is dropping
as disk densities increase) [12].
This experiment shows that the simplest and most effective way to
reduce energy consumption in large server clusters is to turn servers
off, concentrating load on a subset of the servers. In large clusters
this is more effective than adaptively controlling power draw in
the server CPUs, which consume only a portion of server energy.
Muse defines both a mechanism to achieve this--the reconfigurable
switching architecture--and policies for adaptively varying the ac-
tive server sets and the number of on-power servers. Recent indus-
try initiatives for advanced power management allow the executive
to initiate power transitions remotely (see Section 5.2). An increas-
ing number of servers on the market support these features.
One potential concern with this approach is that power transitions
(e.g., disk spin-down) may reduce the lifetime of disks on servers
that store data locally, although it may extend the lifetime of other
components. One solution is to keep servers stateless and leave
the network storage tier powered. A second concern is that power
transitions impose a time lag ranging from several seconds to a
minute. Neither of these issues is significant if power transitions
are damped to occur only in response to daily load shifts, such as
those illustrated in Figure 2 and Figure 3. For example, typical
modem disk drives are rated for 30,000 to 40,000 start/stop cycles.
4.
THE RESOURCE ECONOMY
This section details the system's framework for allocating resources
to competing co-hosted services (customers). The basic challenge
is to determine the resource demand of each customer at its current
request load level, and to allocate resources to their most efficient
and productive use. Resources are left idle if the marginal cost to
use them (e.g., energy) is less than the marginal value of deploying
them to improve performance at current load levels.
To simplify the discussion, we consider a single server pool with a
common unit of hosting service resource. This unit could represent
CPU time or a combined measure reflecting a share of aggregate
CPU, memory, and storage resources. Section 4.6 discusses the
problem of provisioning multiple resource classes.
Consider a hosting center with a total of
#max
discrete units of
hosting resource at time t. The average cost to provide one unit of
resource per unit of time is given by
cost(t),
which may account for
factors such as equipment wear or energy prices through time. All
resource units are assigned equivalent (average) cost at any time.
This represents the
variable
cost of service; the center pays the cost
for a resource unit only when it allocates that unit to a customer.
The cost and available resource #max may change at any time (e.g.,
due to browndown).
Each customer i is associated with a
utility function Ui
(t, #i) to
model the value of allocating #i resource units to i at time t. Since
hosting is a contracted service with economic value, there is an
106

100
80
................................................
==':°° iiiiiiiii .... ii-
o
40
~_
................
+7_=:=,oot_.
0
0 30 60 90 120 150 180
Time(@
600 100
~
'480
i
360
'~
240
o
120
................... 80
60
.40 ~
* Throughput 20
" = 0
30 60 90 120 150 180
Time (s)
Figure 5: Trading off CPU allocation, throughput, and latency
for a synthetic Web workload.
economic basis for evaluating utility [35]. We consider each cus-
tomer's utility function as reflecting its "bid" for the service vol-
tune and service quality resulting from its resource allotment at any
given time. Our intent is that utility derives directly from negoti-
ated or offered prices in a usage-based pricing system for hosting
service, and may be compared directly to
cost.
We use the eco-
nomic terms
price, revenue, profit, and loss
to refer to utility values
and their differences from cost. However, our approach does not
depend on any specific pricing model, and the utility functions may
represent other criteria for establishing customer priority (see Sec-
tion 4.5). Without loss of generality suppose that cost and utility
are denominated in "dollars" ($).
The executive's goal of maximizing resource efficiency corresponds
to maximizing profit in this economic formulation. Crucially, cus-
tomer utility is defined in terms of
delivered performance,
e.g.,
as described in Section 4.1 below. The number of resource units
consumed to achieve that performance is known internally to the
system, which plans its resource assignments by estimating the
the value of the resulting performance effects as described in Sec-
tion 4.3. Section 4.4 addresses the problem of obtaining stable and
responsive load estimates from bursty measures.
4.1 Bids and Penalties
While several measures could act as a basis for utility, we select
a simple measure that naturally captures the key variables for ser-
vices with many small requests and stable average-case service de-
mand per request. Each customer's bid for each unit of time is a
function
bidi
of its delivered request throughput ),i, measured in
hits per minute
($/hpm).
As a simple example, a customer might
bid 5 cents per thousand
hpm
up to 10K
hpm,
then 1 cent for each
additional thousand
hpm
up to a maximum of one dollar per minute.
If400 requests are served during a minute, then the customer's bid
is 2 cents for that minute.
It is left to the system to determine the resources needed to achieve
a given throughput level so that it may determine the value of de-
ploying resources for service i. The delivered throughput is some
function of the allotment and of the active client load through time:
hi (t, pi). This function encapsulates user population changes, pop-
ularity shifts, the resource-intensity of the service, and user think
time. The throughput can be measured for the current t and #i,
but the system must also predict the value of changes to the al-
lotment; Section 4.3 discusses techniques to approximate
)~i (l~,/Zi )
from continuous performance measures. The system computes bids
by composing
bidi
and hi: the revenue from service i at time t with
resource allotment pl is
bidi()q(t, #i)).
Note that bid prices based on
$/hpm
reflect service quality as well
as load. That is, the center may increase its profit by improving
service quality to deliver better throughput at a given load level.
Figure 5 illustrates this effect with measurements from a single
CPU-bound service running a synthetic Web workload with a fixed
client population (see Section 6). The top graph shows the CPU
allotment increasing as a step function through time, and the corre-
sponding CPU utilization. (Note that a service may temporarily
exceed its allotment on a server with idle resources.) The bot-
tom graph shows the effect on throughput and latency. Figure 5
illustrates the improved service quality (lower latency) from re-
duced queuing delays as more resources are deployed. As latency
drops, the throughput ,~ increases. This is because the user pop-
ulation issues requests faster when response latencies are lower, a
well-known phenomenon modeled by dosed or finite population
queuing systems. Better service may also reduce the number of
users who "balk" and abandon the service. However, as the latency
approaches zero, throughput improvements level off because user
think time increasingly dominates the request generation rate from
each user. At this point, increasing the allotment further yields no
added value; thus the center has an incentive to deploy surplus re-
sources to another service, or to leave them idle to reduce cost.
SLAs for hosting service may also impose specific bounds on ser-
vice quality, e.g., latency. To incorporate this factor into the utility
functions, we may include a penalty term for failure to deliver ser-
vice levels stipulated in an SLA. For example, suppose customer
i leases a virtual server with a specific resource reservation of ri
units from the hosting center at a fixed rate. This corresponds to
a flat
bidi
at the fixed rate, or a fixed minimum bid if the
bidi
also includes a load-varying charge as described above to motivate
the center to allot #i > ri when the customer's load demands it.
If the request load for service i is too low to benefit from its full
reservation, then the center may overbook the resource and deliver
#i < ri to increase its profit by deploying its resources elsewhere:
this occurs when ,~i(t, pi) ~ hi(t, rl) for pi <
ri
at some time t.
However, since the center collects i's fixed minimum bid whether
or not it allots any resource to
i, the bidi
may not create sufficient
incentive to deliver the full reservation on demand. Thus a penalty
is needed.
Since utilization is an excellent predictor of service quality (see
Figure 5), the SLA can specify penalties in a direct and general
way by defining a maximum target utilization level p~,~rg~ for
the
allotted resource. Suppose that #i is the portion of its allotment
#i that customer i uses during some interval, which is easily mea-
107

Citations
More filters
Journal ArticleDOI
TL;DR: An architectural framework and principles for energy-efficient Cloud computing are defined and the proposed energy-aware allocation heuristics provision data center resources to client applications in a way that improves energy efficiency of the data center, while delivering the negotiated Quality of Service (QoS).

2,511 citations


Cites background or methods from "Managing energy and server resource..."

  • ...As in [8], two power saving techniques are applied: switching power of computing nodes on / off and Dynamic Voltage and Frequency Scaling (DVFS)....

    [...]

  • ...Despite the variable nature of the workload, unlike [8], the resource usage data are not approximated, which results in potentially inefficient decisions due to fluctuations....

    [...]

  • ...[8] have considered the problem of energyefficient management of homogeneous resources in Internet hosting centers....

    [...]

Proceedings ArticleDOI
09 Jun 2007
TL;DR: This paper presents the aggregate power usage characteristics of large collections of servers for different classes of applications over a period of approximately six months, and uses the modelling framework to estimate the potential of power management schemes to reduce peak power and energy usage.
Abstract: Large-scale Internet services require a computing infrastructure that can beappropriately described as a warehouse-sized computing system. The cost ofbuilding datacenter facilities capable of delivering a given power capacity tosuch a computer can rival the recurring energy consumption costs themselves.Therefore, there are strong economic incentives to operate facilities as closeas possible to maximum capacity, so that the non-recurring facility costs canbe best amortized. That is difficult to achieve in practice because ofuncertainties in equipment power ratings and because power consumption tends tovary significantly with the actual computing activity. Effective powerprovisioning strategies are needed to determine how much computing equipmentcan be safely and efficiently hosted within a given power budget.In this paper we present the aggregate power usage characteristics of largecollections of servers (up to 15 thousand) for different classes ofapplications over a period of approximately six months. Those observationsallow us to evaluate opportunities for maximizing the use of the deployed powercapacity of datacenters, and assess the risks of over-subscribing it. We findthat even in well-tuned applications there is a noticeable gap (7 - 16%)between achieved and theoretical aggregate peak power usage at the clusterlevel (thousands of servers). The gap grows to almost 40% in wholedatacenters. This headroom can be used to deploy additional compute equipmentwithin the same power budget with minimal risk of exceeding it. We use ourmodeling framework to estimate the potential of power management schemes toreduce peak power and energy usage. We find that the opportunities for powerand energy savings are significant, but greater at the cluster-level (thousandsof servers) than at the rack-level (tens). Finally we argue that systems needto be power efficient across the activity range, and not only at peakperformance levels.

2,047 citations


Cites background from "Managing energy and server resource..."

  • ...[6] treat energy as a resource to be scheduled by a hosting center’s management infrastructure, and propose a scheme that can reduce energy us-...

    [...]

Book
Luiz Andre Barroso1, Urs Hoelzle1
01 Jan 2008
TL;DR: The architecture of WSCs is described, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base are described.
Abstract: As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today's WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today's WSCs on a single board. Table of Contents: Introduction / Workloads and Software Infrastructure / Hardware Building Blocks / Datacenter Basics / Energy and Power Efficiency / Modeling Costs / Dealing with Failures and Repairs / Closing Remarks

1,938 citations

Journal ArticleDOI
Carl A. Waldspurger1
09 Dec 2002
TL;DR: Several novel ESX Server mechanisms and policies for managing memory are introduced, including a ballooning technique that reclaims the pages considered least valuable by the operating system running in a virtual machine, and an idle memory tax that achieves efficient memory utilization.
Abstract: VMware ESX Server is a thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified commodity operating systems. This paper introduces several novel ESX Server mechanisms and policies for managing memory. A ballooning technique reclaims the pages considered least valuable by the operating system running in a virtual machine. An idle memory tax achieves efficient memory utilization while maintaining performance isolation guarantees. Content-based page sharing and hot I/O page remapping exploit transparent page remapping to eliminate redundancy and reduce copying overheads. These techniques are combined to efficiently support virtual machine workloads that overcommit memory.

1,528 citations

Proceedings ArticleDOI
07 Mar 2009
TL;DR: The PowerNap concept, an energy-conservation approach where the entire system transitions rapidly between a high-performance active state and a near-zero-power idle state in response to instantaneous load, is proposed and the Redundant Array for Inexpensive Load Sharing (RAILS) is introduced.
Abstract: Data center power consumption is growing to unprecedented levels: the EPA estimates U.S. data centers will consume 100 billion kilowatt hours annually by 2011. Much of this energy is wasted in idle systems: in typical deployments, server utilization is below 30%, but idle servers still consume 60% of their peak power draw. Typical idle periods though frequent--last seconds or less, confounding simple energy-conservation approaches.In this paper, we propose PowerNap, an energy-conservation approach where the entire system transitions rapidly between a high-performance active state and a near-zero-power idle state in response to instantaneous load. Rather than requiring fine-grained power-performance states and complex load-proportional operation from each system component, PowerNap instead calls for minimizing idle power and transition time, which are simpler optimization goals. Based on the PowerNap concept, we develop requirements and outline mechanisms to eliminate idle power waste in enterprise blade servers. Because PowerNap operates in low-efficiency regions of current blade center power supplies, we introduce the Redundant Array for Inexpensive Load Sharing (RAILS), a power provisioning approach that provides high conversion efficiency across the entire range of PowerNap's power demands. Using utilization traces collected from enterprise-scale commercial deployments, we demonstrate that, together, PowerNap and RAILS reduce average server power consumption by 74%.

1,002 citations


Cites background from "Managing energy and server resource..."

  • ...dynamic voltage and frequency scaling (DVFS) nearly eliminate processor power consumption in idle systems, presentday servers still dissipate about 60% as much power when idle as when fully loaded [4,6,13]....

    [...]

References
More filters
Journal ArticleDOI
01 Aug 1988
TL;DR: The measurements and the reports of beta testers suggest that the final product is fairly good at dealing with congested conditions on the Internet, and an algorithm recently developed by Phil Karn of Bell Communications Research is described in a soon-to-be-published RFC.
Abstract: In October of '86, the Internet had the first of what became a series of 'congestion collapses'. During this period, the data throughput from LBL to UC Berkeley (sites separated by 400 yards and three IMP hops) dropped from 32 Kbps to 40 bps. Mike Karels1 and I were fascinated by this sudden factor-of-thousand drop in bandwidth and embarked on an investigation of why things had gotten so bad. We wondered, in particular, if the 4.3BSD (Berkeley UNIX) TCP was mis-behaving or if it could be tuned to work better under abysmal network conditions. The answer to both of these questions was “yes”.Since that time, we have put seven new algorithms into the 4BSD TCP: round-trip-time variance estimationexponential retransmit timer backoffslow-startmore aggressive receiver ack policydynamic window sizing on congestionKarn's clamped retransmit backofffast retransmit Our measurements and the reports of beta testers suggest that the final product is fairly good at dealing with congested conditions on the Internet.This paper is a brief description of (i) - (v) and the rationale behind them. (vi) is an algorithm recently developed by Phil Karn of Bell Communications Research, described in [KP87]. (viii) is described in a soon-to-be-published RFC.Algorithms (i) - (v) spring from one observation: The flow on a TCP connection (or ISO TP-4 or Xerox NS SPP connection) should obey a 'conservation of packets' principle. And, if this principle were obeyed, congestion collapse would become the exception rather than the rule. Thus congestion control involves finding places that violate conservation and fixing them.By 'conservation of packets' I mean that for a connection 'in equilibrium', i.e., running stably with a full window of data in transit, the packet flow is what a physicist would call 'conservative': A new packet isn't put into the network until an old packet leaves. The physics of flow predicts that systems with this property should be robust in the face of congestion. Observation of the Internet suggests that it was not particularly robust. Why the discrepancy?There are only three ways for packet conservation to fail: The connection doesn't get to equilibrium, orA sender injects a new packet before an old packet has exited, orThe equilibrium can't be reached because of resource limits along the path. In the following sections, we treat each of these in turn.

5,620 citations

Proceedings ArticleDOI
01 Jun 1998
TL;DR: This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server and addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream.
Abstract: One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called Surge (Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that Surge exercises servers in a manner significantly different from other Web server benchmarks.

1,549 citations

Journal ArticleDOI
TL;DR: Observations on the patterns and characteristics of wide-area Internet traffic, as recorded by MCI's OC-3 traffic monitors are presented, revealing the characteristics of the traffic in terms of packet sizes, flow duration, volume, and percentage composition by protocol and application.
Abstract: The Internet is rapidly growing in number of users, traffic levels, and topological complexity. At the same time it is increasingly driven by economic competition. These developments render the characterization of network usage and workloads more difficult, and yet more critical. Few recent studies have been published reporting Internet backbone traffic usage and characteristics. At MCI, we have implemented a high-performance, low-cost monitoring system that can capture traffic and perform analyses. We have deployed this monitoring tool on OC-3 trunks within the Internet MCI's backbone and also within the NSF-sponsored vBNS. This article presents observations on the patterns and characteristics of wide-area Internet traffic, as recorded by MCI's OC-3 traffic monitors. We report on measurements from two OC-3 trunks in MCI's commercial Internet backbone over two time ranges (24-hour and 7-day) in the presence of up to 240,000 flows. We reveal the characteristics of the traffic in terms of packet sizes, flow duration, volume, and percentage composition by protocol and application, as well as patterns seen over the two time scales.

1,180 citations

Proceedings ArticleDOI
20 Jan 2001
TL;DR: This work investigates dynamic thermal management as a technique to control CPU power dissipation and explores the tradeoffs between several mechanisms for responding to periods of thermal trauma and the effects of hardware and software implementations.
Abstract: With the increasing clock rate and transistor count of today's microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dynamic thermal management as a technique to control CPU power dissipation. With the increasing usage of clock gating techniques, the average power dissipation typically seen by common applications is becoming much less than the chip's rated maximum power dissipation. However system designers still must design thermal heat sinks to withstand the worse-case scenario. We define and investigate the major components of any dynamic thermal management scheme. Specifically we explore the tradeoffs between several mechanisms for responding to periods of thermal trauma and we consider the effects of hardware and software implementations. With approximate dynamic thermal management, the CPU can be designed for a much lower maximum power rating, with minimal performance impact for typical applications.

882 citations


"Managing energy and server resource..." refers background in this paper

  • ..., cooling failure) in a manner similar to dynamic thermal management [13] within an individual server....

    [...]

  • ...thermal management for individual servers [13]....

    [...]

Proceedings ArticleDOI
01 Oct 1997
TL;DR: The design of Odyssey is described, a prototype implementing application-aware adaptation, and how it supports concurrent execution of diverse mobile applications, and agility is identified as a key attribute of adaptive systems.
Abstract: In this paper we show that application-aware adaptation, a collaborative partnership between the operating system and applications, offers the most general and effective approach to mobile information access. We describe the design of Odyssey, a prototype implementing this approach, and show how it supports concurrent execution of diverse mobile applications. We identify agility as a key attribute of adaptive systems, and describe how to quantify and measure it. We present the results of our evaluation of Odyssey, indicating performance improvements up to a factor of 5 on a benchmark of three applications concurrently using remote services over a network with highly variable bandwidth.

827 citations


"Managing energy and server resource..." refers background in this paper

  • ...A good load estimator must balance stability—to avoid overreacting to transient load fluctuations—with agility to adapt quickly to meaningful load shifts [31, 25]....

    [...]

  • ...One aspect we have not investigated in the server context is the role of application-specific adaptation to resource constraints [31, 17, 30, 2]....

    [...]