scispace - formally typeset
Search or ask a question

Showing papers by "David Irwin published in 2016"


Proceedings ArticleDOI
18 Apr 2016
TL;DR: Flint is designed, which is based on Spark and includes automated checkpointing and server selection policies that support batch and interactive applications and dynamically adapt to application characteristics, and yields cost savings of up to 90% compared to using on-demand servers.
Abstract: Cloud providers now offer transient servers, which they may revoke at anytime, for significantly lower prices than on-demand servers, which they cannot revoke. The low price of transient servers is particularly attractive for executing an emerging class of workload, which we call Batch-Interactive Data-Intensive (BIDI), that is becoming increasingly important for data analytics. BIDI workloads require large sets of servers to cache massive datasets in memory to enable low latency operation. In this paper, we illustrate the challenges of executing BIDI workloads on transient servers, where revocations (akin to failures) are the common case. To address these challenges, we design Flint, which is based on Spark and includes automated checkpointing and server selection policies that i) support batch and interactive applications and ii) dynamically adapt to application characteristics. We evaluate a prototype of Flint using EC2 spot instances, and show that it yields cost savings of up to 90% compared to using on-demand servers, while increasing running time by

91 citations


Proceedings Article
20 Jun 2016
TL;DR: It is argued that sophisticated bidding strategies, in practice, do not provide any advantages over simple strategies for multiple reasons.
Abstract: Cloud providers have begun to allow users to bid for surplus servers on a spot market. These servers are allocated if a user's bid price is higher than their market price and revoked otherwise. Thus, analyzing price data to derive optimal bidding strategies has become a popular research topic. In this paper, we argue that sophisticated bidding strategies, in practice, do not provide any advantages over simple strategies for multiple reasons. First, due to price characteristics, there are a wide range of bid prices that yield the optimal cost and availability. Second, given the large number of spot markets, there is always a market with available surplus resources. Thus, if resources become unavailable due to a price spike, users need not wait until the spike subsides, but can instead provision a new spot resource elsewhere and migrate to it. Third, current spot market rules enable users to place maximum bids for resources without any penalty. Given bidding's irrelevance, users can adopt trivial bidding strategies and focus instead on modifying applications to efficiently seek out and migrate to the lowest cost resources.

45 citations


Proceedings ArticleDOI
16 Nov 2016
TL;DR: SunSpot is able to localize a solar-powered home to a small region of interest that is near the smallest possible area given the energy data resolution, e.g., within a ~500m and ~28km radius for per-second and per-minute resolution, respectively.
Abstract: Homeowners are increasingly deploying grid-tied solar systems due to the rapid decline in solar module prices. The energy produced by these solar-powered homes is monitored by utilities and third parties using networked energy meters, which record and transmit energy data at fine-grained intervals. Such energy data is considered anonymous if it is not associated with identifying account information, e.g., a name and address. Thus, energy data from these "anonymous" homes is often not handled securely: it is routinely transmitted over the Internet in plaintext, stored unencrypted in the cloud, shared with third-party energy analytics companies, and even made publicly available over the Internet. Extensive prior work has shown that energy consumption data is vulnerable to multiple attacks, which analyze it to reveal a range of sensitive private information about occupant activities. However, these attacks are useless without knowledge of a home's location. Our key insight is that solar energy data is not anonymous: since every location on Earth has a unique solar signature, it embeds detailed location information. To explore the severity and extent of this privacy threat, we design SunSpot to localize "anonymous" solar-powered homes using their solar energy data. We evaluate SunSpot on publicly-available energy data from 14 homes with rooftop solar. We find that SunSpot is able to localize a solar-powered home to a small region of interest that is near the smallest possible area given the energy data resolution, e.g., within a ~500m and ~28km radius for per-second and per-minute resolution, respectively. SunSpot then identifies solar-powered homes within this region using crowd-sourced image processing of satellite data before applying additional filters to identify a specific home.

37 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: This work incorporates market-based probing into SpotLight, an information service that enables cloud applications to query this and other data, and uses it to monitor the availability of more than 4500 distinct server types across 9 geographical regions in Amazon's Elastic Compute Cloud over a 3 month period.
Abstract: Infrastructure-as-a-Service cloud platforms are incredibly complex: they rent hundreds of different types of servers across multiple geographical regions under a wide range of contract types that offer varying tradeoffs between risk and cost. Unfortunately, the internal dynamics of cloud platforms are opaque along several dimensions. For example, while the risk of servers not being available when requested is critical in optimizing the cloud's risk-cost tradeoffs, it is not typically made visible to users. Thus, inspired by prior work on Internet bandwidth probing, we propose actively probing cloud platforms to explicitly learn such information, where each "probe" is a request for a particular type of server. We model the relationships between different contracts types to develop a market-based probing policy, which leverages the insight that real-time prices in cloud spot markets loosely correlate with the supply (and availability) of fixed-price on-demand servers. That is, the higher the spot price for a server, the more likely the corresponding fixed-price on-demand server is not available. We incorporate market-based probing into SpotLight, an information service that enables cloud applications to query this and other data, and use it to monitor the availability of more than 4500 distinct server types across 9 geographical regions in Amazon's Elastic Compute Cloud over a 3 month period. We analyze this data to reveal interesting observations about the platform's internal dynamics. We then show how SpotLight enables two recently proposed derivative cloud services to select a better mix of servers to host applications, which improves their availability from ~70-90% to near 100% in practice.

33 citations


Proceedings ArticleDOI
13 Nov 2016
TL;DR: This work presents policies for partitioning a variable amount of idle capacity into classes with different transient guarantees to maximize performance and value, and shows that this approach can increase the aggregate revenue from idle server capacity by up to ∼6.5× compared to existing approaches.
Abstract: To prevent rejecting requests, cloud platforms typically provision for their peak demand. Thus, a platform's idle capacity can be significant, as demand varies widely over multiple time scales, e.g., daily and seasonally. To reduce waste, platforms have begun to offer this idle capacity in the form of transient servers, which they may unilaterally revoke, for much lower prices---~50-90% less---than on-demand servers, which they cannot revoke. However, transient servers' revocation characteristics---their volatility and predictability---influence their performance, since they affect the overhead of fault-tolerance mechanisms applications use to handle revocations. Unfortunately, current cloud platforms offer no guarantees on revocation characteristics, which makes it difficult for users to optimally configure (and correctly value) transient servers. To address the problem, we propose the abstraction of a transient guarantee, which offers probabilistic assurances on revocation characteristics. Transient guarantees have numerous benefits: they increase the performance of transient servers, enable users to optimally use and correctly value them, and permit platforms to control their freedom to revoke them. We present policies for partitioning a variable amount of idle capacity into classes with different transient guarantees to maximize performance and value. We then implement and evaluate these policies on job traces from a production Google cluster. We show that our approach can increase the aggregate revenue from idle server capacity by up to ~6.5X compared to existing approaches.

33 citations


Proceedings ArticleDOI
01 Jan 2016
TL;DR: This paper designs a solar-powered EV charging station in a parking lot of a car-share service and forms a Linear Programming approach to charge EVs that maximize the utilization of solar energy while maintaining similar battery levels for all cars.
Abstract: Electric vehicles (EV) are growing in popularity as a credible alternative to gas-powered vehicles. These vehicles require their batteries to be “fueled up” for operation. While EV charging has traditionally been grid-based, use of solar powered chargers has emerged as an interesting opportunity. These chargers provide clean electricity to electric-powered cars that are themselves pollution free resulting in positive environmental effects. In this paper, we design a solar-powered EV charging station in a parking lot of a car-share service. In such a car-share service rental pick up and drop off times are known. We formulate a Linear Programming approach to charge EVs that maximize the utilization of solar energy while maintaining similar battery levels for all cars. We evaluate the performance of our algorithm on a real-world and synthetically derived datasets to show that it fairly distributes the available electric charge among candidate EVs across seasons with variable demand profiles. Further, we reduce the disparity in the battery charge levels by 60% compared to best effort charging policy. Moreover, we show that 80th percentile of EVs have at least 75% battery level at the end of their charging session. Finally, we demonstrate the feasibility of our charging station and show that a solar installation proportional to the size of a parking lot adequately apportions available solar energy generated to the EVs serviced.

32 citations


Proceedings ArticleDOI
16 Nov 2016
TL;DR: This paper conducts a wide-ranging analysis of the city's gas and electric data to gain insights into the energy consumption of both individual homes and the city as a whole and demonstrates how city-scale smart meter datasets can answer a variety of questions on building energy consumption.
Abstract: Understanding the energy usage of buildings is crucial for policy-making, energy planning, and achieving sustainable development. Unfortunately, instrumenting buildings to collect energy usage data is difficult and all publicly available datasets typically include only a few hundred homes within a region. Due to their relatively small size, these datasets provide limited insight and are insufficient for analyses that require a larger representation, such as an entire city or town. In recent years, utility companies have installed advanced electric and gas meters, i.e., "smart meters" that enable energy data collection on a massive scale. In this paper, we analyze such a dataset from a utility company that includes energy data from 14,836 smart meters covering a small city. We conduct a wide-ranging analysis of the city's gas and electric data to gain insights into the energy consumption of both individual homes and the city as a whole. In doing so, we demonstrate how city-scale smart meter datasets can answer a variety of questions on building energy consumption, such as the impact of weather on energy usage, the correlation between the size and age of a building and its energy usage, the impact of increasing levels of renewable penetration, etc. For example, we show that extreme weather events significantly increase energy usage, e.g., by 36% and 11.5% on hot summer and cold winter days, respectively. As another example, we observe that 700 homes are highly energy inefficient as its energy demand variability is twice that of the aggregate grid demand. Finally, we study the impact of increasing level of renewable integration in homes and show that solar penetration rates higher than 20% of demand increases the risk of over-generation and may impact utility operations.

27 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: SmartSim, a publicly-available device-accurate smart home energy trace generator, is developed and integrated with NILM-TK, a public-available toolkit for Non-Intrusive Load Monitoring (NILM), and compared with traces from a real home to show they yield similar quantitative and qualitative results for representative energy analytics.
Abstract: Utilities have deployed tens of millions of smart meters, which record and transmit home energy usage at fine-grained intervals. These deployments are motivating researchers to develop new energy analytics that mine smart meter data to learn insights into home energy usage and behavior. Unfortunately, a significant barrier to evaluating energy analytics is the overhead of instrumenting homes to collect aggregate energy usage data and data from each device. As a result, researchers typically evaluate their analytics on only a small number of homes, and cannot rigorously vary a home's characteristics to determine what attributes of its energy usage affect accuracy. To address the problem, we develop SmartSim, a publicly-available device-accurate smart home energy trace generator. SmartSim generates energy usage traces for devices by combining a device energy model, which captures its pattern of energy usage when active, with a device usage model, which specifies its frequency, duration, and time of activity. SmartSim then generates aggregate energy data for a simulated home by combining the data from each device. We integrate SmartSim with NILM-TK, a publicly-available toolkit for Non-Intrusive Load Monitoring (NILM), and compare its synthetically generated traces with traces from a real home to show they yield similar quantitative and qualitative results for representative energy analytics.

23 citations


Proceedings ArticleDOI
12 Mar 2016
TL;DR: A detailed analysis of a state-of-the-art 15MW green multi-tenant data center that incorporates many of the technological advances used in commercial data centers is presented, revealing the benefits of optimizations, and insights into how the various effectiveness metrics change with the seasons and increasing capacity usage are provided.
Abstract: Data centers are an indispensable part of today's IT infrastructure. To keep pace with modern computing needs, data centers continue to grow in scale and consume increasing amounts of power. While prior work on data centers has led to significant improvements in their energy-efficiency, detailed measurements from these facilities' operations are not widely available, as data center design is often considered part of a company's competitive advantage. However, such detailed measurements are critical to the research community in motivating and evaluating new energy-efficiency optimizations. In this paper, we present a detailed analysis of a state-of-the-art 15MW green multi-tenant data center that incorporates many of the technological advances used in commercial data centers. We analyze the data center's computing load and its impact on power, water, and carbon usage using standard effectiveness metrics, including PUE, WUE, and CUE. Our results reveal the benefits of optimizations, such as free cooling, and provide insights into how the various effectiveness metrics change with the seasons and increasing capacity usage. More broadly, our PUE, WUE, and CUE analysis validate the green design of this LEED Platinum data center.

17 citations


Journal ArticleDOI
TL;DR: This article proposes an alternative structure where nearby homes explicitly share energy with each other to balance local energy harvesting and demand in microgrids, and develops a novel energy sharing approach to determine which homes should share energy, and when to minimize system-wide energy transmission losses in the microgrid.
Abstract: Renewable energy (e.g., solar energy) is an attractive option to provide green energy to homes. Unfortunately, the intermittent nature of renewable energy results in a mismatch between when these sources generate energy and when homes demand it. This mismatch reduces the efficiency of using harvested energy by either (i) requiring batteries to store surplus energy, which typically incurs ∼ 20% energy conversion losses, or (ii) using net metering to transmit surplus energy via the electric grid’s AC lines, which severely limits the maximum percentage of renewable penetration possible. In this article, we propose an alternative structure where nearby homes explicitly share energy with each other to balance local energy harvesting and demand in microgrids. We develop a novel energy sharing approach to determine which homes should share energy, and when to minimize system-wide energy transmission losses in the microgrid. We evaluate our approach in simulation using real traces of solar energy harvesting and home consumption data from a deployment in Amherst, MA. We show that our system (i) reduces the energy loss on the AC line by 64% without requiring large batteries, (ii) performance scales up with larger battery capacities, and (iii) is robust to different energy consumption patterns and energy prediction accuracy in the microgrid.

15 citations


Proceedings ArticleDOI
21 Jun 2016
TL;DR: A Non-Intrusive Model Derivation (NIMD) algorithm is presented to automate modeling of residential electric loads using concepts from power systems, statistics, and machine learning to show that models derived via NIMD are comparable in accuracy to models built by experts and closely approximate the ground truth data.
Abstract: A variety of energy management and analytics techniques rely on models of the power usage of a device over time. Unfortunately, the models employed by these techniques are often highly simplistic, such as modeling devices as simply being on with a fixed power usage or off and consuming little power. As we show, even the power usage of relatively simple devices exhibits much more complexity than a simple on and off state. To address the problem, we present a Non-Intrusive Model Derivation (NIMD) algorithm to automate modeling of residential electric loads using concepts from power systems, statistics, and machine learning. NIMD automatically derives a compact representation of the time-varying power usage of any residential electrical load, including both the device's energy usage and its pattern of usage over time. Such models are useful for a variety of analytics techniques, such as Non-Intrusive Load Monitoring, that have relied on simple on-off models in the past. We evaluate the accuracy of our models by comparing them with both actual ground truth data, and against models that have been designed manually by human experts. We show that models derived via NIMD are comparable in accuracy to models built by experts and closely approximate the ground truth data.

Proceedings Article
20 Jun 2016
TL;DR: This work argues that price volatility will significantly decrease the value of spot servers as the spot market matures, and proposes a more sustainable alternative that offers a variable amount of idle capacity to users for a fixed price, but with transient guarantees.
Abstract: Computational spot markets enable users to bid on servers, and then continuously allocates them to the highest bidder: if a user is "out bid" for a server, the market revokes it and re-allocates it to the new highest bidder. Spot markets are common when trading commodities to balance real-time supply and demand--cloud platforms use them to sell their idle capacity, which varies over time. However, server-time differs from other commodities in that it is "stateful": losing a spot server incurs an overhead that decreases the useful work it performs. Thus, variations in the spot price actually affect the inherent value of server-time bought in the spot market. As the spot market matures, we argue that price volatility will significantly decrease the value of spot servers. Thus, somewhat counter-intuitively, spot markets may not maximize the value of idle server capacity. To address the problem, we propose a more sustainable alternative that offers a variable amount of idle capacity to users for a fixed price, but with transient guarantees.

Proceedings ArticleDOI
12 Mar 2016
TL;DR: The results demonstrate the importance of energy-agile design when considering the benefits of using variable power, and show that GreenSort requires 31% more time and energy to complete when power varies based on real-time electricity prices versus when it is constant.
Abstract: Computing researchers have long focused on improving energy-efficiency under the implicit assumption that all energy is created equal. Yet, this assumption is actually incorrect: energy's cost and carbon footprint vary substantially over time. As a result, consuming energy inefficiently when it is cheap and clean may sometimes be preferable to consuming it efficiently when it is expensive and dirty. Green datacenters adapt their energy usage to optimize for such variations, as reflected in changing electricity prices or renewable energy output. Thus, we introduce energy-agility as a new metric to evaluate green datacenter applications. To illustrate fundamental tradeoffs in energy-agile design, we develop GreenSort, a distributed sorting system optimized for energy-agility. GreenSort is representative of the long-running, massively-parallel, data-intensive tasks that are common in datacenters and amenable to delays from power variations. Our results demonstrate the importance of energy-agile design when considering the benefits of using variable power. For example, we show that GreenSort requires 31% more time and energy to complete when power varies based on real-time electricity prices versus when it is constant. Thus, in this case, real-time prices should be at least 31% lower than fixed prices to warrant using them.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: This work proposes AutoPlug, a system that automatically identifies and tracks the devices plugged into smart outlets in real time without user intervention, and achieves ∼90% identification accuracy on real data collected from 13 distinct device types, while also detecting when a device changes outlets with an accuracy >90%.
Abstract: Low-cost network-connected smart outlets are now available for monitoring, controlling, and scheduling the energy usage of electrical devices. As a result, such smart outlets are being integrated into automated home management systems, which remotely control them by analyzing and interpreting their data. However, to effectively interpret data and control devices, the system must know the type of device that is plugged into each smart outlet. Existing systems require users to manually input and maintain the outlet metadata that associates a device type with a smart outlet. Such manual operation is time-consuming and error-prone: users must initially inventory all outlet-to-device mappings, enter them into the management system, and then update this metadata every time a new device is plugged in or moves to a new outlet. Inaccurate metadata may cause systems to misinterpret data or issue incorrect control actions. To address the problem, we propose AutoPlug, a system that automatically identifies and tracks the devices plugged into smart outlets in real time without user intervention. AutoPlug combines machine learning techniques with time-series analysis of device energy data in real time to accurately identify and track devices on startup, and as they move from outlet-to-outlet. We show that AutoPlug achieves ∼90% identification accuracy on real data collected from 13 distinct device types, while also detecting when a device changes outlets with an accuracy >90%. We implement an AutoPlug prototype on a Raspberry Pi and deploy it live in a real home for a period of 20 days. We show that its performance enables it to monitor up to 25 outlets, while detecting new devices or changes in devices with <50s latency.

Proceedings ArticleDOI
21 Jun 2016
TL;DR: The prototype SDS system, called SunShade, includes two new mechanisms that enable programmatic solar flow control: one that enforces an absolute limit on solar output, and one thatEnforces a relative limit onSolar output as a fraction of the current maximum power point.
Abstract: Since the electric grid was not designed to support large-scale solar generation, current policies place hard caps on the number of solar systems that connect to the grid. Unfortunately, users are starting to hit these caps, which is restricting solar's natural growth. Software-defined solar (SDS) systems address the problem by dynamically regulating the power they inject into the grid, similar to TCP, to maximize the grid's available solar capacity, maintain grid stability, and fairly share the grid's solar capacity among users. By dynamically regulating solar "flows," SDS systems remove the need for policies that artificially cap solar systems, enabling any SDS system to freely connect to the grid. Our prototype SDS system, called SunShade, includes two new mechanisms that enable programmatic solar flow control: one that enforces an absolute limit on solar output, and one that enforces a relative limit on solar output as a fraction of the current maximum power point. We have implemented both mechanisms, and conducted a preliminary evaluation with an emulated solar panel using real weather traces with different insolation and temperature levels.