Temporal data mining approaches for sustainable chiller management in data centers

doi:10.1145/1989734.1989738

Home
/
Papers
/
Temporal data mining approaches for sustainable chiller management in data centers

Journal Article•DOI•

Temporal data mining approaches for sustainable chiller management in data centers

Debprakash Patnaik¹, Manish Marwah², Ratnesh Sharma³, Naren Ramakrishnan¹•Institutions (3)

Virginia Tech¹, Hewlett-Packard², Princeton University³

15 Jul 2011-ACM Transactions on Intelligent Systems and Technology (ACM)-Vol. 2, Iss: 4, pp 34

TL;DR: Three key ingredients of CAMAS---motif mining, association analysis, and dynamic Bayesian network inference---that help bridge the gap between low-level, raw, sensor streams, and the high-level operating regions and features needed for an operator to efficiently manage the data center are demonstrated.

read less

Abstract: Practically every large IT organization hosts data centers---a mix of computing elements, storage systems, networking, power, and cooling infrastructure---operated either in-house or outsourced to major vendors. A significant element of modern data centers is their cooling infrastructure, whose efficient and sustainable operation is a key ingredient to the “always-on” capability of data centers. We describe the design and implementation of CAMAS (Chiller Advisory and MAnagement System), a temporal data mining solution to mine and manage chiller installations. CAMAS embodies a set of algorithms for processing multivariate time-series data and characterizes sustainability measures of the patterns mined. We demonstrate three key ingredients of CAMAS---motif mining, association analysis, and dynamic Bayesian network inference---that help bridge the gap between low-level, raw, sensor streams, and the high-level operating regions and features needed for an operator to efficiently manage the data center. The effectiveness of CAMAS is demonstrated by its application to a real-life production data center managed by HP.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A review of data mining technologies in building energy systems: Load prediction, pattern identification, fault detection and diagnosis

[...]

Yang Zhao¹, Chaobo Zhang¹, Yiwen Zhang¹, Zihao Wang¹, Junyang Li¹ - Show less +1 more•Institutions (1)

Zhejiang University¹

01 Apr 2020

TL;DR: A comprehensive literature review of the applications of data mining technologies in this domain and suggestions for future researches are proposed towards effective and efficient data mining solutions for building energy systems.

...read moreread less

Abstract: With the advent of the era of big data, buildings have become not only energy-intensive but also data-intensive. Data mining technologies have been widely utilized to release the values of massive amounts of building operation data with an aim of improving the operation performance of building energy systems. This paper aims at making a comprehensive literature review of the applications of data mining technologies in this domain. In general, data mining technologies can be classified into two categories, i.e., supervised data mining technologies and unsupervised data mining technologies. In this field, supervised data mining technologies are usually utilized for building energy load prediction and fault detection/diagnosis. And unsupervised data mining technologies are usually utilized for building operation pattern identification and fault detection/diagnosis. Comprehensive discussions are made about the strengths and shortcomings of the data mining-based methods. Based on this review, suggestions for future researches are proposed towards effective and efficient data mining solutions for building energy systems.

...read moreread less

157 citations

Journal Article•DOI•

Unsupervised data analytics in mining big building operational data for energy efficiency enhancement: A review

[...]

Cheng Fan¹, Fu Xiao², Zhengdao Li¹, Jiayuan Wang¹•Institutions (2)

Shenzhen University¹, Hong Kong Polytechnic University²

15 Jan 2018-Energy and Buildings

TL;DR: A comprehensive review on the current utilization of unsupervised data analytics in mining massive building operational data is provided, according to their knowledge representations and applications.

...read moreread less

157 citations

Journal Article•DOI•

Temporal knowledge discovery in big BAS data for building energy management

[...]

Cheng Fan¹, Fu Xiao¹, Henrik Madsen, Dan Wang¹•Institutions (1)

Hong Kong Polytechnic University¹

15 Dec 2015-Energy and Buildings

TL;DR: A time series data mining methodology for temporal knowledge discovery in big BAS data to identify dynamics, patterns and anomalies in building operations, derive temporal association rules within and between subsystems, assess building system performance and spot opportunities in energy conservation.

...read moreread less

123 citations

Proceedings Article•DOI•

Strip, bind, and search: a method for identifying abnormal energy consumption in buildings

[...]

Romain Fontugne¹, Jorge Ortiz², Nicolas Tremblay³, Pierre Borgnat³, Patrick Flandrin³, Kensuke Fukuda⁴, David E. Culler², Hiroshi Esaki¹ - Show less +4 more•Institutions (4)

University of Tokyo¹, University of California, Berkeley², École normale supérieure de Lyon³, National Institute of Informatics⁴

08 Apr 2013

TL;DR: A new approach called the Strip, Bind and Search (SBS) is presented; a method for uncovering abnormal equipment behavior and in-concert usage patterns that uncovers misbehavior corresponding to inefficient device usage that leads to energy waste.

...read moreread less

Abstract: A typical large building contains thousands of sensors, monitoring the HVAC system, lighting, and other operational sub-systems. With the increased push for operational efficiency, operators are relying more on historical data processing to uncover opportunities for energy-savings. However, they are overwhelmed with the deluge of data and seek more efficient ways to identify potential problems. In this paper, we present a new approach called the Strip, Bind and Search (SBS); a method for uncovering abnormal equipment behavior and in-concert usage patterns. SBS uncovers relationships between devices and constructs a model for their usage pattern relative to other devices. It then flags deviations from the model. We run SBS on a set of building sensor traces; each containing hundred sensors reporting data flows over 18 weeks from two separate buildings with fundamentally different infrastructures. We demonstrate that, in many cases, SBS uncovers misbehavior corresponding to inefficient device usage that leads to energy waste. The average waste uncovered is as high as 2500~kWh per device.

...read moreread less

73 citations

Cites background from "Temporal data mining approaches for..."

...State machines can model the operation of HVAC systems [22] and permit to predict or detect the abnormal behavior of HVAC’s components [3]....
[...]

Proceedings Article•

Fine-grained photovoltaic output prediction using a Bayesian ensemble

[...]

Prithwish Chakraborty¹, Manish Marwah², Martin Arlitt², Naren Ramakrishnan¹•Institutions (2)

Virginia Tech¹, Hewlett-Packard²

22 Jul 2012

TL;DR: A novel Bayesian ensemble methodology involving three diverse predictors that captures the sequentiality implicit in PV generation and uses motifs mined from historical data to estimate the most likely mixture weights using a stream prediction methodology is described.

...read moreread less

Abstract: Local and distributed power generation is increasingly reliant on renewable power sources, e.g., solar (photovoltaic or PV) and wind energy. The integration of such sources into the power grid is challenging, however, due to their variable and intermittent energy output. To effectively use them on a large scale, it is essential to be able to predict power generation at a finegrained level. We describe a novel Bayesian ensemble methodology involving three diverse predictors. Each predictor estimates mixing coefficients for integrating PV generation output profiles but captures fundamentally different characteristics. Two of them employ classical parameterized (naive Bayes) and non-parametric (nearest neighbor) methods to model the relationship between weather forecasts and PV output. The third predictor captures the sequentiality implicit in PV generation and uses motifs mined from historical data to estimate the most likely mixture weights using a stream prediction methodology. We demonstrate the success and superiority of our methods on real PV data from two locations that exhibit diverse weather conditions. Predictions from our model can be harnessed to optimize scheduling of delay tolerant workloads, e.g., in a data center.

...read moreread less

45 citations

Cites background from "Temporal data mining approaches for..."

...Our goal is to predict photovoltaic (PV) power generation from i) historic PV power generation data, and, ii) available weather forecast data....
[...]
...Related Work Comprehensive surveys on time series prediction (Brockwell and Davis 2002; Montgomery, Jennings, and Kulahci 2008) exist that provide overviews of classical methods from ARMA to modeling heteroskedasticity (we implement some of these in this paper for comparison purposes)....
[...]

1
2
3
4
…
5
6

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

A symbolic representation of time series, with implications for streaming algorithms

[...]

Jessica Lin¹, Eamonn Keogh¹, Stefano Lonardi¹, Bill Chiu¹•Institutions (1)

University of California, Riverside¹

13 Jun 2003

TL;DR: A new symbolic representation of time series is introduced that is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.

...read moreread less

Abstract: The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically streaming data. The main reason for this apparent paradox is the fact that the vast majority of work on streaming data explicitly assumes that the data is discrete, whereas the vast majority of time series data is real valued.Many researchers have also considered transforming real valued time series into symbolic representations, nothing that such representations would potentially allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities, in addition to allowing formerly "batch-only" problems to be tackled by the streaming community. While many symbolic representations of time series have been introduced over the past decades, they all suffer from three fatal flaws. Firstly, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Secondly, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. Finally, most of these symbolic approaches require one to have access to all the data, before creating the symbolic representation. This last feature explicitly thwarts efforts to use the representations with streaming algorithms.In this work we introduce a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. Finally, our representation allows the real valued data to be converted in a streaming fashion, with only an infinitesimal time and space overhead.We will demonstrate the utility of our representation on the classic data mining tasks of clustering, classification, query by content and anomaly detection.

...read moreread less

1,922 citations

Journal Article•DOI•

Discovery of Frequent Episodes in Event Sequences

[...]

Heikki Mannila¹, Hannu Toivonen¹, A. Inkeri Verkamo¹•Institutions (1)

University of Helsinki¹

31 Jan 1997-Data Mining and Knowledge Discovery

TL;DR: This work gives efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and presents detailed experimental results that are in use in telecommunication alarm management.

...read moreread less

Abstract: Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in a sequence. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We give efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and present detailed experimental results. The methods are in use in telecommunication alarm management.

...read moreread less

1,593 citations

"Temporal data mining approaches for..." refers background in this paper

...A contrasting framework, referred to as frequent episode discovery, is an event-based framework that is most applicable to symbolic data that is not uniformly sampled [Laxman et al. 2005, 2008; Mannila et al. 1997; Patnaik et al. 2008]....
[...]
...A contrasting framework, referred to as frequent episode discovery, is an event-based framework that is most applicable to symbolic data that is not uniformly sampled [Laxman et al. 2005, 2008; Mannila et al. 1997; Patnaik et al. 2008]....
[...]

Journal Article•DOI•

Experiencing SAX: a novel symbolic representation of time series

[...]

Jessica Lin¹, Eamonn Keogh², Li Wei², Stefano Lonardi²•Institutions (2)

George Mason University¹, University of California, Riverside²

01 Oct 2007-Data Mining and Knowledge Discovery

TL;DR: The utility of the new symbolic representation of time series formed is demonstrated, which allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.

...read moreread less

Abstract: Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. In particular, we will demonstrate the utility of our representation on various data mining tasks of clustering, classification, query by content, anomaly detection, motif discovery, and visualization.

...read moreread less

1,452 citations

"Temporal data mining approaches for..." refers background in this paper

...Experiencing SAX: A novel symbolic representation of time series....
[...]
...SAX [Lin et al. 2007] performs a piece-wise aggregate approximation (the aggregate refers to the notion of modeling the given single time series by a linear combination of multiple time-series, each expressed as a box basis function) and symbolize the resulting representation so that techniques from discrete algorithms can be adapted toward querying, matching, and mining the time series....
[...]
...SAX [Lin et al. 2007] performs a piece-wise aggregate approximation (the aggregate refers to the notion of modeling the given single time series by a linear combination of multiple time-series, each expressed as a box basis function) and symbolize the resulting representation so that techniques from discrete algorithms can be adapted toward querying, matching, and mining the time series....
[...]
...As the work closest to ours, we explicitly focus on the SAX representation, which also provides some signi.cant advantages for mining motifs....
[...]
...SAX [Lin et al. 2007] performs a piece-wise aggregate approximation (the aggregate refers to the notion of modeling the given single time series by a linear combination of multiple time-series, each expressed as a box basis function) and symbolize the resulting representation so that techniques…...
[...]

Journal Article•DOI•

Querying and mining of time series data: experimental comparison of representations and distance measures

[...]

Hui Ding¹, Goce Trajcevski¹, Peter Scheuermann¹, Xiaoyue Wang², Eamonn Keogh² - Show less +1 more•Institutions (2)

Northwestern University¹, University of California, Riverside²

01 Aug 2008

TL;DR: An extensive set of time series experiments are conducted re-implementing 8 different representation methods and 9 similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains to provide a unified validation of some of the existing achievements.

...read moreread less

Abstract: The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.

...read moreread less

1,387 citations

Proceedings Article•DOI•

Diagnosing network-wide traffic anomalies

[...]

Anukool Lakhina¹, Mark Crovella¹, Christophe Diot²•Institutions (2)

Boston University¹, Intel²

30 Aug 2004

TL;DR: A general method based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions to diagnose anomalies is proposed.

...read moreread less

Abstract: Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data.In this paper we propose a general method to diagnose anomalies. This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. We show that this separation can be performed effectively by Principal Component Analysis.Using only simple traffic measurements from links, we study volume anomalies and show that the method can: (1) accurately detect when a volume anomaly is occurring; (2) correctly identify the underlying origin-destination (OD) flow which is the source of the anomaly; and (3) accurately estimate the amount of traffic involved in the anomalous OD flow.We evaluate the method's ability to diagnose (i.e., detect, identify, and quantify) both existing and synthetically injected volume anomalies in real traffic from two backbone networks. Our method consistently diagnoses the largest volume anomalies, and does so with a very low false alarm rate.

...read moreread less

1,157 citations