scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Temporal data mining approaches for sustainable chiller management in data centers

TL;DR: Three key ingredients of CAMAS---motif mining, association analysis, and dynamic Bayesian network inference---that help bridge the gap between low-level, raw, sensor streams, and the high-level operating regions and features needed for an operator to efficiently manage the data center are demonstrated.
Abstract: Practically every large IT organization hosts data centers---a mix of computing elements, storage systems, networking, power, and cooling infrastructure---operated either in-house or outsourced to major vendors. A significant element of modern data centers is their cooling infrastructure, whose efficient and sustainable operation is a key ingredient to the “always-on” capability of data centers. We describe the design and implementation of CAMAS (Chiller Advisory and MAnagement System), a temporal data mining solution to mine and manage chiller installations. CAMAS embodies a set of algorithms for processing multivariate time-series data and characterizes sustainability measures of the patterns mined. We demonstrate three key ingredients of CAMAS---motif mining, association analysis, and dynamic Bayesian network inference---that help bridge the gap between low-level, raw, sensor streams, and the high-level operating regions and features needed for an operator to efficiently manage the data center. The effectiveness of CAMAS is demonstrated by its application to a real-life production data center managed by HP.
Citations
More filters
Journal ArticleDOI
Yang Zhao1, Chaobo Zhang1, Yiwen Zhang1, Zihao Wang1, Junyang Li1 
01 Apr 2020
TL;DR: A comprehensive literature review of the applications of data mining technologies in this domain and suggestions for future researches are proposed towards effective and efficient data mining solutions for building energy systems.
Abstract: With the advent of the era of big data, buildings have become not only energy-intensive but also data-intensive. Data mining technologies have been widely utilized to release the values of massive amounts of building operation data with an aim of improving the operation performance of building energy systems. This paper aims at making a comprehensive literature review of the applications of data mining technologies in this domain. In general, data mining technologies can be classified into two categories, i.e., supervised data mining technologies and unsupervised data mining technologies. In this field, supervised data mining technologies are usually utilized for building energy load prediction and fault detection/diagnosis. And unsupervised data mining technologies are usually utilized for building operation pattern identification and fault detection/diagnosis. Comprehensive discussions are made about the strengths and shortcomings of the data mining-based methods. Based on this review, suggestions for future researches are proposed towards effective and efficient data mining solutions for building energy systems.

157 citations

Journal ArticleDOI
TL;DR: A comprehensive review on the current utilization of unsupervised data analytics in mining massive building operational data is provided, according to their knowledge representations and applications.

157 citations

Journal ArticleDOI
TL;DR: A time series data mining methodology for temporal knowledge discovery in big BAS data to identify dynamics, patterns and anomalies in building operations, derive temporal association rules within and between subsystems, assess building system performance and spot opportunities in energy conservation.

123 citations

Proceedings ArticleDOI
08 Apr 2013
TL;DR: A new approach called the Strip, Bind and Search (SBS) is presented; a method for uncovering abnormal equipment behavior and in-concert usage patterns that uncovers misbehavior corresponding to inefficient device usage that leads to energy waste.
Abstract: A typical large building contains thousands of sensors, monitoring the HVAC system, lighting, and other operational sub-systems. With the increased push for operational efficiency, operators are relying more on historical data processing to uncover opportunities for energy-savings. However, they are overwhelmed with the deluge of data and seek more efficient ways to identify potential problems. In this paper, we present a new approach called the Strip, Bind and Search (SBS); a method for uncovering abnormal equipment behavior and in-concert usage patterns. SBS uncovers relationships between devices and constructs a model for their usage pattern relative to other devices. It then flags deviations from the model. We run SBS on a set of building sensor traces; each containing hundred sensors reporting data flows over 18 weeks from two separate buildings with fundamentally different infrastructures. We demonstrate that, in many cases, SBS uncovers misbehavior corresponding to inefficient device usage that leads to energy waste. The average waste uncovered is as high as 2500~kWh per device.

73 citations


Cites background from "Temporal data mining approaches for..."

  • ...State machines can model the operation of HVAC systems [22] and permit to predict or detect the abnormal behavior of HVAC’s components [3]....

    [...]

Proceedings Article
22 Jul 2012
TL;DR: A novel Bayesian ensemble methodology involving three diverse predictors that captures the sequentiality implicit in PV generation and uses motifs mined from historical data to estimate the most likely mixture weights using a stream prediction methodology is described.
Abstract: Local and distributed power generation is increasingly reliant on renewable power sources, e.g., solar (photovoltaic or PV) and wind energy. The integration of such sources into the power grid is challenging, however, due to their variable and intermittent energy output. To effectively use them on a large scale, it is essential to be able to predict power generation at a finegrained level. We describe a novel Bayesian ensemble methodology involving three diverse predictors. Each predictor estimates mixing coefficients for integrating PV generation output profiles but captures fundamentally different characteristics. Two of them employ classical parameterized (naive Bayes) and non-parametric (nearest neighbor) methods to model the relationship between weather forecasts and PV output. The third predictor captures the sequentiality implicit in PV generation and uses motifs mined from historical data to estimate the most likely mixture weights using a stream prediction methodology. We demonstrate the success and superiority of our methods on real PV data from two locations that exhibit diverse weather conditions. Predictions from our model can be harnessed to optimize scheduling of delay tolerant workloads, e.g., in a data center.

45 citations


Cites background from "Temporal data mining approaches for..."

  • ...Our goal is to predict photovoltaic (PV) power generation from i) historic PV power generation data, and, ii) available weather forecast data....

    [...]

  • ...Related Work Comprehensive surveys on time series prediction (Brockwell and Davis 2002; Montgomery, Jennings, and Kulahci 2008) exist that provide overviews of classical methods from ARMA to modeling heteroskedasticity (we implement some of these in this paper for comparison purposes)....

    [...]

References
More filters
Proceedings ArticleDOI
13 Jun 2003
TL;DR: A new symbolic representation of time series is introduced that is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.
Abstract: The parallel explosions of interest in streaming data, and data mining of time series have had surprisingly little intersection. This is in spite of the fact that time series data are typically streaming data. The main reason for this apparent paradox is the fact that the vast majority of work on streaming data explicitly assumes that the data is discrete, whereas the vast majority of time series data is real valued.Many researchers have also considered transforming real valued time series into symbolic representations, nothing that such representations would potentially allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities, in addition to allowing formerly "batch-only" problems to be tackled by the streaming community. While many symbolic representations of time series have been introduced over the past decades, they all suffer from three fatal flaws. Firstly, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Secondly, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. Finally, most of these symbolic approaches require one to have access to all the data, before creating the symbolic representation. This last feature explicitly thwarts efforts to use the representations with streaming algorithms.In this work we introduce a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. Finally, our representation allows the real valued data to be converted in a streaming fashion, with only an infinitesimal time and space overhead.We will demonstrate the utility of our representation on the classic data mining tasks of clustering, classification, query by content and anomaly detection.

1,922 citations

Journal ArticleDOI
TL;DR: This work gives efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and presents detailed experimental results that are in use in telecommunication alarm management.
Abstract: Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in a sequence. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We give efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and present detailed experimental results. The methods are in use in telecommunication alarm management.

1,593 citations


"Temporal data mining approaches for..." refers background in this paper

  • ...A contrasting framework, referred to as frequent episode discovery, is an event-based framework that is most applicable to symbolic data that is not uniformly sampled [Laxman et al. 2005, 2008; Mannila et al. 1997; Patnaik et al. 2008]....

    [...]

  • ...A contrasting framework, referred to as frequent episode discovery, is an event-based framework that is most applicable to symbolic data that is not uniformly sampled [Laxman et al. 2005, 2008; Mannila et al. 1997; Patnaik et al. 2008]....

    [...]

Journal ArticleDOI
TL;DR: The utility of the new symbolic representation of time series formed is demonstrated, which allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.
Abstract: Many high level representations of time series have been proposed for data mining, including Fourier transforms, wavelets, eigenwaves, piecewise polynomial models, etc. Many researchers have also considered symbolic representations of time series, noting that such representations would potentiality allow researchers to avail of the wealth of data structures and algorithms from the text processing and bioinformatics communities. While many symbolic representations of time series have been introduced over the past decades, they all suffer from two fatal flaws. First, the dimensionality of the symbolic representation is the same as the original data, and virtually all data mining algorithms scale poorly with dimensionality. Second, although distance measures can be defined on the symbolic approaches, these distance measures have little correlation with distance measures defined on the original time series. In this work we formulate a new symbolic representation of time series. Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measures defined on the original series. As we shall demonstrate, this latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation, while producing identical results to the algorithms that operate on the original data. In particular, we will demonstrate the utility of our representation on various data mining tasks of clustering, classification, query by content, anomaly detection, motif discovery, and visualization.

1,452 citations


"Temporal data mining approaches for..." refers background in this paper

  • ...Experiencing SAX: A novel symbolic representation of time series....

    [...]

  • ...SAX [Lin et al. 2007] performs a piece-wise aggregate approximation (the aggregate refers to the notion of modeling the given single time series by a linear combination of multiple time-series, each expressed as a box basis function) and sym­bolize the resulting representation so that techniques from discrete algorithms can be adapted toward querying, matching, and mining the time series....

    [...]

  • ...SAX [Lin et al. 2007] performs a piece-wise aggregate approximation (the aggregate refers to the notion of modeling the given single time series by a linear combination of multiple time-series, each expressed as a box basis function) and symbolize the resulting representation so that techniques from discrete algorithms can be adapted toward querying, matching, and mining the time series....

    [...]

  • ...As the work closest to ours, we explicitly focus on the SAX representation, which also provides some signi.cant advantages for mining motifs....

    [...]

  • ...SAX [Lin et al. 2007] performs a piece-wise aggregate approximation (the aggregate refers to the notion of modeling the given single time series by a linear combination of multiple time-series, each expressed as a box basis function) and sym­bolize the resulting representation so that techniques…...

    [...]

Journal ArticleDOI
01 Aug 2008
TL;DR: An extensive set of time series experiments are conducted re-implementing 8 different representation methods and 9 similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains to provide a unified validation of some of the existing achievements.
Abstract: The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.

1,387 citations

Proceedings ArticleDOI
30 Aug 2004
TL;DR: A general method based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions to diagnose anomalies is proposed.
Abstract: Anomalies are unusual and significant changes in a network's traffic levels, which can often span multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data.In this paper we propose a general method to diagnose anomalies. This method is based on a separation of the high-dimensional space occupied by a set of network traffic measurements into disjoint subspaces corresponding to normal and anomalous network conditions. We show that this separation can be performed effectively by Principal Component Analysis.Using only simple traffic measurements from links, we study volume anomalies and show that the method can: (1) accurately detect when a volume anomaly is occurring; (2) correctly identify the underlying origin-destination (OD) flow which is the source of the anomaly; and (3) accurately estimate the amount of traffic involved in the anomalous OD flow.We evaluate the method's ability to diagnose (i.e., detect, identify, and quantify) both existing and synthetically injected volume anomalies in real traffic from two backbone networks. Our method consistently diagnoses the largest volume anomalies, and does so with a very low false alarm rate.

1,157 citations