What contributions have the authors mentioned in the paper "Data science for building energy management: a review" ?

This paper reviews how Data Science has been applied to address the most difficult problems faced by practitioners in the field of Energy Management, especially in the building sector.

What future works have the authors mentioned in the paper "Data science for building energy management: a review" ?

In the near future, Big Data techniques will expand these possibilities and democratize them. This will enhance energy awareness, since users will have access to more data and be able to understand their own energy consumption habits.

What are the common techniques used for descriptive reports of energy loads?

fuzzy rules (which have been widely used for HVAC control) can also be used for descriptive reports of energy loads since they offer a robust representation in the context of high imprecision and uncertainty.

What are the techniques traditionally used for this task?

The techniques traditionally used for this task are classification, clustering, and pattern analysis (mostly by means of association rules).

What are the main technologies that are expected to have a significant impact on Energy Efficiency and Management?

Apart from Big Data, other technologies that are expected to have a significant impact on Energy Efficiency and Management include Smart metering, the Internet of Things and Cloud computing.

What was the first study to use the ISPC algorithm?

The ISPC algorithm (Incremental Summarization and Pattern Characterization) was used by De Silva et al. [52] to structure stream data into a data warehouse based on key dimensions for enabling a rapid interim summarization.

What other techniques have been used to extract predicted operation rules?

Xaio and Fan [35] used cluster analysis to identify daily power consumption patterns, whereas Morbitzer et al. [36] applied clustering to analyze simulation results for performance predictions in order to extract predicted operation rules.

What is the main advantage of cloud computing?

Cloud computing enables continuous and transparent updates and improvements, which are readily available to customers.

What are some examples of methods that could be applied to Building Energy Management?

Examples include methods with more accurate results, methods capable of handling temporal data or data streams, etc., which could feasibly be applied to Building Energy Management.

What are the main advantages of using classification models?

In addition, classification models are effective tools that can be used to predict building user comfort under different environmental conditions [28].

Why are some industries reluctant to embrace cloud computing?

Because of security constraints and privacy concerns, some industries are still reluctant to embrace cloud computing and cloud technologies in general.

What are the common techniques used for predicting energy loads?

Techniques such as association rules in all its variants are certainly underrepresented when modelling and predicting energy loads.

What were the main techniques used by Jiang et al.?

Classification techniques were also used by Jiang et al. [64], who created a new automatic feature analysis method using wavelet techniques and combining multiple classifiers to identify fraud in electricity distribution networks.

What is the main reason why companies are reluctant to embrace cloud computing?

For most companies, cloud computing seems a plausible choice since they can avoid scalability problems, and reduce deployment costs and time.

What is the popular method of detecting fraud in electricity companies?

Filho et al. [63] described a method to fight against fraud in electricity companies, which involves a classifying algorithm, based on decision trees, to pre-select potentially fraudulent customers, who will then undergo in-site inspection for fraud or faulty measurement equipment identification.

How can the authors detect faults in buildings?

By continuously monitoring the building, it is possible to detect when a fault has happened (typically an anomalous event) and how it affects to other equipment (by means of correlation analysis).

What other techniques were used to assist decision-making and optimize building design?

The same authors also applied classification and regression techniques couple with building indoor daylight methods to assist decision-making and optimize building design [38].

(Open Access) Data science for building energy management: a review (2017) | Miguel Molina-Solana

Data Science for Building Energy Management: a review

Miguel Molina-Solana

a,b

, Mar´ıa Ros

a,∗

, M. Dolores Ruiz

, Juan G´omez-Romero

, M.J. Martin-Bautista

Department of Computer Science and Artiﬁcial Intelligence, Universidad de Granada

Data Science Institute, Imperial College London

Abstract

The energy consumption of residential and commercial buildings has risen steadily in recent years, an

increase largely due to their HVAC systems. Expected energy loads, transportation, and storage as well

as user behavior inﬂuence the quantity and quality of the energy consumed daily in buildings. However,

technology is now available that can accurately monitor, collect, and store the huge amount of data involved

in this process. Furthermore, this technology is capable of analyzing and exploiting such data in meaningful

ways. Not surprisingly, the use of data science techniques to increase energy eﬃciency is currently attracting

a great deal of attention and interest. This paper reviews how Data Science has been applied to address the

most diﬃcult problems faced by practitioners in the ﬁeld of Energy Management, especially in the building

sector. The work also discusses the challenges and opportunities that will arise with the advent of fully

connected devices and new computational technologies.

1. Introduction

There is a general consensus in the world today that human activities are having a negative impact

on the environment and have accelerated both global warming and climate change. These environmental

threats have been intensiﬁed by the emissions produced by the energy required for the lighting and HVAC

(heating, ventilation and air-conditioning) systems in building constructions. According to the International

Energy Agency (IEA), residential and commercial buildings are responsible for up to 32% of the total ﬁnal

energy consumption. In fact, in most IEA countries, they account for approximately 40% of the primary

energy consumption. Similar statistics are given by the World Business Council for Sustainable Development

(WBCSD) within the framework of its Energy Eﬃciency in Buildings (EEB) project

. Also provided is a

comprehensive review [1] of the state of the art in building energy use (with a primary focus on energy

demand).

These data indicate that ineﬃcient energy management in aging buildings combined with rising construc-

tion activity in developed countries will cause energy consumption to soar in the near future and heighten the

negative impacts associated with this consumption. Moreover, variable energy costs call for the implemen-

tation of more intelligent strategies to adapt and reduce energy consumption as well as to ﬁnd alternative

and sustainable energy sources. The relevance of these issues is clearly reﬂected in the research priorities of

the European Union, as stated in its Horizon2020 Societal Challenge “Secure, Clean and Eﬃcient Energy”.

This work program targets a signiﬁcant reduction in energy consumption by 2020 in the transportation and

building sectors, both of which have great potential for energy savings.

Increasing energy eﬃciency is a two-fold process. Not only does it involve the use of aﬀordable energy

sources, but also the improvement of current energy management procedures and infrastructures. The

∗

Corresponding author

Email addresses: miguelmolina@imperial.ac.uk (Miguel Molina-Solana), marosiz@decsai.ugr.es (Mar´ıa Ros),

mdruiz@decsai.ugr.es (M. Dolores Ruiz), jgomez@decsai.ugr.es (Juan G´omez-Romero), mbautis@decsai.ugr.es (M.J.

Martin-Bautista)

http://www.wbcsd.org/web/eeb.htm

Preprint submitted to Renewable & Sustainable Energy Reviews June 25, 2017

latter includes the optimization of energy generation and transportation based on user demand [2], one of

the most important issues for energy companies. In this regard, computer-aided approaches have recently

come into the spotlight. More speciﬁcally, increased data awareness in companies has led to the development

of solutions based on Data Mining, a research area that studies how to automatically discover non-trivial

knowledge from data, and Data Science, which encompasses a wide range of techniques and more complex

datasets.

In the area of building energy management, Data Science is now used to address problems such as the

following: (i)the prediction of energy demand in order to adapt production and distribution; (ii) the analysis

of building operations as well as of equipment status and failures to optimize operation and maintenance

costs; (iii) the detection of energy consumption patterns to create customized commercial oﬀers and to

detect fraud. This requires collecting data pertaining to building operation and user behavior. These data

must also be interpreted to implement adapted energy management policies. The information collected may

come from very heterogeneous sources ranging from in-site sensors (located in the equipment and in the

immediate environment) to external parameters (e.g. weather, energy costs, etc.). These advances have also

signiﬁed a shift in the perception of who owns these data and who beneﬁts from them [3]. Customers are

increasingly aware of the importance of their actions and the value of the data that they generate. In this

sense, they have become actors with a key role in the energy eﬃciency landscape.

This paper reviews diﬀerent data science techniques and explains how they have been employed to deal

with the diﬃcult challenges faced by building energy management. As reﬂected in recent literature on the

topic, classiﬁcation and clustering methods are frequently used for this purpose, but there is still room

for improvement in relatively underexplored areas, such as frequent and temporal pattern discovery for

load prediction. Also discussed are future trends in Data Science, which will lead to new methods and

tools capable of the more intelligent processing of large amounts of data collected from multiple distributed

devices. Although there are other reviews on automatic techniques for building eﬃciency assessment [4, 5],

and on classiﬁcation methods for load and energy consumption prediction [6], this work examines and

discusses a broader set of data science techniques, and their applications to the diﬀerent aspects of building

energy management.

The paper is structured as follows. After an introduction to data science techniques (Section 2), Section

3 summarizes recent work in Energy Data Science and situates it in the context of the current requirements

and needs of building energy managers. Section 4 discusses the data science techniques employed in various

ﬁelds related to building energy management. Finally, Section 5 provides an overview of new approaches

that are expected to lead to research advances, and concludes with recommendations and guidelines for the

future.

2. Data Science

Over the years, technological tools have beneﬁted a wide range of domains, and Energy Eﬃciency and

Management is no exception. Developments in various areas of Information and Communications Technology

(ICT), such as Control and Automation, Smart Metering, Real-time Monitoring, and Data Science, have

had a tremendous impact on this ﬁeld. As is well known, Data Science builds systems and algorithms to

discover knowledge, detect patterns, and generate useful insights and predictions from large-scale data. It

encompasses the whole data analysis process, which begins with data extraction and cleaning, and extends

to data analysis, description and summarization. The results is the prediction of new values and their

visualization. Data Science thus involves mathematical and statistical analysis, combined with information

technology tools.

However, deriving insights from data is not only achieved by using such techniques. The expert must

also manage and interpret the data in order to obtain valuable knowledge. As shown in Figure 1, the process

starts with the collection of raw data. After that, it is necessary to clean the data, and select the subset that

has the relevant information. For that purpose, the expert applies ﬁlters to the data or formulates queries

that will eliminate irrelevant information. At this step, it is also when additional sources of information

might be integrated and fused with the original data to provide further knowledge. Once the data are

prepared for use, an exploratory analysis (including visualization tools) can help decide which methods or

Data Processing

Collection of

Raw Data

Data

Cleaning

Data Pre-Processing

Data

Filtering

Exploratory

Analysis &

Visualization

of Data

Models &

Algorithms

Data

Querying

Revision

Reports

Decision

Making

Visualization

of Results

Results: Data description & prediction

Data Selection

Data

Aggregation

Figure 1: Data science process

algorithms are most eﬀective to obtain the desired knowledge. The ﬁnal process will lead to a set of results

that guide the decision-making, which again, might rely on visualisation.

Based on the preliminary outcomes, the whole process might need to be tuned to obtain better results.

This could entail setting new parameter values or adding/discarding new sets of data. Since such decisions

cannot be made automatically, the participation of the expert in the analysis of the results is a crucial factor.

From a more technically perspective, Data Science comprises a set of techniques and tools which pursue

diﬀerent goals and depart from diﬀerent situations. Some of the most popular techniques are classiﬁcation,

clustering, regression and association rule mining. Although these techniques have been the most frequently

applied in Energy Eﬃciency and Management, others, which are not so well known (e.g. sequence analysis

and anomaly detection), are also useful in providing solutions for building energy problems.

Classiﬁcation When classifying a set of objects, the objective is to predict the class of each one on the

basis of their attributes. Decision trees (i.e. a kind of ﬂowchart for the classiﬁcation of new data) are

a common way of performing and visualizing that classiﬁcation [7]. Decision trees can be generated by

many diﬀerent algorithms, though the most well known are CLS, ID3, C4.5, C5.0, and CART. Random

Forest is another classiﬁcation technique that constructs a set of decision trees and then predicts the

class by aggregating the values obtained with each tree (e.g. by using the mode or mean). This method

corrects overﬁtting (when the models from the learning algorithm perform very well on the training set,

at the cost of an increased error on the validation set), a common practical diﬃculty in decision trees.

Support Vector Machine (SVM) [8] is a technique that is also used for classiﬁcation. SVMs perfom

classiﬁcation tasks by constructing a hyperplane (or a set of hyperplanes) in a multidimensional space

to separate the data (regarded as points in the space) into classes. Once the hyperplanes is constructed,

it classiﬁes the new examples according to the previously speciﬁed decision boundaries.

Bayesian classiﬁcation, genetic algorithms, and neural Networks have been also employed in classiﬁ-

cation tasks. There are various approximations that use probabilistic classiﬁers based on the Bayes’

theorem, but as a consequence, there are strong independence assumptions between the variables in-

volved [9]. Class prediction with genetic programming algorithms [10] are based on chromosome-like

structures that can be combined and/or mutated with other chromosomes to create new individuals.

Neural Networks (NNs) are able to predict new observations from existing ones by means of intercon-

nected elements called neurons [11]. The main advantage of NNs is that they are robust and tolerant

of errors. A self-organizing map (SOM) is a type of artiﬁcial neural network that is trained by un-

supervised learning to produce low-dimensional views of high-dimensional data. Another well-known

classiﬁcation method is that of k-Nearest Neighbors, which classiﬁes and object by the majority vote

of its k neighbors. In other words, an object is assigned to a category based on the category of its k

nearest neighbors [12].

Regression The main objective of regression analysis is to numerically estimate the relationship between

variables. This involves ascertaining whether variables are independent. When they are not, it is

then necessary to discover the type of dependence of their relation [13]. Regression analysis is widely

used in prediction and forecasting as well as to understand how the values of dependent variables

change while those of independent variables remain ﬁxed. Linear and non-linear (polynomial, logistic,

etc.) regression methods are mainly used for this purpose. In linear regression, the model assumes

that variables are a linear combination of the parameters. Examples of linear regression methods are

linear least squares, Bayesian linear regression, and generalized linear models (GLM). Nevertheless,

linear models often do not provide a good ﬁt to reality, and then non-linear models are required. In

this case, classiﬁcation-based techniques, such as support vector regression or k-Nearest Neighbors,

can also be used for regression. In particular, ARMA (Autoregressive Moving Average) or ARIMA

(Autoregressive Integrated Moving Average) are capable of predicting the future values of time series,

based on past values. The relationship between variables can also be statistically measured by means

of the standard deviation, Pearson correlation, and other correlation coeﬃcients.

Clustering Clustering is the separation of objects into groups (clusters) based on their degree of similarity

[14]. It is unsupervised, because there is no previous knowledge of the classes to which the objects can

be assigned. Depending on the criterion used to measure similarity, there are diﬀerent models of cluster

analysis: (i) connectivity models, based on distance connectivity (e.g. hierarchical clustering); (ii)

centroid models, which are constructed by assigning objects to the nearest cluster center (e.g. k -means

or k-medians); (iii) distribution models using statistical distributions (e.g. expectation-maximization

algorithm); (iv) density models where clusters are deﬁned based on high-density areas in the data set;

(v) graph-based models in which the data are expressed as graphs. A further distinction can be made

between hierarchical and non-hierarchical models. Hierarchical models take the form of a hierarchy

of clusters (e.g. hierarchical tree or agglomerative hierarchical clustering) whereas non-hierarchical

models are based on a plain cluster organization without any relations between them but rather group

a set of units into a pre-determined number of groups, using an iterative algorithm that optimizes a

chosen criterion.

Clustering techniques are often a ﬁrst step in a classiﬁcation problem when there is no information

about the classes. In an initial phase, clustering is used to identify groups of objects with similar

features. Classiﬁcation techniques are then applied to assign new objects to these groups. When there

is no previous information about the objects, clustering techniques can also be used for classiﬁcation

purposes.

Association rules (ARs) Association rules are a useful tool for the representation of new information

extracted from raw data and comprehensively expressed for decision-making in the form of implication

rules of the type A → B [15]. These rules depict the frequent co-occurrence of attributes with a

high reliability in a database. For example “most transactions containing beer also contain diapers”

is an association rule that could be found in a supermarket database. The Apriori algorithm and its

adaptations (e.g. generalized rule induction algorithm) are the most widely used, though there are

others, such as the FP-Growth and ECLAT algorithms, which improve scalability in very large datasets

[16, 17]. Association rules now have more sophisticated versions that not only capture correlations

but other kinds of association as well. Examples include the following: (i) generalized ARs, which use

a concept hierarchy to obtain rules relating the diﬀerent granularities of items; (ii) quantitative ARs,

which deal with categorical and quantitative data; (iii) gradual dependence rules, which capture data

tendencies by obtaining rules of the type “the more/less A → the more/less B”; (iv) sequential rules,

which identify relationships between items while considering some ordering criterion (e.g. time).

Sequence discovery Sequence discovery comprises techniques that identify statistically relevant patterns

in data, whose values are distributed in order [18]. Frequent problems in sequence analysis include

the following: (i) the extraction of sequence information using techniques such as Motif Mining (MM);

(ii) the detection of frequently occurring patterns; (iii) the search for similar sequences with a time

lag by means of autocorrelation methods such as the ACF (Autocorrelation Function) and PACF

(Partial Autocorrelation Function); (iv) the recovery of missing sequence members. Many of the other

previously explained techniques are also capable of dealing with this kind of data.

Anomaly or outlier detection The objective of detecting anomalies is to identify items, events, or ob-

servations that deviate from expected patterns or from the usual behavior of other data items [19].

The discovery of anomalous items is crucial in the resolution of bank fraud, medical diagnoses, errors

in data transmission, noise, etc. Since the previously described techniques are based on the identiﬁca-

tion/classiﬁcation of similar items, most frequent patterns, etc., variations of these methods can also

be employed for anomaly discovery. Methods used for this purpose are the following: density-based

techniques, correlation, clustering, searching deviations from association rules, and combinations of

diverse techniques using, for example, feature bagging or score normalization.

Time series analysis Time series analysis is performed on time-series data (i.e. data points that are

recorded over time) in order to model data and then use the model to predict or monitor future

values of the time series [20]. The most frequently used methods include the following: (i) methods for

exploratory analysis (e.g. autocorrelation, trend analysis, wavelets, etc.); (ii) prediction and forecasting

techniques (e.g. regression methods, signal estimation, etc.); (iii) classiﬁcation methods which assign

a category to patterns in the series; (iv) segmentation which aims to identify a sequence of points

sharing speciﬁc properties (e.g. ARMA or ARIMA).

Most of the previously mentioned techniques have a fuzzy extension that allows them to process with

imprecise and uncertain data in various domains [21]. Fuzzy logic allows a non-strict representation of

object membership to a set, thus avoiding the problem of hard boundaries that are often present in basic

techniques, such as clustering and classiﬁcation methods. For example, fuzzy k-means is a clustering method

that has proved eﬀective in many scenarios since it permits the assignment of data elements to one or more

clusters [22]. Fuzzy approaches also allow a more human-friendly representation of the extracted knowledge;

since fuzzy association rules are easier to interpret than purely numerical rules [23].

3. Applications of Data Science for Building Energy Management

Data science techniques have been frequently used to support and improve basic aspects of Energy

Eﬃciency and Management. Accordingly, this section focuses on applications of Data Science that are

capable of doing the following: (1) predicting the energy demand required for the eﬃcient operation of a

building; (2) optimizing building operation; (3) enabling building retroﬃting; (3) verifying the operational

status and failures of building equipment and networks; (4) analyzing the economic and commercial impact

of user energy consumption; (5) detecting and preventing energy fraud.

3.1. Prediction of building energy load

Energy demand, or energy load, refers to the amount of energy required at a certain time instant or

interval. In particular, HVAC systems focus on thermal loads, which refer to the quantity of heating and

cooling energy that must be added or removed from the building to keep its occupants comfortable. Thermal

loads can be classiﬁed as internal loads, when heat transfer/inﬂuence is produced by elements (e.g. lightning,

Data science for building energy management: a review

Figures

Citations

Model Predictive Control (MPC) for Enhancing Building and HVAC System Energy Efficiency: Problem Formulation, Applications and Opportunities

A review of strategies for building energy management system: Model predictive control, demand side management, optimization, and fault detect & diagnosis

A review of machine learning in building load prediction

Renewable energy: Present research and future scope of Artificial Intelligence

A review of operating performance in green buildings: Energy use, indoor environmental quality and occupant satisfaction

References

Data clustering: a review

Time Series Analysis.

Anomaly detection: A survey

FCM: The fuzzy c-means clustering algorithm

Top 10 algorithms in data mining

Related Papers (5)

A review of data-driven building energy consumption prediction studies

A review on time series forecasting techniques for building energy consumption

A review on the prediction of building energy consumption

A review of data-driven approaches for prediction and classification of building energy consumption

A decision tree method for building energy demand modeling

Frequently Asked Questions (21)

Q1. What contributions have the authors mentioned in the paper "Data science for building energy management: a review" ?

Q2. What future works have the authors mentioned in the paper "Data science for building energy management: a review" ?

Q3. What are the common techniques used for descriptive reports of energy loads?

Q4. What are the techniques traditionally used for this task?

Q5. What are the main technologies that are expected to have a significant impact on Energy Efficiency and Management?

Q6. What was the first study to use the ISPC algorithm?

Q7. What other techniques have been used to extract predicted operation rules?

Q8. How much energy is used in buildings?

Q9. What is the main advantage of cloud computing?

Q10. What are the common problems in sequence analysis?

Q11. What are some examples of methods that could be applied to Building Energy Management?

Q12. What are the main advantages of using classification models?

Q13. What is the objective of detecting anomalies?

Q14. Why are some industries reluctant to embrace cloud computing?

Q15. What are the common techniques used for predicting energy loads?

Q16. What were the main techniques used by Jiang et al.?

Q17. What is the main reason why companies are reluctant to embrace cloud computing?

Q18. What is the popular method of detecting fraud in electricity companies?

Q19. How can the authors detect faults in buildings?

Q20. What other techniques were used to assist decision-making and optimize building design?

Q21. What are some of the common techniques used in energy efficiency and management?