This item was submitted to Loughborough’s Institutional Repository

(https://dspace.lboro.ac.uk/) by the author and is made available under the

following Creative Commons Licence conditions.

For the full text of this licence, please go to:

http://creativecommons.org/licenses/by-nc-nd/2.5/

A Methodology for Modeling HVAC Components using

Evolving Fuzzy Rules

P. P. Angelov, V. I. Hanby, R. A. Buswell and J. A. Wright

Abstract--A methodology for the evolutionary

construction of fuzzy rule-based (EFRB) models

is proposed in the paper. The resulting models

are transparent and existing expert knowledge

could easily be incorporated into the model (both

at initialisation stages and during its generation).

An additional advantage of the model is

represented by the economy in computational

effort in generating the model output. A new

encoding mechanism is used that allows the fuzzy

model rule base structure and parameters to be

estimated from training data without

establishing the complete rule list. It uses rule

indices and therefore significantly reduces the

computational load. The rules are extracted from

the data without using a priori information about

the inherent model structure. It makes EFRB

models as flexible as other types of 'black-box'

models (neural networks, polynomial models

etc.) and in the same time significantly more

transparent, especially when only small subset of

all possible rules is considered. This approach is

applied to modelling of components of heating

ventilating and air-conditioning (HVAC)

systems. The EFRB models have potential

applications in simulation, control and fault

detection and diagnosis.

Keywords—Fuzzy Logic, Modelling, Genetic

Algorithms, HVAC, Component Modelling.

I.INTRODUCTION

The computational demands of combined HVAC

systems and building simulations can be

considerable as these are often used for energy

predictions over annual operational cycles. Fuzzy

rule-based models as with other black-box

approaches have the potential to reduce the

computational demand of the simulation by

mapping the inputs and outputs of the system

components directly. In reality, processes associated

with some typical components, such as boilers or

compressors, can be too complex to be readily

described by analytical methods and polynomial

representations and recently neural networks [2]

based on test data are employed. One disadvantage

of black-box methods is the lack of transparency.

Models using fuzzy rules can offer a high degree of

transparency, but traditionally require the

incorporation of a priori knowledge and subjective

estimation to establish the rule base, i.e. the model

structure. In many cases it is a complex and

ambiguous process [6].

Recently genetic algorithms (GA) and neural

networks have been used to extract an appropriate

rule base from data [1], [3]-[5], [9], [12]-[13]. In [3]

and [13] they are used for adjustment of parameters

of membership functions only (parameter

identification). In another group of papers the

linguistic labels are assigned at the end of the

identification process only [4]-[5], which makes

these models quite close to neural networks,

including the limited interpretability.

The limitation of these approaches has been that the

exhaustive list of fuzzy rules is usually considered

[9],[12]. The length of the chromosome there is

determined on the basis of all possible combinations

of linguistic variables, which hampers solving

problems with realistic dimensions and makes

models practically not interpretable.

An effective encoding approach [1] is used in this

paper. It significantly minimises the length of the

chromosome, which is also due to the use of real-

coded GA. This permits simultaneous parameter and

structure identification as well as the application of

the approach to problems with realistic dimensions.

Unlike most of the black-box models and some of

the fuzzy rule-based approaches, the knowledge

extracted from the data is fully interpretable and

could help for better understanding of the nature of

the process being modelled. Expert knowledge

could be added both at the initialisation step and

during the identification process. Hybrid types of

models composed of both fuzzy rules and crisp

equations or inequalities could also be considered.

The approach is applied to the modelling of axial

fans as a components of an air conditioning system.

II. EVOLVING FUZZY RULES

Generally, fuzzy rule-based models consists of a

number of rules of the following type, called

Mamdani type [6]:

IF(x

1

is X

1

)AND...AND(x

n

is X

n

)THEN(y is Y) (1)

where x

i

is a fuzzy linguistic input variable;

y is the output variable;

Y is the fuzzy linguistic term of y;

Y{Y

1

,Y

2

,...,Y

m0

};

X

i

is the fuzzy linguistic term or label of the

i-th input variable; X

i

{X

1

i

, X

2

i

,...,X

mi

i

}

A fuzzy set and its membership function define

each linguistic label. Different types of membership

functions are possible: Gaussian, triangular,

trapezoidal, etc. [6].

For a specified number of linguistic variables and

labels it is possible to determine the number of all

possible fuzzy rules (R), which could be formulated

out of them. Even if only combinations, in which

each variable is participating are considered, this

number could be extremely high for real problems,

because of the so called curse of dimensionality

[6],[9],[12]:

R =

1n

1j

j

m

(2)

where m

j

is the number of linguistic terms of the j-th

linguistic variable.

It would be impossible to interpret such a model,

even if it is generated automatically. Practically,

significantly smaller number of rules (r) could be

used, because of information redundancy [6]:

r << R (3)

Extraction of a set of rules has been made,

generally, by the following two approaches: using

neural networks or by GA. We explore the second

one.

GA could be considered as a driven

stochastic search technique which imitate

the process of natural selection. They are

specifically appropriate for the problem

we have to solve [7], because of their

robustness, model structure independence,

capacity to escape local minima.

The GA probes a set of trial points at every

iteration. The trial set, called population, consists of

several chromosomes, which comprises a number of

genes. Each problem variable is coded into a gene.

The modified version of the original binary GA [7],

called real-coded GA or evolutionary algorithms

[10] makes problem definition more compact by

representing each variable by a single gene:

TABLE I

REAL-VALUED CHROMOMOME

i

x

1

i

x

2

…

i

n

x

where i=1,2,...P; P denotes the population size.

Part of the chromosomes from the current epoch is

selected for reproduction. There is two operations,

which are usually applied for producing new

chromosomes: crossover and mutation. Mutation is

a triggering from 0 to 1 and vice versa for the

standard binary-coded GA [7]. Different schemes

for mutation exist for real-coded GA [10].

III. FUZZY RULES ENCODING

Application of evolutionary technique for extraction

of the fuzzy model requires an appropriate encoding

of the fuzzy rules and their parameters. Encoding of

all possible fuzzy rules into the chromosome as in

[9],[12] is time consuming and can become an non-

solvable problem for problems with realistic

dimension (>5 inputs and 7-9 linguistic labels). We

propose to consider encoding of the indices of rules,

which participate into the fuzzy model only. Their

number is significantly smaller: normally some tens

of rules are used and could be interpreted. Different

encoding schemes could be used. The basic

requirement is non-ambiguity (uniqueness) in

coding and decoding. We introduce a simple

encoding procedure, which assigns an index to

every possible rule. A positive integer number

represents each fuzzy rule. The genotype of the

chromosome considered in our approach consists of

two parts: indices of rules, which participate into the

fuzzy model and their parameters:

TABLE II

GENOTYPE: LEFT PART REPRESENTS INDICIES OF RULES;

RIGHT ONE – MEMBERSHIP FUNCTIONS’ PARAMETERS

I

1

I

2

…

I

K

p

11

p

12

…

p

1nm

i

A two-stage coding scheme is adopted in this paper:

first, we translate each linguistic label into a L -

based number (where

1

1

)max(

n

i

i

mL

is the

maximal number of labels in all linguistic

variables). 0 is assigned to the one marginal

linguistic label, 1 to the next etc. As second stage,

we transform the set of L-based numbers (codes of

labels) into decimal integer positive number. They

represent the index of the considered fuzzy rule:

I =

10

n

L

2

L

1

L

)t, ... , t,(t

+1 (4)

where I denotes the index of the fuzzy rule;

t

j

, j=1,2,...,n+1 is code of the label t

j

[0;m

i

-1]

As an example the encoding of the following rule

could be considered:

IF (LV

1

is High) AND (LV

2

is Very Low) AND

(LV

3

is Low)) THEN (LV

4

is Medium) (5)

In this rule there are 4 linguistic variables (3 input

and an output), n=3. Let the first input and the

output have 3 linguistic terms (Low, Medium and

High) and all other variables have 5 linguistic terms

(Very Low, Low, Medium, High and Very High).

First, the codes of the used linguistic terms are

determined:

L=max(3,5,5,3)=5;

1

5

a

=2;

2

5

a

=0;

3

5

a

=1;

4

5

a

=1 (6)

Index of the fuzzy rule (5) is determined as a

transformation to the integer with a decimal base

according to (4):

I=(2

5

0

5

1

5

1

5

)

10

+1=2*5

3

+1*5

1

+1*5

0

+1=257 (7)

Decoding process is an inverse of the coding one.

First, the codes of linguistic labels are determined

from the index of the rule as residuals in division by

L:

[(257-1)/5]=51, Res((257-1),3)=1; (8)

[51/5]=10, Res(51,3)=1;

[10/5]=2; Res(10,5)=0;

[2/5]=0; Res(2,5)=2;

where Res(.) denotes residual in division of integers;

[.] denotes integer result in division of

integers.

Residual values determine in a unique way the fuzzy

rule (5) from the index 257.

This effective encoding mechanism makes it

possible to treat parameters of fuzzy membership

functions as unknowns as well. Real-valued GA also

minimises the chromosome representation and

contributes to the compactness. Encoding of fuzzy

rule parameters together with their structure into the

same chromosome reveals a possibility for fine-

tuning of the fuzzy rule-based models generated. It

makes identification process more flexible and more

independent on the subjectivity in structure

determination. In the same time, certain degree of

influence on the model structure is also possible. It

could be realised by definition of parameters like the

number of linguistic terms (m), the maximal number

of fuzzy rules considered (K), the pre-defined level

of correlation (r) and the type of the membership

functions as well as using a priori knowledge in the

initialisation. The number of fuzzy rules in the

model could be finally smaller than K due to

possible coincidence of some indices as well as due

to appearance of zeros as rules indices (zero is left

as an 'empty' index). Therefore K defines the upper

bound of k - the number of used fuzzy rules. In the

same time K could be significantly smaller than R,

which defines the number of all possible rules.

Practically, some tens of rules are enough for

reaching a pre-defined level of correlation and such

number of rules is still interpretable. The number of

linguistic labels considered (m) is recommended to

be 72 (5,7 or 9) as closer to the human perception

[6]. Values of correlation higher than 0.95 are seen

as acceptable level of closeness between

experimental and model outputs.

IV. EVOULTIONARY SEARCH

PROCEDURE

Numerical solution of both parameter and structural

identification problems is sought by an evolutionary

search procedure. The identification problem could

be formulated as

To determine the fuzzy rules (represented by

their indices) and their parameters such that to

minimise the deviation between the model and the

experimental outputs (represented by correlation):

r max (9)

subject to (1)

1 I

i

R; i=1,2,...,K

(j-1) p

lj

(j+1); l=1,2,…,n+1; j=1,2,…m

l

where =

1

l

m

LVLV

l

LV

is the lower bound of the l-th linguistic

variable;

l

LV

is its upper bound .

It is important to note that fuzzy model (1) is

considered as one of the constraints.

Evolutionary algorithm, which is applied for

numerical solution of this problem, matches better

the specifics of the considered problem:

Vector of unknowns consists of integer (indices

of fuzzy rules) and real (parameters of the

membership functions) values , not binary ones;

Real-valued GA supposes shorter chromosome,

which allows simultaneous parameter and

structure identification as well as solving

problems with realistic dimensions.

The basic algorithm applied to our problem could be

represented by the following pseudo-code.

Begin

Epoch = 0;

Initialise (randomly or using a

priori information) a

population of chromosomes

(Table 2);

While (Fitness<r)

Decode fuzzy model by rules

indices as in (8);

Calculate outputs y

similarly to (1);

Evaluate Fitness as in (9)

Perform crossover and

mutation;

Perform selection and

reproduction using Fitness

Epoch := epoch + 1;

end

End.

where

r is a pre-defined desired correlation value;

Epoch denotes number of epochs.

The algorithm is initialised by a set of chromosomes

(population), which is randomly seeded or defined

on the base of a priori expert knowledge and

previous experience (if such exists). Child

chromosomes are produced using modified

recombination, mutation and selection operation.

They are performed separately for both parts (Table

2) because of the specific of the problem: a part of

chromosomes represents indices of the rules and

consists of integer values while the other part

represents the parameters of fuzzy sets and consists

of real values. Mutation over first part of the

chromosomes, which contains the integer values, is

performed with size of mutation step [11] equal to 1

such that to produce an integer number again. The

selection is performed for the whole chromosomes

because both parts contribute to the fitness value.

It has been demonstrated to the authors that when

the number of rules is restricted, it is possible that

not all the FLT are represented, and hence “holes”

in the input/output space can be evident. While the

application of overlapping Gaussian membership

functions can minimise the effect on the model,

gaps in the coverage of the model are still

undesirable. To reduce the likelihood of holes being

present a penalty function has been introduced that

checks the population of solutions penalises these

proportionally to the number of holes present. The

penalty function is implemented in the fitness

function by,

,--)y

ˆ

-(y-exp

n

1i

2

ii

f

where

is the number of holes and

is given by,

)}y

ˆ

-(ymin-)y

ˆ

-(ymaxmax{

ii

n

1i

ii

n

1i

In the example used in this paper the function

ensured that the model gave complete coverage of

the data, without the function, one or more holes

were usually present in the optimal solution. In

addition, if the number of rules required is too small

to ensure complete coverage of the input/output

space by the FLT, the penalty function will still

allow feasible solutions to be derived, but will

favour the solutions with the least number of holes.

V. MODELLING A BOILER

A practical problem of modelling of small

boiler that could be used to supply medium

temperature hot water to a heating system is

considered.

A first principles based model of a nominally rated

300kW, natural gas fired boiler was used to generate

training data that covered a typical range of

operation. The boiler was assumed to operate at a

constant water mass flow rate and a flow and return

water temperature difference of 15K. Control of

boiler operation is typically based on the return

water temperature, thus the firing rate and return

water temperature were excited to generate the input

data. For use in HVAC system simulation, the

model inputs were considered to vary between the

maximum firing rate down to 10% of that rate, and

for return water temperatures between 20C and

100C. The resulting FRB model is therefore

suitable when the boiler is firing, the water mass

flow rate through the boiler is a constant 3.8kgS

-1

and for normal operating conditions as well as

“start-up” operation with the heating fluid at

ambient temperatures.

From the data, two models were generated. The first

predicted the gross efficiency of the boiler, taking

the boiler load and return water temperature as

inputs. The second modelled the flow temperature

as the output as a function of the return water

temperature and the control signal (percent of the

maximum firing rate) to the boiler. The latter model

allows the incorporation of the component into a

subsystem performance simulation, while the former

model can be used in conjunction with this to

generate predictions of fuel consumption for energy

analysis, such as annual energy cost predictions.

Figure 1 demonstrates this hybrid approach the

problem solution. “FP” refers to “First Principles”

meaning here, a directly calculable algebraic

relationship based on the physical relationship of the

variables.