What is the important feature of the proposed algorithm?

The proposed algorithm is based on the concept of Pareto optimality [78] and has been chosen taking into account as the most important feature the computational cost.

What are the commonly used methods for combining the networks?

The most commonly used methods for combining the networks are the majority voting and sum of the outputs of the networks, both of them with a weight vector that measures the confidence in the prediction of each network.

What are the common parametric mutation operators?

Many parametric mutation operators have been suggested in the specific literature: random modification of the weights [12], simulated annealing [35], and back-propagation [35], among others.

What other techniques have been proposed in the last few years?

Many other techniques have been proposed in the last few years, such as, linear regression [48], principal components analysis and least-square regression [51], correspondence analysis [6], and the use of a validation set [25].

How many networks did Opitz and Maclin find that the error of the ensemble does not decrease?

Opitz and Maclin [85] have found after some exhaustive experiments that the error of the ensemble does not decrease after adding 25 networks.

How many subpopulations of networks did the authors test?

In order to test the influence of the number of network subpopulations, that is, the size of the ensembles, the authors carried out experiments for Cancer, Glass, Heart, Horse, and Pima problems with 5, 10, 15, 25, and 30 subpopulations of networks.

How do the authors represent the function vectors of a network?

In order to represent the functionality of the network, the authors perform a principal component analysis of the function vectors, retaining the first two components.

(Open Access) Cooperative coevolution of artificial neural network ensembles for pattern classification (2005) | Nicolás García-Pedrajas

Q: What have the authors contributed in "Cooperative coevolution of artificial neural network ensembles for pattern classification" ?

This paper presents a cooperative coevolutive approach for designing neural network ensembles. In this paper, the authors propose a general framework for designing neural network ensembles by means of cooperative coevolution.

Q: What are the future works in "Cooperative coevolution of artificial neural network ensembles for pattern classification" ?

Arcing and Ada-Boosting methods also suggest the possibility of developing an incremental cooperative environment where new subpopulations are added when the evolution stagnates.

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 3, JUNE 2005

271

Cooperative Coevolution of Artiﬁcial Neural Network

Ensembles for Pattern Classiﬁcation

Nicolás García-Pedrajas, Member, IEEE, César Hervás-Martínez, Member, IEEE, and Domingo Ortiz-Boyer

Abstract—This paper presents a cooperative coevolutive ap-

proach for designing neural network ensembles. Cooperative

coevolution is a recent paradigm in evolutionary computation

that allows the effective modeling of cooperative environments.

Although theoretically, a single neural network with a sufﬁcient

number of neurons in the hidden layer would sufﬁce to solve

any problem, in practice many real-world problems are too hard

to construct the appropriate network that solve them. In such

problems, neural network ensembles are a successful alternative.

Nevertheless, the design of neural network ensembles is a com-

plex task. In this paper, we propose a general framework for de-

signing neural network ensembles by means of cooperative coevo-

lution. The proposed model has two main objectives: ﬁrst, the im-

provement of the combination of the trained individual networks;

second, the cooperative evolution of such networks, encouraging

collaboration among them, instead of a separate training of each

network. In order to favor the cooperation of the networks, each

network is evaluated throughout the evolutionary process using a

multiobjective method. For each network, different objectives are

deﬁned, considering not only its performance in the given problem,

but also its cooperation with the rest of the networks.

In addition, a population of ensembles is evolved, improving

the combination of networks and obtaining subsets of networks to

form ensembles that perform better than the combination of all

the evolved networks.

The proposed model is applied to ten real-world classiﬁcation

problems of a very different nature from the UCI machine learning

repository and

proben1 benchmark set. In all of them the perfor-

mance of the model is better than the performance of standard en-

sembles in terms of generalization error. Moreover, the size of the

obtained ensembles is also smaller.

Index Terms—Classiﬁcation, cooperative coevolution, multi-

objective optimization, neural network ensembles.

I. INTRODUCTION

EURAL network ensembles [1] are receiving increasing

attention in recent neural network research, due to their

interesting features. They are a powerful tool especially when

facing complex problems. Network ensembles are usually made

up of a linear combination of several networks that have been

trained using the same data, although the actual sample used

by each network to learn can be different. Each network within

the ensemble has a potentially different weight in the output of

the ensemble. Several works have shown [1] that the network

ensemble has a generalization error generally smaller than that

Manuscript received April 28, 2003; revised April 27, 2004. This work

was supported in part by the Spanish Comisión Interministerial de Ciencia

y Tecnología under Project TIC2002-04036-C05-02 and in part by FEDER

funds.

The authors are with the Department of Computing and Numerical Analysis,

University of Córdoba, Córdoba E-14071, Spain (e-mail: npedrajas@uco.es;

chervas@uco.es; dortiz@uco.es).

Digital Object Identiﬁer 10.1109/TEVC.2005.844158

obtained with a single network and also that the variance of the

ensemble is lesser than the variance of a single network. The

output

of a typical ensemble [2] with constituent networks

when an input pattern

is presented is

(1)

where

is the output of network and is the weight associ-

ated to that network. If the networks have more than one output,

a different weight is usually assigned to each output. The ensem-

bles of neural networks have some of the advantages of large

networks without their problems of long training time and risk

of overﬁtting. For more detailed descriptions of ensembles the

reader is referred to [3]–[7].

Although there is no clear distinction between the different

kinds of multinet networks [2], [8]–[10], we follow the distinc-

tion of [11]. In an ensemble several redundant approximations

to the same function are combined by some method, and in a

modular system the task is decomposed into a number of sim-

pler components. Nevertheless, our approach incorporates an

implicit decomposition that is provided by the use of cooper-

ative coevolution [12]–[14].

This combination of several networks that cooperate in

solving a given task has other important advantages such as

[11], [15] the following.

• They can perform more complex tasks than any of their

subcomponents [16].

• They can make an overall system easier to understand and

modify.

• They are more robust than a single network.

In most cases, neural networks in an ensemble are designed

independently or sequentially, so the advantages of interaction

and cooperation among the individual networks are not ex-

ploited. Earlier works separate the design and learning process

of the individual networks from the combination of the trained

networks. In this work, we propose a framework for designing

ensembles, where the training and combination of the indi-

vidual networks are carried out together, in order to get more

cooperative networks and more effective combinations of them.

The new framework presented in this work for designing and

evolving neural network ensembles uses and beneﬁts from two

different paradigms: cooperative coevolution and multiobjective

optimization. The design of neural network ensembles implies

making many decisions that have a major impact on the perfor-

mance of the ensembles. The most important decisions that we

must face when designing an ensemble are the following.

272 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 3, JUNE 2005

• The method for designing and training the individual

networks.

• The method of combining the individual networks, and

the mechanism for obtaining individual weights for each

network if such is the case.

• The measures of performance of the individual networks.

• The methods for encouraging diversity among the mem-

bers of the ensembles and how to measure such diversity.

• The method of selection of patterns that are used by each

network to learn.

• Whether to include regularization terms and their form.

Techniques using multiple models usually consist of two in-

dependent phases: model generation and model combination

[6]. The disadvantage of this approach is that the combination

is not considered during the generation of the models. With this

approach the possible interactions among the trained networks

cannot be exploited until the combination stage [15], and the

beneﬁts that can be obtained from this interactions during the

learning stage are lost.

However, several researchers [17], [18] have recently shown

that some information about cooperation is useful for obtaining

better ensembles. This new approach opens a wide ﬁeld where

the design and training of the different networks must be

interdependent.

In this paper, we present a new model for cooperatively

evolving the individual networks and their combinations. We

have three main aims in the design of our model.

1) Improving the combination of networks. Some recent

works have shown [17], [19] that the combination of a

subset of all the trained networks can be better than the

combination of all the networks.

2) Improving the introduction of correction terms for dis-

couraging correlation, reducing mutual information, or

similar ideas, as has been suggested by several authors.

3) Improving the diversity among the networks of the

ensemble.

The simultaneous evolution of all the networks has been

shown to be useful in some recent papers. The most common

approach is the modiﬁcation of the error term in the back-prop-

agation algorithm to take into account the correlation of the

networks of the ensemble, e.g., [15] and [20]–[22]. Liu

et al.

[18] evolved the population of networks minimizing the mutual

information [23] among the individual networks. Liu and Yao

[21] modiﬁed the standard back-propagation algorithm adding

a correction term that forces the networks to be negatively

correlated. Nevertheless, these works are centered only on

obtaining more diverse networks in the ensemble, and some

recent results have shown that it is not clear that the use of a

diversity term had a beneﬁcial effect on the ensemble [24]. So,

we have opted for considering diversity as one among many

other interesting objectives.

Opitz and Shavlik [25] developed a model closer to cooper-

ative coevolution. They evolved a population of networks by

means of a genetic algorithm and combined the networks in

an ensemble with a linear combination. Competition among

the networks is encouraged with a diversity term added to the

ﬁtness of each network. More recently, Zhou et al. analyzed the

relationship between the ensemble and its component neural

networks [17]. This study revealed that it may be better to

ensemble a subset of the neural networks instead of all of them.

In order to select this subset of possibly better performing

networks, they applied a genetic algorithm that evolved the

weight of each network in the ensemble. Their results support

our approach of evolving a population of ensembles, each

one being a combination of some of the evolved networks. A

recent work by Bakker and Heskes [19] corroborates the results

of Zhou et al.

Moreover, the selection and training of the individual clas-

siﬁers is thought to be an issue as critical as the combination

method [26], [27]. Zhou et al. [17] have shown that a combina-

tion of some of the networks may be better than a combination

of all the networks, and that a genetic algorithm [28] can be used

for obtaining that subset of networks.

We propose a model that makes use of these ideas by means

of the cooperative evolution of the networks that form the en-

semble. Our model relies on two central ideas: the coevolution

of different subpopulations of diverse networks and the evolu-

tion of the best combinations of these networks. Cooperative

coevolution [12], [29] is a recent paradigm in the ﬁeld of evolu-

tionary computation that has shown a natural tendency to evolve

diverse populations.

Our cooperative model is focused on improving the following

two aspects of the design and training of an ensemble: the evolu-

tion of more cooperative networks and the combination of such

networks. The use of cooperative coevolution allows us to ob-

tain more diverse networks without introducing diversity terms

that can bias the learning process and the improvement of the

collaborative features of the networks. Cooperative coevolution

also offers a framework for the combination of networks that

has been proved useful in other models of neural networks, e.g.,

modular neural networks [30].

The second basic idea of our model is the introduction of mul-

tiobjective optimization in the evaluation of the ﬁtness of the

networks. The performance of the network is one of its most

important aspects, but not the only interesting one. The eval-

uation of different objectives for each network allows a more

accurate estimation of the goodness of a network. Additionally,

the deﬁnition of many objectives allows the inclusion of some

useful measures applied to other models, such as negative cor-

relation [21] or mutual information [18]. Multiobjective evalu-

ation of modular networks obtained good results in a previous

work [31].

The multiobjective approach improves the following features

of the design of the network ensembles.

• The measures of performance of the individual networks.

We can evaluate the performance of the networks from

different points of view.

• The methods for encouraging diversity among the mem-

bers of ensembles and how to measure such diversity.

We can estimate the diversity of networks with different

measures.

• Whether to include regularization terms and their form.

Instead of adding a regularization term [32] to the error

function that may seriously bias the learning process, we

GARCÍA-PEDRAJAS et al.: COOPERATIVE COEVOLUTION OF ARTIFICIAL NEURAL NETWORK ENSEMBLES FOR PATTERN CLASSIFICATION 273

Fig. 1. Populations of ensembles and networks. Each element of the ensemble is a reference to an individual of the corresponding subpopulation of networks,

together with its associated weight.

can add an objective of regularization that will encourage

less complex networks without biasing the evolutionary

process.

The rest of the paper is organized as follows. Section II

describes the proposed model of cooperative ensembles.

Sections III and IV state all the aspects of network and en-

semble populations and their evolution. Section V explains

the multiobjective evaluation of the individuals. Section VI

describes the experimental setup and Section VII shows the

results of the application of our model to ten real-world prob-

lems. A comparison with standard ensemble methods is carried

out in Section VIII. Sections IX and X show a comprehensive

analysis of several aspects of the evolved ensembles. Finally,

Section XI states the conclusions of our work and the most

important lines for future research.

II. C

OOPERATIVE ENSEMBLE OF NEURAL NETWORKS

Evolutionary computation [28], [33] is a set of global opti-

mization techniques that have been widely used in the last few

years for training and automatically designing neural networks.

Some efforts have been made to design modular [34] neural net-

works with these techniques (e.g., [35]), but the design of net-

work ensembles by means of evolutionary computation has only

been focused on some of its aspects [21], [25], [36] and not on

the whole process.

Cooperative coevolution [37] is a recent paradigm in the

area of evolutionary computation, based on the evolution

of coadapted subcomponents without external interaction.

In cooperative coevolution a number of species are evolved

together. Cooperation among individuals is encouraged by

rewarding the individuals for their join effort to solve a target

problem. The work in this paradigm has shown that cooperative

coevolutionary models present many interesting features, such

as specialization through genetic isolation, generalization and

efﬁciency [29]. Cooperative coevolution approaches the design

of modular systems in a natural way, as the modularity is part

of the model. Other models need some a priori knowledge to

decompose the problem by hand. In many cases, either this

knowledge is not available or it is not clear how to decompose

the problem.

So, the cooperative coevolutionary model offers a very nat-

ural way for modeling the evolution of cooperative parts. This

is the case of neural network ensembles, where the accuracy of

the individual networks is not enough to assure a good perfor-

mance. Cooperation among individual networks is also needed

in order to improve the performance signiﬁcantly.

Our cooperative model is based on two separate populations

that evolve cooperatively. A model sharing some of these basic

ideas has already been successfully applied to the evolution of

modular neural networks [31]. These two populations are the

following.

•

Population of networks: This population consists of a

number of independent subpopulations of networks. The

independent evolution of subpopulations is an effective

way of keeping the networks of different populations

diverse. The absence of genetic material exchange among

subpopulations also tends to produce more diverse net-

works whose combination is more effective. Every sub-

population is evolved using evolutionary programming.

274 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 3, JUNE 2005

Fig. 2. Model of a GMLP.

•

Population of ensembles: Each member of the population

of ensembles is an ensemble formed by a network from

every network subpopulation. Each network has an asso-

ciated weight.

The population of ensembles keeps track of the best

combinations of networks, selecting the subsets of net-

works that are promising for the ﬁnal ensemble.

The two populations evolve cooperatively. Each generation of

the whole system consists of a generation of the network popu-

lation followed by a generation of the ensemble population. The

relationship between the two populations can be seen in Fig. 1.

The second basis of our model is the use of multiobjective

optimization in the evaluation of the ﬁtness of the individual

networks. We have quoted previous works that agree that the

learning process of the networks must take into account the co-

operation among the networks for obtaining better ensembles.

Implicit cooperation among the subpopulations in cooperative

coevolution helps this learning process, but it is necessary to en-

force cooperation to assure good results. The evaluation of sev-

eral objectives for each network allows the model to encourage

such cooperation, rewarding the networks not only for their per-

formance in solving the given problem, but also for other as-

pects, such as whether they are different from other networks,

whether they are useful in the ensembles or anything else con-

sidered relevant by the designer.

Additionally, every network is subject to back-propagation

training throughout its evolution with a certain probability. In

this way, the network is allowed to learn from the training set,

but it is also prevented from being too similar to the rest of the

networks by means of its evaluation using different objectives.

The back-propagation algorithm is implemented as a mutation

operator.

As we stated in Section I, many decisions must be made in

order to design an ensemble of neural networks. In the next sec-

tions, we explain in depth all the aspects of our model and the

decisions made, following the ideas we have already introduced.

III. N

ETWORK POPULATION

Our basic network is a generalized multilayer perceptron

(GMLP), as deﬁned in [38]. It consists of an input layer, an

output layer, and a number of hidden nodes interconnected

among them.

Given a GMLP with

inputs, hidden nodes, and out-

puts, and

and being the input and output vectors, respec-

tively, it is deﬁned by the equations [38]

(2)

where

is the weight of the connection from node to node .

The representation of a GMLP can be seen in Fig. 2. We see that

the

th node, provided it is not an input node, has connections

from every

th node .

The main advantage of using a GMLP is the parsimony of

the evolved networks. Its structure allows the deﬁnition of very

complex surfaces with fewer nodes than in a standard multilayer

perceptron with one or two hidden layers.

The network population is formed by

subpopulations.

Each subpopulation consists of a ﬁxed number of networks cod-

iﬁed directly as shown in Fig. 2. These networks are not fully

connected. When a network is initialized, each connection is

GARCÍA-PEDRAJAS et al.: COOPERATIVE COEVOLUTION OF ARTIFICIAL NEURAL NETWORK ENSEMBLES FOR PATTERN CLASSIFICATION 275

created with a given probability. The population is subject to op-

erations of replication and mutation. Crossover is not used due

to potential disadvantages [39] that it has for evolving neural

networks. With these features the algorithm falls in the class of

evolutionary programming [40].

A. Evolution of Networks

The algorithm for the evolution of the subpopulations of net-

works is similar to other models proposed in the literature, such

as GNARL [39] or EPNet [35]. The steps for generating the new

subpopulations are the following.

• Networks of the initial subpopulation are created ran-

domly. The number of nodes of the network

is obtained

from a uniform distribution

. Each node

is created with a number of connections

taken from a

uniform distribution

• The new subpopulation is generated replicating the best

of the previous subpopulation. The remaining

is removed and replaced by mutated copies of net-

works selected by roulette selection from the best

individuals.

• There are two types of mutation: parametric and struc-

tural. The severity of structural mutation is determined by

the relative ﬁtness,

, of the network. Given a network

, its relative ﬁtness is deﬁned as

(3)

where

is the ﬁtness value of network , and is a

parameter that must be chosen by the expert. In our ex-

periments

Parametric mutation consists of the modiﬁcation of the

weights of the network without modifying its topology. Many

parametric mutation operators have been suggested in the

speciﬁc literature: random modiﬁcation of the weights [12],

simulated annealing [35], and back-propagation [35], among

others. In this paper, we use the back-propagation algorithm

[38] as mutation operator. This algorithm is performed for a

few iterations with a low value of the learning coefﬁcient

(in our experiments ). Parametric mutation is always

carried out after structural mutation, as it does not modify the

structure of the network.

Structural mutation is more complex, because it implies a

modiﬁcation of the structure of the network. The behavioral link

between parents and their offspring must be enforced to avoid

generational gaps that produce inconsistency in the evolution

[35], [39]. There are four different structural mutations.

• Addition of a node: The node is added with no connections

to enforce the behavioral link with its parent.

• Deletion of a node: A node is selected randomly and

deleted together with its connections.

• Addition of a connection: A connection is added, with

weight 0, to a randomly selected node. There are three

types of connections: from an input node, from another

hidden node and to an output node. The selection of the

type of connection to remove is made according to the

relative number of each type of nodes: input, output and

hidden. Otherwise, when there is a signiﬁcant difference

TABLE I

ARAMETERS OF

NETWORK STRUCTURAL

MUTATIONS

COMMON TO

ALL THE

EXPERIMENTS

in the number of these three types, the number of connec-

tions of each type may end up highly biased.

• Deletion of a connection: A connection is selected, fol-

lowing the same criterion of the addition of connections,

and removed.

All of the above mutations can be made in a single mutation

operation over the network. For each mutation there is a min-

imum value

and a maximum value . The number of

elements (nodes or connections) involved in the mutation is cal-

culated as follows:

(4)

So, before making a mutation, the number of elements

calculated. If

, the mutation is not actually carried out.

The values of network mutation parameters used in all of our

experiments are shown in Table I.

There is no migration among subpopulations. So, each sub-

population must develop different behaviors of their networks,

that is, different species of networks, in order to compete with

the other subpopulations for conquering its own niche and to

cooperate to form ensembles with high ﬁtness values. This will

help the diversity among networks of different subpopulations.

For the initialization of the weights of the networks, we

used the method suggested by Le Cun, [2], [41]. The weights

are obtained from a uniform distribution within the interval

, where is the number of inputs to the

network.

The whole evolutionary process for network

in a generation

is illustrated in Fig. 3(a). The ﬁgure shows the possible evolution

of a network during one generation of the evolutionary process.

IV. E

NSEMBLE POPULATION

The ensemble population is formed by a ﬁxed number of en-

sembles. Each ensemble is the combination of one network from

each subpopulation of networks with an associated weight. The

relationship between the two populations has been shown in

Fig. 1. It is important to note that, as the chromosome that rep-

resents the ensemble is ordered, the permutation problem [39],

that is so important in network evolution, cannot appear.

A. Evolution of Ensembles

The ensemble population is evolved using the steady-state ge-

netic algorithm [42], [43]. It has been proved that this model

shows higher variance [44] and is a more aggressive and selec-

tive selection strategy [45] than the standard genetic algorithm.

Cooperative coevolution of artificial neural network ensembles for pattern classification

Figures

Citations

Evolutionary multi-objective optimization: a historical view of the field

Group Search Optimizer: An Optimization Algorithm Inspired by Animal Searching Behavior

Ensemble approaches for regression: A survey

A Competitive-Cooperative Coevolutionary Paradigm for Dynamic Multiobjective Optimization

Cooperative coevolution of Elman recurrent neural networks for chaotic time series prediction

References

Genetic algorithms in search, optimization, and machine learning

Maximum likelihood from incomplete data via the EM algorithm

Optimization by Simulated Annealing

Neural Networks: A Comprehensive Foundation

Genetic Algorithms

Related Papers (5)

Bagging predictors

Neural network ensembles

Neural Networks: A Comprehensive Foundation

Evolving neural networks through augmenting topologies

Experiments with a new boosting algorithm

Frequently Asked Questions (9)

Q1. What have the authors contributed in "Cooperative coevolution of artificial neural network ensembles for pattern classification" ?

Q2. What are the future works in "Cooperative coevolution of artificial neural network ensembles for pattern classification" ?

Q3. What is the important feature of the proposed algorithm?

Q4. What are the commonly used methods for combining the networks?

Q5. What are the common parametric mutation operators?

Q6. What other techniques have been proposed in the last few years?

Q7. How many networks did Opitz and Maclin find that the error of the ensemble does not decrease?

Q8. How many subpopulations of networks did the authors test?

Q9. How do the authors represent the function vectors of a network?