scispace - formally typeset
Open AccessJournal ArticleDOI

Cooperative coevolution of artificial neural network ensembles for pattern classification

TLDR
This paper proposes a general framework for designing neural network ensembles by means of cooperative coevolution, and applies the proposed model to ten real-world classification problems of a very different nature from the UCI machine learning repository and proben1 benchmark set.
Abstract
This paper presents a cooperative coevolutive approach for designing neural network ensembles. Cooperative coevolution is a recent paradigm in evolutionary computation that allows the effective modeling of cooperative environments. Although theoretically, a single neural network with a sufficient number of neurons in the hidden layer would suffice to solve any problem, in practice many real-world problems are too hard to construct the appropriate network that solve them. In such problems, neural network ensembles are a successful alternative. Nevertheless, the design of neural network ensembles is a complex task. In this paper, we propose a general framework for designing neural network ensembles by means of cooperative coevolution. The proposed model has two main objectives: first, the improvement of the combination of the trained individual networks; second, the cooperative evolution of such networks, encouraging collaboration among them, instead of a separate training of each network. In order to favor the cooperation of the networks, each network is evaluated throughout the evolutionary process using a multiobjective method. For each network, different objectives are defined, considering not only its performance in the given problem, but also its cooperation with the rest of the networks. In addition, a population of ensembles is evolved, improving the combination of networks and obtaining subsets of networks to form ensembles that perform better than the combination of all the evolved networks. The proposed model is applied to ten real-world classification problems of a very different nature from the UCI machine learning repository and proben1 benchmark set. In all of them the performance of the model is better than the performance of standard ensembles in terms of generalization error. Moreover, the size of the obtained ensembles is also smaller.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 3, JUNE 2005
271
Cooperative Coevolution of Artificial Neural Network
Ensembles for Pattern Classification
Nicolás García-Pedrajas, Member, IEEE, César Hervás-Martínez, Member, IEEE, and Domingo Ortiz-Boyer
Abstract—This paper presents a cooperative coevolutive ap-
proach for designing neural network ensembles. Cooperative
coevolution is a recent paradigm in evolutionary computation
that allows the effective modeling of cooperative environments.
Although theoretically, a single neural network with a sufficient
number of neurons in the hidden layer would suffice to solve
any problem, in practice many real-world problems are too hard
to construct the appropriate network that solve them. In such
problems, neural network ensembles are a successful alternative.
Nevertheless, the design of neural network ensembles is a com-
plex task. In this paper, we propose a general framework for de-
signing neural network ensembles by means of cooperative coevo-
lution. The proposed model has two main objectives: first, the im-
provement of the combination of the trained individual networks;
second, the cooperative evolution of such networks, encouraging
collaboration among them, instead of a separate training of each
network. In order to favor the cooperation of the networks, each
network is evaluated throughout the evolutionary process using a
multiobjective method. For each network, different objectives are
defined, considering not only its performance in the given problem,
but also its cooperation with the rest of the networks.
In addition, a population of ensembles is evolved, improving
the combination of networks and obtaining subsets of networks to
form ensembles that perform better than the combination of all
the evolved networks.
The proposed model is applied to ten real-world classification
problems of a very different nature from the UCI machine learning
repository and
proben1 benchmark set. In all of them the perfor-
mance of the model is better than the performance of standard en-
sembles in terms of generalization error. Moreover, the size of the
obtained ensembles is also smaller.
Index Terms—Classification, cooperative coevolution, multi-
objective optimization, neural network ensembles.
I. INTRODUCTION
N
EURAL network ensembles [1] are receiving increasing
attention in recent neural network research, due to their
interesting features. They are a powerful tool especially when
facing complex problems. Network ensembles are usually made
up of a linear combination of several networks that have been
trained using the same data, although the actual sample used
by each network to learn can be different. Each network within
the ensemble has a potentially different weight in the output of
the ensemble. Several works have shown [1] that the network
ensemble has a generalization error generally smaller than that
Manuscript received April 28, 2003; revised April 27, 2004. This work
was supported in part by the Spanish Comisión Interministerial de Ciencia
y Tecnología under Project TIC2002-04036-C05-02 and in part by FEDER
funds.
The authors are with the Department of Computing and Numerical Analysis,
University of Córdoba, Córdoba E-14071, Spain (e-mail: npedrajas@uco.es;
chervas@uco.es; dortiz@uco.es).
Digital Object Identifier 10.1109/TEVC.2005.844158
obtained with a single network and also that the variance of the
ensemble is lesser than the variance of a single network. The
output
of a typical ensemble [2] with constituent networks
when an input pattern
is presented is
(1)
where
is the output of network and is the weight associ-
ated to that network. If the networks have more than one output,
a different weight is usually assigned to each output. The ensem-
bles of neural networks have some of the advantages of large
networks without their problems of long training time and risk
of overfitting. For more detailed descriptions of ensembles the
reader is referred to [3]–[7].
Although there is no clear distinction between the different
kinds of multinet networks [2], [8]–[10], we follow the distinc-
tion of [11]. In an ensemble several redundant approximations
to the same function are combined by some method, and in a
modular system the task is decomposed into a number of sim-
pler components. Nevertheless, our approach incorporates an
implicit decomposition that is provided by the use of cooper-
ative coevolution [12]–[14].
This combination of several networks that cooperate in
solving a given task has other important advantages such as
[11], [15] the following.
They can perform more complex tasks than any of their
subcomponents [16].
They can make an overall system easier to understand and
modify.
They are more robust than a single network.
In most cases, neural networks in an ensemble are designed
independently or sequentially, so the advantages of interaction
and cooperation among the individual networks are not ex-
ploited. Earlier works separate the design and learning process
of the individual networks from the combination of the trained
networks. In this work, we propose a framework for designing
ensembles, where the training and combination of the indi-
vidual networks are carried out together, in order to get more
cooperative networks and more effective combinations of them.
The new framework presented in this work for designing and
evolving neural network ensembles uses and benefits from two
different paradigms: cooperative coevolution and multiobjective
optimization. The design of neural network ensembles implies
making many decisions that have a major impact on the perfor-
mance of the ensembles. The most important decisions that we
must face when designing an ensemble are the following.
1089-778X/$20.00 © 2005 IEEE

272 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 3, JUNE 2005
The method for designing and training the individual
networks.
The method of combining the individual networks, and
the mechanism for obtaining individual weights for each
network if such is the case.
The measures of performance of the individual networks.
The methods for encouraging diversity among the mem-
bers of the ensembles and how to measure such diversity.
The method of selection of patterns that are used by each
network to learn.
Whether to include regularization terms and their form.
Techniques using multiple models usually consist of two in-
dependent phases: model generation and model combination
[6]. The disadvantage of this approach is that the combination
is not considered during the generation of the models. With this
approach the possible interactions among the trained networks
cannot be exploited until the combination stage [15], and the
benets that can be obtained from this interactions during the
learning stage are lost.
However, several researchers [17], [18] have recently shown
that some information about cooperation is useful for obtaining
better ensembles. This new approach opens a wide eld where
the design and training of the different networks must be
interdependent.
In this paper, we present a new model for cooperatively
evolving the individual networks and their combinations. We
have three main aims in the design of our model.
1) Improving the combination of networks. Some recent
works have shown [17], [19] that the combination of a
subset of all the trained networks can be better than the
combination of all the networks.
2) Improving the introduction of correction terms for dis-
couraging correlation, reducing mutual information, or
similar ideas, as has been suggested by several authors.
3) Improving the diversity among the networks of the
ensemble.
The simultaneous evolution of all the networks has been
shown to be useful in some recent papers. The most common
approach is the modication of the error term in the back-prop-
agation algorithm to take into account the correlation of the
networks of the ensemble, e.g., [15] and [20][22]. Liu
et al.
[18] evolved the population of networks minimizing the mutual
information [23] among the individual networks. Liu and Yao
[21] modied the standard back-propagation algorithm adding
a correction term that forces the networks to be negatively
correlated. Nevertheless, these works are centered only on
obtaining more diverse networks in the ensemble, and some
recent results have shown that it is not clear that the use of a
diversity term had a benecial effect on the ensemble [24]. So,
we have opted for considering diversity as one among many
other interesting objectives.
Opitz and Shavlik [25] developed a model closer to cooper-
ative coevolution. They evolved a population of networks by
means of a genetic algorithm and combined the networks in
an ensemble with a linear combination. Competition among
the networks is encouraged with a diversity term added to the
tness of each network. More recently, Zhou et al. analyzed the
relationship between the ensemble and its component neural
networks [17]. This study revealed that it may be better to
ensemble a subset of the neural networks instead of all of them.
In order to select this subset of possibly better performing
networks, they applied a genetic algorithm that evolved the
weight of each network in the ensemble. Their results support
our approach of evolving a population of ensembles, each
one being a combination of some of the evolved networks. A
recent work by Bakker and Heskes [19] corroborates the results
of Zhou et al.
Moreover, the selection and training of the individual clas-
siers is thought to be an issue as critical as the combination
method [26], [27]. Zhou et al. [17] have shown that a combina-
tion of some of the networks may be better than a combination
of all the networks, and that a genetic algorithm [28] can be used
for obtaining that subset of networks.
We propose a model that makes use of these ideas by means
of the cooperative evolution of the networks that form the en-
semble. Our model relies on two central ideas: the coevolution
of different subpopulations of diverse networks and the evolu-
tion of the best combinations of these networks. Cooperative
coevolution [12], [29] is a recent paradigm in the eld of evolu-
tionary computation that has shown a natural tendency to evolve
diverse populations.
Our cooperative model is focused on improving the following
two aspects of the design and training of an ensemble: the evolu-
tion of more cooperative networks and the combination of such
networks. The use of cooperative coevolution allows us to ob-
tain more diverse networks without introducing diversity terms
that can bias the learning process and the improvement of the
collaborative features of the networks. Cooperative coevolution
also offers a framework for the combination of networks that
has been proved useful in other models of neural networks, e.g.,
modular neural networks [30].
The second basic idea of our model is the introduction of mul-
tiobjective optimization in the evaluation of the tness of the
networks. The performance of the network is one of its most
important aspects, but not the only interesting one. The eval-
uation of different objectives for each network allows a more
accurate estimation of the goodness of a network. Additionally,
the denition of many objectives allows the inclusion of some
useful measures applied to other models, such as negative cor-
relation [21] or mutual information [18]. Multiobjective evalu-
ation of modular networks obtained good results in a previous
work [31].
The multiobjective approach improves the following features
of the design of the network ensembles.
The measures of performance of the individual networks.
We can evaluate the performance of the networks from
different points of view.
The methods for encouraging diversity among the mem-
bers of ensembles and how to measure such diversity.
We can estimate the diversity of networks with different
measures.
Whether to include regularization terms and their form.
Instead of adding a regularization term [32] to the error
function that may seriously bias the learning process, we

GARCÍA-PEDRAJAS et al.: COOPERATIVE COEVOLUTION OF ARTIFICIAL NEURAL NETWORK ENSEMBLES FOR PATTERN CLASSIFICATION 273
Fig. 1. Populations of ensembles and networks. Each element of the ensemble is a reference to an individual of the corresponding subpopulation of networks,
together with its associated weight.
can add an objective of regularization that will encourage
less complex networks without biasing the evolutionary
process.
The rest of the paper is organized as follows. Section II
describes the proposed model of cooperative ensembles.
Sections III and IV state all the aspects of network and en-
semble populations and their evolution. Section V explains
the multiobjective evaluation of the individuals. Section VI
describes the experimental setup and Section VII shows the
results of the application of our model to ten real-world prob-
lems. A comparison with standard ensemble methods is carried
out in Section VIII. Sections IX and X show a comprehensive
analysis of several aspects of the evolved ensembles. Finally,
Section XI states the conclusions of our work and the most
important lines for future research.
II. C
OOPERATIVE ENSEMBLE OF NEURAL NETWORKS
Evolutionary computation [28], [33] is a set of global opti-
mization techniques that have been widely used in the last few
years for training and automatically designing neural networks.
Some efforts have been made to design modular [34] neural net-
works with these techniques (e.g., [35]), but the design of net-
work ensembles by means of evolutionary computation has only
been focused on some of its aspects [21], [25], [36] and not on
the whole process.
Cooperative coevolution [37] is a recent paradigm in the
area of evolutionary computation, based on the evolution
of coadapted subcomponents without external interaction.
In cooperative coevolution a number of species are evolved
together. Cooperation among individuals is encouraged by
rewarding the individuals for their join effort to solve a target
problem. The work in this paradigm has shown that cooperative
coevolutionary models present many interesting features, such
as specialization through genetic isolation, generalization and
efciency [29]. Cooperative coevolution approaches the design
of modular systems in a natural way, as the modularity is part
of the model. Other models need some a priori knowledge to
decompose the problem by hand. In many cases, either this
knowledge is not available or it is not clear how to decompose
the problem.
So, the cooperative coevolutionary model offers a very nat-
ural way for modeling the evolution of cooperative parts. This
is the case of neural network ensembles, where the accuracy of
the individual networks is not enough to assure a good perfor-
mance. Cooperation among individual networks is also needed
in order to improve the performance signicantly.
Our cooperative model is based on two separate populations
that evolve cooperatively. A model sharing some of these basic
ideas has already been successfully applied to the evolution of
modular neural networks [31]. These two populations are the
following.
Population of networks: This population consists of a
number of independent subpopulations of networks. The
independent evolution of subpopulations is an effective
way of keeping the networks of different populations
diverse. The absence of genetic material exchange among
subpopulations also tends to produce more diverse net-
works whose combination is more effective. Every sub-
population is evolved using evolutionary programming.

274 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 3, JUNE 2005
Fig. 2. Model of a GMLP.
Population of ensembles: Each member of the population
of ensembles is an ensemble formed by a network from
every network subpopulation. Each network has an asso-
ciated weight.
The population of ensembles keeps track of the best
combinations of networks, selecting the subsets of net-
works that are promising for the nal ensemble.
The two populations evolve cooperatively. Each generation of
the whole system consists of a generation of the network popu-
lation followed by a generation of the ensemble population. The
relationship between the two populations can be seen in Fig. 1.
The second basis of our model is the use of multiobjective
optimization in the evaluation of the tness of the individual
networks. We have quoted previous works that agree that the
learning process of the networks must take into account the co-
operation among the networks for obtaining better ensembles.
Implicit cooperation among the subpopulations in cooperative
coevolution helps this learning process, but it is necessary to en-
force cooperation to assure good results. The evaluation of sev-
eral objectives for each network allows the model to encourage
such cooperation, rewarding the networks not only for their per-
formance in solving the given problem, but also for other as-
pects, such as whether they are different from other networks,
whether they are useful in the ensembles or anything else con-
sidered relevant by the designer.
Additionally, every network is subject to back-propagation
training throughout its evolution with a certain probability. In
this way, the network is allowed to learn from the training set,
but it is also prevented from being too similar to the rest of the
networks by means of its evaluation using different objectives.
The back-propagation algorithm is implemented as a mutation
operator.
As we stated in Section I, many decisions must be made in
order to design an ensemble of neural networks. In the next sec-
tions, we explain in depth all the aspects of our model and the
decisions made, following the ideas we have already introduced.
III. N
ETWORK POPULATION
Our basic network is a generalized multilayer perceptron
(GMLP), as dened in [38]. It consists of an input layer, an
output layer, and a number of hidden nodes interconnected
among them.
Given a GMLP with
inputs, hidden nodes, and out-
puts, and
and being the input and output vectors, respec-
tively, it is dened by the equations [38]
(2)
where
is the weight of the connection from node to node .
The representation of a GMLP can be seen in Fig. 2. We see that
the
th node, provided it is not an input node, has connections
from every
th node .
The main advantage of using a GMLP is the parsimony of
the evolved networks. Its structure allows the denition of very
complex surfaces with fewer nodes than in a standard multilayer
perceptron with one or two hidden layers.
The network population is formed by
subpopulations.
Each subpopulation consists of a xed number of networks cod-
ied directly as shown in Fig. 2. These networks are not fully
connected. When a network is initialized, each connection is

GARCÍA-PEDRAJAS et al.: COOPERATIVE COEVOLUTION OF ARTIFICIAL NEURAL NETWORK ENSEMBLES FOR PATTERN CLASSIFICATION 275
created with a given probability. The population is subject to op-
erations of replication and mutation. Crossover is not used due
to potential disadvantages [39] that it has for evolving neural
networks. With these features the algorithm falls in the class of
evolutionary programming [40].
A. Evolution of Networks
The algorithm for the evolution of the subpopulations of net-
works is similar to other models proposed in the literature, such
as GNARL [39] or EPNet [35]. The steps for generating the new
subpopulations are the following.
Networks of the initial subpopulation are created ran-
domly. The number of nodes of the network
is obtained
from a uniform distribution
. Each node
is created with a number of connections
taken from a
uniform distribution
.
The new subpopulation is generated replicating the best
of the previous subpopulation. The remaining
is removed and replaced by mutated copies of net-
works selected by roulette selection from the best
individuals.
There are two types of mutation: parametric and struc-
tural. The severity of structural mutation is determined by
the relative tness,
, of the network. Given a network
, its relative tness is dened as
(3)
where
is the tness value of network , and is a
parameter that must be chosen by the expert. In our ex-
periments
.
Parametric mutation consists of the modication of the
weights of the network without modifying its topology. Many
parametric mutation operators have been suggested in the
specic literature: random modication of the weights [12],
simulated annealing [35], and back-propagation [35], among
others. In this paper, we use the back-propagation algorithm
[38] as mutation operator. This algorithm is performed for a
few iterations with a low value of the learning coefcient
(in our experiments ). Parametric mutation is always
carried out after structural mutation, as it does not modify the
structure of the network.
Structural mutation is more complex, because it implies a
modication of the structure of the network. The behavioral link
between parents and their offspring must be enforced to avoid
generational gaps that produce inconsistency in the evolution
[35], [39]. There are four different structural mutations.
Addition of a node: The node is added with no connections
to enforce the behavioral link with its parent.
Deletion of a node: A node is selected randomly and
deleted together with its connections.
Addition of a connection: A connection is added, with
weight 0, to a randomly selected node. There are three
types of connections: from an input node, from another
hidden node and to an output node. The selection of the
type of connection to remove is made according to the
relative number of each type of nodes: input, output and
hidden. Otherwise, when there is a signicant difference
TABLE I
P
ARAMETERS OF
NETWORK STRUCTURAL
MUTATIONS
COMMON TO
ALL THE
EXPERIMENTS
in the number of these three types, the number of connec-
tions of each type may end up highly biased.
Deletion of a connection: A connection is selected, fol-
lowing the same criterion of the addition of connections,
and removed.
All of the above mutations can be made in a single mutation
operation over the network. For each mutation there is a min-
imum value
and a maximum value . The number of
elements (nodes or connections) involved in the mutation is cal-
culated as follows:
(4)
So, before making a mutation, the number of elements
is
calculated. If
, the mutation is not actually carried out.
The values of network mutation parameters used in all of our
experiments are shown in Table I.
There is no migration among subpopulations. So, each sub-
population must develop different behaviors of their networks,
that is, different species of networks, in order to compete with
the other subpopulations for conquering its own niche and to
cooperate to form ensembles with high tness values. This will
help the diversity among networks of different subpopulations.
For the initialization of the weights of the networks, we
used the method suggested by Le Cun, [2], [41]. The weights
are obtained from a uniform distribution within the interval
, where is the number of inputs to the
network.
The whole evolutionary process for network
in a generation
is illustrated in Fig. 3(a). The gure shows the possible evolution
of a network during one generation of the evolutionary process.
IV. E
NSEMBLE POPULATION
The ensemble population is formed by a xed number of en-
sembles. Each ensemble is the combination of one network from
each subpopulation of networks with an associated weight. The
relationship between the two populations has been shown in
Fig. 1. It is important to note that, as the chromosome that rep-
resents the ensemble is ordered, the permutation problem [39],
that is so important in network evolution, cannot appear.
A. Evolution of Ensembles
The ensemble population is evolved using the steady-state ge-
netic algorithm [42], [43]. It has been proved that this model
shows higher variance [44] and is a more aggressive and selec-
tive selection strategy [45] than the standard genetic algorithm.

Citations
More filters
Journal ArticleDOI

Evolutionary multi-objective optimization: a historical view of the field

TL;DR: This article provides a general overview of the field now known as "evolutionary multi-objective optimization," which refers to the use of evolutionary algorithms to solve problems with two or more (often conflicting) objective functions.
Journal ArticleDOI

Group Search Optimizer: An Optimization Algorithm Inspired by Animal Searching Behavior

TL;DR: A novel optimization algorithm, group search optimizer (GSO), which is inspired by animal behavior, especially animal searching behavior, and has competitive performance to other EAs in terms of accuracy and convergence speed, especially on high-dimensional multimodal problems.
Journal ArticleDOI

Ensemble approaches for regression: A survey

TL;DR: Different approaches to each of these phases that are able to deal with the regression problem are discussed, categorizing them in terms of their relevant characteristics and linking them to contributions from different fields.
Journal ArticleDOI

A Competitive-Cooperative Coevolutionary Paradigm for Dynamic Multiobjective Optimization

TL;DR: This paper proposes a new coevolutionary paradigm that hybridizes competitive and cooperative mechanisms observed in nature to solve multiobjective optimization problems and to track the Pareto front in a dynamic environment.
Journal ArticleDOI

Cooperative coevolution of Elman recurrent neural networks for chaotic time series prediction

TL;DR: This work employs two problem decomposition methods for training Elman recurrent neural networks on chaotic time series problems and shows improvement in performance in terms of accuracy when compared to some of the methods from literature.
References
More filters
Book

Genetic algorithms in search, optimization, and machine learning

TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
Journal ArticleDOI

Optimization by Simulated Annealing

TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Book

Neural Networks: A Comprehensive Foundation

Simon Haykin
TL;DR: Thorough, well-organized, and completely up to date, this book examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks.
Book

Genetic Algorithms

Frequently Asked Questions (9)
Q1. What have the authors contributed in "Cooperative coevolution of artificial neural network ensembles for pattern classification" ?

This paper presents a cooperative coevolutive approach for designing neural network ensembles. In this paper, the authors propose a general framework for designing neural network ensembles by means of cooperative coevolution. 

Arcing and Ada-Boosting methods also suggest the possibility of developing an incremental cooperative environment where new subpopulations are added when the evolution stagnates. 

The proposed algorithm is based on the concept of Pareto optimality [78] and has been chosen taking into account as the most important feature the computational cost. 

The most commonly used methods for combining the networks are the majority voting and sum of the outputs of the networks, both of them with a weight vector that measures the confidence in the prediction of each network. 

Many parametric mutation operators have been suggested in the specific literature: random modification of the weights [12], simulated annealing [35], and back-propagation [35], among others. 

Many other techniques have been proposed in the last few years, such as, linear regression [48], principal components analysis and least-square regression [51], correspondence analysis [6], and the use of a validation set [25]. 

Opitz and Maclin [85] have found after some exhaustive experiments that the error of the ensemble does not decrease after adding 25 networks. 

In order to test the influence of the number of network subpopulations, that is, the size of the ensembles, the authors carried out experiments for Cancer, Glass, Heart, Horse, and Pima problems with 5, 10, 15, 25, and 30 subpopulations of networks. 

In order to represent the functionality of the network, the authors perform a principal component analysis of the function vectors, retaining the first two components.