scispace - formally typeset
Open AccessJournal ArticleDOI

Constructive Genetic Algorithm for Clustering Problems

TLDR
A new approach called the Constructive Genetic Algorithm (CGA), which allows for schemata evaluation and the provision of other new features to the GA, is introduced, which is applied to two clustering problems in graphs.
Abstract
Genetic algorithms (GAs) have recently been accepted as powerful approaches to solving optimization problems. It is also well-accepted that building block construction (schemata formation and conservation) has a positive influence on GA behavior. Schemata are usually indirectly evaluated through a derived structure. We introduce a new approach called the Constructive Genetic Algorithm (CGA), which allows for schemata evaluation and the provision of other new features to the GA. Problems are modeled as bi-objective optimization problems that consider the evaluation of two fitness functions. This double fitness process, called fg-fitness, evaluates schemata and structures in a common basis. Evolution is conducted considering an adaptive rejection threshold that contemplates both objectives and attributes a rank to each individual in population. The population is dynamic in size and composed of schemata and structures. Recombination preserves good schemata, and mutation is applied to structures to get population diversification. The CGA is applied to two clustering problems in graphs. Representation of schemata and structures use a binary digit alphabet and are based on assignment (greedy) heuristics that provide a clearly distinguished representation for the problems. The clustering problems studied are the classical p-median and the capacitated p-median. Good results are shown for problem instances taken from the literature.

read more

Content maybe subject to copyright    Report

Constructive Genetic Algorithm for Clustering
Problems
Luiz Antonio Nogueira Lorena lorena@lac.inpe.br
LAC-Instituto Nacional de Pesquisas Espaciais, Av. dos Astronautas 1758 - Caixa
Postal 515, 12201-970 S˜ao Jos´e dos Campos-SP, Brazil
Jo˜ao Carlos Furtado jcarlosf@dinf.unisc.br
Universidade de Santa Cruz do Sul, Av. Independˆencia 2293, 96815-900 Santa Cruz do
Sul, Brazil
Abstract
Genetic algorithms (GAs) have recently been accepted as powerful approaches to
solving optimization problems. It is also well-accepted that building block construc-
tion (schemata formation and conservation) has a positive influence on GA behav-
ior. Schemata are usually indirectly evaluated through a derived structure. We intro-
duce a new approach called the Constructive Genetic Algorithm (CGA), which allows
for schemata evaluation and the provision of other new features to the GA. Problems
are modeled as bi-objective optimization problems that consider the evaluation of two
fitness functions. This double fitness process, called fg-fitness, evaluates schemata and
structuresin a common basis. Evolution is conducted considering an adaptive rejection
threshold that contemplates both objectives and attributes a rank to each individual in
population. The population is dynamic in size and composed of schemata and struc-
tures. Recombination preserves good schemata, and mutation is applied to structures
to get population diversification. The CGA is applied to two clustering problems in
graphs. Representation of schemata and structures use a binary digit alphabet and are
based on assignment (greedy) heuristics that provide a clearly distinguished represen-
tation for the problems. The clustering problems studied are the classical p-median
and the capacitated p-median. Good results are shown for problem instances taken
from the literature.
Keywords
Genetic algorithms, clustering problems, p-median problems, capacitated p-median
problem.
1 Introduction
Genetic algorithms (GAs) have been recognized as powerful approaches to solving op-
timization problems (B¨ack and Schwefel, 1993; Davis, 1991; De Jong, 1975; Goldberg,
1989; Holland, 1975; Lorena and Lopes, 1996, 1997; Michalewicz, 1996; Mitchell, 1996).
The foundation of such algorithms is the controlled evolution of a structured popula-
tion.
The GA works on a set of variables called structures. When applying them to opti-
mization problems, the first step is to define a coding scheme that allows a one-to-one
mapping between solutions and structures. The following string can represent a struc-
ture
s
k
=(
s
k
1
;s
k
2
;:::;s
kn
)
, where n is the number of variables in the problem. A fitness
function assigns a numeric value to each member of the current population (a collec-
tion of structures). Selection (like tournament or biased roulette wheel) is used together
c
2001 by the Massachusetts Institute of Technology Evolutionary Computation 9(3): xxx-xxx

L. A. N. Lorena and J. C. Furtado
with crossover and mutation operators. The best structure is kept after a predefined
number of generations (Goldberg, 1989; Holland, 1975; Michalewicz, 1996).
Holland (1975)put forward the Building Block Hypothesis (schema formation and
conservation) as a theoretical basis for the GA mechanism. In his view, avoiding dis-
ruption of good schema is the basis for the good behavior of a GA. However, a major
problem with building blocks is that schemata are evaluated indirectly via evaluation
of their instances (structures). Goldberg and collaborators (Goldberg et al., 1989, 1993;
Kargupta, 1995) introduced the messy-GA that allows variable length strings and looks
for the construction and preservation of good building blocks.
The Constructive Genetic Algorithm (CGA) is proposed here as an alternative to the
traditional GA approach (Holland, 1975), particularly in that CGA directly evaluates
schemata. The population, initially formed only by schemata, is built, generation after
generation, by directly searching for a population of well-adapted structures and also
for good schemata.
Some steps in the CGA are notably different from a classical GA. The CGA works
with a dynamic population, initially composed of schemata, which is enlarged after
the use of recombination operators, or made smaller along the generations, guided by
an evolution parameter. Schemata recombination diversifies the population thereby
generating new schemata or structures. At the time of its creation, each schema or
structure receives a rank used in the evolution analysis. Structures represent feasible
solutions, undergo mutation, and are compared to the best solution found so far, which
is always retained. Another main difference between a classical GA and a CGA is the
new fg-fitness process.
The CGA will be explainedin detail in Sections 2 and 3. We will explain the method
with examples based on the clustering applications.
Clustering problems generally appear in classification of data for some purpose
like storage and retrieval or data analysis. Any clustering algorithm will attempt to
determine some inherent or natural grouping in the data, using distance or similarity
measures between individual data (Spath, 1980; Zupan, 1982). In this paper, we use
graphs to examine the application of CGA to two clustering problems: the classical
p-median problem (PMP) and the capacitated p-median problem (CPMP).
The PMP is a classical location problem. The objective is to locate p facilities (me-
dians) so as to minimize the sum of the distances from each demand vertex to its near-
est facility (Hakimi, 1964, 1965). The problem is well known to be NP-hard (Garey
and Johnson, 1979), and several heuristics have been developed for p-median prob-
lems (Densham and Rushton, 1992; Goodchild and Noronha, 1983; Rolland et al., 1997;
Rosing and ReVelle, 1997; Rosing et al., 1998; Teitz and Bart, 1968). More complete ap-
proaches explore a search tree (Beasley, 1993; Christofides and Beasley, 1982; Efroym-
son and Ray, 1966; Galv˜ao and Raggi, 1989; Jarvinen et al., 1972; Neebe, 1978). Other
approaches consider Lagrangian relaxation and subgradient optimization in a primal-
dual viewpoint (Beasley, 1993; Senne and Lorena, 2000).
The CPMP considers capacities for the service to be given by each median. The to-
tal service demanded by the vertices identified by p-median clusters cannot exceed the
service capacity. Apparently, the CPMP was not as intensively studied as the classical
PMP. Similar problems appeared in Klein and Aronson (1991), Maniezzo et al. (1998),
Mulvey and Beck (1984), and Osman and Christofides (1994).
This paper is organized as follows. The CGA description is divided into two sec-
tions. In Section 2, we present aspects of modeling to be considered when solving a
problem using CGA. Modeling involves definitions of the schema and structure rep-
2
Evolutionary Computation Volume 9, Number 3

Constructive Genetic Algorithm for Clustering Problems
Figure 1: A 3-median solution.
resentations and the consideration of the problems at issue as bi-objective optimization
problems. The evolution process is also described in this section. Section 3 describes
the CGA operators: selection, recombination, and mutation, as well as the definition of
an initial population and a CGA pseudocode. Section 4 shows computational results
using instances from the literature. We conclude in Section 5 with a summary of the
CGA performance and characteristics.
2 CGA Modeling
In this section, we describe the modeling phase of the CGA. The clustering problems
areformulated as bi-objective optimization problems. Two fitness functions are defined
on the space of all schemata and structures that can be obtained using a specific repre-
sentation. The evolution process considers the two objectives on an adaptive rejection
threshold, which gives ranks to individuals in the population and yields a dynamic
population.
2.1 Structure and Schema Representation
Very simple structure and schema representations are adopted for PMP and CPMP.
They use a binary alphabet, and assignment heuristics make a clear and independent
connection with the two clustering problems. The use of the same kind of representa-
tion allows the remaining steps in CGA to be valid for both problems.
Suppose a given graph G=(V,E). A clustering problem in graphs can be stated as
the search for partitions on the vertex set V in a (generally) predefined number of clus-
ters, optimizing some measure on combinations of vertices and/or edge weights. The
problems considered in this paper are clustering problems in graphs.
A typical problem instance is composed of n demand points (vertices) V=
f
1,...,n
g
and a distance (weight) matrix
[
jl
]
indicating distances between pairs of vertices such
that
jl
0
,
jj
=0
, and
jl
=
lj
for all
j; l
2
V
.
To define the representation, some vertices are elected as the seeds, i.e., the initial
vertices in clusters that in some way attract the other vertices that participate in the
representation.
Evolutionary Computation Volume 9, Number 3 3

L. A. N. Lorena and J. C. Furtado
Figure 2: A 3-median schema.
Starting with an example for the PMP, consider the following 3-median solution
for a complete graph G(V,E) instance with 10 vertices (see Figure 1).
A partition on the index vertex set V is then made, yielding two blocks, the seed
set and the non-seed set, where the seeds are the medians. For the 3-median solution of
Figure 1,
V
1
(
s
k
)=
f
1
;
9
;
10
g
is the seed set, and
V
0
(
s
k
)=
f
2
;
3
;
4
;
5
;
6
;
7
;
8
g
is the non-
seed set. Vertices 1, 9, and 10 are the medians, and the others are assigned to a median.
The 3-median structure for the example will be
s
k
=(1
;
0
;
0
;
0
;
0
;
0
;
0
;
0
;
1
;
1)
, where each
position
s
kj
in
s
k
, receiving labels 1 or 0, means that vertex j belongs to sets
V
1
(
s
k
)
or
V
0
(
s
k
)
, respectively.
Structure
s
k
is not completely defined, as we do not know the non-median
assignments. For the clustering problems, after the initial seed identification, an
Assignment Heuristic is employed to assign non-seed vertices to clusters. For the PMP,
each non-median vertex is assigned to the nearest identified median. Algorithm AH1
(shown below) formalizes the assignments.
AH1
Read
s
k
,
V
1
(
s
k
)=
f
1
;
2
; :::;
j
V
1
(
s
k
)
j
g
,
V
0
(
s
k
)=
f
1
;
2
;:::;
j
V
0
(
s
k
)
j
g
;
For
i
=1
to
j
V
1
(
s
k
)
j
do
C
i
(
s
k
):=
f
i
g
;
end
for
For
j
=1
to
j
V
0
(
s
k
)
j
do
r
:=
index
Min
f
i
=1
;:::;
j
V
1
(
s
k
)
jg
f
i
j
g
;
C
r
(
s
k
):=
C
r
(
s
k
)
[f
j
g
;
end
for
After the application of AH1, exactly
p
=
j
V
1
(
s
k
)
j
clusters are identified (for the
example in Figure 1,
C
1
(
s
k
) =
f
1
;
8
g
;C
2
(
s
k
) =
f
2
;
4
;
5
;
9
g
, and
C
3
(
s
k
) =
f
3
;
6
;
7
;
10
g
)
4
Evolutionary Computation Volume 9, Number 3

Constructive Genetic Algorithm for Clustering Problems
corresponding to the median set
V
1
(
s
k
)=
f
1
;
9
;
10
g
.
The CGA works directly with schemata. A 3-median schema for the example can
be
s
k
= (1
;
#
;
0
;
0
;
0
;
#
;
0
;
0
;
1
;
1)
as Figure 2 clarifies. The defined sets are:
V
1
(
s
k
) =
f
1
;
9
;
10
g
,
V
0
(
s
k
) =
f
3
;
4
;
5
;
7
;
8
g
, and the new set
V
#
(
s
k
) =
f
2
;
6
g
.
V
#
(
s
k
) =
V
(
V
1
(
s
k
)
[
V
0
(
s
k
))
is formed by vertices not considered by the 3-median schema. The “do
not care” label # will be used to distinguish this condition. Observe that the number
of medians is the same on schemata and structures, determining the name p-median
schema (a condition that can be relaxed).
The same heuristic AH1 is then used to make the assignments, giving a clear and
unique representation. The clusters identified in Figure 2 are
C
1
(
s
k
)=
f
1
;
8
g
,
C
2
(
s
k
)=
f
4
;
5
;
9
g
, and
C
3
(
s
k
)=
f
3
;
7
;
10
g
.
The structure and schema
s
k
defined above can also be used on the CPMP rep-
resentation. The only modification is that now the clusters have capacities. Heuristic
AH2 is the corresponding assignment heuristic in this case. Assume, for instance, that
the cluster capacities are the same (a condition that can also be relaxed).
Each non-median vertex is assigned to its nearest median if the cluster capacity is
not violated.
AH2
Read
s
k
,
Q
,
V
1
(
s
k
)=
f
1
;
2
; :::;
j
V
1
(
s
k
)
j
g
,
V
0
(
s
k
)=
f
1
;
2
;:::;
j
V
0
(
s
k
)
j
g
;
j
;j
=1
;:::;
j
V
0
(
s
k
)
j
,
i
;i
=1
; :::;
j
V
1
(
s
k
)
j
,
For
i
=1
to
j
V
1
(
s
k
)
j
do
Q
i
:=
Q
i
,
C
i
(
s
k
):=
f
i
g
;
end
for
For
j
=1
to
j
V
0
(
s
k
)
j
do
r
:=
index
Min
f
i
=1
;:::;
j
V
1
(
s
k
)
jg
f
i
j
j
Q
i
j
j
0
g
;
Q
r
:=
Q
r
j
,
C
r
(
s
k
):=
C
r
(
s
k
)
[f
j
g
;
end
for
Define X as the set of all structures and schemata that can be generated by the
0
1
#
string representation. For the PMP, the assignment heuristic AH1 allows
completeness to X in the sense that an optimal solution to the problem is always in
X. The same is not true on heuristic AH2 for CPMP (capacity constraints may not be
correctly represented).
The assignment heuristics can be defined in various ways, and the more elaborated
they are, the better solutions they find to the clustering problems (although, generally
increasing computational times).
2.2 The Bi-Objective Optimization Problem
The CGA is proposed to address the problem of evaluating schemata and structures in
a common basis. While in the other evolutionary algorithms, evaluations of individuals
are based on a single function (the fitness function), in CGA this process relies on two
Evolutionary Computation Volume 9, Number 3 5

Citations
More filters
Journal IssueDOI

Solution methods for the p-median problem: An annotated bibliography

TL;DR: This bibliography summarizes the literature on solution methods for the uncapacitated and capacitated p-median problem on a network.
Journal ArticleDOI

Bees algorithm for generalized assignment problem

TL;DR: An extensive computational study is carried out and the results are compared with several algorithms from the literature, including BA for solving generalized assignment problems (GAP) with an ejection chain neighborhood mechanism.
Journal ArticleDOI

A column generation approach to capacitated p-median problems

TL;DR: The Lagrangean/surrogate relaxation is directly identified from the master problem dual and provides new bounds and new productive columns through a modified knapsack subproblem, and the overall column generation process is accelerated, even when multiple pricing is observed.
Journal ArticleDOI

Wavelet neural network with improved genetic algorithm for traffic flow time series prediction

TL;DR: A predication model based on clustering search strategy improved genetic algorithm and WNN (IGA-WNN) is proposed, which has a higher predication accuracy and a better nonlinear fitting ability compared with the traditional WNN and GA- WNN prediction models.
Journal ArticleDOI

The capacitated centred clustering problem

TL;DR: This work presents two variations (p-CCCP and Generic CCCP) of this problem and their mathematical programming formulations, and proposes a two-phase polynomial heuristic algorithm for these problems being NP-HARD.
References
More filters
Book

Genetic algorithms in search, optimization, and machine learning

TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
Book

Computers and Intractability: A Guide to the Theory of NP-Completeness

TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Book

Adaptation in natural and artificial systems

TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
Frequently Asked Questions (11)
Q1. What have the authors contributed in "Constructive genetic algorithm for clustering problems" ?

The authors introduce a new approach called the Constructive Genetic Algorithm ( CGA ), which allows for schemata evaluation and the provision of other new features to the GA. The clustering problems studied are the classical p-median and the capacitated p-median. 

It is the problem of partitioning the vertex set of a given graph into a pre-fixed number of clusters such that the sum of the cluster vertex weights have inferior and superior limits, while the sum of the clusters edge weights is maximized (or, alternatively, the sum of edge weights outside the clusters is minimized). 

a major problem with building blocks is that schemata are evaluated indirectly via evaluation of their instances (structures). 

A clustering problem in graphs can be stated as the search for partitions on the vertex set V in a (generally) predefined number of clusters, optimizing some measure on combinations of vertices and/or edge weights. 

For the PMP, the assignment heuristic AH1 allows completeness to X in the sense that an optimal solution to the problem is always in X. 

The authors have two purposes in the evolution process: to obtain solutions to the g maximization objective on the BOP and that these structures be the best solutions to the interval minimization problem on the BOP. 

The population increases, after the initial generations, reaching an upper limit (in general controlled by storage conditions) and decreases for higher values of the evolution parameter (see Figure 4). 

Heuristic H.OC is a simple constructive heuristic, while H1+F1 and H1+B1 begin with the H.OC solution and make some permutations using “first improve” and “best improve” strategies. 

Modeling involves definitions of the schema and structure rep-2 Evolutionary Computation Volume 9, Number 3resentations and the consideration of the problems at issue as bi-objective optimization problems. 

The computational times (Table 1) for both algorithms are not comparable due to the use of different machines, although the IBM Risc/6000 could be considered faster than a Pentium 166 Mhz. 

For the 3-median example of Section 2.1, the random structure of Figure 3 gives gmax = g(0; 1; 0; 0; 0; 1; 1; 0; 0; 0) = 32, and for d = 0:1, the expected interval length is dgmax = (0:1):(32) = 3:2.