What is the problem of partitioning the vertex set of a given graph into a?

It is the problem of partitioning the vertex set of a given graph into a pre-fixed number of clusters such that the sum of the cluster vertex weights have inferior and superior limits, while the sum of the clusters edge weights is maximized (or, alternatively, the sum of edge weights outside the clusters is minimized).

What is the definition of a clustering problem in graphs?

A clustering problem in graphs can be stated as the search for partitions on the vertex set V in a (generally) predefined number of clusters, optimizing some measure on combinations of vertices and/or edge weights.

What are the two purposes of the evolution process?

The authors have two purposes in the evolution process: to obtain solutions to the g maximization objective on the BOP and that these structures be the best solutions to the interval minimization problem on the BOP.

What is the effect of the population on the evolution parameter?

The population increases, after the initial generations, reaching an upper limit (in general controlled by storage conditions) and decreases for higher values of the evolution parameter (see Figure 4).

What is the heuristic for annealing?

Heuristic H.OC is a simple constructive heuristic, while H1+F1 and H1+B1 begin with the H.OC solution and make some permutations using “first improve” and “best improve” strategies.

What is the definition of the schema and structure?

Modeling involves definitions of the schema and structure rep-2 Evolutionary Computation Volume 9, Number 3resentations and the consideration of the problems at issue as bi-objective optimization problems.

Why are the computational times for both algorithms not comparable?

The computational times (Table 1) for both algorithms are not comparable due to the use of different machines, although the IBM Risc/6000 could be considered faster than a Pentium 166 Mhz.

What is the expected value of gmax?

For the 3-median example of Section 2.1, the random structure of Figure 3 gives gmax = g(0; 1; 0; 0; 0; 1; 1; 0; 0; 0) = 32, and for d = 0:1, the expected interval length is dgmax = (0:1):(32) = 3:2.

(Open Access) Constructive Genetic Algorithm for Clustering Problems (2001) | Luiz Antonio Nogueira Lorena

Q: What have the authors contributed in "Constructive genetic algorithm for clustering problems" ?

The authors introduce a new approach called the Constructive Genetic Algorithm ( CGA ), which allows for schemata evaluation and the provision of other new features to the GA. The clustering problems studied are the classical p-median and the capacitated p-median.

Constructive Genetic Algorithm for Clustering

Problems

Luiz Antonio Nogueira Lorena lorena@lac.inpe.br

LAC-Instituto Nacional de Pesquisas Espaciais, Av. dos Astronautas 1758 - Caixa

Postal 515, 12201-970 S˜ao Jos´e dos Campos-SP, Brazil

Jo˜ao Carlos Furtado jcarlosf@dinf.unisc.br

Universidade de Santa Cruz do Sul, Av. Independˆencia 2293, 96815-900 Santa Cruz do

Sul, Brazil

Abstract

Genetic algorithms (GAs) have recently been accepted as powerful approaches to

solving optimization problems. It is also well-accepted that building block construc-

tion (schemata formation and conservation) has a positive inﬂuence on GA behav-

ior. Schemata are usually indirectly evaluated through a derived structure. We intro-

duce a new approach called the Constructive Genetic Algorithm (CGA), which allows

for schemata evaluation and the provision of other new features to the GA. Problems

are modeled as bi-objective optimization problems that consider the evaluation of two

ﬁtness functions. This double ﬁtness process, called fg-ﬁtness, evaluates schemata and

structuresin a common basis. Evolution is conducted considering an adaptive rejection

threshold that contemplates both objectives and attributes a rank to each individual in

population. The population is dynamic in size and composed of schemata and struc-

tures. Recombination preserves good schemata, and mutation is applied to structures

to get population diversiﬁcation. The CGA is applied to two clustering problems in

graphs. Representation of schemata and structures use a binary digit alphabet and are

based on assignment (greedy) heuristics that provide a clearly distinguished represen-

tation for the problems. The clustering problems studied are the classical p-median

and the capacitated p-median. Good results are shown for problem instances taken

from the literature.

Keywords

Genetic algorithms, clustering problems, p-median problems, capacitated p-median

problem.

1 Introduction

Genetic algorithms (GAs) have been recognized as powerful approaches to solving op-

timization problems (B¨ack and Schwefel, 1993; Davis, 1991; De Jong, 1975; Goldberg,

1989; Holland, 1975; Lorena and Lopes, 1996, 1997; Michalewicz, 1996; Mitchell, 1996).

The foundation of such algorithms is the controlled evolution of a structured popula-

tion.

The GA works on a set of variables called structures. When applying them to opti-

mization problems, the ﬁrst step is to deﬁne a coding scheme that allows a one-to-one

mapping between solutions and structures. The following string can represent a struc-

ture

;:::;s

)

, where n is the number of variables in the problem. A ﬁtness

function assigns a numeric value to each member of the current population (a collec-

tion of structures). Selection (like tournament or biased roulette wheel) is used together



2001 by the Massachusetts Institute of Technology Evolutionary Computation 9(3): xxx-xxx

L. A. N. Lorena and J. C. Furtado

with crossover and mutation operators. The best structure is kept after a predeﬁned

number of generations (Goldberg, 1989; Holland, 1975; Michalewicz, 1996).

Holland (1975)put forward the Building Block Hypothesis (schema formation and

conservation) as a theoretical basis for the GA mechanism. In his view, avoiding dis-

ruption of good schema is the basis for the good behavior of a GA. However, a major

problem with building blocks is that schemata are evaluated indirectly via evaluation

of their instances (structures). Goldberg and collaborators (Goldberg et al., 1989, 1993;

Kargupta, 1995) introduced the messy-GA that allows variable length strings and looks

for the construction and preservation of good building blocks.

The Constructive Genetic Algorithm (CGA) is proposed here as an alternative to the

traditional GA approach (Holland, 1975), particularly in that CGA directly evaluates

schemata. The population, initially formed only by schemata, is built, generation after

generation, by directly searching for a population of well-adapted structures and also

for good schemata.

Some steps in the CGA are notably different from a classical GA. The CGA works

with a dynamic population, initially composed of schemata, which is enlarged after

the use of recombination operators, or made smaller along the generations, guided by

an evolution parameter. Schemata recombination diversiﬁes the population thereby

generating new schemata or structures. At the time of its creation, each schema or

structure receives a rank used in the evolution analysis. Structures represent feasible

solutions, undergo mutation, and are compared to the best solution found so far, which

is always retained. Another main difference between a classical GA and a CGA is the

new fg-ﬁtness process.

The CGA will be explainedin detail in Sections 2 and 3. We will explain the method

with examples based on the clustering applications.

Clustering problems generally appear in classiﬁcation of data for some purpose

like storage and retrieval or data analysis. Any clustering algorithm will attempt to

determine some inherent or natural grouping in the data, using distance or similarity

measures between individual data (Spath, 1980; Zupan, 1982). In this paper, we use

graphs to examine the application of CGA to two clustering problems: the classical

p-median problem (PMP) and the capacitated p-median problem (CPMP).

The PMP is a classical location problem. The objective is to locate p facilities (me-

dians) so as to minimize the sum of the distances from each demand vertex to its near-

est facility (Hakimi, 1964, 1965). The problem is well known to be NP-hard (Garey

and Johnson, 1979), and several heuristics have been developed for p-median prob-

lems (Densham and Rushton, 1992; Goodchild and Noronha, 1983; Rolland et al., 1997;

Rosing and ReVelle, 1997; Rosing et al., 1998; Teitz and Bart, 1968). More complete ap-

proaches explore a search tree (Beasley, 1993; Christoﬁdes and Beasley, 1982; Efroym-

son and Ray, 1966; Galv˜ao and Raggi, 1989; Jarvinen et al., 1972; Neebe, 1978). Other

approaches consider Lagrangian relaxation and subgradient optimization in a primal-

dual viewpoint (Beasley, 1993; Senne and Lorena, 2000).

The CPMP considers capacities for the service to be given by each median. The to-

tal service demanded by the vertices identiﬁed by p-median clusters cannot exceed the

service capacity. Apparently, the CPMP was not as intensively studied as the classical

PMP. Similar problems appeared in Klein and Aronson (1991), Maniezzo et al. (1998),

Mulvey and Beck (1984), and Osman and Christoﬁdes (1994).

This paper is organized as follows. The CGA description is divided into two sec-

tions. In Section 2, we present aspects of modeling to be considered when solving a

problem using CGA. Modeling involves deﬁnitions of the schema and structure rep-

Evolutionary Computation Volume 9, Number 3

Constructive Genetic Algorithm for Clustering Problems

Figure 1: A 3-median solution.

resentations and the consideration of the problems at issue as bi-objective optimization

problems. The evolution process is also described in this section. Section 3 describes

the CGA operators: selection, recombination, and mutation, as well as the deﬁnition of

an initial population and a CGA pseudocode. Section 4 shows computational results

using instances from the literature. We conclude in Section 5 with a summary of the

CGA performance and characteristics.

2 CGA Modeling

In this section, we describe the modeling phase of the CGA. The clustering problems

areformulated as bi-objective optimization problems. Two ﬁtness functions are deﬁned

on the space of all schemata and structures that can be obtained using a speciﬁc repre-

sentation. The evolution process considers the two objectives on an adaptive rejection

threshold, which gives ranks to individuals in the population and yields a dynamic

population.

2.1 Structure and Schema Representation

Very simple structure and schema representations are adopted for PMP and CPMP.

They use a binary alphabet, and assignment heuristics make a clear and independent

connection with the two clustering problems. The use of the same kind of representa-

tion allows the remaining steps in CGA to be valid for both problems.

Suppose a given graph G=(V,E). A clustering problem in graphs can be stated as

the search for partitions on the vertex set V in a (generally) predeﬁned number of clus-

ters, optimizing some measure on combinations of vertices and/or edge weights. The

problems considered in this paper are clustering problems in graphs.

A typical problem instance is composed of n demand points (vertices) V=

1,...,n

and a distance (weight) matrix

[



]

indicating distances between pairs of vertices such

that







, and



for all

j; l

To deﬁne the representation, some vertices are elected as the seeds, i.e., the initial

vertices in clusters that in some way attract the other vertices that participate in the

representation.

Evolutionary Computation Volume 9, Number 3 3

L. A. N. Lorena and J. C. Furtado

Figure 2: A 3-median schema.

Starting with an example for the PMP, consider the following 3-median solution

for a complete graph G(V,E) instance with 10 vertices (see Figure 1).

A partition on the index vertex set V is then made, yielding two blocks, the seed

set and the non-seed set, where the seeds are the medians. For the 3-median solution of

Figure 1,

(

;

is the seed set, and

(

;

is the non-

seed set. Vertices 1, 9, and 10 are the medians, and the others are assigned to a median.

The 3-median structure for the example will be

=(1

;

, where each

position

, receiving labels 1 or 0, means that vertex j belongs to sets

(

)

(

)

, respectively.

Structure

is not completely deﬁned, as we do not know the non-median

assignments. For the clustering problems, after the initial seed identiﬁcation, an

Assignment Heuristic is employed to assign non-seed vertices to clusters. For the PMP,

each non-median vertex is assigned to the nearest identiﬁed median. Algorithm AH1

(shown below) formalizes the assignments.

AH1

Read

(



;

; :::; 

(

)

(



;

;:::;

(

)

;

For

(

)

(

):=



;

end

for

For

(

)

index



Min

;:::;

(

)









;

(

):=

(

)



;

end

for

After the application of AH1, exactly

(

)

clusters are identiﬁed (for the

example in Figure 1,

(

) =

;

(

) =

;

, and

(

) =

;

)

Evolutionary Computation Volume 9, Number 3

Constructive Genetic Algorithm for Clustering Problems

corresponding to the median set

(

;

The CGA works directly with schemata. A 3-median schema for the example can

= (1

;

as Figure 2 clariﬁes. The deﬁned sets are:

(

) =

;

(

) =

;

, and the new set

(

) =

;

(

) =



(

)

[

(

))

is formed by vertices not considered by the 3-median schema. The “do

not care” label # will be used to distinguish this condition. Observe that the number

of medians is the same on schemata and structures, determining the name p-median

schema (a condition that can be relaxed).

The same heuristic AH1 is then used to make the assignments, giving a clear and

unique representation. The clusters identiﬁed in Figure 2 are

(

;

(

;

, and

(

;

The structure and schema

deﬁned above can also be used on the CPMP rep-

resentation. The only modiﬁcation is that now the clusters have capacities. Heuristic

AH2 is the corresponding assignment heuristic in this case. Assume, for instance, that

the cluster capacities are the same (a condition that can also be relaxed).

Each non-median vertex is assigned to its nearest median if the cluster capacity is

not violated.

AH2

Read

(



;

; :::; 

(

)

(



;

;:::;

(

)

;





;:::;

(

)





; :::;

(

)

For

(

)







(

):=



;

end

for

For

(

)

index



Min

;:::;

(

)













j



;







(

):=

(

)



;

end

for

Deﬁne X as the set of all structures and schemata that can be generated by the



string representation. For the PMP, the assignment heuristic AH1 allows

completeness to X in the sense that an optimal solution to the problem is always in

X. The same is not true on heuristic AH2 for CPMP (capacity constraints may not be

correctly represented).

The assignment heuristics can be deﬁned in various ways, and the more elaborated

they are, the better solutions they ﬁnd to the clustering problems (although, generally

increasing computational times).

2.2 The Bi-Objective Optimization Problem

The CGA is proposed to address the problem of evaluating schemata and structures in

a common basis. While in the other evolutionary algorithms, evaluations of individuals

are based on a single function (the ﬁtness function), in CGA this process relies on two

Evolutionary Computation Volume 9, Number 3 5

Constructive Genetic Algorithm for Clustering Problems

Figures

Citations

Solution methods for the p-median problem: An annotated bibliography

Bees algorithm for generalized assignment problem

A column generation approach to capacitated p-median problems

Wavelet neural network with improved genetic algorithm for traffic flow time series prediction

The capacitated centred clustering problem

References

Genetic algorithms in search, optimization, and machine learning

Computers and Intractability: A Guide to the Theory of NP-Completeness

Adaptation in natural and artificial systems

Genetic Algorithms in Search

Johnson: computers and intractability: a guide to the theory of np- completeness (freeman

Related Papers (5)

Adaptation in natural and artificial systems

Genetic Algorithms + Data Structures = Evolution Programs

Capacitated clustering problems by hybrid simulated annealing and tabu search

Genetic algorithms in search, optimization and machine learning

An Efficient Genetic Algorithm for the p -Median Problem

Frequently Asked Questions (11)

Q1. What have the authors contributed in "Constructive genetic algorithm for clustering problems" ?

Q2. What is the problem of partitioning the vertex set of a given graph into a?

Q3. What is the problem with building blocks?

Q4. What is the definition of a clustering problem in graphs?

Q5. What is the assignment heuristic for a CPMP?

Q6. What are the two purposes of the evolution process?

Q7. What is the effect of the population on the evolution parameter?

Q8. What is the heuristic for annealing?

Q9. What is the definition of the schema and structure?

Q10. Why are the computational times for both algorithms not comparable?

Q11. What is the expected value of gmax?