Particle swarm optimization

doi:10.1007/S11721-007-0002-0

Particle

Swarm

Optimization

James Kennedy' and Russell Eberhart2

Washington,

DC

20212

kennedyjim @bls .gov

2Purdue School of Engineering and Technology

Indianapolis,

IN

46202-5160

eberhart

@

engr.iupui

.edu

1

ABSTRACT

A concept for the optimization of nonlinear functions using particle swarm methodology is introduced.

The evolution of several paradigms is outlined, and an implementation of one

of

the paradigms is

discussed. Benchmark testing of the paradigm is described, and applications, including nonlinear

function optimization and neural network training, are proposed. The relationships between particle

swarm

optimization and both artificial life and genetic algorithms are described,

1

INTRODUCTION

This paper introduces a method for optimization of continuous nonlinear functions. The method was

discovered through simulation

of

a simplified social model; thus the social metaphor is discussed,

though the algorithm stands without metaphorical support. This paper describes the particle swarm

optimization concept in terms of its precursors, briefly reviewing the stages

of

its development from

social simulation to optimizer. Discussed next are a few paradigms that implement the concept.

Finally, the implementation of one paradigm

is

discussed in more detail, followed by results obtained

from applications and tests upon which the paradigm has been shown to perform successfully.

Particle swarm optimization has roots in

two

main component methodologies. Perhaps more obvious

are its ties

to

artificial life (A-life)

in

general, and to bird flocking, fish schooling, and swarming theory

in

particular. It is also related, however, to evolutionary computation, and has ties to both genetic

algorithms and evolutionary programming. These relationships

are

briefly reviewed

in

the paper.

Particle swarm optimization

as

developed by the authors comprises a very simple concept, and

paradigms can

be

implemented

in

a few lines of computer code. It requires only primitive

mathematical operators, and is computationally inexpensive in terms

of

both memory requirements and

speed. Early testing has found the implementation to

be

effective with several kinds of problems. This

paper discusses application of the algorithm to the training of artificial neural network weights,

Particle swarm optimization has also been demonstrated to perform well on genetic algorithm test

functions. This paper discusses the performance on Schaffer's f6 function,

as

described

in

Davis

[l].

2

SIMULATING SOCIAL BEHAVIOR

A number

of

scientists have created computer simulations

of

various interpretations

of

the movement

of

organisms in a bird flock or fish school. Notably, Reynolds

[8]

and Heppner and Grenander [4]

presented simulations

of

bird flocking. Reynolds was intrigued by the aesthetics of bird flocking

choreography, and Heppner, a zoologist, was interested in discovering the underlying rules that enabled

large numbers

of

birds to

flock

synchronously, often changing direction suddenly, scattering and

regrouping, etc. Both

of

these scientists had the insight that local processes, such

as

those modeled by

0-7803-2768-3/95/$4.00

0

1995

IEEE

1942

cellular automata, might underlie the unpredic:table group dynamics

of

bird social behavior. Both

models

relied

heavily on manipulation of inter-individual distances; that is, the synchrony of flocking

behavior was thought to be a function

of

birds’ efforts to maintain an optimum distance between

themselves and their neighbors.

It does not seem a too-large leap of logic to suppose that some same rules underlie animal social

behavior, including herds, schools, and

flocks,

and that of humans. As sociobiologist

E.

0.

Wilson [9]

has written,

in

reference to fish schooling, “In theory at least, individual members of the school can

profit from the discoveries and previous experience of

all

other members of the school during the

search for food. This advantage can become decisive, outweighing the disadvantages of competition

for food items, whenever the resource is unpedictably distributed in patches”

(p.209).

This

statement

suggests that social sharing of information among conspeciates offers an evolutionary advantage: this

hypothesis was fundamental to the developnmt of particle swarm optimization.

One motive for developing the simdation was to model human social behavior, which is of course not

identical to fish schooling or bird flocking.

Che

important difference is its abstractness. Birds and

fish

adjust their physical movement to avoid prechtors,

seek

food and mates, optimize environmental

parameters such

as

temperature, etc. Humans; adjust not only physical movement but cognitive or

experiential variables

as

well. We do not usually walk

in

step and

tum

in unison (though some

fascinating research in human conformity shows that we

are

capable of it); rather, we tend to adjust our

beliefs and attitudes to conform with those cd

our

social

peers.

This is a major distinction in

terms

of

contriving a computer simulation, for at least one obvious

reason: collision. Two individuals can hold identical attitudes and beliefs without banging together, but

two birds cannot occupy the

same position

in

space without colliding. It seems reasonable, in

discussing human social behavior, to map

thie

concept of change into the birdfish analog of

movement.

This is consistent with the classic Aristotelim view of qualitative and quantitative change

as

types of

movement. Thus, besides moving through tlhree-dimensional physical space, and avoiding collisions,

humans change in abstract multidimensional space, colision-free. Physical space of course affects

informational inputs, but it

is

arguably a trivial component

of

psychological experience. Humans learn

to avoid physical collision by

an

early age, hit navigation of n-dimensional psychosocial space requires

decades of practice

-

and many of

us

never seem to acquire quite

all

the skills we need!

3

PRECURSORS: THE

ETIOLOGY

OF

PARTICLE SWARM OPTIMIZATION

The particle swarm optimizer is probably kcst presented by explaining its conceptual development.

As

mentioned above, the algorithm began

as

a simulation of a simplified social milieu. Agents were

thought of

as

collision-proof birds, and the original intent was to graphically simulate the graceful but

unpredictable choreography of a bird

flock.

3.1

Nearest

Neighbor

Velocity

Matching

and

Craziness

A

satisfying simulation was rather quickly written, which

relied

on two props: nearest-neighbor

velocity matching and “craziness.”

A

populartion of birds was randomly initialized with a position for

each on a torus pixel grid and with

X

and

Y

velocities. At each iteration a loop

in

the program

determined, for each agent (a more appropriate term than bird), which other agent was its nearest

neighbor, then assigned that agent’s

X

and

Y

velocities to the agent in focus. Essentially this simple

de

created

a synchrony of movement.

Unfortunately, the

flock

quickly settled on

ii

lunanimous, unchanging direction. Therefore, a stochastic

variable called

craziness

was

introduced. At each iteration some change was added to randomly

1943

chosen

X

and

Y

velocities. This intrdud enough variation into the system to give the simulation

an

interesting and “lifelike” appearance, though of course the variation was wholly artiticial.

3.2

The

Cornfield

Vector

Heppner’s bird simulations had a feature which introduced a dynamic force into the simulation.

His

birds flocked around a “roost,” a position on the pixel screen that attracted them until they finally

landed there. This eliminated

the

need

for a variable like craziness,

as

the simulation took on a lie of

its own. While the idea of

a

roost was intriguing, it led to another question which

seemed

even more

stimulating. Heppner’s birds knew where their roost was, but in real

life

birds land on any

tree

or

telephone wire that meets their immediate needs. Even more importantly, bird flocks land where there

is

food.

How do they find food? Anyone who has ever put out a bird feeder knows that within hours a

great number of birds will likely find it, even though they had no previous knowledge

of

its location,

appearance, etc. It seems possible that something about the flock dynamic enables members of the

flock to capitalize on one another’s knowledge,

as

in Wilson’s quote above.

The second variation

of the simulation defined a “comfield vector,” a two-dimensional vector of

XY

coordinates on the pixel plane. Each agent was programmed to evaluate its present position in terms of

the equation:

Eval=

Jw

+

4-

so

that at the (100,100) position the value was zero.

Each agent “remembered” the best value and the

XY

position which had resulted in that value. The

value was called

pbest[]

and the positions

pbestx[]

and

pbestyl]

(brackets indicate that these

are

arrays, with number of elements

=

number of agents).

As

each agent moved through the pixel space

evaluating positions, its

X

and

Y

velocities were adjusted in a simple manner.

If

it was to the right

of

its

pbestx,

then its

X

velocity (call it

vx)

was adjusted negatively by a random amount weighted by a

parameter

of

the system:

vx[]=vx[]

-

rand()*p-increment.

If

it was to the left of

pbestx,

rand()*p-increment

was added to

vx[].

Similarly,

Y

velocities

vy[]

were adjusted up and down,

depending on whether the agent was above or below

pbesty.

Secondly, each agent “knew” the globally best position that one member of the flock had found, and its

value. This

was

accomplished by simply assigning the

array

index of the agent with the best value to a

variable called

gbest,

so that

pbestx[gbest]

was the group’s best

X

position, and

pbesty[gbest]

its best

Y

position,

and

this

information was available to all flock members.

Again,

each member’s

vx[]

and

vy[]

were adjusted

as

follows, where

g-increment

is

a system parameter.

ifpresentx[l> pbestx[gbest] then

vx[]

=

vx[]

-

rand()

*g-increment

ifpresentx[]

<

pbestx[gbest] then

vx[]

=

vx[]

+

rand()

*g-increment

ifpresenty[]

>

pbesty[gbestl then

vy[]

=

vy[]

-

rand()

*g-increment

ifpresenty[l< pbesty[gbestl then

vy[]

=

vy[]

+

rand() *g-increment

In

the simulation, a circle marked the (100,100) position on the pixel field, and agents were represented

as

colored points. Thus an observer could watch the flocking agents circle around until they found the

simulated cornfield. The results were

surprising.

With

p-increment

and

g-increment

set relatively

high, the

flock

seemed to be sucked violently into the cornfield.

In

a very few iterations the entire flock,

usually

15

to 30 individuals, was seen to be clustered within the tiny circle surrounding the goal. With

p-increment

and

g-increment

set low, the flock swirled around the

goal,

realistically approaching it,

swinging out rhythmically with subgroups synchronized, and finally “landing” on the target.

1944

33

Eliminating

Ancillary

Variables

Once it was clear that the paradigm could

og~imize

simple, two-dimensional, linear functions, it was

important to identify the parts of the paradip that

are

necessary for the task. For instance, the

authors quickly found that the algorithm worlts just as well, and looks just

as

realistic, without

craziness,

so

it was removed. Next it was shown that optimization actually occurs slightly faster when

nearest neighbor velocity matching is removed, though the visual effect

is

changed. Theflock

is

now a

swam,

but it

is

well able to find the codielidl.

The variables

pbest

and

gbest

and their

increments

are

both necessary. Conceptually

pbest

resembles

autobiographical memory,

as

each individual remembers its own experience (though only one fact

about it), and the velocity adjustment associarted with

pbest

has been called “simple nostalgia” in that

the individual tends to return to the place thiat most satisfied it in the past. On the other hand,

gbest

is

conceptually similar to publicized knowledge, or a group norm or standard, which individuals seek to

attain.

In

the simulations, a high value of

princrement

relative to

g-increment

results in excessive

wandering of isolated individuals through the problem space, while the reverse (relatively high

g-increment)

results in the

flock

rushing prematurely toward local minima. Approximately

equal

values of the two

increments

Seem to result in the most effective search

of

the problem domain.

3.4

Multidimensional

Search

We

the algorithm

seems

to impressively ”del a

flock

searching for a cornfield, most interesting

optimization problems are neither linear nor two-dimensional. Since one of the authors’ objectives is

to model social behavior, which is multidimensional and collision-free, it

seemed

a simple step to

change

presentx

and

presenty

(and of course

vx[]

and

vy[n

from onedimensional arrays to

D

x

N

matrices, where

D

is any number of dimensions and

N

is the number of agents.

Multidimensional experiments were performed, using

a

nonlinear, multidimensional problem: adjusting

weights to

train

a feedforward multilayer pe:nceptron neural network

(NN).

One of the authors’ first

experiments involved training weights for a tluee-layer

NN

solving the exclusive-or (XOR) problem.

This problem requires two input and one output processing elements

(PES),

plus some number of

hidden

PES.

Besides connections from the piwious layer, the hidden and output

PE

layers each has a

bias

PE

associated with it. Thus a

2,3,1 NN

requires optimization of

13

parameters. This problem

was approached by flying

the

agents through 13-dimensional space until

an

average sum-squared error

per

PE

criterion was met. The algorithm performed very well on this problem. The thirteen-

dimensional XOR network

was

trained, to

am

e

<

0.05

criterion, in an average of

30.7

iterations with

20

agents. More complex

NN

architectures, look longer of course, but results, discussed in

Section

5:

Results and Early Applications,

were still very good.

3.5

Acceleration

by

Distance

Though the algorithm worked well, there

WiU;

something aesthetically displeasing and hard to

understand about it. Velocity adjustments were based on a crude inequality test:

ifpresentx

>

bestx,

make it smaller;

ifpresentx

c

bestx,

make it bigger. Some experimentation revealed that further

revising the algorithm made it easier to und~eirstand and improved its performance. Rather than simply

testing the sign of the inequality, velocities

were

adjusted according to their difference, per dimension,

from

best locations:

vx[][]

=

vx[][]

+

rand()

*p_increment*(pbt?stx[][]

-

presentx[l[l)

1945

(note the parameters

vx

and

presentx

have two sets of brackets because they are now matrices of agents

by dimensions;

increment

and

bestx

could also have a

g

instead of

p

at their beginnings.)

3.6

Current

Simplified

Version

It was soon realized that there is no

good

way to guess whether

p-

or

g-increment

should

be

larger.

Thus, these terms were also stripped out

of

the algorithm. The stochastic factor was multiplied

by

2

to

give it

a

mean

of

1,

so

that agents would “overfly” the target about half the time. This version

outperforms the previous versions. Further research will show whether there is

an

optimum value for

the constant currently set at

2,

whether the value should be evolved for each problem, or whether the

value can be determined from some knowledge of a particular problem. The current simplified particle

swarm optimizer now adjusts velocities by the following formula:

vxLlLl=

VXLILl

+

2

*

rand()

*

(pbestx[][]

-

presentx[]fl)

+

2

*

rand()

*

(pbestxfllgbesfl

-

presentxflf])

3.7

Other Experiments

Other variations on the algorithm were tried, but none seemed to improve on the current simplified

version. For instance, it is apparent that the agent

is

propelled toward a weighted average of the two

“best” points in the problem space. One version

of

the algorithm reduced the two terms to one, which

was the point on each dimension midway between

pbest

and

gbest

positions. This version had

an

unfortunate tendency, however, to converge on that pint whether it was an optimum or not.

Apparently the two stochastic “kicks” are a necessary part of the process.

Another version considered using

two

types

of agents, conceived

as

“explorers” and “settlers.”

Explorers used the inequality test, which tended to cause them to overrun the target

by

a large distance,

while settlers used the difference term. The hypothesis was that explorers would extrapolate outside

the “known” region of the problem domain,

and

the settlers would hill-climb or micro-explore regions

that had been found to

be

good.

Again, this method showed no improvement over the current

simplified version. Occam’s razor slashed

again.

Another version that was tested removed the momentum

of

vx[][].

The new adjustment was:

VXilLl

=

2

*

rand()

*

(pbestxflf]

-

presentx[lfl

)

+

2

*

rand()

*

(pbestx[][gbestJ

-

presentx[l[]

)

This

version, though

simplified,

tumed out

to

be

quite ineffective at finding global optima.

4

SWARMS

AND

PARTICLES

As

was described

in

Section

3.3,

it

became obvious

during

the simplification of the paradigm that the

behavior of the population of agents

is

now more like

a

swarm than a

flock.

The term

swarm

has a

basis

in

the literature.

In

particular, the authors use the term in accordance with a paper by Millonas

[6],

who developed his models for applications

in

artificial life, and articulated five basic principles

of

swarm intelligence. First is the proximity principle: the population should

be

able to

carry

out simple

space

and

time computations. Second is the quality principle: the population should

be

able to respond

to quality factors

in

the environment. Third is the principle of diverse response: the population should

not commit

its

activities along excessively narrow channels. Fourth is the principle of stability: the

population should not change its mode of behavior every time the environment changes. Fifth

is

the

1946

Particle swarm optimization

Citations

Introduction to Algorithms

An idea based on honey bee swarm for numerical optimization

Practical Genetic Algorithms

Teaching-learning-based optimization: A novel method for constrained mechanical design optimization problems

Firefly algorithm, stochastic test functions and design optimisation

References

Adaptation in natural and artificial systems

A new optimizer using particle swarm theory

The use of multiple measurements in taxonomic problems

Flocks, herds and schools: A distributed behavioral model

Handbook of Genetic Algorithms

Related Papers (5)

A modified particle swarm optimizer

A new optimizer using particle swarm theory

Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces

Optimization by Simulated Annealing

Adaptation in natural and artificial systems