Adaptive step size random search

doi:10.1109/TAC.1968.1098903

270

IEEE

TRANSACTIONS ON AUTOMATIC

CONTROL,

VOL.

AC-13,

NO.

3,

JUNE

1968

Adaptive Step Size Random Search

Absiraci-Fixed step size random search

for

minimization

of

functions

of

several parameters

is

described and compared

with

the

ked step size gradient method for a particular surface.

A

theoretical

technique,

using

the optimum step size at each step,

is

analyzed.

A

practical adaptive step size random search algorithm

is

then pro-

posed, and experimental experience

is

reported that shows the

superiority

of

random search over other methods for sufllciently

high

dimension.

T

INTRODUCTION

HE PROBLEM of locating the minimum of

a

function of several variables

is

one that arises fre-

quently in many areas of technology, particularly

in the design of adaptive control and communication

systems.

The problem is: given the quality function

Q(X),

where

X

is

a

vector of adjustable parameters

xl,

.

,

x,;

find the value of

X

that minimizes

Q.

The following

assumptions are made.

Q

is unimodal. If it is not, a global search can be

carried out first to partition the parameter space

into regions where

Q

is

unimodal.

The structure of the function

Q(X)

is completely

unknown. The only way that information can be

obtained is by evaluating

Q

at specific points. This

means, for example, that derivatives of

Q

are not

directly measurable (if they exist at all).

The

only

significant cost involved in the operation

of

a

search procedure results from evaluating

Q.

Therefore, the fewer function evaluations required

,

the more desirable is the procedure.

Of course, any strategy suggested for solving the pre-

ceding problem can be evaluated only for

a

specific sur-

face

or

class of surfaces. Analyses in

this

paper will be

restricted to hyperspherical surfaces, and experimental

results will be given for other surfaces as well.

Besides the many deterministic minimization algo-

rithms developed, dating back to such classical methods

as

steepest descent, the Newton-Raphson method, and

other gradient procedures, Brooks[‘] and Rastrigin[21.[31

have suggested randomized search strategies.

This work was partially supported by

Army Research Office-Durham

Manuscript received October

30,

1967;

revised January

19,

1968-

under Contract DA-31-124-ARO-D-292, by a National Science

Foundation Graduate Fellowship, and by

a

Research Grant from

the Bendix Corporation. It made use

of

computer facilities supported

in part

by

the National Science Foundation under Grant NSF-GP-

570

Rastrigin has compared

a

fixed

step

size random

search (FSSRS) method with

a

fixed step size gradient

method and concluded that under certain circumstances

FSSRS

is superior. It is clear, however,

that

if the step

size of the random search method were optimum

at

each

step, even better performance would result.

In

this

paper,

a

hypothetical random search method that

uses

the optimum step size at each point mill be analyzed

for

a

hyperspherical surface.

An

adaptive step size

random search (ASSRS) method will then be proposed

that approximates the performance of the optimum step

size random search (OSSRS) procedure.

FIXED STEP SIZE

RANDOM

SEARCH

(FSSRS)

The algorithm for FSSRS is

Xiil

=

Xi

-

aiAXi

-I-

AXi+1

(1)

where

Xi

is

the position in state space at the ith instant

and

AXi

is

a

random vector of length

s,

which

is

dis-

tributed uniformly over the hypersphere of radius

s

whose center is at the origin. The coefficient

ai

is given

hv

(2)

where

and

Qi-l+

=

min

Qj

j=l,Z,.

.

,61

is the smallest value

of

the quality function to be ob-

served in the

first

i

-

1

steps. The coefficient

ai

serves to

negate the effect of an unsuccessful step. Ra~trigin[~1.[~1

has analyzed this algorithm for

s

=

1

and the function

To motivate the development of

a

random search algo-

rithm that adapts the step size

s

to the situation, the

performance of FSSRS as

a

function of

s

is considered.

Attention will be restricted to the function

i=l

“I,.

31.

A.

Schumer

is

nith the Research Div., Raytheon

Co..

\Val-

tham. Mass. 02154

University, Princeton,

N.

J.

08540

K:

Steiglitz is with the Dept.

of

Electrical Engineering, Princeton

the analysis in this section

is

the

same

as

Rastrigin.

which

is

smooth

at

its extremum and is more representa-

tive of problems with

a

minimum square error criterion

than the function considered by Rastrigin. Otherwise,

SCHUMER

AI\TD

STEIGLITZ: ADAPTIVE STEP

SIZE

RANDOM

SEARCH

271

Fig.

1.

A

cross

section

of

parameter space.

Consider the plane formed by the displacement vector

4X

and the gradient vector through the starting point

A

(Fig.

1).

#

is the angle between the displacement

vector and the negative gradient direction.

40

is the

largest value for

#

for which there is an improvement

as

the result of the step

4X.

For the assumed uniform

distribution of displacement, the probability density

for

#,

considering

#

only

on

[0,

T],

(due to symmetry)

is

(see Rastrigin and Mutseniyek~,[~] and entries 858.45

and 858.46

of

sinn-24

P(4)

=

2

so'sinp24

(4)

r(n

-

1)

-

sinn-%.

2n-2

[

r

(7)12

Search

loss

is

defined

as

twice the ratio of the quality

function to the expected value of the improvement per

function evaluation. The search

loss

for the fixed step

size gradient technique

is

then

where

7

=

s/p,

the ratio of step size

to

distance to the

minimum. The search

loss

for

FSSRS

is found to be

and

$0

is equal to cos1 (q/2). Equations

(5)

and

(6)

are

derived in appendix

I.

Fig.

2

shows the relative behavior

of

the random

search and gradient methods for different values of

7-l

0

z7

0

20

Dirnension,n

40

60

80

Fig.

2.

Tradeoff between FSSRS and fixed step size gradient tech-

nique. The gradient method is superior above the boundary and

FSSRS is superior below the boundary.

and dimension

n.

Above the boundary random search

is superior (has

a

smaller search

loss)

to the gradient

method, while below the boundary the gradient tech-

nique is superior. For

n

less than

4,

the gradient tech-

nique is always superior, but for higher dimension ran-

dom search is superior for small

q.

OPTIMUM

STEP SIZE

RAXDOM

SEARCH

(OSSRS)

If

the step size for FSSRS is very small, the probabil-

ity of improvement is approximately one half, but the

improvement is very small for

a

successful step, and this

results in

a

small average improvement. On the other

hand, if the step size is made too large, the step will

overshoot the minimum and the probability of improve-

ment will be extremely small, also resulting in

a

very

small average improvement. Somewhere between these

extremes lies an optimum step size, i.e.,

a

step size for

which the probability of the improvement of the quality

function

is

not one half, but lies between zero and one

half.

The expected improvement, normalized by the pres-

entvalueofQ,i.e.,I= --E{4Q}/Q,isequal to2/Lr(n,7)

and

is

given by

S,""

(21

cos

4

-

72)

sinn-24

d4

I(%

7)

=

*

(7)

2

Jo

sinn-24

To maximize

I,

the right-hand side of

(7)

is

differenti-

ated with respect to

q

and set equal to zero with the

following equation for the optimal value of

7

resulting:

Upon making the appropriate approximations, the fol-

lowing asymptotic expressions are found for large

n

(see Appendix

11)

272

IEEE

TRANSACTIONS

ON

AUTOMATIC

CONTROL,

JUNE

1968

1.225

qopt

-

0.406

I,,

-

4;

.n

If the normalized expected improvement

I

is propor-

tional to

l/n

(see (lo)), and if the normalized improve-

ments are independent,

as

is true for OSSRS, then the

average number of function evaluations for a fixed de-

sired accuracy (relative to the starting value) is

asymptotically linear in

n,

as

is

now

demonstrated. Sup-

pose that the search starts

at

a

point

at

which the value

of the quality function is

Qo,

and that it is desired to

terminate when the quality function reaches

a

final

value of

9.

The value

Qj

of the quality function after

j

steps can be expressed recursively as

Qj

=

1

-

ij)

where

ij

is the normalized improvement

at

thejth step.

Thus

x

~3f

=

QO

(1

-

ij).

j=1

Taking the expected value of both sides, in light of the

preceding assumptions, results in

where

k

is the constant of proportionality. Solving for

the value of

M

for which

E

[QAW]

is equal to the desired

final value

Qf

results in

Thus the asymptotic expression for large

n

becomes

dl&

-

(constant)

an

(14)

where the constant is equal to (-l/k) log

(Q,/Qo).

PRACTICAL

ALGORITHM

FOR

ADAPTIVE

STEP

SIZE

RANDOM

SEARCH

(ASSRS)

OSSRS

is

a

theoretical model and the optimum step

size cannot be found without additional experimenta-

tion. One way to construct

a

practical algorithm would

be to try numerous exploratory random steps from the

same point, each with the same step size, and to repeat

this

procedure for

a

number of different step sizes. From

these results, the optimum step size could be estimated

and this estimate used. However, none of the intermedi-

ate exploratory steps would produce any improvement.

In the ASSRS algorithm,

no

attempt is made

to

esti-

mate the optimum

step

size accurately. Instead the

BEGINNING

OF

1110.

12.0

INITIALIZE LAND

S

I

TAKE STEPS

OF

SIZE

S

AND

S

(I+ol.UPoATE

IS

AN IMPROVEMENT.

X

bND

FO

WEN THERE

I

STEP SIZE BE THAT

I

WCH PRODUCED THE

I

12;"

1

LARGER IMPROMMEM.

I

<

PRESCRIBED LIMIT

7

IS

I2

WLLh THAN

rTS

I

[Wl

[..I

REDUCE

S

12-0

I

<

BEEN

SATISFIED

7

HAS THE STOPPING CRITERW

I

(=)

STOP SEARCHING

,

~

IS

I1

AMULTIRE

I

OF

SOME

PRESCRIBED VALUE

I'

4

LOOP.

I

RETURN TO

BEGlNNlNG

OF

I

INCREMENT11

I

Fig.

3.

Flow diagram for

ASSRS.

I2

counts the number of

suc-

loop.

cessive failures.

I1

counts total number

of

iterations through the

optimum is tracked in an approximate fashion, and

each step is both exploratory and able to produce an

improvement. A nominal value,

s,

for the step size

is

chosen before each iteration. A random step of size

s

is

taken and

a

random step of size s(1

+a)

is taken

(1

>a

>

0),

and the resultant normalized improvements

are compared. The

step

size that produces the larger

improvement

is

chosen

as

the nominal step size for the

next iteration.

If

neither step causes an improvement, the

step size remains unchanged; and if this occurs for some

number of iterations, the step size is reduced. Thus

on

the average the algorithm adjusts to the direction of

the best step size. In addition, each time some large

number of iterations has passed,

a

step with nominal

step

size is compared, in the same manner, n-ith

a

step

of much larger size. Again, the step size that produces

the larger improvement is chosen

as

the new nominal

step size. This test serves as

a

deterrent against the

possibility that the step size has inadvertently become

too small.

It

is also helpful in the case in which the

I

vs.

step size curve has more than

a

single local maximum.

In

such

a

case the search procedure could be chasing

a

small local maximum; and

a

large change in step size

would make

it

possible to detect and begin adaptation

to

a

higher local maximum.

-4

flow

diagram for the

ASSRS algorithm is shown in Fig.

3.

SCHUMER

AND

STEIGLITZ:

ADAPTIVE

STEP SIZE

RANDOM

SEARCH

273

EXPERIMENTAL

RESULTS

The ASSRS algorithm was tested on the

IBM

7094

computer for

a

number of test functions. The results of

these experiments are presented here.

The quality function

Q=p2

was tested to see how the

ASSRS algorithm compared to the Newton-Raphson

method and to OSSRS. For each dimension from

1

to 40,

fifteen independent trials were run. The stopping cri-

terion was Q<1OP8 and the starting point was

quired function evaluations is well described by

ICALL

=

80n.l

Since derivatives are not available, partial derivatives

must be measured approximately by taking finite dif-

ferences, and the number of function evaluations per

iteration for the Newton-Raphson method can be found

as

follows. To approximate the gradient vector,

a

mini-

mum

of

n

finite differences are needed.

To

find the diag-

onal terms of the Hessian matrix, 2n additional func-

tion evaluations are needed to estimate the partial

derivatives

at a

second point. The Hessian matrix is

symmetric

so

that hii

=hi;.

Thus one half of the

off-

diagonal terms or

(n2-n)/2

more second partial deriva-

tives are required, This requires

n2-n

additional func-

tion evaluations. Also, one more function evaluation

occurs with the final move. Thus the total number of

function evaluations required per iteration

of

the New-

ton-Raphson method is n+2n+(n2-n)+l

=

(~+l)~.

Assuming that the partial derivatives can be determined

exactly by finite differences, only one iteration is re-

quired for this function and the number of function

evaluations required for the Newton-Raphson method

is

(n

+

l)2.

The results for ASSRS and for the Newton-Raphson

method are shown in Fig.

4.

Extrapolating these curves,

their intersection is at

n

=

78,

beyond which dimension

ASSRS is superior. If the fact that the partial deriva-

tives cannot be measured exactly (which means that

additional iterations are required) is taken into account,

the Newton-Raphson curve becomes higher and the

intersection with the ASSRS occurs at

a

smaller value

of

n.

Also,

if

a lesser degree of accuracy is required, the

ASSRS curve has

a

smaller slope while the Newton-

Raphson curve is unchanged. This also results in a

smaller value of

n

at

the intersection of the curves, and

ASSRS

is

then superior

at

a

smaller value

of

dimension.

As

an indication of the variance in ICALL associated

with ASSRS, the standard deviation for

n=

100 was

found to be

502

as compared with the mean of

7677.

On the basis of the experimental results, the average

normalized improvement per step,

I,

was calculated for

Q=p2

and was found to be asymptotic to

0.2725/n.

Thus, although the value of

K

is smaller than for

OSSRS, the asymptotic form of

I

is still

k/n;

and the

number of function calls was found to be asymptotically

(1,

1,

-

*

,

1).

The resulting average number of re-

tions performed during

a

given minimization procedure.

1

The variable

ICALL

represents the number of function evalua-

2500

2000

OARS.

00

Dirnension,n

Fig.

4.

Average number of function evaluations vs. dimension

for

Q=

~;clx~i2,

for

ASSRS

(0)

and the Newton-Raphson method

(squares).

2500

F

.N-R

I-

8

A

oooo

I500

A

O00

'Oo0

8

A

00

A

oo

Dirnension,n

Fig.

5.

Average number of function evaluations

vs.

dimension for

Q=

c1,13ci4,

for ASSRS

(O),

Newton-Raphson (squares),

and

the simplex method

(A).

proportional to

n

as

it

is for OSSRS [see (14)].

Fig.

5

shows the results for

Q

=

xi4

i=l

averaged for 22 independent experiments for each

dimension from

1

through 40. Again the starting point

was (1, 1,

.

,

1). The stopping criterion was

Q<O.S

X10-8. The Newton-Raphson method is again

as-

sumed to be able to measure derivatives without error,

but the fact that

Q

is not quadratic

is

taken into account

by multiplying the number

of

iterations required by

(n+l)* in order to find the number of function calls.

The results for the simplex method are those of Nelder

and 31ead,[61 who found that, for

n=

1

through 10,

the number of function evaluations needed for the

simplex method is well described by 3.16(n+1)2.11. This

formula was extrapolated to higher dimensions and

274

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, JUNE

1968

700

w

CONCLUSION

L

I

Dimension

,n

Fig.

6.

Average

number

of

function evaluations vs. dimension

for

ASSRS

for

Q

=

x:=.l

upi2.

plotted along with the curves for ASSRS and Newton-

Raphson. In this case, ASSRS is superior to Newton-

Raphson for

n

>

2

and to the simplex method for

n

>

10.

For ASSRS, ICALL is again proportional to

n,

and

I

is

asymptotic to

0.427/n.

The quadratic form

n

i=l

was also used

as

a

test function, with the coefficients

tribution

on

[0.1,

11.

The minimization was repeated

twenty times

for

each dimension from

1

to

20

and

six

times for

n=

100. The search was terminated when

Q

was less than one one-thousandth of its initial value.

Fig.

6

shows the resulting average number of function

evaluations required

as

a function of dimension. The

result for

100

dimensions was ICALL

=3396.

Again,

ICALL was found to be approximately linear in

n.

Rastrigin131 has done some work with random search

algorithms that adapt

only

to the best direction with a

fixed step size. The problem with this method is that

convergence is only guaranteed to within

a

distance of

one step size to the minimum,

so

that if

a

high accuracy

is desired, the steps must be small, thus requiring many

function evaluations. Adapting

the

step size seems to be

more fruitful than adapting to the direction. A com-

bination of the two methods, however, would seem to be

in order and should be the subject of future investiga-

tions.

ASSRS was also tested using Rosenbrock's function

Q=

100(xz-x~z)2+(1

-x1)2,

and although

it

converged,

ASSRS

was

inferior to Rosenbrock's methodr71 and to

Powell's methad[*l for this function. It should, therefore,

be noted that ASSRS

is

not very effective as

a

ridge

fol-

lower,' but shows its superiority in multidimensional

problems without narrow valleys or ridges. Combining

directional adaptation with step size adaptation may

result

in

removing this limitation.

a1,

az,

-

.

. ,

a,

randomly chosen from

a

uniform dis-

In this paper the problem

of

minimizing

a

function

of several parameters by the method of random search

has been discussed. The ked step size random search

algorithm has been described and compared to the sim-

ple gradient technique

on

the basis of search loss. It has

been shown that for

n>4,

FSSRS

is superior for

suffi-

ciently small

7.

Optimum step size random search was

introduced and its performance investigated for hyper-

spherical surfaces.

A practical algorithm for adaptive step size random

search has been described and compared with

deter-

ministic methods. For the functions

n

Q

=

xi2

i=l

n

Q

=

xi4

i=l

n

Q

=

acxi2

&I

the number

of

function evaluations required

for

a

de-

sired accuracy for the deterministic methods increases

at

a rate that is proportional to

at

least the second power

of

n;

and the computation time increases as

n3.

This is

true for other classicaI methods besides the Newton-

Raphson and simplex However, for ASSRS

the number of required function evaluations is propor-

tional to

n,

and the computation time is proportional

to

n2.

The computation time could be made proportional

to

n

if parallel computations were used. Thus the con-

clusion

is

reached, that despite its simplicity, adaptive

random search is an attractive technique for problems

with large numbers of dimensions.

APPEXDIX

I

EVALUATION

OF

SEARCH

LOSS

The search loss for the fixed step size gradient method

is

found as follonrs. The gradient of

Q

is given by

2X;

thus the step of size

s

to be taken is

This results in

a

change in

Q

AQ

=

s2

-

2~p

(13

which is negative as long

as

p>s/2.

To

determine the

correct descent direction,

n

measurements of the quality

function are made, i.e., one in each of the

n

coordinate

directions with

a

step size much smaller than

s.

Also,

one function evaluation is made corresponding to the

actual move of size

s.

Thus a total of

n+l

function

evaluations are necessary for the improvement given

by

(15).

Therefore, the search

loss

for the gradient

method is

Adaptive step size random search

Citations

Cites background or methods from "Adaptive step size random search"

Cites background from "Adaptive step size random search"

Cites background from "Adaptive step size random search"

References

Related Papers (5)