scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Adaptive step size random search

01 Jun 1968-IEEE Transactions on Automatic Control (IEEE)-Vol. 13, Iss: 3, pp 270-276
TL;DR: A practical adaptive step size random search algorithm is proposed, and experimental experience shows the superiority of random search over other methods for sufficiently high dimension.
Abstract: Fixed step size random search for minimization of functions of several parameters is described and compared with the fixed step size gradient method for a particular surface. A theoretical technique, using the optimum step size at each step, is analyzed. A practical adaptive step size random search algorithm is then proposed, and experimental experience is reported that shows the superiority of random search over other methods for sufficiently high dimension.

Content maybe subject to copyright    Report

270
IEEE
TRANSACTIONS ON AUTOMATIC
CONTROL,
VOL.
AC-13,
NO.
3,
JUNE
1968
Adaptive Step Size Random Search
Absiraci-Fixed step size random search
for
minimization
of
functions
of
several parameters
is
described and compared
with
the
ked step size gradient method for a particular surface.
A
theoretical
technique,
using
the optimum step size at each step,
is
analyzed.
A
practical adaptive step size random search algorithm
is
then pro-
posed, and experimental experience
is
reported that shows the
superiority
of
random search over other methods for sufllciently
high
dimension.
T
INTRODUCTION
HE PROBLEM of locating the minimum of
a
function of several variables
is
one that arises fre-
quently in many areas of technology, particularly
in the design of adaptive control and communication
systems.
The problem is: given the quality function
Q(X),
where
X
is
a
vector of adjustable parameters
xl,
.
.
,
x,;
find the value of
X
that minimizes
Q.
The following
assumptions are made.
Q
is unimodal. If it is not, a global search can be
carried out first to partition the parameter space
into regions where
Q
is
unimodal.
The structure of the function
Q(X)
is completely
unknown. The only way that information can be
obtained is by evaluating
Q
at specific points. This
means, for example, that derivatives of
Q
are not
directly measurable (if they exist at all).
The
only
significant cost involved in the operation
of
a
search procedure results from evaluating
Q.
Therefore, the fewer function evaluations required
,
the more desirable is the procedure.
Of course, any strategy suggested for solving the pre-
ceding problem can be evaluated only for
a
specific sur-
face
or
class of surfaces. Analyses in
this
paper will be
restricted to hyperspherical surfaces, and experimental
results will be given for other surfaces as well.
Besides the many deterministic minimization algo-
rithms developed, dating back to such classical methods
as
steepest descent, the Newton-Raphson method, and
other gradient procedures, Brooks[‘] and Rastrigin[21.[31
have suggested randomized search strategies.
This work was partially supported by
Army Research Office-Durham
Manuscript received October
30,
1967;
revised January
19,
1968-
under Contract DA-31-124-ARO-D-292, by a National Science
Foundation Graduate Fellowship, and by
a
Research Grant from
the Bendix Corporation. It made use
of
computer facilities supported
in part
by
the National Science Foundation under Grant NSF-GP-
570
Rastrigin has compared
a
fixed
step
size random
search (FSSRS) method with
a
fixed step size gradient
method and concluded that under certain circumstances
FSSRS
is superior. It is clear, however,
that
if the step
size of the random search method were optimum
at
each
step, even better performance would result.
In
this
paper,
a
hypothetical random search method that
uses
the optimum step size at each point mill be analyzed
for
a
hyperspherical surface.
An
adaptive step size
random search (ASSRS) method will then be proposed
that approximates the performance of the optimum step
size random search (OSSRS) procedure.
FIXED STEP SIZE
RANDOM
SEARCH
(FSSRS)
The algorithm for FSSRS is
Xiil
=
Xi
-
aiAXi
-I-
AXi+1
(1)
where
Xi
is
the position in state space at the ith instant
and
AXi
is
a
random vector of length
s,
which
is
dis-
tributed uniformly over the hypersphere of radius
s
whose center is at the origin. The coefficient
ai
is given
hv
(2)
where
and
Qi-l+
=
min
Qj
j=l,Z,.
.
.
,61
is the smallest value
of
the quality function to be ob-
served in the
first
i
-
1
steps. The coefficient
ai
serves to
negate the effect of an unsuccessful step. Ra~trigin[~1.[~1
has analyzed this algorithm for
s
=
1
and the function
To motivate the development of
a
random search algo-
rithm that adapts the step size
s
to the situation, the
performance of FSSRS as
a
function of
s
is considered.
Attention will be restricted to the function
i=l
“I,.
31.
A.
Schumer
is
nith the Research Div., Raytheon
Co..
\Val-
tham. Mass. 02154
University, Princeton,
N.
J.
08540
K:
Steiglitz is with the Dept.
of
Electrical Engineering, Princeton
the analysis in this section
is
the
same
as
Rastrigin.
which
is
smooth
at
its extremum and is more representa-
tive of problems with
a
minimum square error criterion
than the function considered by Rastrigin. Otherwise,

SCHUMER
AI\TD
STEIGLITZ: ADAPTIVE STEP
SIZE
RANDOM
SEARCH
271
Fig.
1.
A
cross
section
of
parameter space.
Consider the plane formed by the displacement vector
4X
and the gradient vector through the starting point
A
(Fig.
1).
#
is the angle between the displacement
vector and the negative gradient direction.
40
is the
largest value for
#
for which there is an improvement
as
the result of the step
4X.
For the assumed uniform
distribution of displacement, the probability density
for
#,
considering
#
only
on
[0,
T],
(due to symmetry)
is
(see Rastrigin and Mutseniyek~,[~] and entries 858.45
and 858.46
of
sinn-24
P(4)
=
2
so'sinp24
(4)
r(n
-
1)
-
-
sinn-%.
2n-2
[
r
(7)12
Search
loss
is
defined
as
twice the ratio of the quality
function to the expected value of the improvement per
function evaluation. The search
loss
for the fixed step
size gradient technique
is
then
where
7
=
s/p,
the ratio of step size
to
distance to the
minimum. The search
loss
for
FSSRS
is found to be
and
$0
is equal to cos1 (q/2). Equations
(5)
and
(6)
are
derived in appendix
I.
Fig.
2
shows the relative behavior
of
the random
search and gradient methods for different values of
7-l
0
z7
0
20
Dirnension,n
40
60
80
Fig.
2.
Tradeoff between FSSRS and fixed step size gradient tech-
nique. The gradient method is superior above the boundary and
FSSRS is superior below the boundary.
and dimension
n.
Above the boundary random search
is superior (has
a
smaller search
loss)
to the gradient
method, while below the boundary the gradient tech-
nique is superior. For
n
less than
4,
the gradient tech-
nique is always superior, but for higher dimension ran-
dom search is superior for small
q.
OPTIMUM
STEP SIZE
RAXDOM
SEARCH
(OSSRS)
If
the step size for FSSRS is very small, the probabil-
ity of improvement is approximately one half, but the
improvement is very small for
a
successful step, and this
results in
a
small average improvement. On the other
hand, if the step size is made too large, the step will
overshoot the minimum and the probability of improve-
ment will be extremely small, also resulting in
a
very
small average improvement. Somewhere between these
extremes lies an optimum step size, i.e.,
a
step size for
which the probability of the improvement of the quality
function
is
not one half, but lies between zero and one
half.
The expected improvement, normalized by the pres-
entvalueofQ,i.e.,I= --E{4Q}/Q,isequal to2/Lr(n,7)
and
is
given by
S,""
(21
cos
4
-
72)
sinn-24
d4
I(%
7)
=
*
(7)
2
Jo
sinn-24
To maximize
I,
the right-hand side of
(7)
is
differenti-
ated with respect to
q
and set equal to zero with the
following equation for the optimal value of
7
resulting:
Upon making the appropriate approximations, the fol-
lowing asymptotic expressions are found for large
n
(see Appendix
11)

272
IEEE
TRANSACTIONS
ON
AUTOMATIC
CONTROL,
JUNE
1968
1.225
qopt
-
-
0.406
I,,
-
-
4;
.n
If the normalized expected improvement
I
is propor-
tional to
l/n
(see (lo)), and if the normalized improve-
ments are independent,
as
is true for OSSRS, then the
average number of function evaluations for a fixed de-
sired accuracy (relative to the starting value) is
asymptotically linear in
n,
as
is
now
demonstrated. Sup-
pose that the search starts
at
a
point
at
which the value
of the quality function is
Qo,
and that it is desired to
terminate when the quality function reaches
a
final
value of
9.
The value
Qj
of the quality function after
j
steps can be expressed recursively as
Qj
=
1
-
ij)
where
ij
is the normalized improvement
at
thejth step.
Thus
x
~3f
=
QO
(1
-
ij).
j=1
Taking the expected value of both sides, in light of the
preceding assumptions, results in
where
k
is the constant of proportionality. Solving for
the value of
M
for which
E
[QAW]
is equal to the desired
final value
Qf
results in
Thus the asymptotic expression for large
n
becomes
dl&
-
(constant)
an
(14)
where the constant is equal to (-l/k) log
(Q,/Qo).
PRACTICAL
ALGORITHM
FOR
ADAPTIVE
STEP
SIZE
RANDOM
SEARCH
(ASSRS)
OSSRS
is
a
theoretical model and the optimum step
size cannot be found without additional experimenta-
tion. One way to construct
a
practical algorithm would
be to try numerous exploratory random steps from the
same point, each with the same step size, and to repeat
this
procedure for
a
number of different step sizes. From
these results, the optimum step size could be estimated
and this estimate used. However, none of the intermedi-
ate exploratory steps would produce any improvement.
In the ASSRS algorithm,
no
attempt is made
to
esti-
mate the optimum
step
size accurately. Instead the
BEGINNING
OF
1110.
12.0
INITIALIZE LAND
S
I
TAKE STEPS
OF
SIZE
S
AND
S
(I+ol.UPoATE
IS
AN IMPROVEMENT.
X
bND
FO
WEN THERE
I
STEP SIZE BE THAT
I
WCH PRODUCED THE
I
I
I
12;"
1
LARGER IMPROMMEM.
I
I
<
PRESCRIBED LIMIT
7
IS
I2
WLLh THAN
rTS
I
[Wl
[..I
REDUCE
S
12-0
I
<
BEEN
SATISFIED
7
HAS THE STOPPING CRITERW
I
(=)
STOP SEARCHING
,
~
IS
I1
AMULTIRE
I
OF
SOME
PRESCRIBED VALUE
I'
4
LOOP.
I
RETURN TO
BEGlNNlNG
OF
I
INCREMENT11
I
Fig.
3.
Flow diagram for
ASSRS.
I2
counts the number of
suc-
loop.
cessive failures.
I1
counts total number
of
iterations through the
optimum is tracked in an approximate fashion, and
each step is both exploratory and able to produce an
improvement. A nominal value,
s,
for the step size
is
chosen before each iteration. A random step of size
s
is
taken and
a
random step of size s(1
+a)
is taken
(1
>a
>
0),
and the resultant normalized improvements
are compared. The
step
size that produces the larger
improvement
is
chosen
as
the nominal step size for the
next iteration.
If
neither step causes an improvement, the
step size remains unchanged; and if this occurs for some
number of iterations, the step size is reduced. Thus
on
the average the algorithm adjusts to the direction of
the best step size. In addition, each time some large
number of iterations has passed,
a
step with nominal
step
size is compared, in the same manner, n-ith
a
step
of much larger size. Again, the step size that produces
the larger improvement is chosen
as
the new nominal
step size. This test serves as
a
deterrent against the
possibility that the step size has inadvertently become
too small.
It
is also helpful in the case in which the
I
vs.
step size curve has more than
a
single local maximum.
In
such
a
case the search procedure could be chasing
a
small local maximum; and
a
large change in step size
would make
it
possible to detect and begin adaptation
to
a
higher local maximum.
-4
flow
diagram for the
ASSRS algorithm is shown in Fig.
3.

SCHUMER
AND
STEIGLITZ:
ADAPTIVE
STEP SIZE
RANDOM
SEARCH
273
EXPERIMENTAL
RESULTS
The ASSRS algorithm was tested on the
IBM
7094
computer for
a
number of test functions. The results of
these experiments are presented here.
The quality function
Q=p2
was tested to see how the
ASSRS algorithm compared to the Newton-Raphson
method and to OSSRS. For each dimension from
1
to 40,
fifteen independent trials were run. The stopping cri-
terion was Q<1OP8 and the starting point was
quired function evaluations is well described by
ICALL
=
80n.l
Since derivatives are not available, partial derivatives
must be measured approximately by taking finite dif-
ferences, and the number of function evaluations per
iteration for the Newton-Raphson method can be found
as
follows. To approximate the gradient vector,
a
mini-
mum
of
n
finite differences are needed.
To
find the diag-
onal terms of the Hessian matrix, 2n additional func-
tion evaluations are needed to estimate the partial
derivatives
at a
second point. The Hessian matrix is
symmetric
so
that hii
=hi;.
Thus one half of the
off-
diagonal terms or
(n2-n)/2
more second partial deriva-
tives are required, This requires
n2-n
additional func-
tion evaluations. Also, one more function evaluation
occurs with the final move. Thus the total number of
function evaluations required per iteration
of
the New-
ton-Raphson method is n+2n+(n2-n)+l
=
(~+l)~.
Assuming that the partial derivatives can be determined
exactly by finite differences, only one iteration is re-
quired for this function and the number of function
evaluations required for the Newton-Raphson method
is
(n
+
l)2.
The results for ASSRS and for the Newton-Raphson
method are shown in Fig.
4.
Extrapolating these curves,
their intersection is at
n
=
78,
beyond which dimension
ASSRS is superior. If the fact that the partial deriva-
tives cannot be measured exactly (which means that
additional iterations are required) is taken into account,
the Newton-Raphson curve becomes higher and the
intersection with the ASSRS occurs at
a
smaller value
of
n.
Also,
if
a lesser degree of accuracy is required, the
ASSRS curve has
a
smaller slope while the Newton-
Raphson curve is unchanged. This also results in a
smaller value of
n
at
the intersection of the curves, and
ASSRS
is
then superior
at
a
smaller value
of
dimension.
As
an indication of the variance in ICALL associated
with ASSRS, the standard deviation for
n=
100 was
found to be
502
as compared with the mean of
7677.
On the basis of the experimental results, the average
normalized improvement per step,
I,
was calculated for
Q=p2
and was found to be asymptotic to
0.2725/n.
Thus, although the value of
K
is smaller than for
OSSRS, the asymptotic form of
I
is still
k/n;
and the
number of function calls was found to be asymptotically
(1,
1,
-
*
,
1).
The resulting average number of re-
tions performed during
a
given minimization procedure.
1
The variable
ICALL
represents the number of function evalua-
2500
2000
OARS.
00
Dirnension,n
Fig.
4.
Average number of function evaluations vs. dimension
for
Q=
~;clx~i2,
for
ASSRS
(0)
and the Newton-Raphson method
(squares).
2500
F
.N-R
I-
8
A
oooo
I500
A
O00
'Oo0
8
A
00
00
A
oo
Dirnension,n
Fig.
5.
Average number of function evaluations
vs.
dimension for
Q=
c1,13ci4,
for ASSRS
(O),
Newton-Raphson (squares),
and
the simplex method
(A).
proportional to
n
as
it
is for OSSRS [see (14)].
Fig.
5
shows the results for
Q
=
xi4
i=l
averaged for 22 independent experiments for each
dimension from
1
through 40. Again the starting point
was (1, 1,
.
.
.
,
1). The stopping criterion was
Q<O.S
X10-8. The Newton-Raphson method is again
as-
sumed to be able to measure derivatives without error,
but the fact that
Q
is not quadratic
is
taken into account
by multiplying the number
of
iterations required by
(n+l)* in order to find the number of function calls.
The results for the simplex method are those of Nelder
and 31ead,[61 who found that, for
n=
1
through 10,
the number of function evaluations needed for the
simplex method is well described by 3.16(n+1)2.11. This
formula was extrapolated to higher dimensions and

274
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, JUNE
1968
700
w
CONCLUSION
L
I
Dimension
,n
Fig.
6.
Average
number
of
function evaluations vs. dimension
for
ASSRS
for
Q
=
x:=.l
upi2.
plotted along with the curves for ASSRS and Newton-
Raphson. In this case, ASSRS is superior to Newton-
Raphson for
n
>
2
and to the simplex method for
n
>
10.
For ASSRS, ICALL is again proportional to
n,
and
I
is
asymptotic to
0.427/n.
The quadratic form
n
i=l
was also used
as
a
test function, with the coefficients
tribution
on
[0.1,
11.
The minimization was repeated
twenty times
for
each dimension from
1
to
20
and
six
times for
n=
100. The search was terminated when
Q
was less than one one-thousandth of its initial value.
Fig.
6
shows the resulting average number of function
evaluations required
as
a function of dimension. The
result for
100
dimensions was ICALL
=3396.
Again,
ICALL was found to be approximately linear in
n.
Rastrigin131 has done some work with random search
algorithms that adapt
only
to the best direction with a
fixed step size. The problem with this method is that
convergence is only guaranteed to within
a
distance of
one step size to the minimum,
so
that if
a
high accuracy
is desired, the steps must be small, thus requiring many
function evaluations. Adapting
the
step size seems to be
more fruitful than adapting to the direction. A com-
bination of the two methods, however, would seem to be
in order and should be the subject of future investiga-
tions.
ASSRS was also tested using Rosenbrock's function
Q=
100(xz-x~z)2+(1
-x1)2,
and although
it
converged,
ASSRS
was
inferior to Rosenbrock's methodr71 and to
Powell's methad[*l for this function. It should, therefore,
be noted that ASSRS
is
not very effective as
a
ridge
fol-
lower,' but shows its superiority in multidimensional
problems without narrow valleys or ridges. Combining
directional adaptation with step size adaptation may
result
in
removing this limitation.
a1,
az,
-
.
. ,
a,
randomly chosen from
a
uniform dis-
In this paper the problem
of
minimizing
a
function
of several parameters by the method of random search
has been discussed. The ked step size random search
algorithm has been described and compared to the sim-
ple gradient technique
on
the basis of search loss. It has
been shown that for
n>4,
FSSRS
is superior for
suffi-
ciently small
7.
Optimum step size random search was
introduced and its performance investigated for hyper-
spherical surfaces.
A practical algorithm for adaptive step size random
search has been described and compared with
deter-
ministic methods. For the functions
n
Q
=
xi2
i=l
n
Q
=
xi4
i=l
n
Q
=
acxi2
&I
the number
of
function evaluations required
for
a
de-
sired accuracy for the deterministic methods increases
at
a rate that is proportional to
at
least the second power
of
n;
and the computation time increases as
n3.
This is
true for other classicaI methods besides the Newton-
Raphson and simplex However, for ASSRS
the number of required function evaluations is propor-
tional to
n,
and the computation time is proportional
to
n2.
The computation time could be made proportional
to
n
if parallel computations were used. Thus the con-
clusion
is
reached, that despite its simplicity, adaptive
random search is an attractive technique for problems
with large numbers of dimensions.
APPEXDIX
I
EVALUATION
OF
SEARCH
LOSS
The search loss for the fixed step size gradient method
is
found as follonrs. The gradient of
Q
is given by
2X;
thus the step of size
s
to be taken is
This results in
a
change in
Q
AQ
=
s2
-
2~p
(13
which is negative as long
as
p>s/2.
To
determine the
correct descent direction,
n
measurements of the quality
function are made, i.e., one in each of the
n
coordinate
directions with
a
step size much smaller than
s.
Also,
one function evaluation is made corresponding to the
actual move of size
s.
Thus a total of
n+l
function
evaluations are necessary for the improvement given
by
(15).
Therefore, the search
loss
for the gradient
method is

Citations
More filters
Journal ArticleDOI
TL;DR: Two general convergence proofs for random search algorithms are given and how these extend those available for specific variants of the conceptual algorithm studied here are shown.
Abstract: We give two general convergence proofs for random search algorithms. We review the literature and show how our results extend those available for specific variants of the conceptual algorithm studied here. We then exploit the convergence results to examine convergence rates and to actually design implementable methods. Finally we report on some computational experience.

1,550 citations


Cites background or methods from "Adaptive step size random search"

  • ...where p is the optimal step size, see [12]....

    [...]

  • ...5 [12])....

    [...]

  • ...Since the expected decrease in the function value is p2p [12] we see that the expected step in the direction of the solution is cp/n....

    [...]

  • ...This "linearity" was first observed by Schumer and Steiglitz [12], the algorithm that they propose has K ' 80....

    [...]

  • ...Schumer and Steiglitz [12] introduce adaptive step size methods; here Pk is increased or decreased depending on the number of successes or failures in finding lower values of f on S in the preceding iterations....

    [...]

Journal ArticleDOI
TL;DR: The numerical performance of the BGA is demonstrated on a test suite of multimodal functions and the number of function evaluations needed to locate the optimum scales only as n ln(n) where n is thenumber of parameters.
Abstract: In this paper a new genetic algorithm called the Breeder Genetic Algorithm (BGA) is introduced. The BGA is based on artificial selection similar to that used by human breeders. A predictive model for the BGA is presented that is derived from quantitative genetics. The model is used to predict the behavior of the BGA for simple test functions. Different mutation schemes are compared by computing the expected progress to the solution. The numerical performance of the BGA is demonstrated on a test suite of multimodal functions. The number of function evaluations needed to locate the optimum scales only as n ln(n) where n is the number of parameters. Results up to n = 1000 are reported.

1,267 citations


Cites background from "Adaptive step size random search"

  • ...For rh = 1:225r=pn the expected progress was computed for large n and smallr in (Schumer & Steiglitz, 1968) E(n; r)r = 0:2n (21)We now turn to normal distributed mutation....

    [...]

  • ...The optimal normalized average progress for uniform distributed mutation decreases exponentially with n. The reason for this behavior is the well-known fact that the volume of the unit sphere in n dimensions goes to zero for n + 00. Better results can be obtained if the uniform distribution is restricted to a hypersphere with radius rh. For rh = 1.22Sr/fi the expected progress was computed for large n and small Y in Schumer and Steiglitz ......

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors present a set of 175 benchmark functions for unconstrained optimization problems with diverse properties in terms of modality, separability, and valley landscape, which can be used for validation of new optimization in the future.
Abstract: Test functions are important to validate and compare the performance of optimization algorithms. There have been many test or benchmark functions reported in the literature; however, there is no standard list or set of benchmark functions. Ideally, test functions should have diverse properties so that can be truly useful to test new algorithms in an unbiased way. For this purpose, we have reviewed and compiled a rich set of 175 benchmark functions for unconstrained optimization problems with diverse properties in terms of modality, separability, and valley landscape. This is by far the most complete set of functions so far in the literature, and tt can be expected this complete set of functions can be used for validation of new optimization in the future.

944 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a set of 175 benchmark functions for unconstrained optimisation problems with diverse properties in terms of modality, separability, and valley landscape.
Abstract: Test functions are important to validate and compare the performance of optimisation algorithms. There have been many test or benchmark functions reported in the literature; however, there is no standard list or set of benchmark functions. Ideally, test functions should have diverse properties to be truly useful to test new algorithms in an unbiased way. For this purpose, we have reviewed and compiled a rich set of 175 benchmark functions for unconstrained optimisation problems with diverse properties in terms of modality, separability, and valley landscape. This is by far the most complete set of functions so far in the literature, and it can be expected that this complete set of functions can be used for validation of new optimisation in the future.

876 citations

Posted Content
TL;DR: The Square Attack is a score-based black-box attack that does not rely on local gradient information and thus is not affected by gradient masking, and can outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate.
Abstract: We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized search scheme which selects localized square-shaped updates at random positions so that at each iteration the perturbation is situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art methods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks by a factor of at least $1.8$ and up to $3$ compared to the recent state-of-the-art $l_\infty$-attack of Al-Dujaili & O'Reilly. Moreover, although our attack is black-box, it can also outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate. The code of our attack is available at this https URL.

362 citations


Cites background from "Adaptive step size random search"

  • ...The Square Attack exploits random search [46,48] which is one of the simplest approaches for blackbox optimization....

    [...]

  • ...Many variants of random search have been introduced [38,48,47], which differ mainly in how the random perturbation is chosen at each iteration (the original...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point.
Abstract: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point. The simplex adapts itself to the local landscape, and contracts on to the final minimum. The method is shown to be effective and computationally compact. A procedure is given for the estimation of the Hessian matrix in the neighbourhood of the minimum, needed in statistical estimation problems.

27,271 citations

Journal ArticleDOI
TL;DR: In the design of experiments for the purpose of seeking maxima, random methods have been shown to have an important place in the consideration of the experimenter as discussed by the authors, and the rationale, application, and relative merits of random methods are discussed.
Abstract: In the design of experiments for the purpose of seeking maxima, random methods are shown to have an important place in the consideration of the experimenter. For a rather large class of experimental situations, an elementary probability formulation leads to an exact statement for the number of trials required in the experiment. The rationale, application, and relative merits of random methods are discussed.

326 citations

Journal ArticleDOI
TL;DR: An iterative method which is not unlike the conjugate gradient method of Hestenes and Stiefel (1952), and which finds stationary values of a general function, which has second-order convergence.
Abstract: Eighteen months ago Rosenbrock (1960) published a paper in this journal on finding the greatest or least value of a function of several variables. A number of methods were listed and they all have first-order convergence. Six months ago Martin and Tee (1961) published a paper in which they mentioned gradient methods which have second-order convergence for finding the minimum of a quadratic positive definite function. In this paper will be described an iterative method which is not unlike the conjugate gradient method of Hestenes and Stiefel (1952), and which finds stationary values of a general function. It has second-order convergence, so near a stationary value it converges more quickly than Rosenbrock's variation of the steepest descents method and, although each iteration is rather longer because the method is applicable to a general function, the rate of convergence is comparable to that of the more powerful of the gradient methods described by Martin and Tee.

191 citations