scispace - formally typeset
Open AccessJournal ArticleDOI

No free lunch theorems for optimization

David H. Wolpert, +1 more
- 01 Apr 1997 - 
- Vol. 1, Iss: 1, pp 67-82
Reads0
Chats0
TLDR
A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving and a number of "no free lunch" (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class.
Abstract
A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving. A number of "no free lunch" (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class. These theorems result in a geometric interpretation of what it means for an algorithm to be well suited to an optimization problem. Applications of the NFL theorems to information-theoretic aspects of optimization and benchmark measures of performance are also presented. Other issues addressed include time-varying optimization problems and a priori "head-to-head" minimax distinctions between optimization algorithms, distinctions that result despite the NFL theorems' enforcing of a type of uniformity over all algorithms.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 1, NO. 1, APRIL 1997 67
No Free Lunch Theorems for Optimization
David H. Wolpert and William G. Macready
AbstractA framework is developed to explore the connection
between effective optimization algorithms and the problems they
are solving. A number of “no free lunch” (NFL) theorems are
presented which establish that for any algorithm, any elevated
performance over one class of problems is offset by perfor-
mance over another class. These theorems result in a geometric
interpretation of what it means for an algorithm to be well
suited to an optimization problem. Applications of the NFL
theorems to information-theoretic aspects of optimization and
benchmark measures of performance are also presented. Other
issues addressed include time-varying optimization problems and
a priori “head-to-head” minimax distinctions between optimiza-
tion algorithms, distinctions that result despite the NFL theorems’
enforcing of a type of uniformity over all algorithms.
Index Terms Evolutionary algorithms, information theory,
optimization.
I. INTRODUCTION
T
HE past few decades have seen an increased interest
in general-purpose “black-box” optimization algorithms
that exploit limited knowledge concerning the optimization
problem on which they are run. In large part these algorithms
have drawn inspiration from optimization processes that occur
in nature. In particular, the two most popular black-box
optimization strategies, evolutionary algorithms [1]–[3] and
simulated annealing [4], mimic processes in natural selection
and statistical mechanics, respectively.
In light of this interest in general-purpose optimization
algorithms, it has become important to understand the rela-
tionship between how well an algorithm
performs and the
optimization problem
on which it is run. In this paper
we present a formal analysis that contributes toward such
an understanding by addressing questions like the following:
given the abundance of black-box optimization algorithms and
of optimization problems, how can we best match algorithms
to problems (i.e., how best can we relax the black-box nature
of the algorithms and have them exploit some knowledge
concerning the optimization problem)? In particular, while
serious optimization practitioners almost always perform such
matching, it is usually on a heuristic basis; can such matching
be formally analyzed? More generally, what is the underlying
mathematical “skeleton” of optimization theory before the
“flesh” of the probability distributions of a particular context
and set of optimization problems are imposed? What can
Manuscript received August 15, 1996; revised December 30, 1996. This
work was supported by the Santa Fe Institute and TXN Inc.
D. H. Wolpert is with IBM Almaden Research Center, San Jose, CA 95120-
6099 USA.
W. G. Macready was with Santa Fe Institute, Santa Fe, NM 87501 USA.
He is now with IBM Almaden Research Center, San Jose, CA 95120-6099
USA.
Publisher Item Identifier S 1089-778X(97)03422-X.
information theory and Bayesian analysis contribute to an
understanding of these issues? How a priori generalizable are
the performance results of a certain algorithm on a certain
class of problems to its performance on other classes of
problems? How should we even measure such generalization?
How should we assess the performance of algorithms on
problems so that we may programmatically compare those
algorithms?
Broadly speaking, we take two approaches to these ques-
tions. First, we investigate what a priori restrictions there are
on the performance of one or more algorithms as one runs
over the set of all optimization problems. Our second approach
is to instead focus on a particular problem and consider the
effects of running over all algorithms. In the current paper
we present results from both types of analyses but concentrate
largely on the first approach. The reader is referred to the
companion paper [5] for more types of analysis involving the
second approach.
We begin in Section II by introducing the necessary nota-
tion. Also discussed in this section is the model of computation
we adopt, its limitations, and the reasons we chose it.
One might expect that there are pairs of search algorithms
and such that performs better than on average, even if
sometimes outperforms . As an example, one might expect
that hill climbing usually outperforms hill descending if one’s
goal is to find a maximum of the cost function. One might also
expect it would outperform a random search in such a context.
One of the main results of this paper is that such expecta-
tions are incorrect. We prove two “no free lunch” (NFL) the-
orems in Section III that demonstrate this and more generally
illuminate the connection between algorithms and problems.
Roughly speaking, we show that for both static and time-
dependent optimization problems, the average performance
of any pair of algorithms across all possible problems is
identical. This means in particular that if some algorithm
’s
performance is superior to that of another algorithm
over
some set of optimization problems, then the reverse must be
true over the set of all other optimization problems. (The reader
is urged to read this section carefully for a precise statement
of these theorems.) This is true even if one of the algorithms
is random; any algorithm
performs worse than randomly
just as readily (over the set of all optimization problems) as
it performs better than randomly. Possible objections to these
results are addressed in Sections III-A and III-B.
In Section IV we present a geometric interpretation of the
NFL theorems. In particular, we show that an algorithm’s
average performance is determined by how “aligned” it is
with the underlying probability distribution over optimization
problems on which it is run. This section is critical for an
1089–778X/97$10.00 1997 IEEE

68 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 1, NO. 1, APRIL 1997
understanding of how the NFL results are consistent with the
well-accepted fact that many search algorithms that do not take
into account knowledge concerning the cost function work
well in practice.
Section V-A demonstrates that the NFL theorems allow
one to answer a number of what would otherwise seem to
be intractable questions. The implications of these answers
for measures of algorithm performance and of how best to
compare optimization algorithms are explored in Section V-B.
In Section VI we discuss some of the ways in which,
despite the NFL theorems, algorithms can have a priori
distinctions that hold even if nothing is specified concerning
the optimization problems. In particular, we show that there
can be “head-to-head” minimax distinctions between a pair of
algorithms, i.e., that when considering one function at a time,
a pair of algorithms may be distinguishable, even if they are
not when one looks over all functions.
In Section VII we present an introduction to the alternative
approach to the formal analysis of optimization in which
problems are held fixed and one looks at properties across
the space of algorithms. Since these results hold in general,
they hold for any and all optimization problems and thus
are independent of the types of problems one is more or
less likely to encounter in the real world. In particular,
these results show that there is no a priori justification for
using a search algorithm’s observed behavior to date on a
particular cost function to predict its future behavior on that
function. In fact when choosing between algorithms based on
their observed performance it does not suffice to make an
assumption about the cost function; some (currently poorly
understood) assumptions are also being made about how the
algorithms in question are related to each other and to the
cost function. In addition to presenting results not found in
[5], this section serves as an introduction to the perspective
adopted in [5].
We conclude in Section VIII with a brief discussion, a
summary of results, and a short list of open problems.
We have confined all proofs to appendixes to facilitate the
flow of the paper. A more detailed, and substantially longer,
version of this paper, a version that also analyzes some issues
not addressed in this paper, can be found in [6].
II. P
RELIMINARIES
We restrict attention to combinatorial optimization in which
the search space
, though perhaps quite large, is finite.
We further assume that the space of possible “cost” values
is also finite. These restrictions are automatically met
for optimization algorithms run on digital computers where
typically
is some 32 or 64 bit representation of the real
numbers.
The size of the spaces
and are indicated by and ,
respectively. An optimization problem
(sometimes called
a “cost function” or an “objective function” or an “energy
function”) is represented as a mapping
and
indicates the space of all possible problems.
is of size —a large but finite number. In addition to
static
, we are also interested in optimization problems that
depend explicitly on time. The extra notation required for such
time-dependent problems will be introduced as needed.
It is common in the optimization community to adopt
an oracle-based view of computation. In this view, when
assessing the performance of algorithms, results are stated
in terms of the number of function evaluations required to
find a given solution. Practically though, many optimization
algorithms are wasteful of function evaluations. In particular,
many algorithms do not remember where they have already
searched and therefore often revisit the same points. Although
any algorithm that is wasteful in this fashion can be made
more efficient simply by remembering where it has been (cf.
tabu search [7], [8]), many real-world algorithms elect not to
employ this stratagem. From the point of view of the oracle-
based performance measures, these revisits are “artifacts”
distorting the apparent relationship between many such real-
world algorithms.
This difficulty is exacerbated by the fact that the amount
of revisiting that occurs is a complicated function of both
the algorithm and the optimization problem and therefore
cannot be simply “filtered out” of a mathematical analysis.
Accordingly, we have elected to circumvent the problem
entirely by comparing algorithms based on the number of
distinct function evaluations they have performed. Note that
this does not mean that we cannot compare algorithms that
are wasteful of evaluations—it simply means that we compare
algorithms by counting only their number of distinct calls to
the oracle.
We call a time-ordered set of
distinct visited points
a “sample” of size
. Samples are denoted by
. The points in a
sample are ordered according to the time at which they
were generated. Thus
indicates the value of the
th successive element in a sample of size and is
its associated cost or
value.
will be used to indicate the ordered set of cost values. The
space of all samples of size
is (so
) and the set of all possible samples of arbitrary
size is
.
As an important clarification of this definition, consider a
hill-descending algorithm. This is the algorithm that examines
a set of neighboring points in
and moves to the one having
the lowest cost. The process is then iterated from the newly
chosen point. (Often, implementations of hill descending stop
when they reach a local minimum, but they can easily be
extended to run longer by randomly jumping to a new unvis-
ited point once the neighborhood of a local minimum has been
exhausted.) The point to note is that because a sample contains
all the previous points at which the oracle was consulted, it
includes the
values of all the neighbors of the current
point, and not only the lowest cost one that the algorithm
moves to. This must be taken into account when counting the
value of
.
An optimization algorithm
is represented as a mapping
from previously visited sets of points to a single new (i.e.,
previously unvisited) point in
. Formally,
. Given our decision to only measure distinct
function evaluations even if an algorithm revisits previously

WOLPERT AND MACREADY: NO FREE LUNCH THEOREMS FOR OPTIMIZATION 69
searched points, our definition of an algorithm includes all
common black-box optimization techniques like simulated an-
nealing and evolutionary algorithms. (Techniques like branch
and bound [9] are not included since they rely explicitly on
the cost structure of partial solutions.)
As defined above, a search algorithm is deterministic; every
sample maps to a unique new point. Of course, essentially, all
algorithms implemented on computers are deterministic,
1
and
in this our definition is not restrictive. Nonetheless, it is worth
noting that all of our results are extensible to nondeterministic
algorithms, where the new point is chosen stochastically from
the set of unvisited points. This point is returned to later.
Under the oracle-based model of computation any measure
of the performance of an algorithm after
iterations is a
function of the sample
. Such performance measures will
be indicated by
. As an example, if we are trying
to find a minimum of
, then a reasonable measure of the
performance of
might be the value of the lowest value in
: . Note that measures
of performance based on factors other than
(e.g., wall clock
time) are outside the scope of our results.
We shall cast all of our results in terms of probability
theory. We do so for three reasons. First, it allows simple
generalization of our results to stochastic algorithms. Second,
even when the setting is deterministic, probability theory
provides a simple consistent framework in which to carry out
proofs. The third reason for using probability theory is perhaps
the most interesting. A crucial factor in the probabilistic
framework is the distribution
.
This distribution, defined over
, gives the probability that
each
is the actual optimization problem at hand.
An approach based on this distribution has the immediate
advantage that often knowledge of a problem is statistical in
nature and this information may be easily encodable in
.
For example, Markov or Gibbs random field descriptions [10]
of families of optimization problems express
exactly.
Exploiting
, however, also has advantages even when
we are presented with a single uniquely specified cost function.
One such advantage is the fact that although it may be fully
specified, many aspects of the cost function are effectively
unknown (e.g., we certainly do not know the extrema of the
function). It is in many ways most appropriate to have this
effective ignorance reflected in the analysis as a probability
distribution. More generally, optimization practitioners usually
act as though the cost function is partially unknown, in that the
same algorithm is used for all cost functions in a class of such
functions (e.g., in the class of all traveling salesman problems
having certain characteristics). In so doing, the practitioner
implicitly acknowledges that distinctions between the cost
functions in that class are irrelevant or at least unexploitable.
In this sense, even though we are presented with a single
particular problem from that class, we act as though we are
presented with a probability distribution over cost functions,
a distribution that is nonzero only for members of that class
of cost functions.
is thus a prior specification of the
class of the optimization problem at hand, with different
1
In particular, note that pseudorandom number generators are deterministic
given a seed.
classes of problems corresponding to different choices of
what algorithms we will use, and giving rise to different
distributions
.
Given our choice to use probability theory, the perfor-
mance of an algorithm
iterated times on a cost function
is measured with . This is the conditional
probability of obtaining a particular sample
under the
stated conditions. From
performance measures
can be found easily.
In the next section we analyze
and in par-
ticular how it varies with the algorithm
. Before proceeding
with that analysis, however, it is worth briefly noting that there
are other formal approaches to the issues investigated in this
paper. Perhaps the most prominent of these is the field of com-
putational complexity. Unlike the approach taken in this paper,
computational complexity largely ignores the statistical nature
of search and concentrates instead on computational issues.
Much, though by no means all, of computational complexity is
concerned with physically unrealizable computational devices
(e.g., Turing machines) and the worst-case resource usage
required to find optimal solutions. In contrast, the analysis
in this paper does not concern itself with the computational
engine used by the search algorithm, but rather concentrates
exclusively on the underlying statistical nature of the search
problem. The current probabilistic approach is complimentary
to computational complexity. Future work involves combining
our analysis of the statistical nature of search with practical
concerns for computational resources.
III. T
HE NFL THEOREMS
In this section we analyze the connection between algo-
rithms and cost functions. We have dubbed the associated
results NFL theorems because they demonstrate that if an
algorithm performs well on a certain class of problems then
it necessarily pays for that with degraded performance on the
set of all remaining problems. Additionally, the name em-
phasizes a parallel with similar results in supervised learning
[11], [12].
The precise question addressed in this section is: “How does
the set of problems
for which algorithm performs
better than algorithm
compare to the set for which
the reverse is true?” To address this question we compare the
sum over all
of to the sum over all of
. This comparison constitutes a major result of
this paper:
is independent of when averaged
over all cost functions.
Theorem 1: For any pair of algorithms
and
A proof of this result is found in Appendix A. An immediate
corollary of this result is that for any performance measure
, the average over all of is inde-
pendent of
. The precise way that the sample is mapped to
a performance measure is unimportant.
This theorem explicitly demonstrates that what an algorithm
gains in performance on one class of problems is necessarily

70 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 1, NO. 1, APRIL 1997
offset by its performance on the remaining problems; that is
the only way that all algorithms can have the same
-averaged
performance.
A result analogous to Theorem 1 holds for a class of time-
dependent cost functions. The time-dependent functions we
consider begin with an initial cost function
that is present
at the sampling of the first
value. Before the beginning of
each subsequent iteration of the optimization algorithm, the
cost function is deformed to a new function, as specified by a
mapping
.
2
We indicate this mapping with the
notation
. So the function present during the th iteration is
. is assumed to be a (potentially -dependent)
bijection between
and . We impose bijectivity because if
it did not hold, the evolution of cost functions could narrow
in on a region of
’s for which some algorithms may perform
better than others. This would constitute an a priori bias in
favor of those algorithms, a bias whose analysis we wish to
defer to future work.
How best to assess the quality of an algorithm’s perfor-
mance on time-dependent cost functions is not clear. Here we
consider two schemes based on manipulations of the definition
of the sample. In scheme 1 the particular
value in
corresponding to a particular value is given by the
cost function that was present when
was sampled. In
contrast, for scheme 2 we imagine a sample
given by the
values from the present cost function for each of the values in
. Formally if , then in scheme 1
we have
, and
in scheme 2 we have
where is the final cost function.
In some situations it may be that the members of the
sample “live” for a long time, compared to the time scale
of the dynamics of the cost function. In such situations it may
be appropriate to judge the quality of the search algorithm
by
; all those previous elements of the sample that are
still “alive” at time
, and therefore their current cost is of
interest. On the other hand, if members of the sample live
for only a short time on the time scale of the dynamics of
the cost function, one may instead be concerned with things
like how well the “living” member(s) of the sample track
the changing cost function. In such situations, it may make
more sense to judge the quality of the algorithm with the
sample.
Results similar to Theorem 1 can be derived for both
schemes. By analogy with that theorem, we average over all
possible ways a cost function may be time dependent, i.e., we
average over all
(rather than over all ). Thus we consider
where is the initial cost function.
Since
only takes effect for , and since is fixed,
there are a priori distinctions between algorithms as far as
the first member of the sample is concerned. After redefining
samples, however, to only contain those elements added after
the first iteration of the algorithm, we arrive at the following
result, proven in Appendix B.
2
An obvious restriction would be to require that
T
does not vary with time,
so that it is a mapping simply from
F
to
F
. An analysis for
T
’s limited in
this way is beyond the scope of this paper.
Theorem 2: For all , algorithms and
, and initial cost functions
and
So, in particular, if one algorithm outperforms another for
certain kinds of cost function dynamics, then the reverse must
be true on the set of all other cost function dynamics.
Although this particular result is similar to the NFL result
for the static case, in general the time-dependent situation
is more subtle. In particular, with time dependence there
are situations in which there can be a priori distinctions
between algorithms even for those members of the sample
arising after the first. For example, in general there will be
distinctions between algorithms when considering the quantity
. To see this, consider the case where
is a set of contiguous integers and for all iterations is a
shift operator, replacing
by for all [with
]. For such a case we can construct
algorithms which behave differently a priori. For example,
take
to be the algorithm that first samples at , next
at
, and so on, regardless of the values in the sample.
Then for any
, is always made up of identical values.
Accordingly,
is nonzero only for for
which all values
are identical. Other search algorithms,
even for the same shift
, do not have this restriction on
values. This constitutes an a priori distinction between
algorithms.
A. Implications of the NFL Theorems
As emphasized above, the NFL theorems mean that if an
algorithm does particularly well on average for one class of
problems then it must do worse on average over the remaining
problems. In particular, if an algorithm performs better than
random search on some class of problems then in must
perform worse than random search on the remaining problems.
Thus comparisons reporting the performance of a particular
algorithm with a particular parameter setting on a few sample
problems are of limited utility. While such results do indicate
behavior on the narrow range of problems considered, one
should be very wary of trying to generalize those results to
other problems.
Note, however, that the NFL theorems need not be viewed
as a way of comparing function classes
and (or
classes of evolution operators
and , as the case might
be). They can be viewed instead as a statement concerning
any algorithm’s performance when
is not fixed, under the
uniform prior over cost functions,
. If we wish
instead to analyze performance where
is not fixed, as in this
alternative interpretation of the NFL theorems, but in contrast
with the NFL case
is now chosen from a nonuniform prior,
then we must analyze explicitly the sum
(1)

WOLPERT AND MACREADY: NO FREE LUNCH THEOREMS FOR OPTIMIZATION 71
Since it is certainly true that any class of problems faced by
a practitioner will not have a flat prior, what are the practical
implications of the NFL theorems when viewed as a statement
concerning an algorithm’s performance for nonfixed
? This
question is taken up in greater detail in Section IV but we
offer a few comments here.
First, if the practitioner has knowledge of problem charac-
teristics but does not incorporate them into the optimization
algorithm, then
is effectively uniform. (Recall that
can be viewed as a statement concerning the practitioner’s
choice of optimization algorithms.) In such a case, the NFL
theorems establish that there are no formal assurances that the
algorithm chosen will be at all effective.
Second, while most classes of problems will certainly have
some structure which, if known, might be exploitable, the
simple existence of that structure does not justify choice
of a particular algorithm; that structure must be known and
reflected directly in the choice of algorithm to serve as such a
justification. In other words, the simple existence of structure
per se, absent a specification of that structure, cannot provide a
basis for preferring one algorithm over another. Formally, this
is established by the existence of NFL-type theorems in which
rather than average over specific cost functions
, one averages
over specific “kinds of structure,” i.e., theorems in which one
averages
over distributions . That such
theorems hold when one averages over all
means that
the indistinguishability of algorithms associated with uniform
is not some pathological, outlier case. Rather, uniform
is a “typical” distribution as far as indistinguishability
of algorithms is concerned. The simple fact that the
at
hand is nonuniform cannot serve to determine one’s choice of
optimization algorithm.
Finally, it is important to emphasize that even if one is
considering the case where
is not fixed, performing the
associated average according to a uniform
is not essential
for NFL to hold. NFL can also be demonstrated for a range
of nonuniform priors. For example, any prior of the form
(where is the distribution of
values) will also give NFL theorems. The -average can also
enforce correlations between costs at different
values and
NFL-like results will still be obtained. For example, if costs
are rank ordered (with ties broken in some arbitrary way) and
we sum only over all cost functions given by permutations of
those orderings, then NFL remains valid.
The choice of uniform
was motivated more from
theoretical rather than pragmatic concerns, as a way of an-
alyzing the theoretical structure of optimization. Nevertheless,
the cautionary observations presented above make clear that
an analysis of the uniform
case has a number of
ramifications for practitioners.
B. Stochastic Optimization Algorithms
Thus far we have considered the case in which algorithms
are deterministic. What is the situation for stochastic algo-
rithms? As it turns out, NFL results hold even for these
algorithms.
The proof is straightforward. Let
be a stochastic “nonpo-
tentially revisiting” algorithm. Formally, this means that
is
a mapping taking any sample
to a -dependent distribution
over
that equals zero for all . In this sense is
what in statistics community is known as a “hyper-parameter,”
specifying the function
for all
and . One can now reproduce the derivation of the NFL
result for deterministic algorithms, only with
replaced by
throughout. In so doing, all steps in the proof remain valid.
This establishes that NFL results apply to stochastic algorithms
as well as deterministic ones.
IV. A G
EOMETRIC PERSPECTIVE ON THE NFL THEOREMS
Intuitively, the NFL theorem illustrates that if knowledge
of
, perhaps specified through , is not incorporated into
, then there are no formal assurances that will be effective.
Rather, in this case effective optimization relies on a fortuitous
matching between
and . This point is formally established
by viewing the NFL theorem from a geometric perspective.
Consider the space
of all possible cost functions. As pre-
viously discussed in regard to (1), the probability of obtaining
some
is
where is the prior probability that the optimization
problem at hand has cost function
. This sum over functions
can be viewed as an inner product in
. Defining the -space
vectors
and by their components
and , respectively
(2)
This equation provides a geometric interpretation of the op-
timization process.
can be viewed as fixed to the sample
that is desired, usually one with a low cost value, and
is a measure of the computational resources that can be
afforded. Any knowledge of the properties of the cost function
goes into the prior over cost functions
. Then (2) says the
performance of an algorithm is determined by the magnitude
of its projection onto
, i.e., by how aligned is with
the problems
. Alternatively, by averaging over , it is easy
to see that
is an inner product between and
. The expectation of any performance measure
can be written similarly.
In any of these cases,
or must “match” or be aligned
with
to get the desired behavior. This need for matching
provides a new perspective on how certain algorithms can
perform well in practice on specific kinds of problems. For
example, it means that the years of research into the traveling
salesman problem (TSP) have resulted in algorithms aligned
with the (implicit)
describing traveling salesman problems
of interest to TSP researchers.
Taking the geometric view, the NFL result that
is independent of has the interpretation
that for any particular
and , all algorithms have the
same projection onto the uniform
, represented by the
diagonal vector
. Formally, where
is some constant depending only upon and . For
deterministic algorithms, the components of
(i.e., the

Citations
More filters
Book

Deep Learning

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Book

Numerical Optimization

TL;DR: Numerical Optimization presents a comprehensive and up-to-date description of the most effective methods in continuous optimization, responding to the growing interest in optimization in engineering, science, and business by focusing on the methods that are best suited to practical problems.
Journal ArticleDOI

Grey Wolf Optimizer

TL;DR: The results of the classical engineering design problems and real application prove that the proposed GWO algorithm is applicable to challenging problems with unknown search spaces.
Book ChapterDOI

Introduction to Algorithms

Xin-She Yang
TL;DR: This chapter provides an overview of the fundamentals of algorithms and their links to self-organization, exploration, and exploitation.
Journal ArticleDOI

Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach

TL;DR: The proof-of-principle results obtained on two artificial problems as well as a larger problem, the synthesis of a digital hardware-software multiprocessor system, suggest that SPEA can be very effective in sampling from along the entire Pareto-optimal front and distributing the generated solutions over the tradeoff surface.
References
More filters
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal ArticleDOI

Optimization by Simulated Annealing

TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Book

Adaptation in natural and artificial systems

TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
Journal ArticleDOI

Tabu Search—Part II

TL;DR: The elements of staged search and structured move sets are characterized, which bear on the issue of finiteness, and new dynamic strategies for managing tabu lists are introduced, allowing fuller exploitation of underlying evaluation functions.
Book ChapterDOI

Artificial Intelligence through Simulated Evolution

TL;DR: This chapter contains sections titled: References Artificial Intelligence through a Simulation of Evolution Natural Automata and Prosthetic Devices and Artificial intelligence through a simulation of Evolution natural automata and prosthetic devices.