scispace - formally typeset
Open AccessJournal ArticleDOI

Fast calculation of multiobjective probability of improvement and expected improvement criteria for Pareto optimization

Ivo Couckuyt, +2 more
- 01 Nov 2014 - 
- Vol. 60, Iss: 3, pp 575-594
Reads0
Chats0
TLDR
The authors propose the efficient multiobjective optimization (EMO) algorithm which uses Kriging models and multi objective versions of the probability of improvement and expected improvement criteria to identify the Pareto front with a minimal number of expensive simulations.
Abstract
The use of surrogate based optimization (SBO) is widely spread in engineering design to reduce the number of computational expensive simulations. However, "real-world" problems often consist of multiple, conflicting objectives leading to a set of competitive solutions (the Pareto front). The objectives are often aggregated into a single cost function to reduce the computational cost, though a better approach is to use multiobjective optimization methods to directly identify a set of Pareto-optimal solutions, which can be used by the designer to make more efficient design decisions (instead of weighting and aggregating the costs upfront). Most of the work in multiobjective optimization is focused on multiobjective evolutionary algorithms (MOEAs). While MOEAs are well-suited to handle large, intractable design spaces, they typically require thousands of expensive simulations, which is prohibitively expensive for the problems under study. Therefore, the use of surrogate models in multiobjective optimization, denoted as multiobjective surrogate-based optimization, may prove to be even more worthwhile than SBO methods to expedite the optimization of computational expensive systems. In this paper, the authors propose the efficient multiobjective optimization (EMO) algorithm which uses Kriging models and multiobjective versions of the probability of improvement and expected improvement criteria to identify the Pareto front with a minimal number of expensive simulations. The EMO algorithm is applied on multiple standard benchmark problems and compared against the well-known NSGA-II, SPEA2 and SMS-EMOA multiobjective optimization methods.

read more

Content maybe subject to copyright    Report

Noname manuscript No.
(will be inserted by the editor)
Fast Calculation of Multiobjective Probability of
Improvement and Expected Improvement Criteria for
Pareto Optimization
Ivo Couckuyt · Dirk Deschrijver · Tom
Dhaene
the date of receipt and acceptance should be inserted later
Abstract The use of Surrogate Based Optimization (SBO) is widely spread in engi-
neering design to reduce the number of computational expensive simulations. However,
“real-world” problems often consist of multiple, conflicting objectives leading to a set
of competitive solutions (the Pareto front). The objectives are often aggregated into a
single cost function to reduce the computational cost, though a better approach is to
use multiobjective optimization methods to directly identify a set of Pareto-optimal
solutions, which can be used by the designer to make more efficient design decisions
(instead of weighting and aggregating the costs upfront). Most of the work in multiob-
jective optimization is focused on MultiObjective Evolutionary Algorithms (MOEAs).
While MOEAs are well-suited to handle large, intractable design spaces, they typically
require thousands of expensive simulations, which is prohibitively expensive for the
problems under study. Therefore, the use of surrogate models in multiobjective opti-
mization, denoted as MultiObjective Surrogate-Based Optimization (MOSBO), may
prove to be even more worthwhile than SBO methods to expedite the optimization
of computational expensive systems. In this paper, the authors propose the Efficient
Multiobjective Optimization (EMO) algorithm which uses Kriging models and multi-
objective versions of the Probability of Improvement (PoI) and Expected Improvement
(EI) criteria to identify the Pareto front with a minimal number of expensive simula-
tions. The EMO algorithm is applied on multiple standard benchmark problems and
compared against the well-known NSGA-II, SPEA2 and SMS-EMOA multiobjective
optimization methods.
Keywords multiobjective optimization · expected improvement · probability of
improvement · hypervolume · Kriging · Gaussian Process
1 Introduction
Surrogate modeling techniques, also known as metamodeling, are becoming rapidly
popular in the engineering community to speed up complex, computational expensive
Ivo Couckuyt
Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
E-mail: ivo.couckuyt@ugent.be

2
design problems [37, 22]. Surrogate models, or metamodels, are mathematical approx-
imation models that mimic the behavior of computational expensive simulation codes
such as mechanical or electrical finite element simulations, or computational fluid dy-
namic simulations. This paper deals with the use of surrogate models for expediting
the optimization of time-consuming (black-box) problems of a deterministic nature, in
contrast to stochastic simulation.
While several types of surrogate modeling uses can be distinguished, this work is
concerned with the integration of surrogate models into the optimization process, often
denoted by Surrogate Based Optimization (SBO) or Metamodel-Assisted Optimization
(MAO). SBO methods typically generate surrogate models on the fly that are only
accurate in certain regions of the input space, e.g., around potentially optimal regions.
The generated surrogate models can then be used to intelligently guide the optimization
process to the global optimum.
The focus of this work is the global SBO method based on the Probability of
Improvement (PoI) and Expected Improvement (EI), popularized by Jones et al. [25].
These statistical criteria guide the selection of new data points in such a way that the
objective function is optimized, while minimizing the number of expensive simulations.
The advantage of EI and PoI is that, besides the prediction (mean), the uncertainty
(variance) of the surrogate model is taken into account as well, providing a balance
between exploration
1
and exploitation
2
. Most often EI or PoI is used in conjunction
with the Kriging surrogate model (Gaussian Processes) [27] which provides by con-
struction a prediction of the mean as well as the variance, but other surrogate models
are also possible, such as Radial Basis Functions (RBF), Support Vector Regression
(SVR) [13], etc.
The single-objective SBO problem is well described in literature, however, most
(if not all) real-world problems actually consists of multiple, conflicting objectives
leading to a set of Pareto-optimal solutions. Often the objectives are aggregated into a
single cost function, e.g., using a weighted sum, that can be optimized by standard opti-
mization techniques. Subsequently, by repeating this process many times using varying
starting conditions, e.g., different set of weights, several solutions on the Pareto front
can be found. On the other hand, a multiobjective optimization method can optimize
the different objective functions simultaneously, and try to find the Pareto front in
just a single run. Examples of such methods are primarily the MultiObjective Evolu-
tionary Algorithms (MOEAs), e.g., the “Non-dominated Sorting Genetic Algorithm II”
(NSGA-II; [14]), the “Strength Pareto Evolutionary Algorithm 2” (SPEA2; [44]) and
the S-Metric Selection Evolutionary MultiObjective Algorithm” (SMS-EMOA; [5]).
Unfortunately, MOEAs typically require a massive amount of function evaluations,
which is infeasible for computational expensive simulators. Hence, it is vital to econo-
mize on the number of function evaluations, e.g., by using surrogate models. MultiOb-
jective Surrogate-based Optimization (MOSBO) methods only appeared quite recently
in literature. Most work is focused on integrating surrogate models in MOEAs [41].
Gaspar et al. [21] use neural networks to either approximate the fitness function or as
a local approximation technique to generate search points more efficiently. Voutchkov
et al. [35] apply the NSGA-II algorithm to Kriging models instead of the expensive sim-
1
Improving the overall accuracy of the surrogate model (space-filling).
2
Enhancing the accuracy of the surrogate model solely in the region of the (current) opti-
mum.

3
ulator. For an overview of available techniques and approaches the reader is referred
to [30, 42].
While the PoI and EI approach is well-developed and used for single-objective
SBO, its use in MOSBO is not well spread. Single-objective versions of EI and PoI are
utilized by Knowles et al. [28,29] to solve MOSBO problems. This approach, known
as ParEGO, uses Kriging and EI to optimize a weighted sum of objective functions.
By randomizing the weights every iteration several solutions along the Pareto front
can be identified. More recently, Keane [26] proposed multiobjective versions of PoI
and Euclidean distance-based EI. At the same time Emmerich et al. [17] proposed the
hypervolume-based EI criterion. Similarly to a weighted sum, the multiobjective ver-
sions of EI and PoI aggregate information from the surrogate models into a single cost
function, balancing between exploration
1
and exploitation
3
. Unfortunately, only for-
mulae for two objective functions are given by Keane as the statistical criteria become
rather cumbersome and complex for a higher number of objective functions. Similarly,
while Emmerich et al. [16] describe formulae for an arbitrary number of dimensions for
the hypervolume-based EI, the computation cost increases at least exponentially with
the number of objectives and, hence, has only been applied to two objectives.
The key contribution of this paper is the Efficient Multiobjective Optimization
(EMO) algorithm which is a much more efficient method of evaluating multiobjective
versions of the PoI and EI criteria for multiobjective optimization problems. In fact,
the problem at hand is similar to calculating the hypervolume (a Pareto set quality
estimator) [45] as will be shown below and, hence, hypervolume algorithms can be
adapted to aid in the evaluation of the statistical criteria. Moreover, a new statistical
criterion is proposed, based on the hypervolume-based EI, which is significantly cheaper
to compute while still delivering promising results.
In section 2 the Kriging surrogate model is briefly discussed. In section 3, an
overview of the EMO algorithm is given, including general expressions for the PoI
and several variants of EI. Subsequently, a fundamental part needed for the calculation
of the statistical criteria is discussed in section 3.4. Afterwards, in section 4 the EMO
algorithm is tested on several functions from the DTLZ benchmark suite [15]. Lastly,
in section 5 conclusions and future work are discussed.
2 Kriging
Kriging is a popular surrogate model to approximate deterministic noise-free data,
and has proven to be very useful for tasks such as optimization [25], design space
exploration, visualization, prototyping, and sensitivity analysis [37].
A thorough mathematically treatment of Kriging is given in [33,19]. Basically,
Kriging is a two-step process: first a regression function h(x) is constructed, and,
subsequently, a centered Gaussian process Z with variance σ
2
and a correlation matrix
Ψ is constructed through the residuals.
Y (x) = h(x) + Z(x). (1)
Consider a set of n samples, (x
1
, . . . , x
n
)
>
in d dimensions (see Equation 2) and
associated function values, y = (y
1
, . . . , y
n
)
>
, where (·)
>
is the transpose of a vector
or matrix.
3
Improving or augmenting the Pareto front.

4
X =
x
1
, . . . , x
n
>
=
x
1,1
. . . x
1,d
.
.
.
.
.
.
.
.
.
x
n,1
. . . x
n,d
(2)
Essentially, the regression part is encoded in the n ×p model matrix F using basis
functions b
i
(x) for i = 1 . . . p,
F =
b
1
(x
1
) b
2
(x
1
) ··· b
p
(x
1
)
.
.
.
.
.
.
.
.
.
.
.
.
b
1
(x
n
) b
2
(x
n
) ··· b
p
(x
n
)
,
while the stochastic process is mainly defined by the n × n correlation matrix Ψ ,
Ψ =
ψ(x
1
, x
1
) . . . ψ(x
1
, x
n
)
.
.
.
.
.
.
.
.
.
ψ(x
n
, x
1
) . . . ψ(x
n
, x
n
)
,
where ψ(·, ·) is the correlation function. ψ(·, ·) is parameterized by a set of hyperpa-
rameters θ. The choice of correlation function is crucial to obtain good accuracy. This
paper focuses on using the Matérn correlation function [34], with ν =
3
/2,
ψ(x, x
0
)
Mat´ern
ν=
3
/2
=
1 +
3l
exp
3l
,
with l =
q
P
d
i=1
θ
i
(x
i
x
0
i
)
2
. In addition, the popular Gaussian correlation func-
tion is also used,
ψ(x, x
0
)
Gauss
= exp
d
X
i=1
θ
i
|x
i
x
0
i
|
2
!
.
The hyperparameters θ are identified by Maximum Likelihood Estimation (MLE).
In particular, the negative concentrated log-likelihood is minimized,
argmin
θ
n
2
ln(σ
2
)
1
2
ln(|Ψ |),
where σ
2
=
1
n
(yF α)
>
Ψ
1
(yF α). Subsequently, the prediction mean and prediction
variance of Kriging are derived, respectively, as,
µ(x) = Mα + r(x) · Ψ
1
· (yF α), (3)
s
2
(x) = σ
2
1 r(x)Ψ
1
r(x)
>
+
(1 F
>
Ψ
1
r(x)
>
)
F
>
Ψ
1
F
, (4)
where M =
b
1
(x) b
2
(x) . . . b
p
(x)
is the model matrix of the predicting point x,
α is a p × 1 vector denoting the coefficients of the regression function, determined by
Generalized Least Squares (GLS), and r(x) is an 1 × n vector of correlations between
the point x and the samples X.

5
Fig. 1: Flow chart of the Efficient Multiobjective Optimization (EMO) algorithm.
3 Efficient Multiobjective Optimization (EMO)
3.1 Overview
A flow chart of the EMO algorithm is shown in Figure 1. First an initial set of points X
is generated and evaluated on the expensive objective functions f
j
(x), for j = 1 . . . m.
Each objective function f
j
(x) is then approximated by a Kriging model. Based on
the Kriging models useful criteria can be constructed that help in identifying Pareto-
optimal solutions. After selecting a new point it is evaluated on the expensive objective
functions f
j
(x), the Kriging models are updated with this new information and this
process is repeated in an iterative fashion until some stopping criterion is met.
Of particular interest are the Probability of Improvement (PoI) and Expected Im-
provement (EI) statistical criteria which are widely used for single-objective optimiza-
tion [24,10]. Hence, it may be useful to extend the concept of the PoI and EI directly
to multiobjective optimization. Multiobjective versions of the PoI and EI are defined
for an arbitrary number of objective functions in sections 3.2 and 3.3.
For ease of notation in the forthcoming sections, the output of all the Kriging
models can be considered as mutually independent Gaussian random variables Y
j
(x),
Y
j
(x) N(µ
j
(x), s
2
j
(x)) for j = 1 . . . m. (5)
The associated probability density function φ
j
and cumulative distribution function
Φ
j
of Y
j
(x) are compactly denoted as,
φ
j
[y
j
] , φ
j
[y
j
; µ
j
(x), s
2
j
(x)], (6)
Φ
j
[y
j
] , Φ
j
[y
j
; µ
j
(x), s
2
j
(x)]. (7)
Given a set of n points X as in (2), a Pareto set P can be constructed that comprises
v n Pareto-optimal (non-dominated) solutions,

Figures
Citations
More filters
Journal ArticleDOI

A Surrogate-Assisted Reference Vector Guided Evolutionary Algorithm for Computationally Expensive Many-Objective Optimization

TL;DR: A surrogate-assisted reference vector guided evolutionary algorithm (SAEA) for computationally expensive optimization problems with more than three objectives that uses Kriging to approximate each objective function to reduce the computational cost.
Journal ArticleDOI

On design optimization for structural crashworthiness and its state of the art

TL;DR: A comprehensive review of the important studies on design optimization for structural crashworthiness and energy absorption is provided in this article, where the authors provide some conclusions and recommendations to enable academia and industry to become more aware of the available capabilities and recent developments in design optimization.
Journal ArticleDOI

A survey on handling computationally expensive multiobjective optimization problems with evolutionary algorithms

TL;DR: A survey of 45 different recent algorithms proposed in the literature between 2008 and 2016 to handle computationally expensive multiobjective optimization problems and identifies and discusses some promising elements and major issues among algorithms in the Literature related to using an approximation and numerical settings used.
Journal ArticleDOI

Multiobjective optimization using Gaussian process emulators via stepwise uncertainty reduction

TL;DR: The method is tested on several numerical examples and on an agronomy problem, showing that it provides an efficient trade-off between exploration and intensification.
Journal ArticleDOI

A Bayesian approach to constrained single- and multi-objective optimization

TL;DR: An extended domination rule is used to handle objectives and constraints in a unified way, and a corresponding expected hyper-volume improvement sampling criterion is proposed, which is compared to state-of-the-art algorithms for single- and multi-objective constrained optimization.
References
More filters
Journal ArticleDOI

A fast and elitist multiobjective genetic algorithm: NSGA-II

TL;DR: This paper suggests a non-dominated sorting-based MOEA, called NSGA-II (Non-dominated Sorting Genetic Algorithm II), which alleviates all of the above three difficulties, and modify the definition of dominance in order to solve constrained multi-objective problems efficiently.
Book

Gaussian Processes for Machine Learning

TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.
Journal ArticleDOI

Efficient Global Optimization of Expensive Black-Box Functions

TL;DR: This paper introduces the reader to a response surface methodology that is especially good at modeling the nonlinear, multimodal functions that often occur in engineering and shows how these approximating functions can be used to construct an efficient global optimization algorithm with a credible stopping rule.
Journal ArticleDOI

The design and analysis of computer experiments

TL;DR: This paper presents a meta-modelling framework for estimating Output from Computer Experiments-Predicting Output from Training Data and Criteria Based Designs for computer Experiments.
Journal ArticleDOI

Performance assessment of multiobjective optimizers: an analysis and review

TL;DR: This study provides a rigorous analysis of the limitations underlying this type of quality assessment in multiobjective evolutionary algorithms and develops a mathematical framework which allows one to classify and discuss existing techniques.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What are the contributions in "Fast calculation of multiobjective probability of improvement and expected improvement criteria for pareto optimization" ?

In this paper, the authors propose the Efficient Multiobjective Optimization ( EMO ) algorithm which uses Kriging models and multiobjective versions of the Probability of Improvement ( PoI ) and Expected Improvement ( EI ) criteria to identify the Pareto front with a minimal number of expensive simulations. 

Future work will focus more on exploring the key benefits of the EMO algorithm on various industrial applications and benchmark problems. In addition, future work will focus on minimizing the number of cells and on an iterative update scheme for the cells, which will be considerable more efficient than recalculating the cells almost each iteration. 

The key contribution of this paper is the Efficient Multiobjective Optimization (EMO) algorithm which is a much more efficient method of evaluating multiobjective versions of the PoI and EI criteria for multiobjective optimization problems. 

This paper deals with the use of surrogate models for expediting the optimization of time-consuming (black-box) problems of a deterministic nature, in contrast to stochastic simulation. 

In order to evaluate these statistical criteria efficiently, one or more integrals need to be evaluated over an integration area A. As A is non-rectangular and often irregularly shaped, especially for a higher number of objective functions, the integral must first be decomposed into a sum of k integrals over rectangular cells. 

The first run is configured with a population size of 25 and a maximum number of generations of 10 (total sample budget 250) and the second run is configured with a population size of 50 and a maximum number of generations of 50 (total sample budget 2500). 

the construction of the Kriging models and the thorough optimization of the statistical criteria make the EMO algorithm more expensive than SMS-EMOA. 

(11)While the cells can be chosen to disjointedly cover the integration area A, the algorithm described in section 3.4 decomposes the region A in overlapping cells. 

Similarly to the hypervolume-based PoI, Keane et al. [26] defines the EI as the product of the PoI P [I] and an Euclidean distance-based improvement function. 

the authors propose to decompose the integration area in as few cells as possible using an efficient computer algorithm, i.e., each cell encompasses a large part of the integration area. 

The integration area A of P [I] corresponds to the non-dominated region and, hence, a closed-form expression of the hypervolume-based PoI can be derived from the same set of cells used to evaluate P [I], see Figure 2b, namely,Phv[I] =( q∑k=1±V ol(µ, lk,uk) ) · P [I] (18)where,V ol(µ, l,u) ={∏m j=1(uj −max(lj , µj(x))) if uj > µj(x) for j = 1 . . .m0 otherwise . 

A plot with the practical computation time and the number of the cells is shown in Figures 5a and 5b, applying the adapted WFG algorithm to sets of Pareto points randomly drawn from the first quadrant of a unit sphere (taking the mean values of 1000 repetitions). 

A good theoretical overview of different types of EI is given by [36], including work on scalar improvement functions [26,16] as well as using the single-objective EI in a multiobjective setting [28,23]. 

These “statistical criteria” guide the selection of new data points in such a way that the objective function is optimized, while minimizing the number of expensive simulations. 

the exclusive hypervolume (or hypervolume contribution, see Figure 2b) of a Pareto set P relative to a point p is defined as,8 Hexc(p,P) = H(P ∪ {p})−H(P). (13)Hexc measures the contribution (or improvement) of the point p to the Pareto set P and, hence, can also be used to define a scalar improvement function, namely,I(p,P) = { Hexc(p,P) if p is not dominated byP 0 otherwise .