scispace - formally typeset
Open AccessJournal ArticleDOI

Automatic selection for general surrogate models

TLDR
A universal criterion that can be applied to any type of surrogate models is introduced, composed of three complementary components measuring the quality of general surrogate models: internal accuracy, predictive performance and a roughness penalty.
Abstract
In design engineering problems, the use of surrogate models (also called metamodels) instead of expensive simulations have become very popular. Surrogate models include individual models (regression, kriging, neural network...) or a combination of individual models often called aggregation or ensemble. Since different surrogate types with various tunings are available, users often struggle to choose the most suitable one for a given problem. Thus, there is a great interest in automatic selection algorithms. In this paper, we introduce a universal criterion that can be applied to any type of surrogate models. It is composed of three complementary components measuring the quality of general surrogate models: internal accuracy (on design points), predictive performance (cross-validation) and a roughness penalty. Based on this criterion, we propose two automatic selection algorithms. The first selection scheme finds the optimal ensemble of a set of given surrogate models. The second selection scheme further explores the space of surrogate models by using an evolutionary algorithm where each individual is a surrogate model. Finally, the performances of the algorithms are illustrated on 15 classical test functions and compared to different individual surrogate models. The results show the efficiency of our approach. In particular, we observe that the three components of the proposed criterion act all together to improve accuracy and limit over-fitting.

read more

Content maybe subject to copyright    Report

HAL Id: hal-01685848
https://hal.archives-ouvertes.fr/hal-01685848
Submitted on 16 Jan 2018
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Automatic selection for general surrogate models
Malek Ben Salem, Lionel Tomaso
To cite this version:
Malek Ben Salem, Lionel Tomaso. Automatic selection for general surrogate models. Structural and
Multidisciplinary Optimization, Springer Verlag (Germany), 2018. �hal-01685848�

manuscript No.
(will be inserted by the editor)
Automatic selection for general surrogate models
Malek Ben Salem · Lionel Tomaso
Received: date / Accepted: date
Abstract In design engineering problems, the use of surro-
gate models (also called metamodels) instead of expensive
simulations have become very popular. Surrogate models
include individual models (regression, kriging, neural net-
work...) or a combination of individual models often called
aggregation or ensemble. Since different surrogate types with
various tunings are available, users often struggle to choose
the most suitable one for a given problem. Thus, there is a
great interest in automatic selection algorithms. In this pa-
per, we introduce a universal criterion that can be applied
to any type of surrogate models. It is composed of three
complementary components measuring the quality of gen-
eral surrogate models: internal accuracy (on design points),
predictive performance (cross-validation) and a roughness
penalty. Based on this criterion, we propose two automatic
selection algorithms. The first selection scheme finds the op-
timal ensemble of a set of given surrogate models. The sec-
ond selection scheme further explores the space of surrogate
models by using an evolutionary algorithm where each indi-
vidual is a surrogate model. Finally, the performances of the
algorithms are illustrated on 15 classical test functions and
compared to different individual surrogate models. The re-
sults show the efficiency of our approach. In particular, we
Malek BEN SALEM is funded by a CIFRE grant from the ANSYS
company, subsidized by the French National Association for Research
and Technology (ANRT, CIFRE grant number 2014/1349).
M. Ben Salem
Ecole des mines de St-Etienne - ANSYS, Inc. Villeurbanne
Tel.: +33 - ...
Fax.: +33 - ...
E-mail: firstname.ben-salem[@]emse.fr
L. Tomaso
ANSYS, Inc. Villeurbanne
Tel.: +33 - ...
Fax.: +33 - ...
E-mail: firstname.name[@]ansys.com
observe that the three components of the proposed criterion
act all together to improve accuracy and limit over-fitting.
Keywords Surrogate modeling · Multiple Surrogate
models · Surrogate Model selection · Cross-Validation
errors
PACS PACS fx122 · PACS fx122 · MSC 4563
1 Introduction
Computer simulations are an efficient tool to study com-
plex physical behaviors. However, high-fidelity simulations
are generally computationally expensive. Therefore, surro-
gate models, also known as metamodels or response sur-
faces, are usually instead used. They provide an approxi-
mation of a response of interest based on a limited number
of expensive simulations. There are several methods of con-
struction of such approximations. Among the popular surro-
gate model types, we can cite for example Kriging (Math-
eron, 1963), support vector machines (SVM) (Smola and
Schlkopf, 2004), Moving least squares (Lancaster and Salka-
uskas, 1981) and Multivariate Adaptive Regressive Splines
(MARS) (Friedman, 1991). Generally, a metamodel family
comes with several possible tunings. In the same time, there
is no universal optimal surrogate for all the problems. Some
users face some difficulties in selecting the most suitable
surrogate for their problem. Thus, there is a great interest
in automatic model selection algorithms. The main purpose
is to choose the surrogate that provides the best prediction
performances on the whole parametric space.
In the literature, this problem is generally studied along
three different approaches.
1) The first approach consists in using algorithms to opti-
mize the settings of a particular surrogate model type.
For instance, (Chen et al, 2004; Lessmann et al, 2006)

2 Malek Ben Salem, Lionel Tomaso
work on SVM, (Zhang et al, 2000) on neural networks,
and (Tomioka et al, 2007) deal with least squares regres-
sion.
2) A second approach consists in considering multiple sur-
rogates or ensembles. The automatic surrogate selection
is so a model selection method. Often, the selected model
is a weighted sum of different surrogate models. For ex-
ample, (Viana et al, 2009; Zhou et al, 2011; Acar and
Rais-Rohani, 2009; Goel et al, 2007) discuss different
ways to build such aggregations.
3) The last approach consists in selecting a good member
among different types of surrogate models with different
settings. We refer for instance to the works of (Gorissen
et al, 2009; Shi et al, 2012; Zhou and Jiang, 2016).
The main objective of our paper is to propose a new rel-
evant surrogate model selection algorithm that can handle
different type of surrogates. To achieve such a goal, we de-
fine a universal criterion. This criterion may evaluate the ac-
curacy of any surrogate model.
The paper is organized as follows. We introduce and dis-
cuss in Section 2 our criterion called the Penalized Predic-
tive Score (PPS). We show in Section 3 that PPS is suit-
able to optimize weights of surrogate models ensembles. In
Section 4, we present an evolutionary selection algorithm
that explores the space of surrogate models. The algorithm
is called PPS Genetic Aggregations (PPS-GA). Finally, the
performances of the algorithm on 15 test cases are displayed
in Section 5. The results show the efficiency of the PPS, the
complementary role of its three components and the rele-
vance of the proposed selection algorithms.
2 Penalized Predictive Score (PPS)
2.1 Definition
Assessing the quality of a surrogate is very challenging. It
is desirable to use an independent set to assess the predic-
tive capabilities of a given method. But, this is computation-
ally expensive in practice. One can also estimate the errors
by computing the errors on design points. Unfortunately,
a small MSE does not imply good predictive capabilities.
Therefore, resampling techniques such as Cross-Validation
(CV) (Stone, 1974) or bootstrap (Efron and Tibshirani, 1993)
are generally used. Such techniques reduce the bias of the
estimation. Nevertheless, they does not prevent overparame-
terized models. We will introduce a criterion that will do this
job. This criterion is called the Penalized Predictive Score
(PPS Equation (1)). It combines three components:
a) The internal accuracy (or fit): we use the mean squared
errors (MSE) on design points.
b) The predictive capability: we propose to use the 10F-CV
PRESS errors.
c) A roughness penalty: We propose to use the Bending
Energy Functional (BEF) ((Duchon, 1977)).
PPS(m,Z
n
) = α
b
R
l
2
,Z
n
(m)
| {z }
a
+βR
10CV
(m)
| {z }
b
+γE
n
(m)
| {z }
c
(1)
Here, as it will be described below,
b
R
l
2
,Z
n
(m) denotes
the MSE criterion, R
10CV
(m) the 10-Fold cross-validation
estimate of the errors and E
n
(m) a roughness penalty. Fur-
ther, α, β , γ are weights in R
+
. In all our implementations,
we use α = 2β and β = 2γ.
2.2 Internal accuracy
Let = [0, 1]
d
be the parametric space of dimension d.
X
n
= (x
1
,... ,x
n
)
>
n
and Y
n
= (y
1
,... ,y
n
)
>
R
n
form
the set of design points Z
n
= (X
n
,Y
n
) where y
i
= f (x
i
) for
i = 1,.. .,n and f R
is an expensive-to-evaluate function.
A surrogate model
b
m
|Z
n
R
is used to replace f based on
the design Z
n
. We call the construction method a “surro-
gate model builder”. For instance, if m is a surrogate model
builder, then we build the surrogate model
b
m
|Z
n
R
based
on the design Z
n
.
The assessment of the performance of a surrogate model
is extremely important in practice (Hastie et al, 2009). It re-
lies on the evaluation on the set of design points of the pre-
diction capabilities of the surrogate model. It is generally
based on a contrast function (or loss function) that measures
the errors between the predicted and the true models. A typi-
cal choice is the square error l
2
(x,y) = (x y)
2
. The integral
form of the MSE is the l
2
risk overall the parametric space.
R
l
2
,Z
n
(m) =
Z
l
2
d
m
|Z
n
(x), f (x)
dx (2)
Since f is unknown, we can only use an approximation
to estimate this risk. Ideally, the performance of the surro-
gate model would be evaluated on an extra set of points.
However, generating such set is sometimes computationally
expensive. Therefore, one use an empirical distribution as-
sociated to the set of design points. Computing the mean
square errors (MSE) (Equation (3)) on the set of design points
for the surrogate model
d
m
|Z
n
is an empirical approximation
of R
l
2
,Z
n
(m) defined in Equation (2).
b
R
l
2
,Z
n
(m) =
1
n
n
i=1
l
2
(
d
m
|Z
n
(x
i
),y
i
)
=
1
n
n
i=1
(
d
m
|Z
n
(x
i
) y
i
)
2
(3)
Note that computing the MSE on the set of design points
is a biased estimate of the error in the whole space. In fact,

PPS for surrogate models selection 3
for any interpolating surrogate model m,
b
R
l
2
,Z
n
(m) = 0. This
does not necessarily mean that the surrogate model fits the
real function in the whole space.
2.3 Predictive capabilities
On one hand, the use of design points to estimate the er-
rors yields an optimistic result (Arlot and Celisse, 2010).
On the other hand, using a validation set can be expensive.
Therefore, it is convenient to use re-sampling techniques
such as Cross-Validation (CV) (Stone, 1974) and bootstrap
(Efron and Tibshirani, 1993) to estimate the predicted er-
rors. Resampling techniques estimate the errors by using
subsets of the design points to build several sub-surrogate
models. For instance, computing the Leave-One-Out Cross-
Validation (LOO-CV) errors of a surrogate model
d
m
|Z
n
con-
sists in computing the errors of an observation (x
i
,y
i
) based
on the surrogate model \m
|Z
n,i
built on the subset of all the
design points expect the i
th
design point (Z
n,i
= (x
j
,y
j
)
j6=i
).
In the same way, k-fold cross-validation (kF-CV) consists
in dividing the data into k subsets. Each subset plays the
role of validation set while the remaining k 1 subsets are
used together as the training set. If k is the number of folds,
for i 1,...,k let Z
(i)
P(Z
n
) be a subset of Z
n
such that
k
i=1
Z
(i)
= Z
n
. The k F-CV estimates of the l
2
errors (Equa-
tion (4)) by computing the loss of a point in the i
th
fold Z
(i)
compared to the prediction of the surrogate model built on
the remaining folds (Z
n
\Z
(i)
).
R
kCV
(m) =
1
n
k
i=1
(x
0
,y
0
)Z
(i)
l
2
( \m
|Z
n
\Z
(i)
(x
0
),y
0
)
(4)
where z Z
n
\Z
(i)
if and only if x Z
n
and x / Z
(i)
.
(Queipo et al, 2005) pointed out that the main advantage
of CV is that it provides a nearly unbiased estimate. Further,
(Kohavi, 1995) studied Cross-Validation and Bootstrap per-
formances on a large dataset and recommended using strati-
fied 10-fold-cross-validation. (James et al, 2013) stated that
kF-CV with k = 5 or k = 10 yield test error estimates that
suffer neither from excessively high bias nor from very high
variance.
2.4 Penalization
Penalties are used in several model selection frameworks in
order to prevent over-fitting. Selection criteria such as the
Bayesian Information Criterion (BIC) (Schwarz et al, 1978)
or Akaike Information Criterion (AIC) (Akaike, 1974) pe-
nalize the models by their degrees of freedom. Most penal-
ties are designed for a particular family of surrogates. Here,
we are interested in universal methods. So that, we prefer to
deal with the smoothness of the surrogate model rather than
with its structural complexity. For instance, (Nguyen et al,
2011) introduce a criterion called Linear Reference Model
(LRM). It scores a surrogate model by computing the devia-
tion between its predictions and a local linear model
c
l
rm
. The
LRM is computed over a set of N points x
(k)
for k = 1,... ,N
(see Equation (5)).
R
LRM
(m) =
1
N
N
k=1
l
2
(
d
m
|Z
n
(x
(k)
),
c
l
rm
(x
(k)
)) (5)
Computationally, this last criteria needs the construction
of a Delaunay tessellation (Watson, 1981) to compute
c
l
rm
.
The computational cost of such construction in high dimen-
sion is too expensive. We suggest to use a criterion that
penalize the roughness of surrogate models: the thin plate
spline (TPS) (Duchon, 1977) Bending Energy Functional
(BEF). It is a second order partial derivatives-based penalty.
For a dimension d, the roughness penalty E
n
is the integral
of the squared term of the Hessian (Equation (6)).
E
n
(
ˆ
f ) =
Z
d
i=1
d
j=1
2
ˆ
f
x
i
x
j
2
dx (6)
LRM can be used in place of the BEF in the selection
criterion PPS. It penalizes the deviation from a linear model
regardless of its roughness. It still gives good predictive ca-
pabilities also. Nevertheless, some rough surrogates may be
selected.
3 Surrogate model ensemble: PPS-OS
3.1 Overview
Surrogate model selection consists in selecting a surrogate
model among a collection of them. This means that we eval-
uate the performances of several surrogate models and then
choose one of them. (Acar and Rais-Rohani, 2009) stated
that this practice has some shortcomings as it does not take
full advantage of the resources devoted to constructing dif-
ferent metamodels. In fact, it is possible to consider a weighted
combination of surrogates without any significant extra com-
putational cost. These combinations are called: ensembles,
aggregations and multiple surrogates.
(Forrester and Keane, 2009) show that these aggrega-
tion methods drastically improve the performances of the
surrogate models. In general, ensembles require small com-
putational resources compared to the cost of the simulations
(Queipo et al, 2005). The general form of an aggregation of

4 Malek Ben Salem, Lionel Tomaso
p surrogate models
d
m
(i)
|Z
n
, for i = 1,..., p is given in Equa-
tion (7):
b
A
|Z
n
(x) =
p
i=0
w
i
(x)
d
m
(i)
|Z
n
(x) (7)
For instance, (Zerpa et al, 2005) considered a local com-
bination called weighted average model where the weights
are based on the local expected variances of the surrogate
models. (Goel et al, 2007) extended the use of ensembles to
the identification of region with high error. They presented
also several heuristics to weight ensembles.
However, (Gorissen et al, 2009) used a simple average
ensemble (all the weights are equal). (M
¨
uller and Pich
´
e,
2011) proposed to weight the aggregation using the Dempster-
Shafer theory where the error estimates are used as basic
probability assignments. (Viana et al, 2009) proposed to use
an ensemble of surrogate models that minimize the CV er-
rors. In fact, if for k = 1, ...,n, v
k
is the vector of CV er-
rors of the surrogate model
d
m
(i)
|Z
n
, the CV errors of the ag-
gregation is then W
>
CW . The weights are selected to min-
imize the CV errors of the aggregation under the constraint
p
i=1
w
i
= 1. The optimal weighted surrogate OW S is obtained
using the weights of Equation (8).
W =
C
1
1
1
>
C
1
1
(8)
where the elements of the matrix C, c
i j
=< v
i
,v
j
>. (Viana
et al, 2009) noticed that the solution may include negative
values. They stated that this additional freedom to the weights
estimation amplify errors. In fact, the matrix C is an ap-
proximation of the covariance of the errors of the surrogate
models. To overcome the problem, the authors suggested
to use only the diagonal elements of C. Then, the weights
are w
i
=
c
1
ii
n
k=1
c
1
kk
. This formulation is close to the weights of
the PRESS weighted surrogate (PWS) given in (Goel et al,
2007) (equation (9)), with α = 0, β = 2.
w
i
=
(
c
ii
+
α
n
n
j=1
c
j j
)
β
n
k=1
(
c
kk
+
α
n
n
j=1
c
j j
)
β
(9)
3.2 PPS-optimal ensemble
Let us consider (
d
m
(1)
|Z
n
,... ,
d
m
(n)
|Z
n
) a set of p surrogate
models. Let A be an aggregation of these surrogate models
weighted by the vector W = (w
1
,..,w
n
) (Equation (10)).
b
A(x) =
p
k=1
w
k
d
m
(k)
|Z
n
(x) (10)
In our formulation, we compute the weights of the aggre-
gations by optimizing the PPS of the aggregation under the
constraint
p
i=1
w
i
= 1. The PPS-Optimal aggregation is then
the aggregation in which the weights are the solution of the
optimization Problem (11).
min
W
PPS(A,Z
n
)
u.c.
p
i=1
w
i
= 1
(11)
For each k in 1,... , p, let:
e
k
be the vector of errors on design points.
v
k
the vector of cross-validation error of the surrogate
model
d
m
(k)
)
|Z
n
.
Notice then that the MSE of the aggregation is a quadratic
form of the weights
R
l
2
,
b
P
n
(A) =
p
i=1
w
i
e
i
= W
T
EW (12)
Where the elements of E, E
i j
=< e
i
,e
j
>. Similarly, the
cross validation errors of the aggregation is also a quadratic
form of the weights (Equation (13)) where C is the same
defined in the previous section.
R
CV
(A) = W
T
CW (13)
Last, the energy functional is also a quadratic form of the
weights (Equation 14).
E
n
(
b
A) =
Z
d
i=1
d
j=1
p
k=1
w
k
2
d
m
(k)
|Z
n
(x)
x
i
x
j
2
dx
= W
T
KW
(14)
where: K =
h
k
kl
=
d
i=1
d
j=1
R
2
d
m
(k)
|Z
n
(x)
x
i
x
j

2
d
m
(l)
|Z
n
(x)
x
i
x
j
dx
i
.
Let R = αE + βC + γK. The PPS of the aggregation is
then a quadratic form of the weights W: PPS(
b
A) = W
T
RW.
The PPS-Optimal aggregation is then the aggregation that
minimizes the PPS under the constraint
n
i=1
w
i
= 1. The so-
lution is defined in Equation (15):
W
?
=
R
1
1
1
>
R
1
1
(15)

Citations
More filters
Journal ArticleDOI

Inverse modeling of saturated-unsaturated flow in site-scale fractured rocks using the continuum approach: A case study at Baihetan dam site, Southwest China

TL;DR: In this article, a combined procedure of orthogonal design, finite element forward analysis, artificial neural network, and genetic algorithm is adopted for the inverse analysis with high computational efficiency.
Journal ArticleDOI

Modelling for Digital Twins—Potential Role of Surrogate Models

TL;DR: The aim was to overview the difficulties and challenges regarding the modelling aspects of digital twin applications and to explore the fields where surrogate models can be utilised advantageously.
Journal ArticleDOI

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy function

TL;DR: Analytically it is shown analytically that the Bayesian MFS framework utilizes the scale factor to reduce the waviness and variation of the discrepancy function by maximizing the Gaussian process-based likelihood function.
Journal ArticleDOI

Efficient global optimization with ensemble and selection of kernel functions for engineering design

TL;DR: It is revealed that the ensemble techniques improve the robustness and performance of EGO and that the use of Matérn-kernels yields better results than those of the Gaussian kernel when EGO with a single kernel is considered.
Journal ArticleDOI

Understanding the Effect of Hyperparameter Optimization on Machine Learning Models for Structure Design Problems

TL;DR: This work proposed to establish a hyperparameter optimization (HOpt) framework, and showed that HOpt can generally improve the performance of the MLA models in general and investigated the additional computational costs incurred by HOpt.
References
More filters
Journal ArticleDOI

A new look at the statistical model identification

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Journal ArticleDOI

Estimating the Dimension of a Model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Book

An introduction to the bootstrap

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.

Estimating the dimension of a model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What are the future works in "Automatic selection for general surrogate models" ?

Assessing the level of confidence of a prediction is left for future research. 

In this paper, the authors introduce a universal criterion that can be applied to any type of surrogate models. Based on this criterion, the authors propose two automatic selection algorithms. The results show the efficiency of their approach. The second selection scheme further explores the space of surrogate models by using an evolutionary algorithm where each individual is a surrogate model. 

The mutation and cross-over operators between two surrogate models of the same type are performed by modifying or exchanging the surrogate models settings. 

The optimal weighted surrogate OWS is obtained using the weights of Equation (8).W = C−111>C−11 (8)where the elements of the matrix C, ci j =< vi,vj >. (Viana et al, 2009) noticed that the solution may include negative values. 

For instance, to tune a universal kriging surrogate model, there are various possible choices for covariance function and trend function. 

the cross validation errors of the aggregation is also a quadratic form of the weights (Equation (13)) where C is the same defined in the previous section. 

(k)|Zn(x) (10)In their formulation, the authors compute the weights of the aggregations by optimizing the PPS of the aggregation under the constraint p ∑i=1 wi = 1. 

it is convenient to use re-sampling techniques such as Cross-Validation (CV) (Stone, 1974) and bootstrap (Efron and Tibshirani, 1993) to estimate the predicted errors.