What are the future works in "Automatic selection for general surrogate models" ?

Assessing the level of confidence of a prediction is left for future research.

What are the contributions mentioned in the paper "Automatic selection for general surrogate models" ?

In this paper, the authors introduce a universal criterion that can be applied to any type of surrogate models. Based on this criterion, the authors propose two automatic selection algorithms. The results show the efficiency of their approach. The second selection scheme further explores the space of surrogate models by using an evolutionary algorithm where each individual is a surrogate model.

What is the evolution of surrogate models?

The mutation and cross-over operators between two surrogate models of the same type are performed by modifying or exchanging the surrogate models settings.

What is the optimal weighted surrogate OWS?

The optimal weighted surrogate OWS is obtained using the weights of Equation (8).W = C−111>C−11 (8)where the elements of the matrix C, ci j = . (Viana et al, 2009) noticed that the solution may include negative values.

What is the way to select a surrogate model?

For instance, to tune a universal kriging surrogate model, there are various possible choices for covariance function and trend function.

(Open Access) Automatic selection for general surrogate models (2018) | Malek Ben Salem

Q: what is the aggregation of the surrogate models?

(k)|Zn(x) (10)In their formulation, the authors compute the weights of the aggregations by optimizing the PPS of the aggregation under the constraint p ∑i=1 wi = 1.

HAL Id: hal-01685848

https://hal.archives-ouvertes.fr/hal-01685848

Submitted on 16 Jan 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Automatic selection for general surrogate models

Malek Ben Salem, Lionel Tomaso

To cite this version:

Malek Ben Salem, Lionel Tomaso. Automatic selection for general surrogate models. Structural and

Multidisciplinary Optimization, Springer Verlag (Germany), 2018. �hal-01685848�

manuscript No.

(will be inserted by the editor)

Automatic selection for general surrogate models

Malek Ben Salem · Lionel Tomaso

Received: date / Accepted: date

Abstract In design engineering problems, the use of surro-

gate models (also called metamodels) instead of expensive

simulations have become very popular. Surrogate models

include individual models (regression, kriging, neural net-

work...) or a combination of individual models often called

aggregation or ensemble. Since different surrogate types with

various tunings are available, users often struggle to choose

the most suitable one for a given problem. Thus, there is a

great interest in automatic selection algorithms. In this pa-

per, we introduce a universal criterion that can be applied

to any type of surrogate models. It is composed of three

complementary components measuring the quality of gen-

eral surrogate models: internal accuracy (on design points),

predictive performance (cross-validation) and a roughness

penalty. Based on this criterion, we propose two automatic

selection algorithms. The ﬁrst selection scheme ﬁnds the op-

timal ensemble of a set of given surrogate models. The sec-

ond selection scheme further explores the space of surrogate

models by using an evolutionary algorithm where each indi-

vidual is a surrogate model. Finally, the performances of the

algorithms are illustrated on 15 classical test functions and

compared to different individual surrogate models. The re-

sults show the efﬁciency of our approach. In particular, we

Malek BEN SALEM is funded by a CIFRE grant from the ANSYS

company, subsidized by the French National Association for Research

and Technology (ANRT, CIFRE grant number 2014/1349).

M. Ben Salem

Ecole des mines de St-Etienne - ANSYS, Inc. Villeurbanne

Tel.: +33 - ...

Fax.: +33 - ...

E-mail: ﬁrstname.ben-salem[@]emse.fr

L. Tomaso

ANSYS, Inc. Villeurbanne

Tel.: +33 - ...

Fax.: +33 - ...

E-mail: ﬁrstname.name[@]ansys.com

observe that the three components of the proposed criterion

act all together to improve accuracy and limit over-ﬁtting.

Keywords Surrogate modeling · Multiple Surrogate

models · Surrogate Model selection · Cross-Validation

errors

PACS PACS fx122 · PACS fx122 · MSC 4563

1 Introduction

Computer simulations are an efﬁcient tool to study com-

plex physical behaviors. However, high-ﬁdelity simulations

are generally computationally expensive. Therefore, surro-

gate models, also known as metamodels or response sur-

faces, are usually instead used. They provide an approxi-

mation of a response of interest based on a limited number

of expensive simulations. There are several methods of con-

struction of such approximations. Among the popular surro-

gate model types, we can cite for example Kriging (Math-

eron, 1963), support vector machines (SVM) (Smola and

Schlkopf, 2004), Moving least squares (Lancaster and Salka-

uskas, 1981) and Multivariate Adaptive Regressive Splines

(MARS) (Friedman, 1991). Generally, a metamodel family

comes with several possible tunings. In the same time, there

is no universal optimal surrogate for all the problems. Some

users face some difﬁculties in selecting the most suitable

surrogate for their problem. Thus, there is a great interest

in automatic model selection algorithms. The main purpose

is to choose the surrogate that provides the best prediction

performances on the whole parametric space.

In the literature, this problem is generally studied along

three different approaches.

1) The ﬁrst approach consists in using algorithms to opti-

mize the settings of a particular surrogate model type.

For instance, (Chen et al, 2004; Lessmann et al, 2006)

2 Malek Ben Salem, Lionel Tomaso

work on SVM, (Zhang et al, 2000) on neural networks,

and (Tomioka et al, 2007) deal with least squares regres-

sion.

2) A second approach consists in considering multiple sur-

rogates or ensembles. The automatic surrogate selection

is so a model selection method. Often, the selected model

is a weighted sum of different surrogate models. For ex-

ample, (Viana et al, 2009; Zhou et al, 2011; Acar and

Rais-Rohani, 2009; Goel et al, 2007) discuss different

ways to build such aggregations.

3) The last approach consists in selecting a good member

among different types of surrogate models with different

settings. We refer for instance to the works of (Gorissen

et al, 2009; Shi et al, 2012; Zhou and Jiang, 2016).

The main objective of our paper is to propose a new rel-

evant surrogate model selection algorithm that can handle

different type of surrogates. To achieve such a goal, we de-

ﬁne a universal criterion. This criterion may evaluate the ac-

curacy of any surrogate model.

The paper is organized as follows. We introduce and dis-

cuss in Section 2 our criterion called the Penalized Predic-

tive Score (PPS). We show in Section 3 that PPS is suit-

able to optimize weights of surrogate models ensembles. In

Section 4, we present an evolutionary selection algorithm

that explores the space of surrogate models. The algorithm

is called PPS Genetic Aggregations (PPS-GA). Finally, the

performances of the algorithm on 15 test cases are displayed

in Section 5. The results show the efﬁciency of the PPS, the

complementary role of its three components and the rele-

vance of the proposed selection algorithms.

2 Penalized Predictive Score (PPS)

2.1 Deﬁnition

Assessing the quality of a surrogate is very challenging. It

is desirable to use an independent set to assess the predic-

tive capabilities of a given method. But, this is computation-

ally expensive in practice. One can also estimate the errors

by computing the errors on design points. Unfortunately,

a small MSE does not imply good predictive capabilities.

Therefore, resampling techniques such as Cross-Validation

(CV) (Stone, 1974) or bootstrap (Efron and Tibshirani, 1993)

are generally used. Such techniques reduce the bias of the

estimation. Nevertheless, they does not prevent overparame-

terized models. We will introduce a criterion that will do this

job. This criterion is called the Penalized Predictive Score

(PPS Equation (1)). It combines three components:

a) The internal accuracy (or ﬁt): we use the mean squared

errors (MSE) on design points.

b) The predictive capability: we propose to use the 10F-CV

PRESS errors.

c) A roughness penalty: We propose to use the Bending

Energy Functional (BEF) ((Duchon, 1977)).

PPS(m,Z

) = α

(m)

| {z }

+βR

10−CV

(m)

| {z }

+γE

(m)

| {z }

(1)

Here, as it will be described below,

(m) denotes

the MSE criterion, R

10−CV

(m) the 10-Fold cross-validation

estimate of the errors and E

(m) a roughness penalty. Fur-

ther, α, β , γ are weights in R

. In all our implementations,

we use α = 2β and β = 2γ.

2.2 Internal accuracy

Let Ω = [0, 1]

be the parametric space of dimension d.

= (x

,... ,x

)

∈ Ω

and Y

= (y

,... ,y

)

∈ R

form

the set of design points Z

= (X

) where y

= f (x

) for

i = 1,.. .,n and f ∈R

Ω

is an expensive-to-evaluate function.

A surrogate model

∈ Ω

is used to replace f based on

the design Z

. We call the construction method a “surro-

gate model builder”. For instance, if m is a surrogate model

builder, then we build the surrogate model

∈ Ω

based

on the design Z

The assessment of the performance of a surrogate model

is extremely important in practice (Hastie et al, 2009). It re-

lies on the evaluation on the set of design points of the pre-

diction capabilities of the surrogate model. It is generally

based on a contrast function (or loss function) that measures

the errors between the predicted and the true models. A typi-

cal choice is the square error l

(x,y) = (x −y)

. The integral

form of the MSE is the l

−risk overall the parametric space.

(m) =

Ω



(x), f (x)



dx (2)

Since f is unknown, we can only use an approximation

to estimate this risk. Ideally, the performance of the surro-

gate model would be evaluated on an extra set of points.

However, generating such set is sometimes computationally

expensive. Therefore, one use an empirical distribution as-

sociated to the set of design points. Computing the mean

square errors (MSE) (Equation (3)) on the set of design points

for the surrogate model

is an empirical approximation

of R

(m) deﬁned in Equation (2).

(m) =

∑

i=1

(

),y

)

∑

i=1

(

) −y

)

(3)

Note that computing the MSE on the set of design points

is a biased estimate of the error in the whole space. In fact,

PPS for surrogate models selection 3

for any interpolating surrogate model m,

(m) = 0. This

does not necessarily mean that the surrogate model ﬁts the

real function in the whole space.

2.3 Predictive capabilities

On one hand, the use of design points to estimate the er-

rors yields an optimistic result (Arlot and Celisse, 2010).

On the other hand, using a validation set can be expensive.

Therefore, it is convenient to use re-sampling techniques

such as Cross-Validation (CV) (Stone, 1974) and bootstrap

(Efron and Tibshirani, 1993) to estimate the predicted er-

rors. Resampling techniques estimate the errors by using

subsets of the design points to build several sub-surrogate

models. For instance, computing the Leave-One-Out Cross-

Validation (LOO-CV) errors of a surrogate model

con-

sists in computing the errors of an observation (x

) based

on the surrogate model \m

n,−i

built on the subset of all the

design points expect the i

design point (Z

n,−i

= (x

)

j6=i

In the same way, k-fold cross-validation (kF-CV) consists

in dividing the data into k subsets. Each subset plays the

role of validation set while the remaining k −1 subsets are

used together as the training set. If k is the number of folds,

for i ∈ 1,...,k let Z

(i)

∈ P(Z

) be a subset of Z

such that

∪

i=1

(i)

= Z

. The k F-CV estimates of the l

errors (Equa-

tion (4)) by computing the loss of a point in the i

fold Z

(i)

compared to the prediction of the surrogate model built on

the remaining folds (Z

(i)

k−CV

(m) =

∑

i=1

∑

)∈Z

(i)

( \m

(i)

),y

)

(4)

where z ∈ Z

(i)

if and only if x ∈ Z

and x /∈ Z

(i)

(Queipo et al, 2005) pointed out that the main advantage

of CV is that it provides a nearly unbiased estimate. Further,

(Kohavi, 1995) studied Cross-Validation and Bootstrap per-

formances on a large dataset and recommended using strati-

ﬁed 10-fold-cross-validation. (James et al, 2013) stated that

kF-CV with k = 5 or k = 10 yield test error estimates that

suffer neither from excessively high bias nor from very high

variance.

2.4 Penalization

Penalties are used in several model selection frameworks in

order to prevent over-ﬁtting. Selection criteria such as the

Bayesian Information Criterion (BIC) (Schwarz et al, 1978)

or Akaike Information Criterion (AIC) (Akaike, 1974) pe-

nalize the models by their degrees of freedom. Most penal-

ties are designed for a particular family of surrogates. Here,

we are interested in universal methods. So that, we prefer to

deal with the smoothness of the surrogate model rather than

with its structural complexity. For instance, (Nguyen et al,

2011) introduce a criterion called Linear Reference Model

(LRM). It scores a surrogate model by computing the devia-

tion between its predictions and a local linear model

. The

LRM is computed over a set of N points x

(k)

for k = 1,... ,N

(see Equation (5)).

LRM

(m) =

∑

k=1

(

(k)

)) (5)

Computationally, this last criteria needs the construction

of a Delaunay tessellation (Watson, 1981) to compute

The computational cost of such construction in high dimen-

sion is too expensive. We suggest to use a criterion that

penalize the roughness of surrogate models: the thin plate

spline (TPS) (Duchon, 1977) Bending Energy Functional

(BEF). It is a second order partial derivatives-based penalty.

For a dimension d, the roughness penalty E

is the integral

of the squared term of the Hessian (Equation (6)).

(

f ) =

Ω

∑

i=1

∑

j=1



∂

∂ x



dx (6)

LRM can be used in place of the BEF in the selection

criterion PPS. It penalizes the deviation from a linear model

regardless of its roughness. It still gives good predictive ca-

pabilities also. Nevertheless, some rough surrogates may be

selected.

3 Surrogate model ensemble: PPS-OS

3.1 Overview

Surrogate model selection consists in selecting a surrogate

model among a collection of them. This means that we eval-

uate the performances of several surrogate models and then

choose one of them. (Acar and Rais-Rohani, 2009) stated

that this practice has some shortcomings as it does not take

full advantage of the resources devoted to constructing dif-

ferent metamodels. In fact, it is possible to consider a weighted

combination of surrogates without any signiﬁcant extra com-

putational cost. These combinations are called: ensembles,

aggregations and multiple surrogates.

(Forrester and Keane, 2009) show that these aggrega-

tion methods drastically improve the performances of the

surrogate models. In general, ensembles require small com-

putational resources compared to the cost of the simulations

(Queipo et al, 2005). The general form of an aggregation of

4 Malek Ben Salem, Lionel Tomaso

p surrogate models

(i)

, for i = 1,..., p is given in Equa-

tion (7):

(x) =

∑

i=0

(x)

(i)

(x) (7)

For instance, (Zerpa et al, 2005) considered a local com-

bination called weighted average model where the weights

are based on the local expected variances of the surrogate

models. (Goel et al, 2007) extended the use of ensembles to

the identiﬁcation of region with high error. They presented

also several heuristics to weight ensembles.

However, (Gorissen et al, 2009) used a simple average

ensemble (all the weights are equal). (M

uller and Pich

2011) proposed to weight the aggregation using the Dempster-

Shafer theory where the error estimates are used as basic

probability assignments. (Viana et al, 2009) proposed to use

an ensemble of surrogate models that minimize the CV er-

rors. In fact, if for k = 1, ...,n, v

is the vector of CV er-

rors of the surrogate model

(i)

, the CV errors of the ag-

gregation is then W

CW . The weights are selected to min-

imize the CV errors of the aggregation under the constraint

∑

i=1

= 1. The optimal weighted surrogate OW S is obtained

using the weights of Equation (8).

W =

−1

(8)

where the elements of the matrix C, c

i j

=< v

>. (Viana

et al, 2009) noticed that the solution may include negative

values. They stated that this additional freedom to the weights

estimation amplify errors. In fact, the matrix C is an ap-

proximation of the covariance of the errors of the surrogate

models. To overcome the problem, the authors suggested

to use only the diagonal elements of C. Then, the weights

are w

−1

∑

k=1

−1

. This formulation is close to the weights of

the PRESS weighted surrogate (PWS) given in (Goel et al,

2007) (equation (9)), with α = 0, β = −2.

(

√

∑

j=1

√

j j

)

∑

k=1

(

√

∑

j=1

√

j j

)

(9)

3.2 PPS-optimal ensemble

Let us consider (

(1)

,... ,

(n)

) a set of p surrogate

models. Let A be an aggregation of these surrogate models

weighted by the vector W = (w

,..,w

) (Equation (10)).

A(x) =

∑

k=1

(k)

(x) (10)

In our formulation, we compute the weights of the aggre-

gations by optimizing the PPS of the aggregation under the

constraint

∑

i=1

= 1. The PPS-Optimal aggregation is then

the aggregation in which the weights are the solution of the

optimization Problem (11).

min

PPS(A,Z

)

u.c.

∑

i=1

= 1

(11)

For each k in 1,... , p, let:

– e

be the vector of errors on design points.

– v

the vector of cross-validation error of the surrogate

model

(k)

)

Notice then that the MSE of the aggregation is a quadratic

form of the weights

(A) =



∑

i=1



= W

EW (12)

Where the elements of E, E

i j

=< e

>. Similarly, the

cross validation errors of the aggregation is also a quadratic

form of the weights (Equation (13)) where C is the same

deﬁned in the previous section.

(A) = W

CW (13)

Last, the energy functional is also a quadratic form of the

weights (Equation 14).

(

A) =

Ω

∑

i=1

∑

j=1



∑

k=1

∂

(k)

(x)

∂ x



= W

(14)

where: K =

∑

i=1

∑

j=1

Ω



∂

(k)

(x)

∂ x



∂

(l)

(x)

∂ x



Let R = αE + βC + γK. The PPS of the aggregation is

then a quadratic form of the weights W: PPS(

A) = W

RW.

The PPS-Optimal aggregation is then the aggregation that

minimizes the PPS under the constraint

∑

i=1

= 1. The so-

lution is deﬁned in Equation (15):

−1

(15)

Automatic selection for general surrogate models

Figures

Citations

Inverse modeling of saturated-unsaturated flow in site-scale fractured rocks using the continuum approach: A case study at Baihetan dam site, Southwest China

Modelling for Digital Twins—Potential Role of Surrogate Models

Low-fidelity scale factor improves Bayesian multi-fidelity prediction by reducing bumpiness of discrepancy function

Efficient global optimization with ensemble and selection of kernel functions for engineering design

Understanding the Effect of Hyperparameter Optimization on Machine Learning Models for Structure Design Problems

References

A new look at the statistical model identification

Estimating the Dimension of a Model

An introduction to the bootstrap

Estimating the dimension of a model

The Elements of Statistical Learning

Related Papers (5)

Universal Prediction Distribution for Surrogate Models

Structural reliability analysis based on ensemble learning of surrogate models

The design and analysis of computer experiments

An Empirical Validation Protocol for Large-Scale Agent-Based Models

Various approaches for constructing an ensemble of metamodels using local measures

Frequently Asked Questions (8)

Q1. What are the future works in "Automatic selection for general surrogate models" ?

Q2. What are the contributions mentioned in the paper "Automatic selection for general surrogate models" ?

Q3. What is the evolution of surrogate models?

Q4. What is the optimal weighted surrogate OWS?

Q5. What is the way to select a surrogate model?

Q6. What is the aggregation of the surrogate models?

Q7. what is the aggregation of the surrogate models?

Q8. What techniques are useful to estimate the errors of a surrogate model?