scispace - formally typeset
Open AccessJournal ArticleDOI

A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data

Pritam Ranjan, +2 more
- 01 Nov 2011 - 
- Vol. 53, Iss: 4, pp 366-378
TLDR
In this paper, the authors propose a lower bound on the nugget that minimizes the over-smoothing and an iterative regularization approach to construct a predictor that further improves the inter...
Abstract
For many expensive deterministic computer simulators, the outputs do not have replication error and the desired metamodel (or statistical emulator) is an interpolator of the observed data. Realizations of Gaussian spatial processes (GP) are commonly used to model such simulator outputs. Fitting a GP model to n data points requires the computation of the inverse and determinant of n×n correlation matrices, R, that are sometimes computationally unstable due to near-singularity of R. This happens if any pair of design points are very close together in the input space. The popular approach to overcome near-singularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with other model parameters. The inclusion of a nugget in the model often causes unnecessary over-smoothing of the data. In this article, we propose a lower bound on the nugget that minimizes the over-smoothing and an iterative regularization approach to construct a predictor that further improves the inter...

read more

Content maybe subject to copyright    Report

A Computationally Stable Approach to Gaussian
Process Interpolation of Deterministic
Computer Simulation Data
Pritam RANJAN
Department of Mathematics and Statistics
Acadia University
Wolfville, NS, B4P 2R6, Canada
(
pritam.ranjan@acadiau.ca
)
Ronald H
AY N E S
Department of Mathematics and Statistics
Memorial University
St. John’s, NL, A1C 5S7, Canada
(
rhaynes@mun.ca
)
Richard K
ARSTEN
Department of Mathematics and Statistics
Acadia University
Wolfville, NS, B4P 2R6, Canada
(
richard.karsten@acadiau.ca
)
For many expensive deterministic computer simulators, the outputs do not have replication error and
the desired metamodel (or statistical emulator) is an interpolator of the observed data. Realizations of
Gaussian spatial processes (GP) are commonly used to model such simulator outputs. Fitting a GP model
to n data points requires the computation of the inverse and determinant of n × n correlation matrices,
R, that are sometimes computationally unstable due to near-singularity of R. This happens if any pair
of design points are very close together in the input space. The popular approach to overcome near-
singularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with
other model parameters. The inclusion of a nugget in the model often causes unnecessary over-smoothing
of the data. In this article, we propose a lower bound on the nugget that minimizes the over-smoothing
and an iterative regularization approach to construct a predictor that further improves the interpolation
accuracy. We also show that the proposed predictor converges to the GP interpolator.
KEY WORDS: Computer experiment; Matrix inverse approximation; Regularization.
1. INTRODUCTION
Computer simulators are often used to model complex phys-
ical and engineering processes that are either too expensive or
time consuming to observe. A simulator is said to be determin-
istic if the replicate runs of the same inputs will yield iden-
tical responses. For the last few decades, deterministic sim-
ulators have been widely used to model physical processes.
For instance, Kumar and Davidson (1978) used deterministic
simulation models for comparing the performance of highly
concurrent computers; Su et al. (1996) used generalized linear
regression models to design a lamp filament via a determinis-
tic finite-element computer code; Aslett et al. (1998) discussed
an optimization problem for a deterministic circuit simulator;
several deterministic simulators are being used for analyzing
biochemical networks (see Bergmann and Sauro 2008 for ref-
erences). On the other hand, there are cases where stochastic
(nondeterministic) simulators are preferred due to unavoidable
biases (e.g., Poole and Raftery 2000). In spite of the recent in-
terest in stochastic simulators, deterministic simulators are still
being actively used. For instance, Medina, Moreno, and Royo
(2005) demonstrated the preference of deterministic traffic sim-
ulators over their stochastic counterparts. In this article, we as-
sume that the simulator under consideration is deterministic up
to working precision and the scientist is confident about the va-
lidity of the simulator.
Sacks et al. (1989) proposed modeling (or emulating) such
an expensive deterministic simulator as a realization of a Gaus-
sian stochastic process (GP). An emulator of a deterministic
simulator is desired to be an interpolator of the observed data
(e.g., Sacks et al. 1989; Van Beers and Kleijnen 2004). For the
problem that motivated this work, the objective is to emulate
the average extractable tidal power as a function of the turbine
locations in the Bay of Fundy, Nova Scotia, Canada. The deter-
ministic computer simulator for the tidal power model is a nu-
merical solver of a complex system of partial differential equa-
tions, and we accept the simulator as a valid representation of
the tidal power.
In this article, we discuss a computational issue in building
the GP based emulator for a deterministic simulator. Fitting
a GP model to n data points using either a maximum likeli-
hood technique or a Bayesian approach requires the computa-
tion of the determinant and inverse of several n × n correla-
tion matrices, R. Although the correlation matrices are posi-
tive definite by definition, near-singularity (also referred to as
ill-conditioning) of these matrices is a common problem in fit-
ting GP models. Ababou, Bagtzoglou, and Wood (1994) stud-
ied the relationship of a uniform grid to the ill-conditioning and
© 2011 American Statistical Association and
the American Society for Quality
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4
DOI 10.1198/TECH.2011.09141
366

COMPUTATIONALLY STABLE APPROACH TO GAUSSIAN PROCESS INTERPOLATION 367
quality of model fit for various covariance models. Barton and
Salagame (1997) studied the effect of experimental design on
the ill-conditioning of kriging models. Jones, Schonlau, and
Welch (1998) used the singular value decomposition to over-
come the near-singularity of R. Booker (2000) used the sum of
independent GPs to overcome near-singularity for multistage
adaptive designs in kriging models. A more popular solution
to overcome near-singularity is to introduce a nugget or jitter
parameter, δ, in the model (e.g., Sacks et al. 1989; Neal 1997;
Booker et al. 1999; Santner, Williams, and Notz 2003; Gramacy
and Lee 2008) that is estimated along with other model param-
eters. However, adding a nugget to the model introduces addi-
tional smoothing in the predictor and as a result the predictor is
no longer an interpolator.
Here, we first propose a lower bound on the nugget (δ
lb
) that
minimizes the additional over-smoothing. Second, an iterative
approach is developed to enable the construction of a new pre-
dictor that further improves the interpolation as well as the pre-
diction (at unsampled design points) accuracy. We also show
that the proposed predictor converges to an interpolator. Al-
though an arbitrary nugget (0 <1) can be used in the itera-
tive approach, the rate of convergence (i.e., the number of iter-
ations required to reach certain tolerance) depends on the mag-
nitude of the nugget. To this effect, the proposed lower bound
δ
lb
significantly reduces the number of iterations required. This
feature is particularly desirable for implementation.
The article is organized as follows. Section 2 presents the
tidal power modeling example. In Section 3,wereviewtheGP
model, a computational issue in fitting the model, and the pop-
ular approach to overcome near-singularity. Section 4 presents
the new lower bound for the nugget that is required to achieve
well-conditioned correlation matrices and minimize unneces-
sary over-smoothing. In Section 5, we develop the iterative ap-
proach for constructing a more accurate predictor. Several ex-
amples are presented in Section 6 to illustrate the performance
of our proposed predictor over the one obtained using the popu-
lar approach. Finally, we conclude the paper with some remarks
on the numerical issues and recommendations for practitioners
in Section 7.
2. MOTIVATING EXAMPLE
The Bay of Fundy, located between New Brunswick and
Nova Scotia, Canada, with a small portion touching Maine,
U.S.A., is world famous for its high tides. In the upper portion
of the Bay of Fundy [see Figure 1(a)], the difference in water
level between high tide and low tide can be as much as 17 me-
ters. The high tides in this region are a result of a resonance,
with the natural period of the Bay of Fundy very close to the
period of the principal lunar tide. This results in very regular
tides in the Bay of Fundy with a high tide every 12.42 hours.
The incredible energy in these tides has meant that the region
has significant potential for extracting tidal power [Greenberg
1979; Karsten et al. 2008 (hereafter KMLH)].
Though the notion of harnessing tidal power from the Bay of
Fundy is not new, earlier proposed methods of harvesting the
much needed green electrical energy involved building a bar-
rage or dam. This method was considered infeasible for a va-
riety of economic and environmental reasons. Recently, there
has been rapid technological development of in-steam tidal tur-
bines. These devices act much like wind turbines, with individ-
ual turbines placed in regions of strong tidal currents. Individual
turbines can be up to 20 m in diameter and can produce over
1 MW of power. Ideally, these turbines would produce a pre-
dictable and renewable source of power with less of an impact
on the environment than a dam. KMLH examined the power
potential of farms of such turbines across the Minas Passage
[Figure 1(b)] where the tidal currents are strongest. They found
that the potential extractable power is much higher than previ-
ous estimates and that the environmental impacts of extracting
power can be greatly reduced by extracting only a portion of
the maximum power available. The simulations in KMLH did
not represent individual turbines and left open the question of
how to optimally place turbines. In this article, we emulate the
KMLH numerical model to examine the placement of turbines
to maximize the power output.
We numerically simulate the tides as in KMLH by solv-
ing the 2D shallow water equations using the Finite-Volume
Coastal Ocean Model (FVCOM) with a triangular grid on the
(a) (b)
Figure 1. Panel (a) shows the triangular grid used in the FVCOM model for simulating tides in the upper Bay of Fundy. The small box in the
center surrounds the Minas Passage shown in (b). The shaded triangles in the center of (b) represent a possible turbine location.
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

368 PRITAM RANJAN, RONALD HAYNES, AND RICHARD KARSTEN
upper Bay of Fundy [see Figure 1(a)]. Since the grid triangles
differ in size and orientation, the ith turbine was modeled on
the set of all triangular elements whose centers lie within 250 m
of (x
i
, y
i
). A possible turbine location is shown in Figure 1(b).
The triangular grid was developed by David Greenberg and col-
leagues at the Bedford Institute of Oceanography, NS, Canada.
The details of FVCOM can be found in the article by Chen,
Beardsley, and Cowles (2006).
Using this setup, the estimate of the electric power that can be
harnessed through a turbine at a particular location (x, y) over a
tidal cycle T = 12.42 hr is obtained by the simulator in KMLH.
The average tidal power at the location (x, y) is given by
¯
P(x, y) =
1
T
T
0
P(t; x, y) dt,
where P(t; x, y) is the extractable power output at time t and
location (x, y). The process is deterministic up to the machine
precision, and the main objective is to emulate
¯
P(x, y).
It turns out that the GP model fitted to the simulator out-
put at n = 100 points (chosen using a space-filling design cri-
terion) is not an interpolator and results in an over-smoothed
emulator (see Example 4 for details). This is undesirable as
the ocean modelers are interested in an emulator that interpo-
lates their simulator. This emulator will be used to obtain es-
timates of both the maximizer of the power function (i.e., the
location where to put the turbine) and the extractable power at
this location. The manufacturing and installation cost of the ini-
tial prototype turbine is very high (roughly 20 million dollars).
Since the over-smoothed emulator can underestimate the maxi-
mum extractable power, a good approximation of the attainable
power function can be helpful in saving the cost of a few tur-
bines. Example 4 shows that the proposed approach leads to a
more accurate estimate of the maximum extractable power.
3. BACKGROUND REVIEW
3.1 Gaussian Process Model
Let the ith input and output of the computer simulator be
denoted by a d-dimensional vector, x
i
= (x
i1
,...,x
id
), and the
univariate response, y
i
= y(x
i
), respectively. The experiment de-
sign D
0
={x
1
,...,x
n
} is the set of n input trials. The outputs
of the simulation trials are held in the n-dimensional vector
Y = y(D
0
) = (y
1
, y
2
,...,y
n
)
. The simulator output, y(x
i
),is
modeled as
y(x
i
) = μ + z(x
i
); i = 1,...,n, (1)
where μ is the overall mean, and z(x
i
) is a GP with E(z(x
i
)) = 0,
Var (z(x
i
)) = σ
2
z
, and Cov(z(x
i
), z(x
j
)) = σ
2
z
R
ij
. In general,
y(D
0
) has a multivariate normal distribution, N
n
(1
n
μ, ),
where = V(D
0
|y(D
0
)) = σ
2
z
R, and 1
n
is an n × 1 vector
of all ones (see Sacks et al. 1989 and Jones, Schonlau, and
Welch 1998 for details). Although there are several choices
for the correlation function, we focus on the Gaussian corre-
lation because of its properties like smoothness (or differen-
tiability in mean squared sense) and popularity in other areas
like machine learning (radial basis kernels) and geostatistics
(kriging). For a detailed discussion on correlation functions see
the works of Stein (1999), Santner, Williams, and Notz (2003),
and Rasmussen and Williams (2006). The Gaussian correlation
function is a special case (p
k
= 2 for all k) of the power expo-
nential correlation family
R
ij
= corr(z(x
i
), z(x
j
)) =
d
k=1
exp{−θ
k
|x
ik
x
jk
|
p
k
} for all i, j,
(2)
where θ =
1
,...,θ
d
) is the vector of hyperparameters, and
p
k
(0, 2] is the smoothness parameter. As discussed in Sec-
tion 7, the results developed in this article may vary slightly
when other correlation structures are used instead of the Gaus-
sian correlation.
We use the GP model with Gaussian correlation function to
predict responses at any unsampled design point x
; however,
the theory developed here is also valid for other correlation
structures in the power exponential family (see Section 7 for
more details). Following the maximum likelihood approach, the
best linear unbiased predictor (BLUP) at x
is
ˆy(x
) μ + r
R
1
(Y 1
n
ˆμ)
=
(1 r
R
1
1
n
)
1
n
R
1
1
n
1
n
+ r
R
1
Y, (3)
with mean squared error
s
2
(x
) = σ
2
z
(1 2C
r + C
RC)
= σ
2
z
1 r
R
1
r +
(1 1
n
R
1
r)
2
1
n
R
1
1
n
, (4)
where r = (r
1
(x
),...,r
n
(x
))
, r
i
(x
) = corr(z(x
), z(x
i
)), and
C is such that ˆy(x
) = C
Y. In practice, the parameters μ, σ
2
z
,
and θ are replaced with estimates (see Sacks et al. 1989;
Santner, Williams, and Notz 2003, for details).
3.2 A Computational Issue in Model Fitting
Fitting a GP model (1)–(4) to a dataset with n observations in
d-dimensional input space requires numerous evaluations of the
log-likelihood function for several realizations of the parameter
vector
1
,...,θ
d
; μ, σ
2
z
). The closed form estimators of μ and
σ
2
z
, given by
ˆμ(θ) = (1
n
R
1
1
n
)
1
(1
n
R
1
Y) and
(5)
ˆσ
2
z
) =
(Y 1
n
ˆμ(θ))
R
1
(Y 1
n
ˆμ(θ))
n
,
are often used to obtain the profile log-likelihood
2logL
p
log(|R|)
+ n log
(Y 1
n
ˆμ(θ))
R
1
(Y 1
n
ˆμ(θ))
(6)
for estimating the hyperparameters θ =
1
,...,θ
d
), where |R|
denotes the determinant of R. Recall from (2) that the correla-
tion matrix R depends on θ and the design points.
An n × n matrix R is said to be near-singular (or, ill-
conditioned) if its condition number κ(R) =R·R
1
is
too large (see Section 4 for details on “how large is large?”),
where ·denotes a matrix norm (we will use the L
2
-norm).
Although these correlation matrices are positive definite by def-
inition, computation of |R| and R
1
can sometimes be unstable
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

COMPUTATIONALLY STABLE APPROACH TO GAUSSIAN PROCESS INTERPOLATION 369
due to ill-conditioning. This prohibits precise computation of
the likelihood and hence the parameter estimates.
Ill-conditioning of R often occurs if any pair of design points
are very close in the input space, or θ
k
s are close to zero,
that is,
d
k=1
θ
k
|x
ik
x
jk
|
p
k
0. The distances between neigh-
boring points in space-filling designs with large n (sample
size) and small d (input dimension) can be very small. Near-
singularity is more common in the sequential design setup (e.g.,
expected improvement based designs; see Jones, Schonlau, and
Welch 1998; Schonlau, Welch, and Jones 1998; Oakley 2004;
Huang et al. 2006; Ranjan, Bingham, and Michailidis 2008;
Taddy et al. 2009), where the follow-up points tend to “pile up”
near the prespecified features of interest like the global maxi-
mum, contours, quantiles, and so on.
3.3 The Popular Approach
A popular approach to overcome the ill-conditioning of R is
to introduce a nugget, 0 <1 in the model, and replace the
ill-conditioned R with a well-conditioned R
δ
= R + δI that has
a smaller condition number (see Section 4 for details) as com-
pared to that of R. Equivalently, one can introduce an indepen-
dent white-noise process in the model
y(x
i
) = μ + z(x
i
) +
i
, i = 1,...,n,
where
i
are iid N(0
2
). That is, Var(Y) = V(D
0
|y(D
0
)) =
σ
2
z
R + σ
2
I = σ
2
z
(R + δI) for δ = σ
2
2
z
. The value of the
nugget is bounded above, δ<1 , to ensure that the numerical
uncertainty is smaller than the process uncertainty. The result-
ing BLUP is given by
ˆy
δ
(x) =
(1 r
(R + δI)
1
1
n
)
1
n
(R + δI)
1
1
n
1
n
+ r
(R + δI)
1
Y, (7)
and the associated mean squared error s
2
δ
(x) is
s
2
δ
(x) = σ
2
z
(1 2C
δ
r + C
δ
RC
δ
), (8)
where C
δ
is such that ˆy
δ
(x) = C
δ
Y.
Theoretically, it is straightforward to see that the use of a pos-
itive nugget in the GP model produces a non-interpolator. Jones,
Schonlau, and Welch (1998) showed that the GP fit given by (3)
and (4) is an interpolator because for 1 j n, r
R
1
= e
j
,
where e
j
is the jth unit vector, r = (r
1
(x
j
),...,r
n
(x
j
))
and
r
i
(x
j
) = corr(z(x
i
), z(x
j
)).Ifweuseaδ (> 0) in the model (i.e.,
replace R with R
δ
), then r
R
1
δ
= e
j
and thus ˆy(x
j
) = y
j
and
ˆs
2
(x
j
) = 0. From a practitioner’s viewpoint, one could sacri-
fice exact interpolation if the interpolation accuracy of the fit
is within the desired tolerance, but it is not always achievable
(see Section 6 for illustrations).
The nugget parameter δ is often estimated along with the
other model parameters. However, one of the major concerns
in the optimization is that the likelihood (modified by replacing
R with R
δ
) computation fails if the candidate nugget δ (0, 1)
is not large enough to overcome ill-conditioning of R
δ
. To avoid
this problem in the optimization, it is common to fix an ad hoc
boundary value on the nugget parameter. The resulting maxi-
mum likelihood estimate is often close to this boundary value
and the fit is not an interpolator of the observed data (i.e., the in-
terpolation error is more than the desired tolerance). Even if the
estimated nugget is not near the boundary, the use of a nugget
in the model in this manner may introduce unnecessary over-
smoothing from a practical standpoint (Section 6 presents sev-
eral illustrations). In the next section, we propose a lower bound
on the nugget that minimizes the unnecessary over-smoothing.
4. CHOOSING THE NUGGET
Recall from Section 3.2 that an n× n matrix R is said to be ill-
conditioned or near-singular if its condition number κ(R) is too
large. Thus, we intend to find δ such that κ(R
δ
) is smaller than a
certain threshold. Our main objectives here are to compute the
condition number of R
δ
and the threshold that classifies R
δ
as
well-behaved.
Let λ
1
λ
2
···λ
n
be the eigenvalues of R. Then, in the
L
2
-norm, κ(R) = λ
n
1
(Golub and Van Loan 1996). The addi-
tion of δ along the main diagonal of R shifts all of the eigenval-
ues of R by δ. That is, the eigenvalues of R
δ
= R + δI are λ
i
+ δ,
i = 1,...,n, where λ
i
is the ith smallest eigenvalue of R. Thus,
R
δ
is well-conditioned if
log(R
δ
)) a,
λ
n
+ δ
λ
1
+ δ
e
a
,
δ
λ
n
(R) e
a
)
κ(R)(e
a
1)
= δ
lb
,
where κ(R) = λ
n
1
and e
a
is the desired threshold for κ(R
δ
).
Note that δ
lb
is a function of the design points and the hyperpa-
rameter θ .
The closed form expressions for the eigenvalues and hence
the condition number of a Gaussian correlation matrix R,in
(2), for arbitrary θ and design {x
1
,...,x
n
} is, to our knowledge,
yet unknown. If x (−∞, )
d
and x
k
N(0
2
x
),closedform
expressions of the expected eigenvalues of R are known (see
section 4.3 in the book by Rasmussen and Williams 2006). In
our case, x ∈[0 , 1]
d
, and the design points are often chosen us-
ing a space-filling criterion (e.g., Latin hypercube with prop-
erties like maximin distance, minimum correlation, OA; uni-
form designs, and so on). In such cases, one may assume, at
most, x
k
U(0, 1) for k = 1,...,d. In fact, the objectives of
building efficient emulators for computer simulators often in-
clude estimating prespecified process features of interest, and
sequential designs (e.g., expected improvement based designs)
are preferred to achieve such goals. In such designs, the follow-
up points tend to “pile up” near the feature of interest. The
distributions of such design points are not uniform and can be
nontrivial to represent in analytical expressions. Hence, it is al-
most surely infeasible to obtain closed form expressions for the
eigenvalues of such R in general. Of course, one can compute
these quantities numerically. We use Matlab’s built-in function
eig to compute the maximum eigenvalue of R and cond to cal-
culate the condition number κ(R) = λ
n
1
in the expression of
δ
lb
.
Another important component of the proposed lower bound
is the threshold for getting well-behaved nonsingular correla-
tion matrices. As one would suspect, the near-singularity of
such a correlation matrix depends on n, d, the distribution of
{x
1
,...,x
n
}∈[0, 1]
d
, and θ (0, )
d
. We now present the key
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

370 PRITAM RANJAN, RONALD HAYNES, AND RICHARD KARSTEN
Figure 2. The contours in the left panel show the proportion of correlation matrices flagged as near-singular. The contours in the right panel
display average log(R)) values. The shaded region in the left panel corresponds to log(R)) > 25. The online version of this figure is in color.
steps of the simulation algorithm used for estimating the thresh-
old under a specific design framework. The results are averaged
over the distribution of {x
1
,...,x
n
} and θ , and thus it is suffi-
cient to find the threshold of κ(R).
For several combinations of n and d, we generate 5000 corre-
lation matrices where the design points {x
1
,...,x
n
} follow the
maximin Latin hypercube sampling scheme (Stein 1987) and
θ
k
s are chosen from an exponential distribution with mean 1.
Recall from Section 3.2 that a near-singular (or ill-conditioned)
correlation matrix has a large condition number, and κ(R) is in-
versely proportional to θ. Consequently, we focused on small
values of θ in simulating R. These correlation matrices are used
to compute the proportion of matrices that are near-singular
(see the contours in the left panel of Figure 2). We used Mat-
lab’s built-in function lhsdesign to generate the design points
and chol (which computes the Cholesky factorization) to check
whether or not a matrix R was near-singular under the working
precision.
For a positive definite well-behaved matrix R, [U, p]=chol(R) produces an
upper triangular matrix U satisfying U
U = R and p is zero. If R is not positive
definite, then p is a positive integer.
We also computed the condition numbers of these 5000 cor-
relation matrices (using Matlab’s built-in function cond). The
right panel of Figure 2 presents the contours of the average of
log(R)) for different combinations of n and d.
From Figure 2, it is clear that a 25 can be used as the
threshold for log(R
δ
)) of a well-behaved correlation matrix
R
δ
. Also note that the proportion of near-singular cases, denoted
by the contours in the left panel of Figure 2, decreases rapidly
with the increment in the input dimension. This is somewhat
intuitive because the volume of the void (or unexplored region)
increases exponentially with the dimension, and a really large
space-filling design is needed to jeopardize the conditioning of
the correlation matrices in high-dimensional input space. For
other design schemes (e.g., sequential designs), one can follow
these steps to estimate the threshold for the condition number
of well-behaved correlation matrices.
The lower bound on the nugget is only a sufficient condi-
tion and not a necessary one for R
δ
to be well-conditioned.
For instance, a correlation matrix with 100 design points in
(0, 1)
2
chosen using a space-filling criterion may lead to a well-
behaved R if θ is very large. If the correlation matrix is well-
conditioned, R should be used instead of R
δ
, that is,
δ
lb
= max
λ
n
(R) e
a
)
κ(R)(e
a
1)
, 0
. (9)
That is, when R is well-behaved our approach allows δ
lb
to
be zero and hence a more accurate surrogate can be obtained
as compared to the popular approach (Section 3.3), where
a nonzero nugget is forced in the model which may lead
to undesirable over-smoothing. This could be of concern in
high-dimensional input space, because the proportion of near-
singular cases decreases with the increment in the input di-
mension. Example 3 demonstrates the performance of the pro-
posed methodology over the popular approach for an eight-
dimensional simulator.
Although the use of δ
lb
in the GP model minimizes the over-
smoothing, δ
lb
may not be small enough to achieve the desired
interpolation accuracy (see Examples 1 and 2 for illustrations),
and choosing δ<δ
lb
may lead to ill-conditioned R. This may
not be a big issue if one believes that the simulator is somewhat
noisy and/or the statistical emulator is biased due to misspec-
ification in the correlation structure or model assumptions. In
such cases, a little smoothing might be a good idea. However,
controlling the amount of smoothing is a nontrivial task and
requires more attention. On the other hand, over-smoothing is
undesirable if the experimenter believes that the computer sim-
ulator is deterministic and the statistician is confident about the
choice of the emulator (we consider the GP model with Gaus-
sian correlation structure). Under these assumptions, we now
propose a new predictor that can achieve the desired level of
interpolation accuracy.
5. NEW ITERATIVE APPROACH
In this section, we propose a predictor that is based on the
iterative use of a nugget δ (0, 1). This approach does not
TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

Citations
More filters
Journal ArticleDOI

Advances in surrogate based modeling, feasibility analysis, and optimization: A review

TL;DR: Two of the frequently used surrogates, radial basis functions, and Kriging are tested on a variety of test problems and guidelines for the choice of appropriate surrogate model are discussed.
Posted Content

Local Gaussian process approximation for large computer experiments

TL;DR: In this article, the authors derive a family of local sequential design schemes that dynamically define the support of a Gaussian process predictor based on a local subset of the data, and derive expressions for fast sequential updating of all needed quantities as the local designs are built up iteratively.
Journal ArticleDOI

laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R

TL;DR: This work discusses an implementation of local approximate Gaussian process models, in the laGP package for R, that offers a particular sparse-matrix remedy uniquely positioned to leverage modern parallel computing architectures.
Journal ArticleDOI

The effect of the nugget on Gaussian process emulators of computer models

TL;DR: A practical method for automatically imposing restrictions on the extent of the nugget effect is introduced, achieved by means of a penalty term that is added to the likelihood function, and controls the amount of unexplainable variability in the computer model.
Journal ArticleDOI

Composite Gaussian process models for emulating expensive functions

Shan Ba, +1 more
- 11 Jan 2013 - 
TL;DR: A new type of nonstationary Gaussian process model is developed for approximating computationally expensive functions that is a composite of two Gaussian processes, where the first one captures the smooth global trend and the second one models local details.
References
More filters
Journal ArticleDOI

An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization

TL;DR: To fight the premature convergence of GA, two deciding alterations made to the algorithm are emphasized: an adaptive reduction of the definition interval of each variable and the use of a scale factor in the calculation of the crossover probabilities.
Posted Content

Test Problems in Optimization

TL;DR: A selected list of test problems for unconstrained optimization, using at least a subset of functions with diverse properties to make sure whether or not the tested algorithm can solve certain type of optimization efficiently.
Journal ArticleDOI

On the condition number of covariance matrices in kriging, estimation, and simulation of random fields

TL;DR: In this article, the numerical stability of linear systems arising in kriging, estimation, and simulation of random fields, is studied analytically and numerically in the state-space formulation of Kriging.
Journal ArticleDOI

Factorial hypercube designs for spatial correlation regression

TL;DR: In this article, the quality of fit generated by random designs, Latin hypercube designs and factorial designs is studied for a particular response surface that arises in inkjet printhead design.
Related Papers (5)
Frequently Asked Questions (21)
Q1. What have the authors contributed in "A computationally stable approach to gaussian process interpolation of deterministic computer simulation data" ?

The popular approach to overcome nearsingularity is to introduce a small nugget ( or jitter ) parameter in the model that is estimated along with other model parameters. In this article, the authors propose a lower bound on the nugget that minimizes the over-smoothing and an iterative regularization approach to construct a predictor that further improves the interpolation accuracy. The authors also show that the proposed predictor converges to the GP interpolator. 

Fitting a GP model to n data points using either a maximum likelihood technique or a Bayesian approach requires the computation of the determinant and inverse of several n × n correlation matrices, R. Although the correlation matrices are positive definite by definition, near-singularity (also referred to as ill-conditioning) of these matrices is a common problem in fitting GP models. 

The deterministic computer simulator for the tidal power model is a numerical solver of a complex system of partial differential equations, and the authors accept the simulator as a valid representation of the tidal power. 

Since the over-smoothed emulator can underestimate the maximum extractable power, a good approximation of the attainable power function can be helpful in saving the cost of a few turbines. 

the authors used the squared exponential correlation (pk = p = 2 for all k) in the GP model because of its popularity and good theoretical properties. 

Although the numerical stability in computing R−1δ,M does not change with M, computation of |R−1δ,M| can become less numerically stable with increasing M. 

The popular approach to overcome nearsingularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with other model parameters. 

For several combinations of n and d, the authors generate 5000 correlation matrices where the design points {x1, . . . , xn} follow the maximin Latin hypercube sampling scheme (Stein 1987) and θk’s are chosen from an exponential distribution with mean 1. 

Schonlau, and Welch (1998) used the singular value decomposition to overcome the near-singularity of R. Booker (2000) used the sum of independent GPs to overcome near-singularity for multistage adaptive designs in kriging models. 

In conclusion, when fitting a GP model to a dataset obtained from a deterministic computer model with nearly singular correlation matrices, the authors recommend using δlb—the lower bound on the nugget, along with the iterative approach with the number of iterations, M, chosen according to the desired interpolation accuracy. 

A realistic model of 20 m sided triangular grid and with 10 vertical layers to model 3D flow would increase the computational expense by a factor of 5120, making each individual simulator run roughly 10 times more costly than the generation of the entire dataset examined here. 

For instance, a correlation matrix with 100 design points in (0,1)2 chosen using a space-filling criterion may lead to a wellbehaved R if θ is very large. 

In an attempt to find an interpolator of the simulator (up to certain accuracy), their objective is to find t∗ = f (Rδ,w) that is a better approximation of t = R−1w as compared to t̃ = R−1δ w, suggested by the popular approach. 

A popular approach to overcome the ill-conditioning of R is to introduce a nugget, 0 < δ < 1 in the model, and replace the ill-conditioned R with a well-conditioned Rδ = R + 

Each of the runs presented here required approximately one hour to run on four processors in parallel on the Atlantic Computational Excellence network (ACEnet) mahone cluster. 

This is expected because getting near-singular correlation matrices becomes less likely as the dimensionality of the input space increases. 

The closed form expressions for the eigenvalues and hence the condition number of a Gaussian correlation matrix R, in (2), for arbitrary θ and design {x1, . . . , xn} is, to their knowledge, yet unknown. 

Section 4 presents the new lower bound for the nugget that is required to achieve well-conditioned correlation matrices and minimize unnecessary over-smoothing. 

The number of iterations (M) in ŷδlb,M(x) and ŝ 2 δlb,M (x) depends on the desired interpolation accuracy, and one can build stopping rules for attaining the prespecified accuracy in (14). 

Though the notion of harnessing tidal power from the Bay of Fundy is not new, earlier proposed methods of harvesting the much needed green electrical energy involved building a barrage or dam. 

Also note that the proportion of near-singular cases, denoted by the contours in the left panel of Figure 2, decreases rapidly with the increment in the input dimension.