Why did the authors use the squared exponential correlation in the GP model?

the authors used the squared exponential correlation (pk = p = 2 for all k) in the GP model because of its popularity and good theoretical properties.

What is the numerical stability of the computation of R1,M?

Although the numerical stability in computing R−1δ,M does not change with M, computation of |R−1δ,M| can become less numerically stable with increasing M.

How many correlation matrices are used for a given design?

For several combinations of n and d, the authors generate 5000 correlation matrices where the design points {x1, . . . , xn} follow the maximin Latin hypercube sampling scheme (Stein 1987) and θk’s are chosen from an exponential distribution with mean 1.

What is the way to fit a GP model to a dataset?

In conclusion, when fitting a GP model to a dataset obtained from a deterministic computer model with nearly singular correlation matrices, the authors recommend using δlb—the lower bound on the nugget, along with the iterative approach with the number of iterations, M, chosen according to the desired interpolation accuracy.

How many simulations would be required to generate a 3D flow?

A realistic model of 20 m sided triangular grid and with 10 vertical layers to model 3D flow would increase the computational expense by a factor of 5120, making each individual simulator run roughly 10 times more costly than the generation of the entire dataset examined here.

What is the condition number of a well-behaved correlation matrix?

For instance, a correlation matrix with 100 design points in (0,1)2 chosen using a space-filling criterion may lead to a wellbehaved R if θ is very large.

What is the way to find an interpolator of the simulator?

In an attempt to find an interpolator of the simulator (up to certain accuracy), their objective is to find t∗ = f (Rδ,w) that is a better approximation of t = R−1w as compared to t̃ = R−1δ w, suggested by the popular approach.

How many runs did the authors need to run on the ACEnet mahone?

Each of the runs presented here required approximately one hour to run on four processors in parallel on the Atlantic Computational Excellence network (ACEnet) mahone cluster.

Why is the GP model less likely to be a single correlation matrix?

This is expected because getting near-singular correlation matrices becomes less likely as the dimensionality of the input space increases.

What is the number of iterations in lb,M(x)?

The number of iterations (M) in ŷδlb,M(x) and ŝ 2 δlb,M (x) depends on the desired interpolation accuracy, and one can build stopping rules for attaining the prespecified accuracy in (14).

(Open Access) A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data (2011) | Pritam Ranjan

Q: What have the authors contributed in "A computationally stable approach to gaussian process interpolation of deterministic computer simulation data" ?

The popular approach to overcome nearsingularity is to introduce a small nugget ( or jitter ) parameter in the model that is estimated along with other model parameters. In this article, the authors propose a lower bound on the nugget that minimizes the over-smoothing and an iterative regularization approach to construct a predictor that further improves the interpolation accuracy. The authors also show that the proposed predictor converges to the GP interpolator.

Q: What is the deterministic computer simulator for the tidal power model?

The deterministic computer simulator for the tidal power model is a numerical solver of a complex system of partial differential equations, and the authors accept the simulator as a valid representation of the tidal power.

Q: What is the way to estimate the maximum extractable power?

Since the over-smoothed emulator can underestimate the maximum extractable power, a good approximation of the attainable power function can be helpful in saving the cost of a few turbines.

Q: What is the common approach to overcome nearsingularity?

The popular approach to overcome nearsingularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with other model parameters.

A Computationally Stable Approach to Gaussian

Process Interpolation of Deterministic

Computer Simulation Data

Pritam RANJAN

Department of Mathematics and Statistics

Acadia University

Wolfville, NS, B4P 2R6, Canada

(

pritam.ranjan@acadiau.ca

)

Ronald H

AY N E S

Department of Mathematics and Statistics

Memorial University

St. John’s, NL, A1C 5S7, Canada

(

rhaynes@mun.ca

)

Richard K

ARSTEN

Department of Mathematics and Statistics

Acadia University

Wolfville, NS, B4P 2R6, Canada

(

richard.karsten@acadiau.ca

)

For many expensive deterministic computer simulators, the outputs do not have replication error and

the desired metamodel (or statistical emulator) is an interpolator of the observed data. Realizations of

Gaussian spatial processes (GP) are commonly used to model such simulator outputs. Fitting a GP model

to n data points requires the computation of the inverse and determinant of n × n correlation matrices,

R, that are sometimes computationally unstable due to near-singularity of R. This happens if any pair

of design points are very close together in the input space. The popular approach to overcome near-

singularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with

other model parameters. The inclusion of a nugget in the model often causes unnecessary over-smoothing

of the data. In this article, we propose a lower bound on the nugget that minimizes the over-smoothing

and an iterative regularization approach to construct a predictor that further improves the interpolation

accuracy. We also show that the proposed predictor converges to the GP interpolator.

KEY WORDS: Computer experiment; Matrix inverse approximation; Regularization.

1. INTRODUCTION

Computer simulators are often used to model complex phys-

ical and engineering processes that are either too expensive or

time consuming to observe. A simulator is said to be determin-

istic if the replicate runs of the same inputs will yield iden-

tical responses. For the last few decades, deterministic sim-

ulators have been widely used to model physical processes.

For instance, Kumar and Davidson (1978) used deterministic

simulation models for comparing the performance of highly

concurrent computers; Su et al. (1996) used generalized linear

regression models to design a lamp ﬁlament via a determinis-

tic ﬁnite-element computer code; Aslett et al. (1998) discussed

an optimization problem for a deterministic circuit simulator;

several deterministic simulators are being used for analyzing

biochemical networks (see Bergmann and Sauro 2008 for ref-

erences). On the other hand, there are cases where stochastic

(nondeterministic) simulators are preferred due to unavoidable

biases (e.g., Poole and Raftery 2000). In spite of the recent in-

terest in stochastic simulators, deterministic simulators are still

being actively used. For instance, Medina, Moreno, and Royo

(2005) demonstrated the preference of deterministic trafﬁc sim-

ulators over their stochastic counterparts. In this article, we as-

sume that the simulator under consideration is deterministic up

to working precision and the scientist is conﬁdent about the va-

lidity of the simulator.

Sacks et al. (1989) proposed modeling (or emulating) such

an expensive deterministic simulator as a realization of a Gaus-

sian stochastic process (GP). An emulator of a deterministic

simulator is desired to be an interpolator of the observed data

(e.g., Sacks et al. 1989; Van Beers and Kleijnen 2004). For the

problem that motivated this work, the objective is to emulate

the average extractable tidal power as a function of the turbine

locations in the Bay of Fundy, Nova Scotia, Canada. The deter-

ministic computer simulator for the tidal power model is a nu-

merical solver of a complex system of partial differential equa-

tions, and we accept the simulator as a valid representation of

the tidal power.

In this article, we discuss a computational issue in building

the GP based emulator for a deterministic simulator. Fitting

a GP model to n data points using either a maximum likeli-

hood technique or a Bayesian approach requires the computa-

tion of the determinant and inverse of several n × n correla-

tion matrices, R. Although the correlation matrices are posi-

tive deﬁnite by deﬁnition, near-singularity (also referred to as

ill-conditioning) of these matrices is a common problem in ﬁt-

ting GP models. Ababou, Bagtzoglou, and Wood (1994) stud-

ied the relationship of a uniform grid to the ill-conditioning and

the American Society for Quality

TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

DOI 10.1198/TECH.2011.09141

366

COMPUTATIONALLY STABLE APPROACH TO GAUSSIAN PROCESS INTERPOLATION 367

quality of model ﬁt for various covariance models. Barton and

Salagame (1997) studied the effect of experimental design on

the ill-conditioning of kriging models. Jones, Schonlau, and

Welch (1998) used the singular value decomposition to over-

come the near-singularity of R. Booker (2000) used the sum of

independent GPs to overcome near-singularity for multistage

adaptive designs in kriging models. A more popular solution

to overcome near-singularity is to introduce a nugget or jitter

parameter, δ, in the model (e.g., Sacks et al. 1989; Neal 1997;

Booker et al. 1999; Santner, Williams, and Notz 2003; Gramacy

and Lee 2008) that is estimated along with other model param-

eters. However, adding a nugget to the model introduces addi-

tional smoothing in the predictor and as a result the predictor is

no longer an interpolator.

Here, we ﬁrst propose a lower bound on the nugget (δ

) that

minimizes the additional over-smoothing. Second, an iterative

approach is developed to enable the construction of a new pre-

dictor that further improves the interpolation as well as the pre-

diction (at unsampled design points) accuracy. We also show

that the proposed predictor converges to an interpolator. Al-

though an arbitrary nugget (0 <δ<1) can be used in the itera-

tive approach, the rate of convergence (i.e., the number of iter-

ations required to reach certain tolerance) depends on the mag-

nitude of the nugget. To this effect, the proposed lower bound

signiﬁcantly reduces the number of iterations required. This

feature is particularly desirable for implementation.

The article is organized as follows. Section 2 presents the

tidal power modeling example. In Section 3,wereviewtheGP

model, a computational issue in ﬁtting the model, and the pop-

ular approach to overcome near-singularity. Section 4 presents

the new lower bound for the nugget that is required to achieve

well-conditioned correlation matrices and minimize unneces-

sary over-smoothing. In Section 5, we develop the iterative ap-

proach for constructing a more accurate predictor. Several ex-

amples are presented in Section 6 to illustrate the performance

of our proposed predictor over the one obtained using the popu-

lar approach. Finally, we conclude the paper with some remarks

on the numerical issues and recommendations for practitioners

in Section 7.

2. MOTIVATING EXAMPLE

The Bay of Fundy, located between New Brunswick and

Nova Scotia, Canada, with a small portion touching Maine,

U.S.A., is world famous for its high tides. In the upper portion

of the Bay of Fundy [see Figure 1(a)], the difference in water

level between high tide and low tide can be as much as 17 me-

ters. The high tides in this region are a result of a resonance,

with the natural period of the Bay of Fundy very close to the

period of the principal lunar tide. This results in very regular

tides in the Bay of Fundy with a high tide every 12.42 hours.

The incredible energy in these tides has meant that the region

has signiﬁcant potential for extracting tidal power [Greenberg

1979; Karsten et al. 2008 (hereafter KMLH)].

Though the notion of harnessing tidal power from the Bay of

Fundy is not new, earlier proposed methods of harvesting the

much needed green electrical energy involved building a bar-

rage or dam. This method was considered infeasible for a va-

riety of economic and environmental reasons. Recently, there

has been rapid technological development of in-steam tidal tur-

bines. These devices act much like wind turbines, with individ-

ual turbines placed in regions of strong tidal currents. Individual

turbines can be up to 20 m in diameter and can produce over

1 MW of power. Ideally, these turbines would produce a pre-

dictable and renewable source of power with less of an impact

on the environment than a dam. KMLH examined the power

potential of farms of such turbines across the Minas Passage

[Figure 1(b)] where the tidal currents are strongest. They found

that the potential extractable power is much higher than previ-

ous estimates and that the environmental impacts of extracting

power can be greatly reduced by extracting only a portion of

the maximum power available. The simulations in KMLH did

not represent individual turbines and left open the question of

how to optimally place turbines. In this article, we emulate the

KMLH numerical model to examine the placement of turbines

to maximize the power output.

We numerically simulate the tides as in KMLH by solv-

ing the 2D shallow water equations using the Finite-Volume

Coastal Ocean Model (FVCOM) with a triangular grid on the

(a) (b)

Figure 1. Panel (a) shows the triangular grid used in the FVCOM model for simulating tides in the upper Bay of Fundy. The small box in the

center surrounds the Minas Passage shown in (b). The shaded triangles in the center of (b) represent a possible turbine location.

TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

368 PRITAM RANJAN, RONALD HAYNES, AND RICHARD KARSTEN

upper Bay of Fundy [see Figure 1(a)]. Since the grid triangles

differ in size and orientation, the ith turbine was modeled on

the set of all triangular elements whose centers lie within 250 m

of (x

, y

). A possible turbine location is shown in Figure 1(b).

The triangular grid was developed by David Greenberg and col-

leagues at the Bedford Institute of Oceanography, NS, Canada.

The details of FVCOM can be found in the article by Chen,

Beardsley, and Cowles (2006).

Using this setup, the estimate of the electric power that can be

harnessed through a turbine at a particular location (x, y) over a

tidal cycle T = 12.42 hr is obtained by the simulator in KMLH.

The average tidal power at the location (x, y) is given by

P(x, y) =



P(t; x, y) dt,

where P(t; x, y) is the extractable power output at time t and

location (x, y). The process is deterministic up to the machine

precision, and the main objective is to emulate

P(x, y).

It turns out that the GP model ﬁtted to the simulator out-

put at n = 100 points (chosen using a space-ﬁlling design cri-

terion) is not an interpolator and results in an over-smoothed

emulator (see Example 4 for details). This is undesirable as

the ocean modelers are interested in an emulator that interpo-

lates their simulator. This emulator will be used to obtain es-

timates of both the maximizer of the power function (i.e., the

location where to put the turbine) and the extractable power at

this location. The manufacturing and installation cost of the ini-

tial prototype turbine is very high (roughly 20 million dollars).

Since the over-smoothed emulator can underestimate the maxi-

mum extractable power, a good approximation of the attainable

power function can be helpful in saving the cost of a few tur-

bines. Example 4 shows that the proposed approach leads to a

more accurate estimate of the maximum extractable power.

3. BACKGROUND REVIEW

3.1 Gaussian Process Model

Let the ith input and output of the computer simulator be

denoted by a d-dimensional vector, x

= (x

,...,x

), and the

univariate response, y

= y(x

), respectively. The experiment de-

sign D

={x

,...,x

} is the set of n input trials. The outputs

of the simulation trials are held in the n-dimensional vector

Y = y(D

) = (y

, y

,...,y

)



. The simulator output, y(x

),is

modeled as

y(x

) = μ + z(x

); i = 1,...,n, (1)

where μ is the overall mean, and z(x

) is a GP with E(z(x

)) = 0,

Var (z(x

)) = σ

, and Cov(z(x

), z(x

)) = σ

. In general,

y(D

) has a multivariate normal distribution, N

μ, ),

where  = V(D

|y(D

)) = σ

R, and 1

is an n × 1 vector

of all ones (see Sacks et al. 1989 and Jones, Schonlau, and

Welch 1998 for details). Although there are several choices

for the correlation function, we focus on the Gaussian corre-

lation because of its properties like smoothness (or differen-

tiability in mean squared sense) and popularity in other areas

like machine learning (radial basis kernels) and geostatistics

(kriging). For a detailed discussion on correlation functions see

the works of Stein (1999), Santner, Williams, and Notz (2003),

and Rasmussen and Williams (2006). The Gaussian correlation

function is a special case (p

= 2 for all k) of the power expo-

nential correlation family

= corr(z(x

), z(x

)) =



k=1

exp{−θ

− x

} for all i, j,

(2)

where θ = (θ

,...,θ

) is the vector of hyperparameters, and

∈ (0, 2] is the smoothness parameter. As discussed in Sec-

tion 7, the results developed in this article may vary slightly

when other correlation structures are used instead of the Gaus-

sian correlation.

We use the GP model with Gaussian correlation function to

predict responses at any unsampled design point x

∗

; however,

the theory developed here is also valid for other correlation

structures in the power exponential family (see Section 7 for

more details). Following the maximum likelihood approach, the

best linear unbiased predictor (BLUP) at x

∗

ˆy(x

∗

) =ˆμ + r



−1

(Y − 1

ˆμ)



(1 − r



−1

)



−1



+ r





−1

Y, (3)

with mean squared error

∗

) = σ

(1 − 2C



r + C



RC)

= σ



1 − r



−1

r +

(1 − 1



−1



−1



, (4)

where r = (r

∗

),...,r

∗

))



, r

∗

) = corr(z(x

∗

), z(x

)), and

C is such that ˆy(x

∗

) = C



Y. In practice, the parameters μ, σ

and θ are replaced with estimates (see Sacks et al. 1989;

Santner, Williams, and Notz 2003, for details).

3.2 A Computational Issue in Model Fitting

Fitting a GP model (1)–(4) to a dataset with n observations in

d-dimensional input space requires numerous evaluations of the

log-likelihood function for several realizations of the parameter

vector (θ

,...,θ

; μ, σ

). The closed form estimators of μ and

, given by

ˆμ(θ) = (1



−1

)

−1



−1

Y) and

(5)

ˆσ

(θ) =

(Y − 1

ˆμ(θ))



−1

(Y − 1

ˆμ(θ))

are often used to obtain the proﬁle log-likelihood

−2logL

∝ log(|R|)

+ n log



(Y − 1

ˆμ(θ))



−1

(Y − 1

ˆμ(θ))



(6)

for estimating the hyperparameters θ = (θ

,...,θ

), where |R|

denotes the determinant of R. Recall from (2) that the correla-

tion matrix R depends on θ and the design points.

An n × n matrix R is said to be near-singular (or, ill-

conditioned) if its condition number κ(R) =R·R

−1

 is

too large (see Section 4 for details on “how large is large?”),

where ·denotes a matrix norm (we will use the L

-norm).

Although these correlation matrices are positive deﬁnite by def-

inition, computation of |R| and R

−1

can sometimes be unstable

TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

COMPUTATIONALLY STABLE APPROACH TO GAUSSIAN PROCESS INTERPOLATION 369

due to ill-conditioning. This prohibits precise computation of

the likelihood and hence the parameter estimates.

Ill-conditioning of R often occurs if any pair of design points

are very close in the input space, or θ

’s are close to zero,

that is,



k=1

− x

≈ 0. The distances between neigh-

boring points in space-ﬁlling designs with large n (sample

size) and small d (input dimension) can be very small. Near-

singularity is more common in the sequential design setup (e.g.,

expected improvement based designs; see Jones, Schonlau, and

Welch 1998; Schonlau, Welch, and Jones 1998; Oakley 2004;

Huang et al. 2006; Ranjan, Bingham, and Michailidis 2008;

Taddy et al. 2009), where the follow-up points tend to “pile up”

near the prespeciﬁed features of interest like the global maxi-

mum, contours, quantiles, and so on.

3.3 The Popular Approach

A popular approach to overcome the ill-conditioning of R is

to introduce a nugget, 0 <δ<1 in the model, and replace the

ill-conditioned R with a well-conditioned R

= R + δI that has

a smaller condition number (see Section 4 for details) as com-

pared to that of R. Equivalently, one can introduce an indepen-

dent white-noise process in the model

y(x

) = μ + z(x

) + 

, i = 1,...,n,

where 

are iid N(0,σ



). That is, Var(Y) = V(D

|y(D

)) =

R + σ



I = σ

(R + δI) for δ = σ



/σ

. The value of the

nugget is bounded above, δ<1 , to ensure that the numerical

uncertainty is smaller than the process uncertainty. The result-

ing BLUP is given by

ˆy

(x) =



(1 − r



(R + δI)

−1

)



(R + δI)

−1



+ r





(R + δI)

−1

Y, (7)

and the associated mean squared error s

(x) is

(x) = σ

(1 − 2C



r + C



), (8)

where C

is such that ˆy

(x) = C



Theoretically, it is straightforward to see that the use of a pos-

itive nugget in the GP model produces a non-interpolator. Jones,

Schonlau, and Welch (1998) showed that the GP ﬁt given by (3)

and (4) is an interpolator because for 1 ≤ j ≤ n, r



−1

= e



where e

is the jth unit vector, r = (r

),...,r

))



and

) = corr(z(x

), z(x

)).Ifweuseaδ (> 0) in the model (i.e.,

replace R with R

), then r



−1

= e



and thus ˆy(x

) = y

and

ˆs

) = 0. From a practitioner’s viewpoint, one could sacri-

ﬁce exact interpolation if the interpolation accuracy of the ﬁt

is within the desired tolerance, but it is not always achievable

(see Section 6 for illustrations).

The nugget parameter δ is often estimated along with the

other model parameters. However, one of the major concerns

in the optimization is that the likelihood (modiﬁed by replacing

R with R

) computation fails if the candidate nugget δ ∈ (0, 1)

is not large enough to overcome ill-conditioning of R

. To avoid

this problem in the optimization, it is common to ﬁx an ad hoc

boundary value on the nugget parameter. The resulting maxi-

mum likelihood estimate is often close to this boundary value

and the ﬁt is not an interpolator of the observed data (i.e., the in-

terpolation error is more than the desired tolerance). Even if the

estimated nugget is not near the boundary, the use of a nugget

in the model in this manner may introduce unnecessary over-

smoothing from a practical standpoint (Section 6 presents sev-

eral illustrations). In the next section, we propose a lower bound

on the nugget that minimizes the unnecessary over-smoothing.

4. CHOOSING THE NUGGET

Recall from Section 3.2 that an n× n matrix R is said to be ill-

conditioned or near-singular if its condition number κ(R) is too

large. Thus, we intend to ﬁnd δ such that κ(R

) is smaller than a

certain threshold. Our main objectives here are to compute the

condition number of R

and the threshold that classiﬁes R

well-behaved.

Let λ

≤ λ

≤···≤λ

be the eigenvalues of R. Then, in the

-norm, κ(R) = λ

/λ

(Golub and Van Loan 1996). The addi-

tion of δ along the main diagonal of R shifts all of the eigenval-

ues of R by δ. That is, the eigenvalues of R

= R + δI are λ

+ δ,

i = 1,...,n, where λ

is the ith smallest eigenvalue of R. Thus,

is well-conditioned if

log(κ(R

))  a,

+ δ

 e

δ 

(κ(R) − e

)

κ(R)(e

− 1)

= δ

where κ(R) = λ

/λ

and e

is the desired threshold for κ(R

Note that δ

is a function of the design points and the hyperpa-

rameter θ .

The closed form expressions for the eigenvalues and hence

the condition number of a Gaussian correlation matrix R,in

(2), for arbitrary θ and design {x

,...,x

} is, to our knowledge,

yet unknown. If x ∈ (−∞, ∞)

and x

∼ N(0,σ

),closedform

expressions of the expected eigenvalues of R are known (see

section 4.3 in the book by Rasmussen and Williams 2006). In

our case, x ∈[0 , 1]

, and the design points are often chosen us-

ing a space-ﬁlling criterion (e.g., Latin hypercube with prop-

erties like maximin distance, minimum correlation, OA; uni-

form designs, and so on). In such cases, one may assume, at

most, x

∼ U(0, 1) for k = 1,...,d. In fact, the objectives of

building efﬁcient emulators for computer simulators often in-

clude estimating prespeciﬁed process features of interest, and

sequential designs (e.g., expected improvement based designs)

are preferred to achieve such goals. In such designs, the follow-

up points tend to “pile up” near the feature of interest. The

distributions of such design points are not uniform and can be

nontrivial to represent in analytical expressions. Hence, it is al-

most surely infeasible to obtain closed form expressions for the

eigenvalues of such R in general. Of course, one can compute

these quantities numerically. We use Matlab’s built-in function

eig to compute the maximum eigenvalue of R and cond to cal-

culate the condition number κ(R) = λ

/λ

in the expression of

Another important component of the proposed lower bound

is the threshold for getting well-behaved nonsingular correla-

tion matrices. As one would suspect, the near-singularity of

such a correlation matrix depends on n, d, the distribution of

,...,x

}∈[0, 1]

, and θ ∈ (0, ∞)

. We now present the key

TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

370 PRITAM RANJAN, RONALD HAYNES, AND RICHARD KARSTEN

Figure 2. The contours in the left panel show the proportion of correlation matrices ﬂagged as near-singular. The contours in the right panel

display average log(κ(R)) values. The shaded region in the left panel corresponds to log(κ(R)) > 25. The online version of this ﬁgure is in color.

steps of the simulation algorithm used for estimating the thresh-

old under a speciﬁc design framework. The results are averaged

over the distribution of {x

,...,x

} and θ , and thus it is sufﬁ-

cient to ﬁnd the threshold of κ(R).

For several combinations of n and d, we generate 5000 corre-

lation matrices where the design points {x

,...,x

} follow the

maximin Latin hypercube sampling scheme (Stein 1987) and

’s are chosen from an exponential distribution with mean 1.

Recall from Section 3.2 that a near-singular (or ill-conditioned)

correlation matrix has a large condition number, and κ(R) is in-

versely proportional to θ. Consequently, we focused on small

values of θ in simulating R. These correlation matrices are used

to compute the proportion of matrices that are near-singular

(see the contours in the left panel of Figure 2). We used Mat-

lab’s built-in function lhsdesign to generate the design points

and chol (which computes the Cholesky factorization) to check

whether or not a matrix R was near-singular under the working

precision.

For a positive deﬁnite well-behaved matrix R, “[U, p]=chol(R)” produces an

upper triangular matrix U satisfying U



U = R and p is zero. If R is not positive

deﬁnite, then p is a positive integer.

We also computed the condition numbers of these 5000 cor-

relation matrices (using Matlab’s built-in function cond). The

right panel of Figure 2 presents the contours of the average of

log(κ(R)) for different combinations of n and d.

From Figure 2, it is clear that a ≈ 25 can be used as the

threshold for log(κ(R

)) of a well-behaved correlation matrix

. Also note that the proportion of near-singular cases, denoted

by the contours in the left panel of Figure 2, decreases rapidly

with the increment in the input dimension. This is somewhat

intuitive because the volume of the void (or unexplored region)

increases exponentially with the dimension, and a really large

space-ﬁlling design is needed to jeopardize the conditioning of

the correlation matrices in high-dimensional input space. For

other design schemes (e.g., sequential designs), one can follow

these steps to estimate the threshold for the condition number

of well-behaved correlation matrices.

The lower bound on the nugget is only a sufﬁcient condi-

tion and not a necessary one for R

to be well-conditioned.

For instance, a correlation matrix with 100 design points in

(0, 1)

chosen using a space-ﬁlling criterion may lead to a well-

behaved R if θ is very large. If the correlation matrix is well-

conditioned, R should be used instead of R

, that is,

= max



(κ(R) − e

)

κ(R)(e

− 1)

, 0



. (9)

That is, when R is well-behaved our approach allows δ

be zero and hence a more accurate surrogate can be obtained

as compared to the popular approach (Section 3.3), where

a nonzero nugget is forced in the model which may lead

to undesirable over-smoothing. This could be of concern in

high-dimensional input space, because the proportion of near-

singular cases decreases with the increment in the input di-

mension. Example 3 demonstrates the performance of the pro-

posed methodology over the popular approach for an eight-

dimensional simulator.

Although the use of δ

in the GP model minimizes the over-

smoothing, δ

may not be small enough to achieve the desired

interpolation accuracy (see Examples 1 and 2 for illustrations),

and choosing δ<δ

may lead to ill-conditioned R. This may

not be a big issue if one believes that the simulator is somewhat

noisy and/or the statistical emulator is biased due to misspec-

iﬁcation in the correlation structure or model assumptions. In

such cases, a little smoothing might be a good idea. However,

controlling the amount of smoothing is a nontrivial task and

requires more attention. On the other hand, over-smoothing is

undesirable if the experimenter believes that the computer sim-

ulator is deterministic and the statistician is conﬁdent about the

choice of the emulator (we consider the GP model with Gaus-

sian correlation structure). Under these assumptions, we now

propose a new predictor that can achieve the desired level of

interpolation accuracy.

5. NEW ITERATIVE APPROACH

In this section, we propose a predictor that is based on the

iterative use of a nugget δ ∈ (0, 1). This approach does not

TECHNOMETRICS, NOVEMBER 2011, VOL. 53, NO. 4

A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data

Figures

Citations

Advances in surrogate based modeling, feasibility analysis, and optimization: A review

Local Gaussian process approximation for large computer experiments

laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R

The effect of the nugget on Gaussian process emulators of computer models

Composite Gaussian process models for emulating expensive functions

References

An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization

Test Problems in Optimization

On the condition number of covariance matrices in kriging, estimation, and simulation of random fields

Factorial hypercube designs for spatial correlation regression

Related Papers (5)

The design and analysis of computer experiments

Efficient Global Optimization of Expensive Black-Box Functions

Gaussian Processes for Machine Learning

Bayesian Treed Gaussian Process Models With an Application to Computer Modeling

A comparison of three methods for selecting values of input variables in the analysis of output from a computer code

Frequently Asked Questions (21)

Q1. What have the authors contributed in "A computationally stable approach to gaussian process interpolation of deterministic computer simulation data" ?

Q2. What is the problem in fitting a GP model to n data points?

Q3. What is the deterministic computer simulator for the tidal power model?

Q4. What is the way to estimate the maximum extractable power?

Q5. Why did the authors use the squared exponential correlation in the GP model?

Q6. What is the numerical stability of the computation of R1,M?

Q7. What is the common approach to overcome nearsingularity?

Q8. How many correlation matrices are used for a given design?

Q9. How did Jones, Schonlau and Welch overcome near-singularity?

Q10. What is the way to fit a GP model to a dataset?

Q11. How many simulations would be required to generate a 3D flow?

Q12. What is the condition number of a well-behaved correlation matrix?

Q13. What is the way to find an interpolator of the simulator?

Q14. What is the way to solve the problem of a well-conditioned R?

Q15. How many runs did the authors need to run on the ACEnet mahone?

Q16. Why is the GP model less likely to be a single correlation matrix?

Q17. What is the condition number of a Gaussian correlation matrix?

Q18. What is the new lower bound for the nugget?

Q19. What is the number of iterations in lb,M(x)?

Q20. What was the first method of generating the green electrical energy?

Q21. What is the proportion of near-singular cases in Figure 2?