A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data
read more
Citations
Advances in surrogate based modeling, feasibility analysis, and optimization: A review
Local Gaussian process approximation for large computer experiments
laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R
The effect of the nugget on Gaussian process emulators of computer models
Composite Gaussian process models for emulating expensive functions
References
An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization
Test Problems in Optimization
On the condition number of covariance matrices in kriging, estimation, and simulation of random fields
Factorial hypercube designs for spatial correlation regression
Related Papers (5)
The design and analysis of computer experiments
Efficient Global Optimization of Expensive Black-Box Functions
Frequently Asked Questions (21)
Q2. What is the problem in fitting a GP model to n data points?
Fitting a GP model to n data points using either a maximum likelihood technique or a Bayesian approach requires the computation of the determinant and inverse of several n × n correlation matrices, R. Although the correlation matrices are positive definite by definition, near-singularity (also referred to as ill-conditioning) of these matrices is a common problem in fitting GP models.
Q3. What is the deterministic computer simulator for the tidal power model?
The deterministic computer simulator for the tidal power model is a numerical solver of a complex system of partial differential equations, and the authors accept the simulator as a valid representation of the tidal power.
Q4. What is the way to estimate the maximum extractable power?
Since the over-smoothed emulator can underestimate the maximum extractable power, a good approximation of the attainable power function can be helpful in saving the cost of a few turbines.
Q5. Why did the authors use the squared exponential correlation in the GP model?
the authors used the squared exponential correlation (pk = p = 2 for all k) in the GP model because of its popularity and good theoretical properties.
Q6. What is the numerical stability of the computation of R1,M?
Although the numerical stability in computing R−1δ,M does not change with M, computation of |R−1δ,M| can become less numerically stable with increasing M.
Q7. What is the common approach to overcome nearsingularity?
The popular approach to overcome nearsingularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with other model parameters.
Q8. How many correlation matrices are used for a given design?
For several combinations of n and d, the authors generate 5000 correlation matrices where the design points {x1, . . . , xn} follow the maximin Latin hypercube sampling scheme (Stein 1987) and θk’s are chosen from an exponential distribution with mean 1.
Q9. How did Jones, Schonlau and Welch overcome near-singularity?
Schonlau, and Welch (1998) used the singular value decomposition to overcome the near-singularity of R. Booker (2000) used the sum of independent GPs to overcome near-singularity for multistage adaptive designs in kriging models.
Q10. What is the way to fit a GP model to a dataset?
In conclusion, when fitting a GP model to a dataset obtained from a deterministic computer model with nearly singular correlation matrices, the authors recommend using δlb—the lower bound on the nugget, along with the iterative approach with the number of iterations, M, chosen according to the desired interpolation accuracy.
Q11. How many simulations would be required to generate a 3D flow?
A realistic model of 20 m sided triangular grid and with 10 vertical layers to model 3D flow would increase the computational expense by a factor of 5120, making each individual simulator run roughly 10 times more costly than the generation of the entire dataset examined here.
Q12. What is the condition number of a well-behaved correlation matrix?
For instance, a correlation matrix with 100 design points in (0,1)2 chosen using a space-filling criterion may lead to a wellbehaved R if θ is very large.
Q13. What is the way to find an interpolator of the simulator?
In an attempt to find an interpolator of the simulator (up to certain accuracy), their objective is to find t∗ = f (Rδ,w) that is a better approximation of t = R−1w as compared to t̃ = R−1δ w, suggested by the popular approach.
Q14. What is the way to solve the problem of a well-conditioned R?
A popular approach to overcome the ill-conditioning of R is to introduce a nugget, 0 < δ < 1 in the model, and replace the ill-conditioned R with a well-conditioned Rδ = R +
Q15. How many runs did the authors need to run on the ACEnet mahone?
Each of the runs presented here required approximately one hour to run on four processors in parallel on the Atlantic Computational Excellence network (ACEnet) mahone cluster.
Q16. Why is the GP model less likely to be a single correlation matrix?
This is expected because getting near-singular correlation matrices becomes less likely as the dimensionality of the input space increases.
Q17. What is the condition number of a Gaussian correlation matrix?
The closed form expressions for the eigenvalues and hence the condition number of a Gaussian correlation matrix R, in (2), for arbitrary θ and design {x1, . . . , xn} is, to their knowledge, yet unknown.
Q18. What is the new lower bound for the nugget?
Section 4 presents the new lower bound for the nugget that is required to achieve well-conditioned correlation matrices and minimize unnecessary over-smoothing.
Q19. What is the number of iterations in lb,M(x)?
The number of iterations (M) in ŷδlb,M(x) and ŝ 2 δlb,M (x) depends on the desired interpolation accuracy, and one can build stopping rules for attaining the prespecified accuracy in (14).
Q20. What was the first method of generating the green electrical energy?
Though the notion of harnessing tidal power from the Bay of Fundy is not new, earlier proposed methods of harvesting the much needed green electrical energy involved building a barrage or dam.
Q21. What is the proportion of near-singular cases in Figure 2?
Also note that the proportion of near-singular cases, denoted by the contours in the left panel of Figure 2, decreases rapidly with the increment in the input dimension.