THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

The Design and Analysis of Experiments

Big Data applications are typically associated with systems involving large numbers of users, massive complex software systems, and large-scale heterogeneous computing and storage architectures. The construction of such systems involves many distributed design choices. The end products (e.g., recommendation systems, medical analysis tools, real-time game engines, speech recognizers) thus involve many tunable configuration parameters. These parameters are often specified and hard-coded into the software by various developers or teams. If optimized jointly, these parameters can result in significant improvements. Bayesian optimization is a powerful tool for the joint optimization of design choices that is gaining great popularity in recent years. It promises greater automation so as to increase both product quality and human productivity. This review paper introduces Bayesian optimization, highlights some of its methodological aspects, and showcases a wide range of applications.

/pdf/taking-the-human-out-of-the-loop-a-review-of-bayesian-4olcznoudw.pdf

Taking the Human Out of the Loop: A Review of Bayesian Optimization

AK-MCS: An active learning reliability method combining Kriging and Monte Carlo Simulation

Many engineering applications are characterized by implicit response functions that are expensive to evaluate and sometimes nonlinear in their behavior, making reliability analysis difficult. This paper develops an efficient reliability analysis method that accurately characterizes the limit state throughout the random variable space. The method begins with a Gaussian process model built from a very small number of samples, and then adaptively chooses where to generate subsequent samples to ensure that the model is accurate in the vicinity of the limit state. The resulting Gaussian process model is then sampled using multimodal adaptive importance sampling to calculate the probability of exceeding (or failing to exceed) the response level of interest. By locating multiple points on or near the limit state, more complex and nonlinear limit states can be modeled, leading to more accurate probability integration. By concentrating the samples in the area where accuracy is important (i.e., in the vicinity of the limit state), only a small number of true function evaluations are required to build a quality surrogate model. The resulting method is both accurate for any arbitrarily shaped limit state and computationally efficient even for expensive response functions. This new method is applied to a collection of example problems including one that analyzes the reliability of a microelectromechanical system device that current available methods have difficulty solving either accurately or efficiently.

Efficient Global Reliability Analysis for Nonlinear Implicit Performance Functions

In many situations across computational science and engineering, multiple computational models are available that describe a system of interest. These different models have varying evaluation costs...

https://epubs.siam.org/doi/pdf/10.1137/16M1082469

Survey of Multifidelity Methods in Uncertainty Propagation, Inference, and Optimization

Computer simulation often is used to study complex physical and engineering processes. Although a computer simulator often can be viewed as an inexpensive way to gain insight into a system, it still can be computationally costly. Much of the recent work on the design and analysis of computer experiments has focused on scenarios where the goal is to fit a response surface or process optimization. In this article we develop a sequential methodology for estimating a contour from a complex computer code. The approach uses a stochastic process model as a surrogate for the computer simulator. The surrogate model and associated uncertainty are key components in a new criterion used to identify the computer trials aimed specifically at improving the contour estimate. The proposed approach is applied to exploration of a contour for a network queuing system. Issues related to practical implementation of the proposed approach also are addressed.

https://acadiau.ca/~pranjan/research/Ranjan_Bingham_Michailidis.pdf

Sequential Experiment Design for Contour Estimation From Complex Computer Codes

Gaussian process (GP) models are commonly used statistical metamodels for emulating expensive computer simulators. Fitting a GP model can be numerically unstable if any pair of design points in the input space are close together. Ranjan, Haynes, and Karsten (2011) proposed a computationally stable approach for fitting GP models to deterministic computer simulators. They used a genetic algorithm based approach that is robust but computationally intensive for maximizing the likelihood. This paper implements a slightly modified version ofthe model proposed by Ranjan et al. (2011 ) in the R package GPfit. A novel parameterization of the spatial correlation function and a clustering based multi-start gradient based optimization algorithm yield robust optimization that is typically faster than the genetic algorithm based approach. We present two examples with R codes to illustrate the usage of the main functions in GPfit . Several test functions are used for performance comparison with the popular R package mlegp . We also use GPfit for a real application, i.e., for emulating the tidal kinetic energy model for the Bay of Fundy, Nova Scotia, Canada. GPfit is free software and distributed under the General Public License and available from the Comprehensive R Archive Network.

/pdf/gpfit-an-r-package-for-fitting-a-gaussian-process-model-to-230l2fbdwg.pdf

GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs

For many expensive deterministic computer simulators, the outputs do not have replication error and the desired metamodel (or statistical emulator) is an interpolator of the observed data. Realizations of Gaussian spatial processes (GP) are commonly used to model such simulator outputs. Fitting a GP model to n data points requires the computation of the inverse and determinant of n×n correlation matrices, R, that are sometimes computationally unstable due to near-singularity of R. This happens if any pair of design points are very close together in the input space. The popular approach to overcome near-singularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with other model parameters. The inclusion of a nugget in the model often causes unnecessary over-smoothing of the data. In this article, we propose a lower bound on the nugget that minimizes the over-smoothing and an iterative regularization approach to construct a predictor that further improves the inter...

/pdf/a-computationally-stable-approach-to-gaussian-process-36gdrzc8oq.pdf

A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data

For many expensive deterministic computer simulators, the outputs do not have replication error and the desired metamodel (or statistical emulator) is an interpolator of the observed data. Realizations of Gaussian spatial processes (GP) are commonly used to model such simulator outputs. Fitting a GP model to $n$ data points requires the computation of the inverse and determinant of $n \times n$ correlation matrices, $R$, that are sometimes computationally unstable due to near-singularity of $R$. This happens if any pair of design points are very close together in the input space. The popular approach to overcome near-singularity is to introduce a small nugget (or jitter) parameter in the model that is estimated along with other model parameters. The inclusion of a nugget in the model often causes unnecessary over-smoothing of the data. In this paper, we propose a lower bound on the nugget that minimizes the over-smoothing and an iterative regularization approach to construct a predictor that further improves the interpolation accuracy. We also show that the proposed predictor converges to the GP interpolator.

Constrained blackbox optimization is a difficult problem, with most approaches coming from the mathematical programming literature. The statistical literature is sparse, especially in addressing problems with nontrivial constraints. This situation is unfortunate because statistical methods have many attractive properties: global scope, handling noisy objectives, sensitivity analysis, and so forth. To narrow that gap, we propose a combination of response surface modeling, expected improvement, and the augmented Lagrangian numerical optimization framework. This hybrid approach allows the statistical model to think globally and the augmented Lagrangian to act locally. We focus on problems where the constraints are the primary bottleneck, requiring expensive simulation to evaluate and substantial modeling effort to map out. In that context, our hybridization presents a simple yet effective solution that allows existing objective-oriented statistical approaches, like those based on Gaussian process surrogates ...

/pdf/modeling-an-augmented-lagrangian-for-blackbox-constrained-5cwhe777zu.pdf

Pritam Ranjan

Papers

Sequential Experiment Design for Contour Estimation From Complex Computer Codes

GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs

A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data

A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data

Modeling an Augmented Lagrangian for Blackbox Constrained Optimization