Showing papers in "Technometrics in 2016"

PDF

Open Access

Journal Article•DOI•

[...]

Xiaoming Huo¹, Gábor J. Székely²•Institutions (2)

Georgia Institute of Technology¹, Alfréd Rényi Institute of Mathematics²

11 Oct 2016-Technometrics

TL;DR: In this article, the authors showed that the computation of distance covariance and distance correlation of real-valued random variables can be done in O(n log n) time using a U-statistic.

...read moreread less

Abstract: Distance covariance and distance correlation have been widely adopted in measuring dependence of a pair of random variables or random vectors If the computation of distance covariance and distance correlation is implemented directly accordingly to its definition then its computational complexity is O(n2), which is a disadvantage compared to other faster methods In this article we show that the computation of distance covariance and distance correlation of real-valued random variables can be implemented by an O(nlog n) algorithm and this is comparable to other computationally efficient algorithms The new formula we derive for an unbiased estimator for squared distance covariance turns out to be a U-statistic This fact implies some nice asymptotic properties that were derived before via more complex methods We apply the fast computing algorithm to some synthetic data Our work will make distance correlation applicable to a much wider class of problems A supplementary file to this article, available on

...read moreread less

117 citations

Journal Article•DOI•

Online Updating of Statistical Inference in the Big Data Setting

[...]

Elizabeth D. Schifano¹, Jing Wu¹, Chun Wang¹, Jun Yan¹, Ming-Hui Chen¹ - Show less +1 more•Institutions (1)

University of Connecticut¹

08 Jul 2016-Technometrics

TL;DR: In this article, the authors present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data.

...read moreread less

Abstract: We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness of fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches...

...read moreread less

115 citations

Journal Article•DOI•

A Change-Point Approach for Phase-I Analysis in Multivariate Profile Monitoring and Diagnosis

[...]

Kamran Paynabar¹, Changliang Zou², Peihua Qiu³•Institutions (3)

Georgia Institute of Technology¹, Nankai University², University of Florida³

18 Apr 2016-Technometrics

TL;DR: A new modeling, monitoring, and diagnosis framework for phase-I analysis of multichannel profiles under the assumption that different profile channels have similar structure so that the proposed approach has good performance in identifying change-points in various situations compared with some existing methods.

...read moreread less

Abstract: Process monitoring and fault diagnosis using profile data remains an important and challenging problem in statistical process control (SPC). Although the analysis of profile data has been extensively studied in the SPC literature, the challenges associated with monitoring and diagnosis of multichannel (multiple) nonlinear profiles are yet to be addressed. Motivated by an application in multioperation forging processes, we propose a new modeling, monitoring, and diagnosis framework for phase-I analysis of multichannel profiles. The proposed framework is developed under the assumption that different profile channels have similar structure so that we can gain strength by borrowing information from all channels. The multidimensional functional principal component analysis is incorporated into change-point models to construct monitoring statistics. Simulation results show that the proposed approach has good performance in identifying change-points in various situations compared with some existing methods. The ...

...read moreread less

102 citations

Journal Article•DOI•

Modeling an Augmented Lagrangian for Blackbox Constrained Optimization

[...]

Robert B. Gramacy¹, Genetha Anne Gray², Sébastien Le Digabel³, Herbert K. H. Lee⁴, Pritam Ranjan, Garth N. Wells⁵, Stefan M. Wild⁶ - Show less +3 more•Institutions (6)

University of Chicago¹, Sandia National Laboratories², École Polytechnique de Montréal³, University of California, Santa Cruz⁴, University of Cambridge⁵, Argonne National Laboratory⁶

22 Feb 2016-Technometrics

TL;DR: In this article, a combination of response surface modeling, expected improvement, and the augmented Lagrangian numerical optimization framework is proposed to solve the problem of constrained black-box optimization.

...read moreread less

Abstract: Constrained blackbox optimization is a difficult problem, with most approaches coming from the mathematical programming literature. The statistical literature is sparse, especially in addressing problems with nontrivial constraints. This situation is unfortunate because statistical methods have many attractive properties: global scope, handling noisy objectives, sensitivity analysis, and so forth. To narrow that gap, we propose a combination of response surface modeling, expected improvement, and the augmented Lagrangian numerical optimization framework. This hybrid approach allows the statistical model to think globally and the augmented Lagrangian to act locally. We focus on problems where the constraints are the primary bottleneck, requiring expensive simulation to evaluate and substantial modeling effort to map out. In that context, our hybridization presents a simple yet effective solution that allows existing objective-oriented statistical approaches, like those based on Gaussian process surrogates ...

...read moreread less

85 citations

Journal Article•DOI•

A Distribution-Free Multivariate Control Chart

[...]

Nan Chen¹, Xuemin Zi², Changliang Zou³•Institutions (3)

National University of Singapore¹, Tianjin University of Technology and Education², Nankai University³

11 Oct 2016-Technometrics

TL;DR: A new nonparametric methodology for monitoring location parameters when only a small reference dataset is available and the key idea is to construct a series of conditionally distribution-free test statistics in the sense that their distributions are free of the underlying distribution given the empirical distribution functions.

...read moreread less

Abstract: Monitoring multivariate quality variables or data streams remains an important and challenging problem in statistical process control (SPC). Although the multivariate SPC has been extensively studied in the literature, designing distribution-free control schemes are still challenging and yet to be addressed well. This article develops a new nonparametric methodology for monitoring location parameters when only a small reference dataset is available. The key idea is to construct a series of conditionally distribution-free test statistics in the sense that their distributions are free of the underlying distribution given the empirical distribution functions. The conditional probability that the charting statistic exceeds the control limit at present given that there is no alarm before the current time point can be guaranteed to attain a specified false alarm rate. The success of the proposed method lies in the use of data-dependent control limits, which are determined based on the observations online rather...

...read moreread less

68 citations

Journal Article•DOI•

Optimum Allocation Rule for Accelerated Degradation Tests With a Class of Exponential-Dispersion Degradation Models

[...]

Sheng-Tsaing Tseng¹, I-Chen Lee¹•Institutions (1)

National Tsing Hua University¹

18 Apr 2016-Technometrics

TL;DR: It is demonstrated that a three-level compromise plan with small proportion allocation in the middle stress, in general, is a good strategy for ADT allocation, and the penalties of using nonoptimum allocation rules are addressed.

...read moreread less

Abstract: Optimum allocation problem in accelerated degradation tests (ADTs) is an important task for reliability analysts. Several researchers have attempted to address this decision problem, but their results have been based only on specific degradation models. Therefore, they lack a unified approach toward general degradation models. This study proposes a class of exponential dispersion (ED) degradation models to overcome this difficulty. Assuming that the underlying degradation path comes from the ED class, we analytically derive the optimum allocation rules (by minimizing the asymptotic variance of the estimated q quantile of product's lifetime) for two-level and three-level ADT allocation problems whether the testing stress levels are prefixed or not. For a three-level allocation problem, we show that all test units should be allocated into two out of three stresses, depending on certain specific conditions. Two examples are used to illustrate the proposed procedure. Furthermore, the penalties of using nonopt...

...read moreread less

54 citations

Journal Article•DOI•

Compressing an Ensemble With Statistical Models: An Algorithm for Global 3D Spatio-Temporal Temperature

[...]

Stefano Castruccio¹, Marc G. Genton²•Institutions (2)

Newcastle University¹, King Abdullah University of Science and Technology²

08 Jul 2016-Technometrics

TL;DR: In this paper, the authors proposed a statistical approach that explicitly accounts for the space-time dependence of the data for annual global 3D temperature fields in an initial condition ensemble, which can be used to instantaneously reproduce the temperature fields with a substantial saving in storage and time.

...read moreread less

Abstract: One of the main challenges when working with modern climate model ensembles is the increasingly larger size of the data produced, and the consequent difficulty in storing large amounts of spatio-temporally resolved information. Many compression algorithms can be used to mitigate this problem, but since they are designed to compress generic scientific datasets, they do not account for the nature of climate model output and they compress only individual simulations. In this work, we propose a different, statistics-based approach that explicitly accounts for the space-time dependence of the data for annual global three-dimensional temperature fields in an initial condition ensemble. The set of estimated parameters is small (compared to the data size) and can be regarded as a summary of the essential structure of the ensemble output; therefore, it can be used to instantaneously reproduce the temperature fields in an ensemble with a substantial saving in storage and time. The statistical model exploits the gri...

...read moreread less

47 citations

Journal Article•DOI•

Speeding Up Neighborhood Search in Local Gaussian Process Prediction

[...]

Robert B. Gramacy¹, Benjamin Haaland²•Institutions (2)

University of Chicago¹, National University of Singapore²

08 Jul 2016-Technometrics

TL;DR: In this article, the authors show that searching the space radially, continuously along rays emanating from the predictive location of interest, is a far thriftier alternative than the exhaustive and discrete nature of an important search subroutine involved in building such local designs may be overly conservative.

...read moreread less

Abstract: Recent implementations of local approximate Gaussian process models have pushed computational boundaries for nonlinear, nonparametric prediction problems, particularly when deployed as emulators for computer experiments. Their flavor of spatially independent computation accommodates massive parallelization, meaning that they can handle designs two or more orders of magnitude larger than previously. However, accomplishing that feat can still require massive computational horsepower. Here we aim to ease that burden. We study how predictive variance is reduced as local designs are built up for prediction. We then observe how the exhaustive and discrete nature of an important search subroutine involved in building such local designs may be overly conservative. Rather, we suggest that searching the space radially, that is, continuously along rays emanating from the predictive location of interest, is a far thriftier alternative. Our empirical work demonstrates that ray-based search yields predictors with accur...

...read moreread less

45 citations

Journal Article•DOI•

Sparse PCA for High-Dimensional Data With Outliers.

[...]

Mia Hubert¹, Tom Reynkens¹, Eric Schmitt¹, Tim Verdonck¹•Institutions (1)

Katholieke Universiteit Leuven¹

11 Oct 2016-Technometrics

TL;DR: In this article, a new sparse PCA algorithm is presented, which is robust against outliers, based on the ROBPCA algorithm that generates robust but nonsparse loadings.

...read moreread less

Abstract: A new sparse PCA algorithm is presented, which is robust against outliers. The approach is based on the ROBPCA algorithm that generates robust but nonsparse loadings. The construction of the new ROSPCA method is detailed, as well as a selection criterion for the sparsity parameter. An extensive simulation study and a real data example are performed, showing that it is capable of accurately finding the sparse structure of datasets, even when challenging outliers are present. In comparison with a projection pursuit-based algorithm, ROSPCA demonstrates superior robustness properties and comparable sparsity estimation capability, as well as significantly faster computation time.

...read moreread less

39 citations

Journal Article•DOI•

Optimizing Two-Level Supersaturated Designs Using Swarm Intelligence Techniques

[...]

Frederick Kin Hing Phoa¹, Ray-Bing Chen², Weichung Wang³, Weng Kee Wong⁴•Institutions (4)

Academia Sinica¹, National Cheng Kung University², National Taiwan University³, University of California, Los Angeles⁴

22 Feb 2016-Technometrics

TL;DR: An algorithm based on swarm intelligence is proposed to find E(s2)-optimal SSDs by showing that they attain the theoretical lower bounds found in previous literature, and it is shown that this algorithm consistently produces SSDs that are at least as efficient as those from the traditional CP exchange method.

...read moreread less

Abstract: Supersaturated designs (SSDs) are often used to reduce the number of experimental runs in screening experiments with a large number of factors. As more factors are used in the study, the search for an optimal SSD becomes increasingly challenging because of the large number of feasible selection of factor level settings. This article tackles this discrete optimization problem via an algorithm based on swarm intelligence. Using the commonly used E(s2) criterion as an illustrative example, we propose an algorithm to find E(s2)-optimal SSDs by showing that they attain the theoretical lower bounds found in previous literature. We show that our algorithm consistently produces SSDs that are at least as efficient as those from the traditional CP exchange method in terms of computational effort, frequency of finding the E(s2)-optimal SSD, and also has good potential for finding D3-, D4-, and D5-optimal SSDs. Supplementary materials for this article are available online.

...read moreread less

36 citations

Journal Article•DOI•

Short-Term Wind Speed Forecast Using Measurements From Multiple Turbines in A Wind Farm

[...]

Arash Pourhabib¹, Jianhua Z. Huang², Yu Ding³•Institutions (3)

Oklahoma State University–Stillwater¹, Renmin University of China², Texas A&M University³

22 Feb 2016-Technometrics

TL;DR: A probabilistic spatial-temporal model for analyzing local wind fields based on measurements taken from a large number of turbines in a wind farm, as opposed to aggregating the data into a single time-series, which finds that the two modeling elements benefit short-term wind speed forecasts.

...read moreread less

Abstract: Turbine operations in a wind farm benefit from an understanding of the near-ground behavior of wind speeds. This article describes a probabilistic spatial-temporal model for analyzing local wind fields. Our model is constructed based on measurements taken from a large number of turbines in a wind farm, as opposed to aggregating the data into a single time-series. The model incorporates both temporal and spatial characteristics of wind speed data: in addition to using a time epoch mechanism to model temporal nonstationarity, our model identifies an informative neighborhood of turbines that are spatially related, and consequently, constructs an ensemble-like predictor using the data associated with the neighboring turbines. Using actual wind data measured at 200 wind turbines in a wind farm, we found that the two modeling elements benefit short-term wind speed forecasts. We also investigate the use of regime switching to account for the effect of wind direction and the use of geostrophic wind to account for...

...read moreread less

Journal Article•DOI•

An Ordered Lasso and Sparse Time-Lagged Regression

[...]

Robert Tibshirani¹, Xiaotong Suo¹•Institutions (1)

Stanford University¹

11 Oct 2016-Technometrics

TL;DR: An order-constrained version of ℓ1-regularized regression (Lasso) is proposed, and it is shown how to solve it efficiently using the well-known pool adjacent violators algorithm as its proximal operator.

...read moreread less

Abstract: We consider regression scenarios where it is natural to impose an order constraint on the coefficients. We propose an order-constrained version of l1-regularized regression (Lasso) for this problem, and show how to solve it efficiently using the well-known pool adjacent violators algorithm as its proximal operator. The main application of this idea is to time-lagged regression, where we predict an outcome at time t from features at the previous K time points. In this setting, it is natural to assume that the coefficients decay as we move farther away from t, and hence the order constraint is reasonable. Potential application areas include financial time series and prediction of dynamic patient outcomes based on clinical measurements. We illustrate this idea on real and simulated data.

...read moreread less

Journal Article•DOI•

Blocking schemes for definitive screening designs

[...]

Bradley Jones¹, Christopher J. Nachtsheim•Institutions (1)

SAS Institute¹

22 Feb 2016-Technometrics

TL;DR: This article develops orthogonal blocking schemes for definitive screening designs, which are quite flexible in that the numbers of blocks may vary from two to the number of factors, and block sizes need not be equal.

...read moreread less

Abstract: In earlier work, Jones and Nachtsheim proposed a new class of screening designs called definitive screening designs. As originally presented, these designs are three-level designs for quantitative factors that provide estimates of main effects that are unbiased by any second-order effect and require only one more than twice as many runs as there are factors. Definitive screening designs avoid direct confounding of any pair of second-order effects, and, for designs that have more than five factors, project to efficient response surface designs for any two or three factors. Recently, Jones and Nachtsheim expanded the applicability of these designs by showing how to include any number of two-level categorical factors. However, methods for blocking definitive screening designs have not been addressed. In this article we develop orthogonal blocking schemes for definitive screening designs. We separately consider the cases where all of the factors are quantitative and where there is a mix of quantitative and tw...

...read moreread less

Journal Article•DOI•

Computer Experiments With Both Qualitative and Quantitative Variables

[...]

Hengzhen Huang¹, Dennis K.J. Lin², Min-Qian Liu¹, Jian-Feng Yang¹•Institutions (2)

Nankai University¹, Pennsylvania State University²

11 Oct 2016-Technometrics

TL;DR: The proposed designs are one kind of sliced Latin hypercube designs with points clustered in the design region and possess good uniformity for each slice to measure the similarities among responses of different level-combinations in the qualitative variables.

...read moreread less

Abstract: Computer experiments have received a great deal of attention in many fields of science and technology. Most literature assumes that all the input variables are quantitative. However, researchers often encounter computer experiments involving both qualitative and quantitative variables (BQQV). In this article, a new interface on design and analysis for computer experiments with BQQV is proposed. The new designs are one kind of sliced Latin hypercube designs with points clustered in the design region and possess good uniformity for each slice. For computer experiments with BQQV, such designs help to measure the similarities among responses of different level-combinations in the qualitative variables. An adaptive analysis strategy intended for the proposed designs is developed. The proposed strategy allows us to automatically extract information from useful auxiliary responses to increase the precision of prediction for the target response. The interface between the proposed design and the analysis strategy ...

...read moreread less

Journal Article•DOI•

Prior-Free Probabilistic Prediction of Future Observations

[...]

Ryan Martin¹, Rama T. Lingham²•Institutions (2)

University of Illinois at Chicago¹, Northern Illinois University²

18 Apr 2016-Technometrics

TL;DR: An IM-based technique is employed to marginalize out the unknown parameters, yielding prior-free probabilistic prediction of future observables, which is expected to be a useful tool for practitioners.

...read moreread less

Abstract: Prediction of future observations is a fundamental problem in statistics. Here we present a general approach based on the recently developed inferential model (IM) framework. We employ an IM-based technique to marginalize out the unknown parameters, yielding prior-free probabilistic prediction of future observables. Verifiable sufficient conditions are given for validity of our IM for prediction, and a variety of examples demonstrate the proposed method’s performance. Thanks to its generality and ease of implementation, we expect that our IM-based method for prediction will be a useful tool for practitioners. Supplementary materials for this article are available online.

...read moreread less

Journal Article•DOI•

Exploiting Structure of Maximum Likelihood Estimators for Extreme Value Threshold Selection

[...]

Jennifer L. Wadsworth¹•Institutions (1)

Lancaster University¹

22 Feb 2016-Technometrics

TL;DR: In this paper, the authors exploit the independent-increments structure of maximum likelihood estimators to produce complementary plots with greater interpretability, and suggest a simple likelihood-based procedure that allows for automated threshold selection.

...read moreread less

Abstract: To model the tail of a distribution, one has to define the threshold above or below which an extreme value model produces a suitable fit. Parameter stability plots, whereby one plots maximum likelihood estimates of supposedly threshold-independent parameters against threshold, form one of the main tools for threshold selection by practitioners, principally due to their simplicity. However, one repeated criticism of these plots is their lack of interpretability, with pointwise confidence intervals being strongly dependent across the range of thresholds. In this article, we exploit the independent-increments structure of maximum likelihood estimators to produce complementary plots with greater interpretability, and suggest a simple likelihood-based procedure that allows for automated threshold selection. Supplementary materials for this article are available online.

...read moreread less

Journal Article•DOI•

Orthogonalizing EM: A Design-Based Least Squares Algorithm

[...]

Shifeng Xiong¹, Bin Dai, Jared D. Huling², Peter Z. G. Qian²•Institutions (2)

Chinese Academy of Sciences¹, University of Wisconsin-Madison²

08 Jul 2016-Technometrics

TL;DR: An efficient iterative algorithm to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework, which is considerably faster than competing methods when n is much larger than p.

...read moreread less

Abstract: We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares (OLS) and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the OLS with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares proble...

...read moreread less

Journal Article•DOI•

A Bayesian Perspective on the Analysis of Unreplicated Factorial Experiments Using Potential Outcomes

[...]

Valeria Espinosa¹, Tirthankar Dasgupta¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

22 Feb 2016-Technometrics

TL;DR: A new approach to screen for active factorial effects from such experiments that uses the potential outcomes framework and is based on sequential posterior predictive model checks, which has the ability to broaden the standard definition of active effects and to link their definition to the population of interest.

...read moreread less

Abstract: Unreplicated factorial designs have been widely used in scientific and industrial settings, when it is important to distinguish “active” or real factorial effects from “inactive” or noise factorial effects used to estimate residual or “error” terms. We propose a new approach to screen for active factorial effects from such experiments that uses the potential outcomes framework and is based on sequential posterior predictive model checks. One advantage of the proposed method is its ability to broaden the standard definition of active effects and to link their definition to the population of interest. Another important aspect of this approach is its conceptual connection to Fisherian randomization tests. Extensive simulation studies are conducted, which demonstrate the superiority of the proposed approach over existing ones in the situations considered.

...read moreread less

Journal Article•DOI•

Fused Adaptive Lasso for Spatial and Temporal Quantile Function Estimation

[...]

Ying Sun¹, Huixia Judy Wang², Montserrat Fuentes³•Institutions (3)

King Abdullah University of Science and Technology¹, George Washington University², North Carolina State University³

22 Feb 2016-Technometrics

TL;DR: New quantile function estimators for spatial and temporal data with a fused adaptive Lasso penalty to accommodate the dependence in space and time for applications with features ordered in time or space without replicated observations are introduced.

...read moreread less

Abstract: Quantile functions are important in characterizing the entire probability distribution of a random variable, especially when the tail of a skewed distribution is of interest. This article introduces new quantile function estimators for spatial and temporal data with a fused adaptive Lasso penalty to accommodate the dependence in space and time. This method penalizes the difference among neighboring quantiles, hence it is desirable for applications with features ordered in time or space without replicated observations. The theoretical properties are investigated and the performances of the proposed methods are evaluated by simulations. The proposed method is applied to particulate matter (PM) data from the Community Multiscale Air Quality (CMAQ) model to characterize the upper quantiles, which are crucial for studying spatial association between PM concentrations and adverse human health effects.

...read moreread less

Journal Article•DOI•

Augmenting the Unreturned for Field Data With Information on Returned Failures Only

[...]

Zhi-Sheng Ye¹, Loon Ching Tang¹•Institutions (1)

National University of Singapore¹

11 Oct 2016-Technometrics

TL;DR: In this paper, the authors proposed a data-augmentation algorithm for truncated field return data with returned failures available only, based on an idea to reveal the hidden unobserved lifetimes.

...read moreread less

Abstract: Field data are an important source of reliability information for many commercial products. Because field data are often collected by the maintenance department, information on failed and returned units is well maintained. Nevertheless, information on unreturned units is generally unavailable. The unavailability leads to truncation in the lifetime data. This study proposes a data-augmentation algorithm for this type of truncated field return data with returned failures available only. The algorithm is based on an idea to reveal the hidden unobserved lifetimes. Theoretical justifications of the procedure for augmenting the hidden unobserved are given. On the other hand, the algorithm is iterative in nature. Asymptotic properties of the estimators from the iterations are investigated. Both point estimation and the information matrix of the parameters can be directly obtained from the algorithm. In addition, a by-product of the algorithm is a nonparametric estimator of the installation time distribution. An ...

...read moreread less

Journal Article•DOI•

Bayesian Additive Regression Tree Calibration of Complex High-Dimensional Computer Models

[...]

Matthew T. Pratola¹, David Higdon²•Institutions (2)

Ohio State University¹, Los Alamos National Laboratory²

18 Apr 2016-Technometrics

TL;DR: This article develops a Bayesian statistical calibration approach that is ideally suited for such challenging calibration problems and leverages recent ideas from Bayesian additive regression Tree models to construct a random basis representation of the simulator outputs and observational data.

...read moreread less

Abstract: Complex natural phenomena are increasingly investigated by the use of a complex computer simulator. To leverage the advantages of simulators, observational data need to be incorporated in a probabilistic framework so that uncertainties can be quantified. A popular framework for such experiments is the statistical computer model calibration experiment. A limitation often encountered in current statistical approaches for such experiments is the difficulty in modeling high-dimensional observational datasets and simulator outputs as well as high-dimensional inputs. As the complexity of simulators seems to only grow, this challenge will continue unabated. In this article, we develop a Bayesian statistical calibration approach that is ideally suited for such challenging calibration problems. Our approach leverages recent ideas from Bayesian additive regression Tree models to construct a random basis representation of the simulator outputs and observational data. The approach can flexibly handle high-dimensional...

...read moreread less

Journal Article•

Supplementary material: Augmenting the Unreturned for Field Data With Information on Returned Failures Only

[...]

Zhi-Sheng Ye, Loon Ching Tang

01 Nov 2016-Technometrics

TL;DR: A data-augmentation algorithm for this type of truncated field return data with returned failures available only is proposed, based on an idea to reveal the hidden unobserved lifetimes.

...read moreread less

Abstract: Supplementary material to "Augmenting the Unreturned for Field Data With Information on Returned Failures Only"

...read moreread less

Journal Article•

Supplementary material: High-Performance Kernel Machines With Implicit Distributed Optimization and Randomization

[...]

Haim Avron, Vikas Sindhwani

01 Aug 2016-Technometrics

TL;DR: This approach is based on a block-splitting variant of the alternating directions method of multipliers, carefully reconfigured to handle very large random feature matrices under memory constraints, while exploiting hybrid parallelism typically found in modern clusters of multicore machines.

...read moreread less

Abstract: Supplementary material to "High-Performance Kernel Machines With Implicit Distributed Optimization and Randomization"

...read moreread less

Journal Article•DOI•

Sliced Orthogonal Array-Based Latin Hypercube Designs

[...]

Youngdeok Hwang¹, Xu He², Peter Z. G. Qian³•Institutions (3)

IBM¹, Chinese Academy of Sciences², University of Wisconsin-Madison³

22 Feb 2016-Technometrics

TL;DR: In this paper, a sliced orthogonal array-based Latin hypercube design is proposed to achieve one-and two-dimensional uniformity, which can be used for uncertainty quantification of computer models, cross-validation and efficient allocation of computing resources.

...read moreread less

Abstract: We propose an approach for constructing a new type of design, called a sliced orthogonal array-based Latin hypercube design. This approach exploits a slicing structure of orthogonal arrays with strength two and makes use of sliced random permutations. Such a design achieves one- and two-dimensional uniformity and can be divided into smaller Latin hypercube designs with one-dimensional uniformity. Sampling properties of the proposed designs are derived. Examples are given for illustrating the construction method and corroborating the derived theoretical results. Potential applications of the constructed designs include uncertainty quantification of computer models, computer models with qualitative and quantitative factors, cross-validation and efficient allocation of computing resources. Supplementary materials for this article are available online.

...read moreread less

Journal Article•DOI•

Monotonic Quantile Regression With Bernstein Polynomials for Stochastic Simulation

[...]

Matthias H. Y. Tan¹•Institutions (1)

City University of Hong Kong¹

18 Apr 2016-Technometrics

TL;DR: This article proposes a class of monotonic regression models, which consists of functional analysis of variance (FANOVA) decomposition components modeled with Bernstein polynomial bases for estimating quantiles as a function of multiple inputs.

...read moreread less

Abstract: Quantile regression is an important tool to determine the quality level of service, product, and operation systems via stochastic simulation. It is frequently known that the quantiles of the output distribution are monotonic functions of certain inputs to the simulation model. Because there is typically high variability in estimation of tail quantiles, it can be valuable to incorporate this information in quantile modeling. However, the existing literature on monotone quantile regression with multiple inputs is sparse. In this article, we propose a class of monotonic regression models, which consists of functional analysis of variance (FANOVA) decomposition components modeled with Bernstein polynomial bases for estimating quantiles as a function of multiple inputs. The polynomial degrees of the bases for the model and the FANOVA components included in the model are selected by a greedy algorithm. Real examples demonstrate the advantages of incorporating the monotonicity assumption in quantile regression a...

...read moreread less

Journal Article•DOI•

Self-Starting Monitoring Scheme for Poisson Count Data With Varying Population Sizes

[...]

Xiaobei Shen¹, Kwok-Leung Tsui¹, Changliang Zou², William H. Woodall³•Institutions (3)

City University of Hong Kong¹, Nankai University², Virginia Tech³

11 Oct 2016-Technometrics

TL;DR: A self-starting exponentially weighted moving average (EWMA) control scheme based on a parametric bootstrap method that is useful in rare event studies during the start-up stage of a monitoring process and has good in-control and out-of-control performance under various situations.

...read moreread less

Abstract: In this article, we consider the problem of monitoring Poisson rates when the population sizes are time-varying and the nominal value of the process parameter is unavailable. Almost all previous control schemes for the detection of increases in the Poisson rate in Phase II are constructed based on assumed knowledge of the process parameters, for example, the expectation of the count of a rare event when the process of interest is in control. In practice, however, this parameter is usually unknown and not able to be estimated with a sufficiently large number of reference samples. A self-starting exponentially weighted moving average (EWMA) control scheme based on a parametric bootstrap method is proposed. The success of the proposed method lies in the use of probability control limits, which are determined based on the observations during rather than before monitoring. Simulation studies show that our proposed scheme has good in-control and out-of-control performance under various situations. In particular...

...read moreread less

Journal Article•DOI•

Partitioning a Large Simulation as It Runs

[...]

Kary Myers¹, Earl Lawrence¹, Michael L. Fugate¹, Claire McKay Bowen², Lawrence O. Ticknor¹, Jon Woodring¹, Joanne Wendelberger¹, James Ahrens¹ - Show less +4 more•Institutions (2)

Los Alamos National Laboratory¹, University of Notre Dame²

08 Jul 2016-Technometrics

TL;DR: In this paper, an online in situ method for identifying a reduced set of time steps of a simulation is presented. But the method is limited to a subset of the time steps, where the spacing can be defined by the budget for storage and transfer.

...read moreread less

Abstract: As computer simulations continue to grow in size and complexity, they present a particularly challenging class of big data problems. Many application areas are moving toward exascale computing systems, systems that perform 1018 FLOPS (FLoating-point Operations Per Second)—a billion billion calculations per second. Simulations at this scale can generate output that exceeds both the storage capacity and the bandwidth available for transfer to storage, making post-processing and analysis challenging. One approach is to embed some analyses in the simulation while the simulation is running—a strategy often called in situ analysis—to reduce the need for transfer to storage. Another strategy is to save only a reduced set of time steps rather than the full simulation. Typically the selected time steps are evenly spaced, where the spacing can be defined by the budget for storage and transfer. This article combines these two ideas to introduce an online in situ method for identifying a reduced set of time steps of ...

...read moreread less

Journal Article•DOI•

A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data

[...]

Faming Liang¹, Jinsu Kim², Qifan Song³•Institutions (3)

University of Florida¹, LG CNS², Purdue University³

08 Jul 2016-Technometrics

TL;DR: The so-called bootstrap Metropolis–Hastings (BMH) algorithm is proposed, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis, that is, to replace the full data log-likelihood by a Monte Carlo average of the log- likelihoods that are calculated in parallel from multiple bootstrap samples.

...read moreread less

Abstract: Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this article, we propose the so-called bootstrap Metropolis–Hastings (BMH) algorithm that provides a general framework for how to tame powerful MCMC methods to be used for big data analysis, that is, to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation r...

...read moreread less

Journal Article•

Supplementary material: Self-Starting Monitoring Scheme for Poisson Count Data With Varying Population Sizes

[...]

Xiaobei Shen, Kwok-Leung Tsui, Chiangliang Zou, William H. Woodall

01 Nov 2016-Technometrics

TL;DR: In this article, a self-starting monitoring scheme for Poisson count data with varying population sizes was proposed, which is based on the Self-Starting Monitoring Scheme (SMS) algorithm.

...read moreread less

Abstract: Supplementary material to "Self-Starting Monitoring Scheme for Poisson Count Data With Varying Population Sizes"

...read moreread less

Journal Article•DOI•

High-Performance Kernel Machines With Implicit Distributed Optimization and Randomization

[...]

Haim Avron¹, Vikas Sindhwani²•Institutions (2)

Tel Aviv University¹, Google²

08 Jul 2016-Technometrics

TL;DR: In this article, a block-splitting variant of the alternating directions method of multipliers is proposed to handle very large random feature matrices under memory constraints, while exploiting hybrid parallelism typically found in modern clusters of multicore machines.

...read moreread less

Abstract: We propose a framework for massive-scale training of kernel-based statistical models, based on combining distributed convex optimization with randomization techniques. Our approach is based on a block-splitting variant of the alternating directions method of multipliers, carefully reconfigured to handle very large random feature matrices under memory constraints, while exploiting hybrid parallelism typically found in modern clusters of multicore machines. Our high-performance implementation supports a variety of statistical learning tasks by enabling several loss functions, regularization schemes, kernels, and layers of randomized approximations for both dense and sparse datasets, in an extensible framework. We evaluate our implementation on large-scale model construction tasks and provide a comparison against existing sequential and parallel libraries. Supplementary materials for this article are available online.

...read moreread less