scispace - formally typeset
Search or ask a question

Showing papers in "Technometrics in 2017"


Journal ArticleDOI
TL;DR: In this article, a Gaussian process emulator is used to approximate the expected utility as a function of a single design coordinate in a series of conditional optimization steps to find multi-variable designs without resorting to asymptotic approximations to the posterior distribution or expected utility.
Abstract: The construction of decision-theoretical Bayesian designs for realistically complex nonlinear models is computationally challenging, as it requires the optimization of analytically intractable expected utility functions over high-dimensional design spaces. We provide the most general solution to date for this problem through a novel approximate coordinate exchange algorithm. This methodology uses a Gaussian process emulator to approximate the expected utility as a function of a single design coordinate in a series of conditional optimization steps. It has flexibility to address problems for any choice of utility function and for a wide range of statistical models with different numbers of variables, numbers of runs and randomization restrictions. In contrast to existing approaches to Bayesian design, the method can find multi-variable designs in large numbers of runs without resorting to asymptotic approximations to the posterior distribution or expected utility. The methodology is demonstrated on...

104 citations


Journal ArticleDOI
TL;DR: A novel methodology for anomaly detection in noisy images with smooth backgrounds, named smooth-sparse decomposition, exploits regularized high-dimensional regression to decompose an image and separate anomalous regions by solving a large-scale optimization problem.
Abstract: In various manufacturing applications such as steel, composites, and textile production, anomaly detection in noisy images is of special importance. Although there are several methods for image den...

84 citations


Journal ArticleDOI
TL;DR: This work proposes an additive Gaussian process model for computer experiments with qualitative and quantitative factors that inherits the flexibility of unrestrictive correlation structure for qualitative factors by using the hypersphere decomposition, embracing more flexibility in modeling the complex systems of computer experiments.
Abstract: Computer experiments with qualitative and quantitative factors occur frequently in various applications in science and engineering. Analysis of such experiments is not yet completely resolved. In this work, we propose an additive Gaussian process model for computer experiments with qualitative and quantitative factors. The proposed method considers an additive correlation structure for qualitative factors, and assumes that the correlation function for each qualitative factor and the correlation function of quantitative factors are multiplicative. It inherits the flexibility of unrestrictive correlation structure for qualitative factors by using the hypersphere decomposition, embracing more flexibility in modeling the complex systems of computer experiments. The merits of the proposed method are illustrated by several numerical examples and a real data application. Supplementary materials for this article are available online.

59 citations


Journal ArticleDOI
TL;DR: This work considers partitioning a spatial region into disjoint sets using hierarchical clustering of observations and finite differences as a measure of dissimilarity and proposes a nonstationary Gaussian process model across the clusters, which allows the computational burden of model fitting to be distributed across multiple cores and nodes.
Abstract: Modern digital data production methods, such as computer simulation and remote sensing, have vastly increased the size and complexity of data collected over spatial domains. Analysis of these large spatial datasets for scientific inquiry is typically carried out using the Gaussian process. However, nonstationary behavior and computational requirements for large spatial datasets can prohibit efficient implementation of Gaussian process models. To perform computationally feasible inference for large spatial data, we consider partitioning a spatial region into disjoint sets using hierarchical clustering of observations and finite differences as a measure of dissimilarity. Intuitively, directions with large finite differences indicate directions of rapid increase or decrease and are, therefore, appropriate for partitioning the spatial region. Spatial contiguity of the resulting clusters is enforced by only clustering Voronoi neighbors. Following spatial clustering, we propose a nonstationary Gaussian process ...

52 citations


Journal ArticleDOI
TL;DR: In this article, the authors model the individual component lifetime by a gamma distribution and fit the data using the inverse Gaussian (IG) distribution, a useful lifetime model for failures caused by degradation.
Abstract: Because of the exponential distribution assumption, many reliability databases recorded data in an aggregate way. Instead of individual failure times, each aggregate data point is a summation of a series of collective failures representing the cumulative operating time of one component position from system commencement to the last component replacement. The data format is different from traditional lifetime data and the statistical inference is challenging. We first model the individual component lifetime by a gamma distribution. Confidence intervals for the gamma shape parameter can be constructed using a scaled χ2 approximation to a modified ratio of the geometric mean to the arithmetic mean, while confidence intervals for the gamma rate and mean parameters, as well as quantiles, are obtained using the generalized pivotal quantity method. We then fit the data using the inverse Gaussian (IG) distribution, a useful lifetime model for failures caused by degradation. Procedures for point estimation and inte...

49 citations


Journal ArticleDOI
TL;DR: For example, this paper showed that DSDs have high power for detecting all the main effects as well as one two-factor interaction or one quadratic effect as long as the true effects are much larger than the error standard deviation.
Abstract: Since their introduction by Jones and Nachtsheim in 2011, definitive screening designs (DSDs) have seen application in fields as diverse as bio-manufacturing, green energy production, and laser etching. One barrier to their routine adoption for screening is due to the difficulties practitioners experience in model selection when both main effects and second-order effects are active. Jones and Nachtsheim showed that for six or more factors, DSDs project to designs in any three factors that can fit a full quadratic model. In addition, they showed that DSDs have high power for detecting all the main effects as well as one two-factor interaction or one quadratic effect as long as the true effects are much larger than the error standard deviation. However, simulation studies of model selection strategies applied to DSDs can disappoint by failing to identify the correct set of active second-order effects when there are more than a few such effects. Standard model selection strategies such as stepwise re...

47 citations


Journal ArticleDOI
TL;DR: A new vine copula model is proposed to accommodate the newly discovered dependence structure, and it is shown that the new model can predict the effectiveness of early-warning more accurately than the others.
Abstract: Internet-based computer information systems play critical roles in many aspects of modern society. However, these systems are constantly under cyber attacks that can cause catastrophic consequences. To defend these systems effectively, it is necessary to measure and predict the effectiveness of cyber defense mechanisms. In this article, we investigate how to measure and predict the effectiveness of an important cyber defense mechanism that is known as early-warning. This turns out to be a challenging problem because we must accommodate the dependence among certain four-dimensional time series. In the course of using a dataset to demonstrate the prediction methodology, we discovered a new nonexchangeable and rotationally symmetric dependence structure, which may be of independent value. We propose a new vine copula model to accommodate the newly discovered dependence structure, and show that the new model can predict the effectiveness of early-warning more accurately than the others. We also discus...

45 citations


Journal ArticleDOI
TL;DR: A new tensor partial least-squares algorithm is proposed, then the corresponding population interpretation is established to build connection with the notion of sufficient dimension reduction, and to obtain the asymptotic consistency of the PLS estimator.
Abstract: Partial least squares (PLS) is a prominent solution for dimension reduction and high-dimensional regressions. Recent prevalence of multidimensional tensor data has led to several tensor versions of the PLS algorithms. However, none offers a population model and interpretation, and statistical properties of the associated parameters remain intractable. In this article, we first propose a new tensor partial least-squares algorithm, then establish the corresponding population interpretation. This population investigation allows us to gain new insight on how the PLS achieves effective dimension reduction, to build connection with the notion of sufficient dimension reduction, and to obtain the asymptotic consistency of the PLS estimator. We compare our method, both analytically and numerically, with some alternative solutions. We also illustrate the efficacy of the new method on simulations and two neuroimaging data analyses. Supplementary materials for this article are available online.

43 citations


Journal ArticleDOI
TL;DR: This work characterize the influence of dependence structures on system reliability and component importance in coherent systems with discrete marginal distributions and extends the framework to coherent multi-state system.
Abstract: System reliability and component importance are of great interest in reliability modeling, especially when the components within the system are dependent. We characterize the influence of dependence structures on system reliability and component importance in coherent systems with discrete marginal distributions. The effects of dependence are captured through copula theory. We extend our framework to coherent multi-state system. Applications of the derived results are demonstrated using a Gaussian copula, which yields simple interpretations. Simulations and two examples are presented to demonstrate the importance of modeling dependence when estimating system reliability and ranking of component importance. Proofs, algorithms, code, and data are provided in supplementary materials available online.

37 citations


Journal ArticleDOI
TL;DR: A new covariance function is investigated that is shown to offer superior prediction compared to the more common covariances for computer simulations of real physical systems and is named a lifted Brownian covariance.
Abstract: Gaussian processes have become a standard framework for modeling deterministic computer simulations and producing predictions of the response surface. This article investigates a new covariance function that is shown to offer superior prediction compared to the more common covariances for computer simulations of real physical systems. This is demonstrated via a gamut of realistic examples. A simple, closed-form expression for the covariance is derived as a limiting form of a Brownian-like covariance model as it is extended to some hypothetical higher-dimensional input domain, and so we term it a lifted Brownian covariance. This covariance has connections with the multiquadric kernel. Through analysis of the kriging model, this article offers some theoretical comparisons between the proposed covariance model and existing covariance models. The major emphasis of the theory is explaining why the proposed covariance is superior to its traditional counterparts for many computer simulations of real phys...

32 citations


Journal ArticleDOI
TL;DR: New methods and algorithms to construct and compute a set of summary statistics, which is termed as the Box–Cox information array are proposed, which can be extremely efficient and fast even when multiple models are considered.
Abstract: The Box–Cox transformation is an important technique in linear regression when assumptions of a regression model are seriously violated. The technique has been widely accepted and extensively applied since it was first proposed. Based on the maximum likelihood approach, previous methods and algorithms for the Box–Cox transformation are mostly developed for small or moderate data. These methods and algorithms cannot be applied to big data because of the memory and storage capacity barriers. To overcome these difficulties, the present article proposes new methods and algorithms, where the basic idea is to construct and compute a set of summary statistics, which is termed as the Box–Cox information array in the article. According to the property of the maximum likelihood approach, the computation of the Box–Cox information array is the only issue to be considered in reading of data. Once the Box–Cox information array is obtained, the optimal power transformation as well as the corresponding estimates...

Journal ArticleDOI
TL;DR: A Box–Ljung-type test statistic is suggested that is formed after calculating the distance covariance function among pairs of observations that converges to a normal random variable under mild regularity conditions.
Abstract: We consider the problem of testing pairwise dependence for stationary time series. For this, we suggest the use of a Box–Ljung-type test statistic that is formed after calculating the distance covariance function among pairs of observations. The distance covariance function is a suitable measure for detecting dependencies between observations as it is based on the distance between the characteristic function of the joint distribution of the random variables and the product of the marginals. We show that, under the null hypothesis of independence and under mild regularity conditions, the test statistic converges to a normal random variable. The results are complemented by several examples. This article has supplementary material online.

Journal ArticleDOI
TL;DR: This article proposes an optimization scheme that sequentially adds new computer runs by following two criteria, called EQI and EQIE, that outperforms the expected improvement (EI) criterion that works for single-accuracy experiments.
Abstract: Computer experiments based on mathematical models are powerful tools for understanding physical processes. This article addresses the problem of kriging-based optimization for deterministic computer experiments with tunable accuracy. Our approach is to use multi-fidelity computer experiments with increasing accuracy levels and a nonstationary Gaussian process model. We propose an optimization scheme that sequentially adds new computer runs by following two criteria. The first criterion, called EQI, scores candidate inputs with given level of accuracy, and the second criterion, called EQIE, scores candidate combinations of inputs and accuracy. From simulation results and a real example using finite element analysis, our method outperforms the expected improvement (EI) criterion that works for single-accuracy experiments. Supplementary materials for this article are available online.

Journal Article
TL;DR: This article shows how to take advantage of the special structure of the DSD to obtain the most clear-cut analytical results possible.

Journal ArticleDOI
TL;DR: This work develops an imperfect-maintenance model by taking a physically meaningful approach, and promotes a bootstrapping approach to approximating the distribution of a test statistic via simulated data.
Abstract: Maintenance actions can be classified, according to their efficiency, into three categories: perfect maintenance, imperfect maintenance, and minimal maintenance. To date, the literature on imperfect maintenance is voluminous, and many models have been developed to treat imperfect maintenance. Yet, there are two important problems in the community of maintenance that still remain wide open: how to give practical grounds for an imperfect-maintenance model, and how to test the fit of a real dataset to an imperfect-maintenance model. Motivated by these two pending problems, this work develops an imperfect-maintenance model by taking a physically meaningful approach. For the practical implementation of the developed model, we advance two methods, called QMI method and spacing-likelihood algorithm, to estimate involved unknown parameters. The two methods complete each other and are widely applicable. To offer a practical guide for testing fit to an imperfect-maintenance model, this work promotes a boots...

Journal ArticleDOI
TL;DR: This article presents a batch sequential experiment design method that uses sliced full factorial-basedLatin hypercube designs (sFFLHDs), which are an extension to the concept of sliced orthogonal array-based Latin hyper cube designs (OALHDs) and shows that the approach has good sampling and fitting qualities through both empirical studies and theoretical arguments.
Abstract: When fitting complex models, such as finite element or discrete event simulations, the experiment design should exhibit desirable properties of both projectivity and orthogonality. To reduce experimental effort, sequential design strategies allow experimenters to collect data only until some measure of prediction precision is reached. In this article, we present a batch sequential experiment design method that uses sliced full factorial-based Latin hypercube designs (sFFLHDs), which are an extension to the concept of sliced orthogonal array-based Latin hypercube designs (OALHDs). At all stages of the sequential design, good univariate stratification is achieved. The structure of the FFLHDs also tends to produce uniformity in higher dimensions, especially at certain stages of the design. We show that our batch sequential design approach has good sampling and fitting qualities through both empirical studies and theoretical arguments. Supplementary materials are available online.

Journal ArticleDOI
TL;DR: A case study of a natural history model that has been used to characterize UK bowel cancer incidence is presented, and it is shown that sensitivity to different discrepancy specifications at little computational cost is investigated.
Abstract: We calibrate a stochastic computer simulation model of “moderate” computational expense. The simulator is an imperfect representation of reality, and we recognize this discrepancy to ensure a reliable calibration. The calibration model combines a Gaussian process emulator of the likelihood surface with importance sampling. Changing the discrepancy specification changes only the importance weights, which lets us investigate sensitivity to different discrepancy specifications at little computational cost. We present a case study of a natural history model that has been used to characterize UK bowel cancer incidence. Datasets and computer code are provided as supplementary material.

Journal ArticleDOI
TL;DR: A new approach to address the multiple testing problem and its advantages over existing methods is introduced and its performance in an application to semiconductor wafer fabrication is illustrated.
Abstract: Motivated by applications to root-cause identification of faults in multistage manufacturing processes that involve a large number of tools or equipment at each stage, we consider multiple testing in regression models whose outputs represent the quality characteristics of a multistage manufacturing process. Because of the large number of input variables that correspond to the tools or equipments used, this falls in the framework of regression modeling in the modern era of big data. On the other hand, with quick fault detection and diagnosis followed by tool rectification, sparsity can be assumed in the regression model. We introduce a new approach to address the multiple testing problem and demonstrate its advantages over existing methods. We also illustrate its performance in an application to semiconductor wafer fabrication that motivated this development. Supplementary materials for this article are available online.

Journal ArticleDOI
TL;DR: A new method for selecting design points over nonconvex regions that is based on the application of multidimensional scaling to the geodesic distance is proposed.
Abstract: Modeling a response over a nonconvex design region is a common problem in diverse areas such as engineering and geophysics. The tools available to model and design for such responses are limited and have received little attention. We propose a new method for selecting design points over nonconvex regions that is based on the application of multidimensional scaling to the geodesic distance. Optimal designs for prediction are described, with special emphasis on Gaussian process models, followed by a simulation study and an application in glaciology. Supplementary materials for this article are available online.

Journal ArticleDOI
TL;DR: Compound criteria, which combine the inference criteria with traditional point estimation criteria, are used and the designs obtained are shown to compromise between point estimation and inference.
Abstract: It is increasingly recognized that many industrial and engineering experiments use split-plot or other multi-stratum structures. Much recent work has concentrated on finding optimum, or near-optimum, designs for estimating the fixed effects parameters in multi-stratum designs. However, often inference, such as hypothesis testing or interval estimation, will also be required and for inference to be unbiased in the presence of model uncertainty requires pure error estimates of the variance components. Most optimal designs provide few, if any, pure error degrees of freedom. Gilmour and Trinca (2012) introduced design optimality criteria for inference in the context of completely randomized and block designs. Here these criteria are used stratum-by-stratum to obtain multi-stratum designs. It is shown that these designs have better properties for performing inference than standard optimum designs. Compound criteria, which combine the inference criteria with traditional point estimation criteria, are al...

Journal ArticleDOI
TL;DR: In this article, the authors propose a procedure for estimating the component lifetime distribution using the aggregated event data from a fleet of systems, where the observed data are a collection of superpositions of renewal processes (SRP), one for each system in the fleet.
Abstract: Maintenance data can be used to make inferences about the lifetime distribution of system components. Typically, a fleet contains multiple systems. Within each system, there is a set of nominally identical replaceable components of particular interest (e.g., 2 automobile headlights, 8 dual in-line memory module (DIMM) modules in a computing server, 16 cylinders in a locomotive engine). For each component replacement event, there is system-level information that a component was replaced, but no information on which particular component was replaced. Thus, the observed data are a collection of superpositions of renewal processes (SRP), one for each system in the fleet. This article proposes a procedure for estimating the component lifetime distribution using the aggregated event data from a fleet of systems. We show how to compute the likelihood function for the collection of SRPs and provide suggestions for efficient computations. We compare performance of this incomplete-data maximum likelihood (M...

Journal ArticleDOI
TL;DR: Bayesian D-optimal designs, designs created algorithmically to optimize estimation capacity over various model spaces, and orthogonal designs by estimation-based criteria and simulation are compared.
Abstract: This article presents a comparison of criteria used to characterize two-level designs for screening purposes. To articulate the relationships among criteria, we focus on 7-factor designs with 16–32 runs and 11-factor designs with 20–48 runs. Screening based on selected designs for each of the run sizes considered is studied with simulation using a forward selection procedure and the Dantzig selector. This article compares Bayesian D-optimal designs, designs created algorithmically to optimize estimation capacity over various model spaces, and orthogonal designs by estimation-based criteria and simulation. In this way, we furnish both general insights regarding various design approaches, as well as a guide to make a choice among a few final candidate designs. Supplementary materials for this article are available online.

Journal ArticleDOI
TL;DR: A Bayesian approach is proposed that first imposes a normal prior on the large space of linear coefficients, then applies the Markov chain Monte Carlo algorithm to generate posterior samples for predictions, and Bayesian credible intervals can be obtained to assess prediction uncertainty.
Abstract: A numerical method, called overcomplete basis surrogate method (OBSM), was recently proposed, which employs overcomplete basis functions to achieve sparse representations. While the method can handle nonstationary response without the need of inverting large covariance matrices, it lacks the capability to quantify uncertainty in predictions. We address this issue by proposing a Bayesian approach that first imposes a normal prior on the large space of linear coefficients, then applies the Markov chain Monte Carlo (MCMC) algorithm to generate posterior samples for predictions. From these samples, Bayesian credible intervals can then be obtained to assess prediction uncertainty. A key application for the proposed method is the efficient construction of sequential designs. Several sequential design procedures with different infill criteria are proposed based on the generated posterior samples. Numerical studies show that the proposed schemes are capable of solving problems of positive point identifica...

Journal ArticleDOI
TL;DR: A general class of models to describe replacement events in a multi-level repairable system by extending the TRP model is proposed and procedures for parameter estimation and the prediction of future events based on historical data are developed.
Abstract: A repairable system is a system that can be restored to an operational state after a repair event The system may experience multiple events over time that are called recurrent events To model the recurrent event data, the renewal process (RP), the nonhomogenous Poisson process (NHPP), and the trend-renewal process (TRP) are often used Compared to the RP and NHPP, the TRP is more flexible for modeling, because it includes both RP and NHPP as special cases However, for a multi-level system (eg, system, subsystem, and component levels), the original TRP model may not be adequate if the repair is effected by a subsystem replacement and if subsystem-level replacement events affect the rate of occurrence of the component-level replacement events In this article, we propose a general class of models to describe replacement events in a multi-level repairable system by extending the TRP model We also develop procedures for parameter estimation and the prediction of future events based on historical

Journal ArticleDOI
TL;DR: The probability of agreement to help quantify the similarities of populations can help to give a realistic assessment of whether the systems have reliability that are sufficiently similar for practical purposes to be treated as a homogeneous population.
Abstract: Combining information from different populations to improve precision, simplify future predictions, or improve underlying understanding of relationships can be advantageous when considering the reliability of several related sets of systems. Using the probability of agreement to help quantify the similarities of populations can help to give a realistic assessment of whether the systems have reliability that are sufficiently similar for practical purposes to be treated as a homogeneous population. The new method is described and illustrated with an example involving two generations of a complex system, where the reliability is modeled using either a logistic or probit regression model. Note that supplementary materials including code, datasets, and added discussion are available online.

Journal Article
TL;DR: This work proposes an interval estimation method for the quantiles of an IG distribution based on the generalized pivotal quantity method, and develops procedures for point estimation and interval estimation of parameters.
Abstract: Supplementary material to "Estimation of Field Reliability Based on Aggregate Lifetime Data"


Journal ArticleDOI
TL;DR: Two new estimators, inspired by the minimum distance estimation and the M-estimation in the linear regression, are proposed for the GPD parameters and are shown to perform well for all values of k under small and moderate sample sizes.
Abstract: The generalized Pareto distribution (GPD) is widely used for extreme values over a threshold. Most existing methods for parameter estimation either perform unsatisfactorily when the shape parameter k is larger than 0.5, or they suffer from heavy computation as the sample size increases. In view of the fact that k > 0.5 is occasionally seen in numerous applications, including two illustrative examples used in this study, we remedy the deficiencies of existing methods by proposing two new estimators for the GPD parameters. The new estimators are inspired by the minimum distance estimation and the M-estimation in the linear regression. Through comprehensive simulation, the estimators are shown to perform well for all values of k under small and moderate sample sizes. They are comparable to the existing methods for k 0.5.

Journal ArticleDOI
TL;DR: In this article, the authors developed an automated and optimized detection procedure that mimics these operations, which reduces the number of images requiring expert visual evaluation by filtering out outliers and eliminating human-factors variability.
Abstract: Nondestructive evaluation (NDE) techniques are widely used to detect flaws in critical components of systems like aircraft engines, nuclear power plants, and oil pipelines to prevent catastrophic events. Many modern NDE systems generate image data. In some applications, an experienced inspector performs the tedious task of visually examining every image to provide accurate conclusions about the existence of flaws. This approach is labor-intensive and can cause misses due to operator ennui. Automated evaluation methods seek to eliminate human-factors variability and improve throughput. Simple methods based on peak amplitude in an image are sometimes employed and a trained-operator-controlled refinement that uses a dynamic threshold based on signal-to-noise ratio (SNR) has also been implemented. We develop an automated and optimized detection procedure that mimics these operations. The primary goal of our methodology is to reduce the number of images requiring expert visual evaluation by filtering o...

Journal Article
TL;DR: A fast algorithm for constructing efficient two-level foldover (EFD) designs that have equal or greater efficiency for estimating the ME model versus competitive designs in the literature and that the algorithmic approach allows the fast construction of designs with many more factors and/or runs.
Abstract: Supplementary material to "Benefits and Fast Construction of Efficient Two-Level Foldover Designs"