Showing papers in "Technometrics in 2013"

PDF

Open Access

Journal Article•DOI•

Quantile-Based Optimization of Noisy Computer Experiments With Tunable Precision

[...]

Victor Picheny, David Ginsbourger¹, Yann Richet², Gregory Caplin²•Institutions (2)

University of Bern¹, Institut de radioprotection et de sûreté nucléaire²

22 Feb 2013-Technometrics

TL;DR: In this article, a quantile-based criterion for the sequential design of experiments, in the fashion of the classical expected improvement criterion, which allows an elegant treatment of heterogeneous response precisions, is proposed.

...read moreread less

Abstract: This article addresses the issue of kriging-based optimization of stochastic simulators. Many of these simulators depend on factors that tune the level of precision of the response, the gain in accuracy being at a price of computational time. The contribution of this work is two-fold: first, we propose a quantile-based criterion for the sequential design of experiments, in the fashion of the classical expected improvement criterion, which allows an elegant treatment of heterogeneous response precisions. Second, we present a procedure for the allocation of the computational time given to each measurement, allowing a better distribution of the computational effort and increased efficiency. Finally, the optimization method is applied to an original application in nuclear criticality safety. This article has supplementary material available online. The proposed criterion is available in the R package DiceOptim.

...read moreread less

192 citations

Journal Article•DOI•

Multivariate Gaussian Process Emulators With Nonseparable Covariance Structures

[...]

Thomas E. Fricker¹, Jeremy E. Oakley², Nathan M. Urban³•Institutions (3)

University of Exeter¹, University of Sheffield², Los Alamos National Laboratory³

22 Feb 2013-Technometrics

TL;DR: Nonseparable covariance structures for Gaussian process emulators are developed, based on the linear model of coregionalization and convolution methods, finding that only emulators with nonseparable covariances structures have sufficient flexibility both to give good predictions and to represent joint uncertainty about the simulator outputs appropriately.

...read moreread less

Abstract: The Gaussian process regression model is a popular type of “emulator” used as a fast surrogate for computationally expensive simulators (deterministic computer models). For simulators with multivariate output, common practice is to specify a separable covariance structure for the Gaussian process. Though computationally convenient, this can be too restrictive, leading to poor performance of the emulator, particularly when the different simulator outputs represent different physical quantities. Also, treating the simulator outputs as independent can lead to inappropriate representations of joint uncertainty. We develop nonseparable covariance structures for Gaussian process emulators, based on the linear model of coregionalization and convolution methods. Using two case studies, we compare the performance of these covariance structures both with standard separable covariance structures and with emulators that assume independence between the outputs. In each case study, we find that only emulators with nons...

...read moreread less

165 citations

Journal Article•DOI•

Scan statistics for the online detection of locally anomalous subgraphs

[...]

Joshua Neil¹, Curtis Hash¹, Alexander William Brugh¹, Mike Fisk¹, Curtis B. Storlie¹ - Show less +1 more•Institutions (1)

Los Alamos National Laboratory¹

02 Aug 2013-Technometrics

TL;DR: This work introduces a computationally scalable method for detecting small anomalous areas in a large, time-dependent computer network, motivated by the challenge of identifying intruders operating inside enterprise-sized computer networks.

...read moreread less

Abstract: We introduce a computationally scalable method for detecting small anomalous areas in a large, time-dependent computer network, motivated by the challenge of identifying intruders operating inside enterprise-sized computer networks. Time-series of communications between computers are used to detect anomalies, and are modeled using Markov models that capture the bursty, often human-caused behavior that dominates a large subset of the time-series. Anomalies in these time-series are common, and the network intrusions we seek involve coincident anomalies over multiple connected pairs of computers. We show empirically that each time-series is nearly always independent of the time-series of other pairs of communicating computers. This independence is used to build models of normal activity in local areas from the models of the individual time-series, and these local areas are designed to detect the types of intrusions we are interested in. We define a locality statistic calculated by testing for deviations from...

...read moreread less

136 citations

Journal Article•DOI•

Sequential Design and Analysis of High-Accuracy and Low-Accuracy Computer Codes

[...]

Shifeng Xiong¹, Peter Z. G. Qian², C. F. Jeff Wu³•Institutions (3)

Chinese Academy of Sciences¹, University of Wisconsin-Madison², Georgia Institute of Technology³

22 Feb 2013-Technometrics

TL;DR: A framework for sequential design and analysis of a pair of high-accuracy and low-accurate computer codes is proposed and a nested relationship between the two scenario sets makes it easier to model and calibrate the difference between theTwo sources.

...read moreread less

Abstract: A growing trend in engineering and science is to use multiple computer codes with different levels of accuracy to study the same complex system. We propose a framework for sequential design and analysis of a pair of high-accuracy and low-accuracy computer codes. It first runs the two codes with a pair of nested Latin hypercube designs (NLHDs). Data from the initial experiment are used to fit a prediction model. If the accuracy of the fitted model is less than a prespecified threshold, the two codes are evaluated again with input values chosen in an elaborate fashion so that their expanded scenario sets still form a pair of NLHDs. The nested relationship between the two scenario sets makes it easier to model and calibrate the difference between the two sources. If necessary, this augmentation process can be repeated a number of times until the prediction model based on all available data has reasonable accuracy. The effectiveness of the proposed method is illustrated with several examples. Matlab codes are...

...read moreread less

128 citations

Journal Article•DOI•

Robust Sparse Principal Component Analysis

[...]

Christophe Croux¹, Peter Filzmoser², Heinrich Fritz²•Institutions (2)

Katholieke Universiteit Leuven¹, Vienna University of Technology²

22 May 2013-Technometrics

TL;DR: In this article, a method for principal component analysis that is sparse and robust at the same time is proposed, where the sparsity delivers principal components that have loadings on a small number of variables, making them easier to interpret.

...read moreread less

Abstract: A method for principal component analysis is proposed that is sparse and robust at the same time. The sparsity delivers principal components that have loadings on a small number of variables, making them easier to interpret. The robustness makes the analysis resistant to outlying observations. The principal components correspond to directions that maximize a robust measure of the variance, with an additional penalty term to take sparseness into account. We propose an algorithm to compute the sparse and robust principal components. The algorithm computes the components sequentially, and thus it can handle datasets with more variables than observations. The method is applied on several real data examples, and diagnostic plots for detecting outliers and for selecting the degree of sparsity are provided. A simulation experiment studies the effect on statistical efficiency by requiring both robustness and sparsity. Supplementary materials are available online on the journal web site.

...read moreread less

90 citations

Journal Article•DOI•

Prediction and Computer Model Calibration Using Outputs From Multifidelity Simulators

[...]

Joslin Goh¹, Derek Bingham¹, James Paul Holloway², Michael Grosskopf², Carolyn Kuranz², Erica M. Rutter² - Show less +2 more•Institutions (2)

Simon Fraser University¹, University of Michigan²

06 Sep 2013-Technometrics

TL;DR: This work combines field observations and model runs from deterministic multifidelity computer simulators to build a predictive model for the real process, which can be used to perform sensitivity analysis for the system, solve inverse problems, and make predictions.

...read moreread less

Abstract: Computer simulators are widely used to describe and explore physical processes. In some cases, several simulators are available, each with a different degree of fidelity, for this task. In this work, we combine field observations and model runs from deterministic multifidelity computer simulators to build a predictive model for the real process. The resulting model can be used to perform sensitivity analysis for the system, solve inverse problems, and make predictions. Our approach is Bayesian and is illustrated through a simple example, as well as a real application in predictive science at the Center for Radiative Shock Hydrodynamics at the University of Michigan. The Matlab code that is used for the analyses is available from the online supplementary materials.

...read moreread less

88 citations

Journal Article•DOI•

Field-Failure Predictions Based on Failure-Time Data With Dynamic Covariate Information

[...]

Yili Hong¹, William Q. Meeker²•Institutions (2)

Virginia Tech¹, Iowa State University²

23 Jan 2013-Technometrics

TL;DR: A general method for prediction using failure-time data with dynamic covariate information to make field-failure predictions is provided and a metric is defined to quantify the improvements in prediction accuracy obtained by using dynamic information.

...read moreread less

Abstract: Modern technological developments, such as smart chips, sensors, and wireless networks, have changed many data-collection processes. For example, there are more and more products being produced with automatic data-collecting devices that track how and under which environments the products are being used. Although there is a tremendous amount of dynamic data being collected, there has been little research on using such data to provide more accurate reliability information for products and systems. Motivated by a warranty-prediction application, this article focuses on using failure-time data with dynamic covariate information to make field-failure predictions. We provide a general method for prediction using failure-time data with dynamic covariate information. The dynamic covariate information is incorporated into the failure-time distribution through a cumulative exposure model. We develop a procedure to predict field-failure returns up to a specified future time. This procedure accounts for unit-to-unit...

...read moreread less

69 citations

Journal Article•DOI•

Phase I Distribution-Free Analysis of Multivariate Data

[...]

Giovanna Capizzi¹, Guido Masarotto¹•Institutions (1)

University of Padua¹

01 Jul 2013-Technometrics

TL;DR: In this paper, a new distribution-free Phase I control chart for retrospectively monitoring multivariate data is developed, which can be applied to individual or subgrouped data for detection of location shifts with an arbitrary pattern (e.g., isolated, transitory, sustained, progressive, etc.).

...read moreread less

Abstract: In this study, a new distribution-free Phase I control chart for retrospectively monitoring multivariate data is developed. The suggested approach, based on the multivariate signed ranks, can be applied to individual or subgrouped data for detection of location shifts with an arbitrary pattern (e.g., isolated, transitory, sustained, progressive, etc.). The procedure is complemented with a LASSO-based post-signal diagnostic method for identification of the shifted variables. A simulation study shows that the method compares favorably with parametric control charts when the process is normally distributed, and largely outperforms other multivariate nonparametric control charts when the process distribution is skewed or heavy-tailed. An R package can be found in the supplementary material.

...read moreread less

58 citations

Journal Article•DOI•

Gaussian Process Modeling of Derivative Curves

[...]

Tracy Holsclaw¹, Bruno Sansó¹, Herbert K. H. Lee¹, Katrin Heitmann², Salman Habib², David Higdon³, Ujjaini Alam³ - Show less +3 more•Institutions (3)

University of California, Santa Cruz¹, Argonne National Laboratory², Los Alamos National Laboratory³

22 Feb 2013-Technometrics

TL;DR: This article develops a GP-based inverse method that allows for the direct estimation of the derivative of a one-dimensional curve by viewing this procedure as an inverse problem.

...read moreread less

Abstract: Gaussian process (GP) models provide nonparametric methods to fit continuous curves observed with noise. In this article, we develop a GP-based inverse method that allows for the direct estimation of the derivative of a one-dimensional curve. In principle, a GP model may be fit to the data directly, with the derivatives obtained by means of differentiation of the correlation function. However, it is known that this approach can be inadequate due to loss of information when differentiating. We present a new method of obtaining the derivative process by viewing this procedure as an inverse problem. We use the properties of a GP to obtain a computationally efficient fit. We illustrate our method with simulated data as well as apply it to an important cosmological application. We include a discussion on model comparison techniques for assessing the quality of the fit of this alternative method. Supplementary materials for this article are available online.

...read moreread less

46 citations

Journal Article•DOI•

Bayesian Methods for Estimating System Reliability Using Heterogeneous Multilevel Information

[...]

Jiqiang Guo¹, Alyson G. Wilson²•Institutions (2)

Columbia University¹, North Carolina State University²

28 May 2013-Technometrics

TL;DR: A Bayesian approach for assessing the reliability of multicomponent systems using multilevel information is proposed and two models allow us to evaluate system, subsystem, and component reliability using multilesvel information.

...read moreread less

Abstract: We propose a Bayesian approach for assessing the reliability of multicomponent systems. Our models allow us to evaluate system, subsystem, and component reliability using multilevel information. Data are collected over time, and include binary, lifetime, and degradation data. We illustrate the methodology through two examples and discuss extensions. Supplementary materials are available online.

...read moreread less

42 citations

Journal Article•DOI•

Experimental Design for Engineering Dimensional Analysis

[...]

Mark Albrecht¹, Christopher J. Nachtsheim², Albrecht Thomas Andrew, R. Dennis Cook²•Institutions (2)

National Marrow Donor Program¹, University of Minnesota²

26 Aug 2013-Technometrics

TL;DR: A robust-DA design approach is developed that combines the best of the standard empirical DOE approach with the suggested design strategy to protect against the possibility that the analyst might omit a key explanatory variable, leading to an incorrect DA model.

...read moreread less

Abstract: Dimensional analysis (DA) is a fundamental method in the engineering and physical sciences for analytically reducing the number of experimental variables affecting a given phenomenon prior to experimentation. Two powerful advantages associated with the method relative to standard design of experiment (DOE) approaches are (a) a priori dimension reduction and (b) scalability of results. The latter advantage permits the experimenter to effectively extrapolate results to similar experimental systems of differing scale. Unfortunately, DA experiments are underused because very few statisticians are familiar with them. In this article, we first provide an overview of DA and give basic recommendations for designing DA experiments. Next, we consider various risks associated with the DA approach, the foremost among them is the possibility that the analyst might omit a key explanatory variable, leading to an incorrect DA model. When this happens, the DA model will fail and experimentation will be largely wasted. To ...

...read moreread less

Journal Article•DOI•

Methods for Planning Repeated Measures Degradation Studies

[...]

Brian P. Weaver¹, William Q. Meeker², Luis A. Escobar³, Joanne Wendelberger¹•Institutions (3)

Los Alamos National Laboratory¹, Iowa State University², Louisiana State University³

22 May 2013-Technometrics

TL;DR: In this article, the authors used the approximate large-sample variance covariance matrix of the parameters of a mixed effects linear regression model for repeated measures degradation data to assess the effect of sample size on estimation precision of both degradation and failure-time distribution quantiles.

...read moreread less

Abstract: Repeated measures degradation studies are used to assess product or component reliability when there are few or even no failures expected during a study. Such studies are often used to assess the shelf life of materials, components, and products. We show how to evaluate the properties of proposed test plans. Such evaluations are needed to identify statistically efficient tests. We consider test plans for applications where parameters related to the degradation distribution or the related lifetime distribution are to be estimated. We use the approximate large-sample variance–covariance matrix of the parameters of a mixed effects linear regression model for repeated measures degradation data to assess the effect of sample size (number of units and number of measurements within the units) on estimation precision of both degradation and failure-time distribution quantiles. We also illustrate the complementary use of simulation-based methods for evaluating and comparing test plans. These test-planning methods ...

...read moreread less

Journal Article•DOI•

Measuring Political Sentiment on Twitter: Factor Optimal Design for Multinomial Inverse Regression

[...]

Matt Taddy¹•Institutions (1)

University of Chicago¹

07 Mar 2013-Technometrics

TL;DR: In this article, the authors present a short case study in text analysis: the scoring of Twitter posts for positive, negative, or neutral sentiment directed toward particular US politicians, and outline a new technique for predicting both generic and subject-specific document sentiment through the use of variable interactions in multinomial inverse regression.

...read moreread less

Abstract: This article presents a short case study in text analysis: the scoring of Twitter posts for positive, negative, or neutral sentiment directed toward particular US politicians The study requires selection of a subsample of representative posts for sentiment scoring, a common and costly aspect of sentiment mining As a general contribution, our application is preceded by a proposed algorithm for maximizing sampling efficiency In particular, we outline and illustrate greedy selection of documents to build designs that are D-optimal in a topic-factor decomposition of the original text The strategy is applied to our motivating dataset of political posts, and we outline a new technique for predicting both generic and subject-specific document sentiment through the use of variable interactions in multinomial inverse regression Results are presented for analysis of 21 million Twitter posts collected around February 2012 Computer codes and data are provided as supplementary material online

...read moreread less

Journal Article•DOI•

Fast sequential computer model calibration of large nonstationary spatial-temporal processes

[...]

Matthew T. Pratola¹, Stephan R. Sain², Derek Bingham¹, Michael Wiltberger², E. Joshua Rigler³ - Show less +1 more•Institutions (3)

Simon Fraser University¹, National Center for Atmospheric Research², University of Colorado Denver³

22 May 2013-Technometrics

TL;DR: This work presents a computationally efficient approach to estimating the calibration parameters using a criterion that measures discrepancy between the computer model output and field data and is able to estimate calibration parameters for large and nonstationary data.

...read moreread less

Abstract: Computer models enable scientists to investigate real-world phenomena in a virtual laboratory using computer experiments. Statistical calibration enables scientists to incorporate field data in this analysis. However, the practical application is hardly straightforward for data structures such as spatial-temporal fields, which are usually large or not well represented by a stationary process model. We present a computationally efficient approach to estimating the calibration parameters using a criterion that measures discrepancy between the computer model output and field data. One can then construct empirical distributions for the calibration parameters and propose new computer model trials using sequential design. The approach is relatively simple to implement using existing algorithms and is able to estimate calibration parameters for large and nonstationary data. Supplementary R code is available online.

...read moreread less

Journal Article•DOI•

A Note On the Connection and Equivalence of Three Sparse Linear Discriminant Analysis Methods

[...]

Qing Mai¹, Hui Zou¹•Institutions (1)

University of Minnesota¹

22 May 2013-Technometrics

TL;DR: The connection between and equivalence of three sparse linear discriminant analysis methods is revealed and it is shown that, for any sequence of penalization parameters, the normalized solution of DSDA equal the normalized solutions of the other two methods at different penalization parameter levels.

...read moreread less

Abstract: In this article, we reveal the connection between and equivalence of three sparse linear discriminant analysis methods: the l1-Fisher’s discriminant analysis proposed by Wu et al. in 2008, the sparse optimal scoring proposed by Clemmensen et al. in 2011, and the direct sparse discriminant analysis (DSDA) proposed by Mai et al. in 2012. It is shown that, for any sequence of penalization parameters, the normalized solutions of DSDA equal the normalized solutions of the other two methods at different penalization parameters. A prostate cancer dataset is used to demonstrate the theory.

...read moreread less

Journal Article•DOI•

Computer Model Calibration Using the Ensemble Kalman Filter

[...]

David Higdon¹, James R. Gattiker¹, Earl Lawrence¹, Charles S. Jackson², Michael Tobis², Matthew T. Pratola³, Salman Habib⁴, Katrin Heitmann⁴, Steve Price¹ - Show less +5 more•Institutions (4)

Los Alamos National Laboratory¹, University of Texas at Austin², Ohio State University³, Argonne National Laboratory⁴

22 Nov 2013-Technometrics

TL;DR: It is found that the EnKF can be directly adapted to Bayesian computer model calibration, motivated by the mean and covariance relationship between the model inputs and outputs, producing an approximate posterior ensemble of the calibration parameters.

...read moreread less

Abstract: Computer model calibration is the process of determining input parameter settings to a computational model that are consistent with physical observations. This is often quite challenging due to the computational demands of running the model. In this article, we use the ensemble Kalman filter (EnKF) for computer model calibration. The EnKF has proven effective in quantifying uncertainty in data assimilation problems such as weather forecasting and ocean modeling. We find that the EnKF can be directly adapted to Bayesian computer model calibration. It is motivated by the mean and covariance relationship between the model inputs and outputs, producing an approximate posterior ensemble of the calibration parameters. While this approach may not fully capture effects due to nonlinearities in the computer model response, its computational efficiency makes it a viable choice for exploratory analyses, design problems, or problems with large numbers of model runs, inputs, and outputs.

...read moreread less

Journal Article•DOI•

A Case Study on Selecting a Best Allocation of New Data for Improving the Estimation Precision of System and Subsystem Reliability Using Pareto Fronts

[...]

Lu Lu¹, Jessica L. Chapman², Christine M. Anderson-Cook³•Institutions (3)

University of South Florida¹, St. Lawrence University², Los Alamos National Laboratory³

06 Sep 2013-Technometrics

TL;DR: This article demonstrates how the Pareto front multiple objective optimization approach can be used to select a best allocation of new data to collect from among many different possible data sources with the goal of maximally reducing the width of the credible intervals of system and two subsystem reliability estimates.

...read moreread less

Abstract: This article demonstrates how the Pareto front multiple objective optimization approach can be used to select a best allocation of new data to collect from among many different possible data sources with the goal of maximally reducing the width of the credible intervals of system and two subsystem reliability estimates. The method provides a streamlined decision-making process by identifying a set of noninferior or admissible allocations either from a given set of candidate choices or through a global optimization search and then using graphical methods for selecting the best allocation from the set of contending choices based on the specific goals of the study. The approach allows for an easy assessment of the tradeoffs between criteria and the robustness of different choices to different prioritization of experiment objectives. This is important for decision makers to make a defensible choice of a best allocation that matches their priorities as well as to quantify the anticipated advantages of their ch...

...read moreread less

Journal Article•DOI•

Matched-Pair Machine Learning

[...]

James Theiler¹•Institutions (1)

Los Alamos National Laboratory¹

06 Sep 2013-Technometrics

TL;DR: It is found that even conventional classifiers exhibit improved performance when the input data have a matched-pair structure, and an example of a “dipole” algorithm to directly exploit this structured input is developed.

...read moreread less

Abstract: Following an analogous distinction in statistical hypothesis testing and motivated by chemical plume detection in hyperspectral imagery, we investigate machine-learning algorithms where the training set is comprised of matched pairs. We find that even conventional classifiers exhibit improved performance when the input data have a matched-pair structure, and we develop an example of a “dipole” algorithm to directly exploit this structured input. In some scenarios, matched pairs can be generated from independent samples, with the effect of not only doubling the nominal size of the training set, but of providing the matched-pair structure that leads to better learning. The creation of matched pairs from a dataset of interest also permits a kind of transductive learning, which is found for the plume detection problem to exhibit improved performance. Supplementary materials for this article are available online.

...read moreread less

Journal Article•DOI•

On Nonparametric Image Registration

[...]

Peihua Qiu¹, Chen Xing¹•Institutions (1)

University of Minnesota¹

22 May 2013-Technometrics

TL;DR: In this paper, the mapping transformation is not well defined at certain places, including the place where the true image intensity surface is straight, and the ill-posed nature of the IR problem is not handled properly by such methods.

...read moreread less

Abstract: Image registration (IR) aims to map one image to another of a same scene. It is a fundamental task in many imaging applications. Most existing IR methods assume that the mapping transformation has a parametric form or satisfies certain regularity conditions (e.g., it is a smooth function with continuous first or higher order derivatives). They often estimate the mapping transformation globally by solving a global minimization/maximization problem. Such global smoothing methods usually cannot preserve singularities (e.g., discontinuities) and other features of the mapping transformation well. Further, the ill-posed nature of the IR problem, namely, the mapping transformation is not well defined at certain places, including the place where the true image intensity surface is straight, is not handled properly by such methods. In this article, we suggest solving the IR problem locally, by first studying the local properties of a mapping transformation. To this end, some concepts for describing such local prop...

...read moreread less

Journal Article•DOI•

Methods for characterizing and comparing populations of shock wave curves

[...]

Curtis B. Storlie¹, Michael L. Fugate¹, David Higdon¹, Aparna V. Huzurbazar¹, Elizabeth G. Francois¹, Douglas C. McHugh¹ - Show less +2 more•Institutions (1)

Los Alamos National Laboratory¹

22 Nov 2013-Technometrics

TL;DR: Two statistical approaches that test for differences in mean curves and provide simultaneous confidence bands for the difference are presented: a B-Spline basis approach and a Bayesian hierarchical Gaussian process approach.

...read moreread less

Abstract: At Los Alamos National Laboratory, engineers conduct experiments to evaluate how well detonators and high explosives work. The experimental unit, often called an “onionskin,” is a hemisphere consisting of a detonator and a booster pellet surrounded by high explosive material. When the detonator explodes, a streak camera mounted above the pole of the hemisphere records when the shock wave arrives at the surface. The output from the camera is a two-dimensional image that is transformed into a curve that shows the arrival time as a function of polar angle. The statistical challenge is to characterize the population of arrival time curves and to compare the baseline population of onionskins to a new population. The engineering goal is to manufacture a new population of onionskins that generate arrival time curves with the same shape as the baseline. We present two statistical approaches that test for differences in mean curves and provide simultaneous confidence bands for the difference: (i) a B-Spline basis ...

...read moreread less

Journal Article•DOI•

Significance of Angle in the Statistical Comparison of Forensic Tool Marks

[...]

Amy B. Lock¹, Max D. Morris¹•Institutions (1)

Iowa State University¹

22 Nov 2013-Technometrics

TL;DR: How the analysis can be enhanced to model the effect of tool angle and allow for angle estimation for a tool mark left at a crime scene and with sufficient development, such methods may lead to more defensible forensic analyses.

...read moreread less

Abstract: In forensics, fingerprints can be used to uniquely identify suspects in a crime. Similarly, a tool mark left at a crime scene can be used to identify the tool that was used. However, the current practice of identifying matching tool marks involves visual inspection of marks by forensic experts which can be a very subjective process. As a result, declared matches are often successfully challenged in court. Hence, law enforcement agencies are particularly interested in encouraging research in more objective approaches. Our analysis is based on comparisons of profilometry data, essentially depth contours of a tool mark surface taken along a linear path. Chumbley et al. pointed out that the angle of incidence between the tool and the marked surface can have a substantial impact on the tool mark and on the effectiveness of both manual and algorithmic matching procedures. To better address this problem, we describe how the analysis can be enhanced to model the effect of tool angle and allow for angle estimation...

...read moreread less

Journal Article•DOI•

Minimax Designs for Finite Design Regions

[...]

Matthias H. Y. Tan¹•Institutions (1)

Georgia Institute of Technology¹

28 May 2013-Technometrics

TL;DR: The relationship between minimax designs and the classical set covering location problem (SCLP) in operations research, which is a binary linear program, is established and it is proved that theset of minimax distances is the set of discontinuities of the function that maps the covering radius to the optimal objective function value.

...read moreread less

Abstract: The problem of choosing a design that is representative of a finite candidate set is an important problem in computer experiments. The minimax criterion measures the degree of representativeness because it is the maximum distance of a candidate point to the design. This article proposes a method for finding minimax designs for finite design regions. We establish the relationship between minimax designs and the classical set covering location problem (SCLP) in operations research, which is a binary linear program. In particular, we prove that the set of minimax distances is the set of discontinuities of the function that maps the covering radius to the optimal objective function value. We show that solving the SCLP at the points of discontinuities, which can be determined, gives minimax designs. These results are employed to design an efficient procedure for finding minimax designs for small-sized candidate sets. A heuristic procedure is proposed to generate near-minimax designs for large candidate sets. S...

...read moreread less

Journal Article•DOI•

Assessing a Binary Measurement System With Varying Misclassification Rates When a Gold Standard Is Available

[...]

Oana Danila¹, Stefan H. Steiner¹, R. Jock MacKay¹•Institutions (1)

University of Waterloo¹

26 Aug 2013-Technometrics

TL;DR: A random-effects model is proposed to allow for variation in the misclassification rates within the populations of conforming and nonconforming parts and to find the asymptotic standard deviation of the estimators with a selected plan size and assumed parameter values for both the standard and the conditional sampling plans.

...read moreread less

Abstract: In manufacturing, we often use a binary measurement system (BMS) for 100% inspection to protect customers from receiving nonconforming product. We can assess the performance of a BMS by estimating the consumer's and producer's risks, the two misclassification rates. Here, we consider assessment plans and their analysis when a gold standard system (GSS) is available for the assessment study but is too expensive for everyday use. We propose a random-effects model to allow for variation in the misclassification rates within the populations of conforming and nonconforming parts. One possibility, here denoted the standard plan, is to randomly sample n parts and measure them once with the GSS and r times with the inspection system. We provide a simple analysis and planning advice for standard plans. In practice, the misclassification rates are often low and the underlying process has high capability. This combination of conditions makes the assessment of the BMS challenging. We show that we need a very large nu...

...read moreread less

Journal Article•DOI•

A Parallel EM Algorithm for Model-Based Clustering Applied to the Exploration of Large Spatio-Temporal Data

[...]

Wei-Chen Chen¹, George Ostrouchov², David Pugmire², Prabhat³, Michael Wehner³ - Show less +1 more•Institutions (3)

University of Tennessee¹, Oak Ridge National Laboratory², Lawrence Berkeley National Laboratory³

09 Aug 2013-Technometrics

TL;DR: A parallel expectation–maximization (EM) algorithm for multivariate Gaussian mixture models is developed and used to perform model-based clustering of a large climate dataset and a new variant that is faster is presented.

...read moreread less

Abstract: We develop a parallel expectation–maximization (EM) algorithm for multivariate Gaussian mixture models and use it to perform model-based clustering of a large climate dataset. Three variants of the EM algorithm are reformulated in parallel and a new variant that is faster is presented. All are implemented using the single program, multiple data programming model, which is able to take advantage of the combined collective memory of large distributed computer architectures to process larger datasets. Displays of the estimated mixture model rather than the data allow us to explore multivariate relationships in a way that scales to arbitrary size data. We study the performance of our methodology on simulated data and apply our methodology to a high-resolution climate dataset produced by the community atmosphere model (CAM5). This article has supplementary material online.

...read moreread less

Journal Article•DOI•

Nonparametric Regression With Basis Selection From Multiple Libraries

[...]

Jeffrey C. Sklar¹, Junqing Wu², Wendy Meiring³, Yuedong Wang³•Institutions (3)

California Polytechnic State University¹, Microsoft², University of California, Santa Barbara³

22 May 2013-Technometrics

TL;DR: In this paper, a nonparametric regression procedure called BSML (Basis Selection from Multiple Libraries) is proposed for estimating a complex function by a linear combination of basis functions adaptively selected from multiple libraries.

...read moreread less

Abstract: New nonparametric regression procedures called BSML (Basis Selection from Multiple Libraries) are proposed in this article for estimating a complex function by a linear combination of basis functions adaptively selected from multiple libraries. Different classes of basis functions are chosen to model various features of the function, for example, truncated constants can model change points in the function, while polynomial spline representers may be used to model smooth components. The generalized cross-validation (GCV) and covariance inflation criteria are used to balance goodness of fit and model complexity where the model complexity is estimated adaptively by either the generalized degrees of freedom or covariance penalty. The cross-validation (CV) method is also considered for model selection. Spatially adaptive regression and model selection in multivariate nonparametric regression will be used to illustrate the flexibility and efficiency of the BSML procedures. Extensive simulations show that the BS...

...read moreread less

Journal Article•DOI•

Robust Analysis of High Throughput Screening (HTS) Assay Data

[...]

Changwon Lim¹, Pranab Kumar Sen², Shyamal D. Peddada³•Institutions (3)

Loyola University Chicago¹, University of North Carolina at Chapel Hill², Research Triangle Park³

01 May 2013-Technometrics

TL;DR: In this paper, the authors present a methodology that is robust to the variance structure of the response of a given compound in a quantitative high throughput screening (qHTS) assays.

...read moreread less

Abstract: Quantitative high throughput screening (qHTS) assays use cells or tissues to screen thousands of compounds in a short period of time Data generated from qHTS assays are then evaluated using nonlinear regression models, such as the Hill model, and decisions regarding toxicity are made using the estimates of the parameters of the model For any given compound, the variability in the observed response may either be constant across dose groups (homoscedasticity) or vary with dose (heteroscedasticity) Since thousands of compounds are simultaneously evaluated in a qHTS assay, it is not practically feasible for an investigator to perform residual analysis to determine the variance structure before performing statistical inferences on each compound Since it is well known that the variance structure plays an important role in the analysis of linear and nonlinear regression models, it is therefore important to have practically useful and easy to interpret methodology that is robust to the variance structure Fur

...read moreread less

Journal Article•DOI•

Comment: Dimensional Analysis in Statistical Engineering

[...]

Tim Davis

14 Jun 2013-Technometrics

TL;DR: Commentary based on the article, "Experimental Design for Engineering Dimensional Analysis" by Mark C. Albrecht, Christopher J. Nachtsheim and R. Dennis Cook published in the journal Technometrics, volume 55, issue 3.

...read moreread less

Abstract: Commentary based on the article, "Experimental Design for Engineering Dimensional Analysis" by Mark C Albrecht, Christopher J Nachtsheim, Thomas A Albrecht & R Dennis Cook published in the journal Technometrics, volume 55, issue 3

...read moreread less

Journal Article•DOI•

A Note on Nonnegative DoIt Approximation

[...]

V. Roshan Joseph¹•Institutions (1)

Georgia Institute of Technology¹

22 Feb 2013-Technometrics

TL;DR: A modification to the approximation is proposed that guarantees nonnegativity of the approximated density and is much simpler and faster to compute than the original DoIt approximation.

...read moreread less

Abstract: Recently, a new deterministic approximation method, known as design of experiments-based interpolation technique (DoIt), was proposed for Bayesian computation. A major weakness of this method is that the approximated posterior density can become negative. In this technical note, a modification to the approximation is proposed that guarantees nonnegativity of the approximated density. Surprisingly, the new approximation is much simpler and faster to compute than the original DoIt approximation. This article has supplementary materials online.

...read moreread less

Journal Article•

Supplementary Material: Robust Analysis of High Throughput Screening (HTS) Assay Data

[...]

Changwon Lim, Pranab Kumar Sen, Shyamal D. Peddada

01 May 2013-Technometrics

TL;DR: This article describes preliminary test estimation (PTE)-based methodology that is robust to the variance structure as well as any potential outliers and influential observations in quantitative high throughput screening assays.

...read moreread less

Abstract: This is supplementary material for the article Robust Analysis of High Throughput Screening (HTS) Assay Data.

...read moreread less

Journal Article•DOI•

A New Approach to Function-Based Hypothesis Testing in Location-Scale Families.

[...]

Peter Hall¹, F. Lombard², Cornelis J. Potgieter³•Institutions (3)

University of Melbourne¹, North-West University², Southern Methodist University³

22 May 2013-Technometrics

TL;DR: A new approach to testing the hypothesis that two-sampled distributions are simply location and scale changes of one another is introduced, applicable to both paired data and two-sample data, based on the empirical characteristic function.

...read moreread less

Abstract: Motivated by two applications in the mining industry, we introduce a new approach to testing the hypothesis that two-sampled distributions are simply location and scale changes of one another. The test, applicable to both paired data and two-sample data, is based on the empirical characteristic function. More conventional techniques founded on the empirical distribution function suffer from serious drawbacks when used to test for location-scale families. In the motivating applications, knowing that the distributions differ only in location and scale has significant operational and economic advantages, enabling protocols for one type of data to be applied directly to another. Supplementary material in the form of Matlab code is available online.

...read moreread less