scispace - formally typeset
Search or ask a question

Showing papers in "Technometrics in 2006"


Journal ArticleDOI
TL;DR: This introductory textbook continues to teach the philosophy of design and analysis of experiments as well as the “nuts and bolts” in a way that is accessible to both students and industrial practitioners and finds clear and well-motivated examples, excellent discussions of underlying statistical concepts and practical guidelines for experimentation.
Abstract: (2006). Statistics for Experimenters: Design, Innovation and Discovery. Technometrics: Vol. 48, No. 2, pp. 303-304.

834 citations


Journal ArticleDOI
TL;DR: Introduction to Linear Models and Statistical Inference is not meant to compete with these texts—rather, its audience is primarily those taking a statistics course within a mathematics department.
Abstract: of the simple linear regression model. Multiple linear regression for two variables is discussed in Chapter 8, and that for more than two variables is covered in Chapter 9. Chapter 10, on model building, is perhaps the book’s strongest chapter. The authors provide one of the most intuitive discussions on variable transformations that I have seen. Nice presentations of indicator variables, variable selection, and influence diagnostics are also provided. The final chapter covers a wide variety of topics, including analysis of variance models, logistic regression, and robust regression. The coverage of regression is not matrix-based, but optional linear algebra sections at the end of each chapter are useful for one wishing to use matrices. In general, the writing is clear and conceptual. A good number of exercises (about 20 on average) at the end of each chapter are provided. The exercises emphasize derivations and computations. It is difficult to name some comparison texts. Certainly, the text by Ott and Longnecker (2001) would be more suitable for a statistical methods course for an interdisciplinary audience. The regression texts of Montgomery, Peck, and Vining (2001) and Mendenhall and Sincich (2003) are more comprehensive in the regression treatment than the reviewed text. However, Introduction to Linear Models and Statistical Inference is not meant to compete with these texts—rather, its audience is primarily those taking a statistics course within a mathematics department.

802 citations


Journal ArticleDOI
TL;DR: The authors point out this shortcoming in the Preface and suggest that a future work covering these topics may be forthcoming, since there are always a substantial number of references given.
Abstract: Classification Without Interaction”), and 13 (“Two-Way Crossed Classification With Interaction”). Every chapter contains two or more numerical example with the exception of Chapters 14 (“Three-Way and Higher-Order Crossed Classifications”) and 17 (“General r-Way Nested Classification”), which only contain one example each. Examples appear in the estimation, confidence interval, and hypothesis testing sections. Distribution of estimators is only discussed for the models in Chapters 11 and 15 (“Two-Way Nested Classification”). Chapters 11, 13, 15, and 16 (“Three-Way Nested Classification”) contain information on design considerations involving unbalanced experiments. The appendixes contain basic theoretical and methodological results useful in the development of unbalanced random models as well as information on the capabilities of widely available software. Packages discussed are SAS, SPSS, BMDP, S–PLUS, GENSTAT, and BUGS. The book is well organized and focused. It contains extensive coverage on crossed and nested unbalanced models. Because of the number of topics, the depth of coverage is occasionally limited. This is only a minor issue, since there are always a substantial number of references given. The organization of the book and the presentation of the material make difficult subject matter easier to follow. The main drawback to the book is that it deals only with completely random univariate models. Given the volume of information in the book, however, this is understandable. The authors point out this shortcoming in the Preface and suggest that a future work covering these topics may be forthcoming. For the application-oriented practitioner, a small disadvantage is that a number of the estimation approaches discussed, while interesting, cannot be found in the more commonly used statistical software packages. Regardless, the book makes an excellent resource for anyone working with unbalanced random models.

713 citations


Journal ArticleDOI
TL;DR: I think that texts which have a large number of programs should write them in a comment verbose fashion rather than a comment terse fashion to help introduce students to programming who have never programmed.
Abstract: this would appeal to the applications oriented user. In a small class this can be used but I think additional material is required to explain concepts. One cannot expect a text that combines so much material to easily cover everything. There are few areas that were not emphasized that one might emphasize more in teaching. For example, there is too much emphasis on μ and not enough on ε. Analysis of assumptions is discussed in one section for simple linear regression but not for multiple regression or analysis of variance. I found some material to be well explained and the explanation of other material to be a bit brief. For example, the four types of residual plots are discussed, include one to assess independence. The explanation of the lack of independence is discussed (in one paragraph) without describing potential causes of dependence or giving examples of when it might occur. The lag plot code is given with little comment on what is actually done. I doubt that many students would understand what plot(tmp[-n], tmp[-n]) actually does without further comment. Additional description of the idea of a lag plot is in an earlier section of the text, but as part of a problem. Referring to the problem number would help as would additional details on what the code actually does. I think that texts which have a large number of programs should write them in a comment verbose fashion rather than a comment terse fashion. Introducing students to programming who have never programmed need as many reminders as possible to help understand the steps in the program. More comments would greatly improve the books purpose as an R aid.

570 citations


Journal ArticleDOI
TL;DR: This book is for social scientists, but the book had no difficulty imagining my own important oil exploration application within the framework of geographically weighted regression (GWR), and the first chapter nicely explains what is unique in this book.
Abstract: Being newly immersed in the upstream part of the oil business, I just recently had my first work session with data in ARC–GIS®. The project involves subsurface geographical modeling. Obviously I had considerable interest in discovering if the methodology in this book would enhance my modeling capabilities. The book is for social scientists, but I had no difficulty imagining my own important oil exploration application within the framework of geographically weighted regression (GWR). The first chapter nicely explains what is unique in this book. A standard regression model using geographically oriented data (the example is housing prices across all of England) is a global representation of a spatial relationship, an average that does not account for any local differences. In y = f (x), imagine a whole family of f ’s that are indexed by spatial location. That is the focus of this book. It is about one form of local spatial modeling, which is GWR. A more general resource for this topic is the earlier book by Fotheringham and Wegener (2000), which escaped the notice of Technometrics. Imagine a display of model parameters in a geographical information system (GIS) and you will understand the focus for this book. The authors note, “only where there is no significant spatial variation in a measured relationship can global models be accepted” (p. 10). The second chapter develops the basis of GWR. It analyzes the housing sales prices versus the 33 boroughs in London and begins by fitting a conventional multiple regression model versus housing characteristics. The GWR is motivated by differences in the regression models fitted separately by borough. The GWR is a spatial moving-window approach with all data distances weighted versus a specific data point using a weighting function and a bandwidth. A GIS can then be used to evaluate the spatial dependency of the parameters. As in kriging, local standard errors also are calculated. The chapter also provides all the math. Chapter 3 comprises several further considerations: parameters that are globally constant, outliers, and spatial heteroscedasticity. The first issue leads to hypothesis tests for model comparison using an Akaike information criterion (AIC). Local outliers are hard to detect. Studentized (deletion) residuals are recommended. The outliers can be plotted geographically. Robust regression is suggested as a less computationally intensive alternative. Hetereoscedasticity is harder to handle. Chapter 4 adds statistical inference to the capabilities of GWR: both a confidence interval approach using local likelihood and an AIC method. Four additional methodology chapters present various extensions of GWR. Chapter 5 considers the relationship between GWR and spatial autocorrelation, and includes a combined version of GWR and spatial regression using some complex hybrid models. Chapter 6 examines the relationship of scale and zoning problems in spatial analysis to GWR. Chapter 7 introduces the use of initial exploratory data analysis using geographically weighted statistics, which are based on the idea of using a kernel around each data point to create weights. Univariate statistics and correlation coefficients are defined for exploring local patterns in data. A final set of extensions in Chapter 8 discusses regression models with non-Gaussian errors, logistic regression, local principal components analysis, and local probability density estimation. The methods all use some kind of distributional model. The million-dollar question for me is always, “What about software?” The authors have a stand-alone program, GWR 3, available in CD–ROM by contacting the authors. Basically the drill with GWR 3 is to gather your data, use Excel to transform and reformat the data for GWR 3, use GWR 3 to produce a set of coefficients, and feed those coefficients to your favorite GIS to produce your maps. Forty pages of discussion about using the software are provided. A final epilogue chapter also discusses embedding GWR in R or Matlab and includes some references to people who have done that type of work. I probably would not have read this book if I had not happened to have had it in my briefcase on a visit with the exploration technologists. Though inclusive of appropriate mathematical development, this material is readily approachable because of the many illustrations and the pages and pages of GIS displays. The authors unabashedly present much of the material as their developmental work, so GWR offers a lot of opportunity for research and further development through novel applications and extensions.

545 citations


Journal ArticleDOI
TL;DR: The multilevel model is highly effective for predictions at both levels of the model, but could easily be misinterpreted for causal inference.
Abstract: Multilevel (hierarchical) modeling is a generalization of linear and generalized linear modeling in which regression coefficients are themselves given a model, whose parameters are also estimated from data. We illustrate the strengths and limitations of multilevel modeling through an example of the prediction of home radon levels in U.S. counties. The multilevel model is highly effective for predictions at both levels of the model, but could easily be misinterpreted for causal inference.

466 citations


Journal ArticleDOI
TL;DR: This reading book is your chosen book to accompany you when in your free time, in your lonely, this kind of book can help you to heal the lonely and get or add the inspirations to be more inoperative.
Abstract: The design of experiments for engineers and scientists that we provide for you will be ultimate to give preference. This reading book is your chosen book to accompany you when in your free time, in your lonely. This kind of book can help you to heal the lonely and get or add the inspirations to be more inoperative. Yeah, book as the widow of the world can be very inspiring manners. As here, this book is also created by an inspiring author that can make influences of you to do more.

338 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an approach to define explained variance at each level of the multilevel model, rather than attempting to create a single summary measure of fit, based on comparing variances in a single fitted model rather than with a null model.
Abstract: Explained variance (R2) is a familiar summary of the fit of a linear regression and has been generalized in various ways to multilevel (hierarchical) models. The multilevel models that we consider in this article are characterized by hierarchical data structures in which individuals are grouped into units (which themselves might be further grouped into larger units), and variables are measured on individuals and each grouping unit. The models are based on regression relationships at different levels, with the first level corresponding to the individual data and subsequent levels corresponding to between-group regressions of individual predictor effects on grouping unit variables. We present an approach to defining R2 at each level of the multilevel model, rather than attempting to create a single summary measure of fit. Our method is based on comparing variances in a single fitted model rather than with a null model. In simple regression, our measure generalizes the classical adjusted R2. We also discuss ...

294 citations


Journal ArticleDOI
TL;DR: A method is proposed for finding exact designs for experiments that uses a criterion allowing for uncertainty in the link function, the linear predictor, or the model parameters, together with a design search.
Abstract: Standard factorial designs sometimes may be inadequate for experiments that aim to estimate a generalized linear model, for example, for describing a binary response in terms of several variables. A method is proposed for finding exact designs for such experiments that uses a criterion allowing for uncertainty in the link function, the linear predictor, or the model parameters, together with a design search. Designs are assessed and compared by simulation of the distribution of efficiencies relative to locally optimal designs over a space of possible models. Exact designs are investigated for two applications, and their advantages over factorial and central composite designs are demonstrated.

176 citations


Journal ArticleDOI
TL;DR: A new, simple method for identifying active factors in computer screening experiments that only requires the generation of a new inert variable in the analysis and uses the posterior distribution of the inert factor as a reference distribution against which the importance of the experimental factors can be assessed.
Abstract: In many situations, simulation of complex phenomena requires a large number of inputs and is computationally expensive. Identifying the inputs that most impact the system so that these factors can be further investigated can be a critical step in the scientific endeavor. In computer experiments, it is common to use a Gaussian spatial process to model the output of the simulator. In this article we introduce a new, simple method for identifying active factors in computer screening experiments. The approach is Bayesian and only requires the generation of a new inert variable in the analysis; however, in the spirit of frequentist hypothesis testing, the posterior distribution of the inert factor is used as a reference distribution against which the importance of the experimental factors can be assessed. The methodology is demonstrated on an application in material science, a computer experiment from the literature, and simulated examples.

172 citations



Journal ArticleDOI
TL;DR: An attractive alternative to traditional charting methods when monitoring for a step change in the mean vector is an unknown-parameter likelihood ratio test for a change in mean of p-variate normal data, able to control the run behavior despite the lack of a large Phase I sample.
Abstract: Multivariate statistical process control (SPC) carries out ongoing checks to ensure that a process is in control. These checks on the process are traditionally done by T2, multivariate cusum, and multivariate exponentially weighted moving average control charts. These traditional SPC charts assume that the in-control true parameters are known exactly and use these assumed true values to set the control limits. The reality, however, is that true parameter values are seldom if ever known exactly; rather, they are commonly estimated from a Phase I sample. It is increasingly recognized that this Phase I study needs to involve large samples if the parameter estimates are to provide run behavior matching that of the known-parameter situation. But, apart from the general undesirability of large and thus expensive studies preliminary to actual charting, some industrial settings have a paucity of relevant data for estimating the process parameters. An attractive alternative to traditional charting methods when mon...

Journal ArticleDOI
TL;DR: This is the first book I have seen that focuses on network analysis, and it is a good introduction to the topic, and the mathematics is accessible at an undergraduate level, requiring little more than a good grasp of linear algebra and statistics.
Abstract: multivariate analysis. Network analysis is a very hot topic these days, with the interest in various types of link analysis and the purported applications to fraud detection and counterterrorism. The chapter on network analysis is one of the shortest, which is a shame since it is an area in which asymmetric data are ubiquitous. Of course, page constraints will always force these types of choices on authors. The book covers many different models, and one criticism is that there is little discussion of how one chooses the model appropriate to the data. Partly this is the nature of the beast; model selection is difficult and often guided by heuristics or personal preference. By covering the many different models that have been developed in the literature, the book provides an entry into the literature and an illustration of the various methodologies that have been developed. This is the first book I have seen that focuses on this important area, and it is a good introduction to the topic. The mathematics is accessible at an undergraduate level, requiring little more than a good grasp of linear algebra and statistics. The topic is extremely timely, and I predict that there will be more books on the topic, both general treatises like this one, and ones specializing in a particular aspect of asymmetric data analysis. The authors have done a service to the statistical community by leading the way with this interesting and useful book.

Journal ArticleDOI
TL;DR: This book examines the theory of statistical modeling with generalized linear models and provides a full description of the use of GLIM4 for model fitting, with detailed discussions of many examples.
Abstract: Chapter 1 provides a general introduction to GLIM4 and its features of which we can make particular use, especially the presentation graphics. Chapter 2 gives a detailed discussion of the population modeling process, with a full discussion of frequency, Bayesian, and likelihood inferential approaches to simple models. Chapter 3 discusses the normal models, including regression models and analysis of variance. It also introduces factorial designs and unbalanced cross-classification designs. Chapter 4 discusses the binomial models for the binary response data, along with construction of the contingency tables from binary data. Chapter 5 discusses multinomial and Poisson models as well as their relations and covers model fitting for cross-classified counts and multicategory responses in detail. Chapter 6 discusses survival models, including exponential, gamma, and Weibull distributions for survival data, and introduces Kaplan–Meier estimator and Cox proportional hazard models. Chapter 7 discusses finite mixtures of distributions with maximum likelihood and kernel density estimates. Chapter 8 discusses random-effects models with conjugate random effects, normal random effects, as well as arbitrary random effects. Chapter 9 discusses variance component models with shared random effects arising through variance component or repeated-measures structure. In addition to the theoretical analyses for model fitting and practical examples for illustration of the use of GLIM4, the authors also provide readers with datasets of the examples and GLIM4 programs for their analyses. The programs are written in a very concise form and should be easy to use and understand. The datasets and programs used in this book can be downloaded from the Oxford University Press website. Overall, this book examines the theory of statistical modeling with generalized linear models. It also provides a full description of the use of GLIM4 for model fitting, with detailed discussions of many examples. This book is an ideal textbook for graduate and advanced undergraduate courses in statistics, in conjunction with interactive use of GLIM. It may also be used as a self-teaching manual by researchers and students in applied statistics and other quantitative disciplines, such as biology, medicine, agriculture, industry, and social sciences. Readers may refer to the books by McCullagh and Nelder (1989) for detailed principles of the generalized linear models and Francis, Green, and Payne (1993) for a full GLIM4 manual.

Journal ArticleDOI
TL;DR: A simple heuristic is proposed for constructing robust experimental designs for multivariate generalized linear models and clustering, with its simplicity and minimal computation needs, is demonstrated to outperform more complex and sophisticated methods.
Abstract: A simple heuristic is proposed for constructing robust experimental designs for multivariate generalized linear models. The method is based on clustering a set of local optimal designs. A method for finding local D-optimal designs using available resources is also introduced. Clustering, with its simplicity and minimal computation needs, is demonstrated to outperform more complex and sophisticated methods.

Journal ArticleDOI
TL;DR: A Bayesian criterion based on estimation precision of a distribution quantile at a specified use condition to find optimum test plans based on censored data from a log-location-scale distribution is used.
Abstract: This article describes Bayesian methods for accelerated life test planning with one accelerating variable, when the acceleration model is linear in the parameters, based on censored data from a log-location-scale distribution. We use a Bayesian criterion based on estimation precision of a distribution quantile at a specified use condition to find optimum test plans. We also show how to compute optimized compromise plans that satisfy practical constraints. A large-sample normal approximation provides an easy-to-interpret yet useful simplification to this planning problem. We present a numerical example using the Weibull distribution with type I censoring to illustrate the methods and to examine the effects of the prior distribution, censoring, and sample size. The general equivalence theorem is used to verify that the numerically optimized test plans are globally optimum. The resulting optimum plans are also evaluated using simulation.

Journal ArticleDOI
TL;DR: Control chart performance is considered when the objective is to detect small or large changes in μ or increases in σ, and combinations of EWMA control charts that include a chart based on squared deviations from target give good overall performance whether or not these charts have the adaptive feature.
Abstract: An exponentially weighted moving average (EWMA) control chart for monitoring the process mean μ may be slow to detect large shifts in μ when the EWMA tuning parameter λ is small. An additional problem, sometimes called the inertia problem, is that the EWMA statistic may be in a disadvantageous position on the wrong side of the target when a shift in μ occurs, which may significantly delay detection of a shift in μ. Options for improving the performance of the EWMA chart include using the EWMA chart in combination with a Shewhart chart or in combination with an EWMA chart based on squared deviations from target. The EWMA chart based on squared deviations from target is designed to detect increases in the process standard deviation σ, but it is also very effective for detecting large shifts inμ. Capizzi and Masarotto recently proposed the option of an adaptive EWMA control chart in which λ is a function of the data. With the adaptive feature, the EWMA chart behaves like a standard EWMA chart when the curren...

Journal ArticleDOI
TL;DR: This book can help statisticians write simple as well as complex Fortran routines and presents links to commonly used commercial programs, such as Windows and Excel.
Abstract: The book presents the basics of modern Fortran in a remarkably clear style. The exposition is free of computer jargon. The authors define the terms that they use and present excellent examples of codes for statistical applications. I am very impressed with the exposition. Most technical people like to use a higher-level language than Fortran 95, such as S or Matlab, claiming that such languages are faster than Fortran. Yet I have seen my colleagues working for hours trying to debug their “easy” Matlab programs, while I can more quickly modify Fortran code that I either have written or have found in IMSL or open sources for Fortran legacy routines to do what they are trying to do. Several years ago, one of my friends in Stockholm was finishing a paper for a conference that used a bispectral analysis of some signals. I asked my friend why did not use my Fortran code. He said that it was easier for him to write his own in Matlab. I asked him to give me his data and his parameter settings so that I could do a bispectral analysis of his signal using my code. He did so, and when I showed him my results, he noticed that the estimates that I got were different from those that he got from his Matlab code. He told me that I had a bug in my code, and I replied that my code was carefully developed over a 10-year period, and that I had checked the output with that generated by an earlier bispectrum program written by a friend. After I returned to Austin, he sent me a note telling me that he found a comma out of place in his program and that my results were correct. It is important for data analysts to control the computations that they wish to run. All commercial statistical packages have some errors that may take a while to fix. I test statistical routines using simulations that I write in Fortran. Simulations are not only useful for testing the accuracy of a routine but they can show where a method breaks down. This book can help statisticians write simple as well as complex Fortran routines. It also presents links to commonly used commercial programs, such as Windows and Excel.

Journal ArticleDOI
TL;DR: This article proposes a new method, called sparse SIR, that combines the shrinkage idea of the lasso with SIR to produce both accurate and sparse solutions.
Abstract: Sliced inverse regression (SIR) is an innovative and effective method for dimension reduction and data visualization of high-dimensional problems. It replaces the original variables with low-dimensional linear combinations of predictors without any loss of regression information and without the need to prespecify a model or an error distribution. However, it suffers from the fact that each SIR component is a linear combination of all the original predictors; thus, it is often difficult to interpret the extracted components. By representing SIR as a regression-type optimization problem, we propose in this article a new method, called sparse SIR, that combines the shrinkage idea of the lasso with SIR to produce both accurate and sparse solutions. The efficacy of the proposed method is verified by simulation, and a real data example is given.

Journal ArticleDOI
TL;DR: Geographical Weighted Regression (GWR) as discussed by the authors is a well-known local spatial modeling technique for the analysis of data in the field of geospatial information systems (GIS).
Abstract: Being newly immersed in the upstream part of the oil business, I just recently had my first work session with data in ARC–GIS®. The project involves subsurface geographical modeling. Obviously I had considerable interest in discovering if the methodology in this book would enhance my modeling capabilities. The book is for social scientists, but I had no difficulty imagining my own important oil exploration application within the framework of geographically weighted regression (GWR). The first chapter nicely explains what is unique in this book. A standard regression model using geographically oriented data (the example is housing prices across all of England) is a global representation of a spatial relationship, an average that does not account for any local differences. In y = f (x), imagine a whole family of f ’s that are indexed by spatial location. That is the focus of this book. It is about one form of local spatial modeling, which is GWR. A more general resource for this topic is the earlier book by Fotheringham and Wegener (2000), which escaped the notice of Technometrics. Imagine a display of model parameters in a geographical information system (GIS) and you will understand the focus for this book. The authors note, “only where there is no significant spatial variation in a measured relationship can global models be accepted” (p. 10). The second chapter develops the basis of GWR. It analyzes the housing sales prices versus the 33 boroughs in London and begins by fitting a conventional multiple regression model versus housing characteristics. The GWR is motivated by differences in the regression models fitted separately by borough. The GWR is a spatial moving-window approach with all data distances weighted versus a specific data point using a weighting function and a bandwidth. A GIS can then be used to evaluate the spatial dependency of the parameters. As in kriging, local standard errors also are calculated. The chapter also provides all the math. Chapter 3 comprises several further considerations: parameters that are globally constant, outliers, and spatial heteroscedasticity. The first issue leads to hypothesis tests for model comparison using an Akaike information criterion (AIC). Local outliers are hard to detect. Studentized (deletion) residuals are recommended. The outliers can be plotted geographically. Robust regression is suggested as a less computationally intensive alternative. Hetereoscedasticity is harder to handle. Chapter 4 adds statistical inference to the capabilities of GWR: both a confidence interval approach using local likelihood and an AIC method. Four additional methodology chapters present various extensions of GWR. Chapter 5 considers the relationship between GWR and spatial autocorrelation, and includes a combined version of GWR and spatial regression using some complex hybrid models. Chapter 6 examines the relationship of scale and zoning problems in spatial analysis to GWR. Chapter 7 introduces the use of initial exploratory data analysis using geographically weighted statistics, which are based on the idea of using a kernel around each data point to create weights. Univariate statistics and correlation coefficients are defined for exploring local patterns in data. A final set of extensions in Chapter 8 discusses regression models with non-Gaussian errors, logistic regression, local principal components analysis, and local probability density estimation. The methods all use some kind of distributional model. The million-dollar question for me is always, “What about software?” The authors have a stand-alone program, GWR 3, available in CD–ROM by contacting the authors. Basically the drill with GWR 3 is to gather your data, use Excel to transform and reformat the data for GWR 3, use GWR 3 to produce a set of coefficients, and feed those coefficients to your favorite GIS to produce your maps. Forty pages of discussion about using the software are provided. A final epilogue chapter also discusses embedding GWR in R or Matlab and includes some references to people who have done that type of work. I probably would not have read this book if I had not happened to have had it in my briefcase on a visit with the exploration technologists. Though inclusive of appropriate mathematical development, this material is readily approachable because of the many illustrations and the pages and pages of GIS displays. The authors unabashedly present much of the material as their developmental work, so GWR offers a lot of opportunity for research and further development through novel applications and extensions.

Journal ArticleDOI
TL;DR: This book has done a good job in elaborating select statistical tools and concepts that are useful for Six Sigma practitioners and could be used as a stepping stone for practitioners to more formal statistical references if they are interested in further details in the inner mechanism of a statistical approach and/or if their tasks require nonstandard assumptions.
Abstract: Chapters 1–3: The book starts in Chapter 1 with discussions on data visualization, how to make intuitive inferences by using various types of plots and charts. This chapter also includes discussions on standard deviations versus standard error, experimental assumptions, data collection, and measurement and environmental conditions as sources of errors. Chapter 2 provides discussions on distribution analysis, how to visualize the distribution of the data through histograms. Several common distributions and their properties are discussed and illustrated. The chapter also discusses the concepts of population average and sample average, central limit theorem, and degrees of freedom. In Chapter 3, the concept of statistical inference is discussed. The chapter starts with an interesting example to illustrate the need for making inferences about parameters of the population of interest. The major focus of this chapter is the different hypothesis tests commonly used in practice. I feel that this chapter would have been more complete if it also discussed how to incorporate practical considerations into these tests. For example, see Limentani et al. (2005), regarding the establishment of practical thresholds for equivalence testing of means in contrast to the conventional t-test, which may detect differences that may not be practically relevant. Chapters 4–7: These chapters discuss the topics of statistical design and analysis of experiments. Chapter 4 starts with an introduction to design of experiments and the concept of analysis of variance. The ubiquitous factorial and fractional factorial designs are discussed in Chapter 5. For empirical model building, the regression analysis is discussed separately in Chapter 6. Chapter 7 discusses the use of response surface designs for modeling curvature. Two classic designs are discussed, Box–Behnken and Central Composite designs. Chapters 8–9: These chapters discuss the DMAIC (Define, Measure, Analyze, Improve, and Control) methodology. The author argues that the statistical tools described in the previous chapters are useful once the problem has been defined. Chapter 8 summarizes one of the essential elements of Six Sigma process, namely the DMAIC methodology, to assists users in defining the tasks at hand to achieve tangible and sustainable business improvement. A case study is illustrated in Chapter 9. One statistical tool discussed in this case study is the control charting, which is not previously discussed. Given that control charting is a widely used tool, I think that it would be beneficial for readers to have a separate chapter on this subject. In summary, I think this book has done a good job in elaborating select statistical tools and concepts that are useful for Six Sigma practitioners. The statistical concepts are illustrated and practical examples are given that will help readers with no formal background in statistics to understand the discussions. The material on statistical concepts is provided efficiently in less than 200 pages. The book obviously can not cover everything that practitioners may use. However, at least in my experience, those methods selected in the book are the ones frequently used in industrial settings. This book could be used as a stepping stone for practitioners to more formal statistical references if they are interested in further details in the inner mechanism of a statistical approach and/or if their tasks require nonstandard assumptions.

Journal ArticleDOI
TL;DR: It is shown that adaptive one-factor-at-a-time provides a large fraction of the potential improvements if experimental error is not large compared with the main effects and that this degree of improvement is more than that provided by resolution III fractional factorial designs if interactions are not small compared with main effects.
Abstract: This article concerns adaptive experimentation as a means for making improvements in design of engineering systems. A simple method for experimentation, called “adaptive one-factor-at-a-time,” is described. A mathematical model is proposed and theorems are proven concerning the expected value of the improvement provided and the probability that factor effects will be exploited. It is shown that adaptive one-factor-at-a-time provides a large fraction of the potential improvements if experimental error is not large compared with the main effects and that this degree of improvement is more than that provided by resolution III fractional factorial designs if interactions are not small compared with main effects. The theorems also establish that the method exploits two-factor interactions when they are large and exploits main effects if interactions are small. A case study on design of electric-powered aircraft supports these results.

Journal ArticleDOI
TL;DR: It is shown that the prior distribution can be induced from a functional prior on the underlying transfer function and a new class of design criteria is proposed that establishes their connections with the minimum aberration criterion.
Abstract: Specifying a prior distribution for the large number of parameters in the statistical model is a critical step in a Bayesian approach to the design and analysis of experiments. This article shows that the prior distribution can be induced from a functional prior on the underlying transfer function. The functional prior requires specification of only a few hyperparameters and thus can be easily implemented in practice. The usefulness of the approach is demonstrated through the analysis of some experiments. The article also proposes a new class of design criteria and establishes their connections with the minimum aberration criterion.

Journal ArticleDOI
TL;DR: The parallel GA (PGA) as mentioned in this paper is a GA that runs a number of GAs in parallel without allowing each GA to fully converge, and to consolidate the information from all the individual GA in the end.
Abstract: The need to identify a few important variables that affect a certain outcome of interest commonly arises in various industrial engineering applications. The genetic algorithm (GA) appears to be a natural tool for solving such a problem. In this article we first demonstrate that the GA is actually not a particularly effective variable selection tool, and then propose a very simple modification. Our idea is to run a number of GAs in parallel without allowing each GA to fully converge, and to consolidate the information from all the individual GAs in the end. We call the resulting algorithm the parallel genetic algorithm (PGA). Using a number of both simulated and real examples, we show that the PGA is an interesting as well as highly competitive and easy-to-use variable selection tool.

Journal ArticleDOI
TL;DR: A generalization of the univariate g-and-h distribution to the multivariate situation is considered with the aim of providing a flexible family of multivariate distributions that incorporate skewness and kurtosis, giving rise to a family of distributions in which the quantiles rather than the densities are the foci of attention.
Abstract: In this article we consider a generalization of the univariate g-and-h distribution to the multivariate situation with the aim of providing a flexible family of multivariate distributions that incorporate skewness and kurtosis. The approach is to modify the underlying random variables and their quantiles, directly giving rise to a family of distributions in which the quantiles rather than the densities are the foci of attention. Using the ideas of multivariate quantiles, we show how to fit multivariate data to our multivariate g-and-h distribution. This provides a more flexible family than the skew-normal and skew-elliptical distributions when quantiles are of principal interest. Unlike those families, the distribution of quadratic forms from the multivariate g-and-h distribution depends on the underlying skewness. We illustrate our methods on Australian athletes data, as well as on some wind speed data from the northwest Pacific.

Journal ArticleDOI
TL;DR: New data reduction methods based on the discrete wavelet transform to handle potentially large and complicated nonstationary data curves are presented and their competitiveness with the existing engineering data compression and statistical data denoising methods is demonstrated.
Abstract: This article presents new data reduction methods based on the discrete wavelet transform to handle potentially large and complicated nonstationary data curves. The methods minimize objective functions to balance the trade-off between data reduction and modeling accuracy. Theoretic investigations provide the optimality of the methods and the large-sample distribution of a closed-form estimate of the thresholding parameter. An upper bound of errors in signal approximation (or estimation) is derived. Based on evaluation studies with popular testing curves and real-life datasets, the proposed methods demonstrate their competitiveness with the existing engineering data compression and statistical data denoising methods for achieving the data reduction goals. Further experimentation with a tree-based classification procedure for identifying process fault classes illustrates the potential of the data reduction tools. Extension of the engineering scalogram to the reduced-size semiconductor fabrication data leads ...

Journal ArticleDOI
TL;DR: This article discusses point estimation of the parameters in a linear measurement error (errors in variables) model when the variances in the measurement errors on both axes vary between observations.
Abstract: This article discusses point estimation of the parameters in a linear measurement error (errors in variables) model when the variances in the measurement errors on both axes vary between observations. A compendium of existing and new regression methods is presented. Application of these methods to real data cases shows that the coefficients of the regression lines depend on the method selected. Guidelines for choosing a suitable regression method are provided.

Journal ArticleDOI
TL;DR: In this article, the authors present an analysis of clinical trials using SAS®: A Practical Guide to Clinical Trials Using SAS® (http://www.sas-a-guide.com).
Abstract: (2006). Analysis of Clinical Trials Using SAS®: A Practical Guide. Technometrics: Vol. 48, No. 2, pp. 311-311.

Journal ArticleDOI
TL;DR: This work applies four pooling designs to a public dataset and evaluates each method by determining how well the design criteria are met and whether the methods are able to find many diverse active compounds.
Abstract: Discovery of a new drug involves screening large chemical libraries to identify new and diverse active compounds. Screening efficiency can be improved by testing compounds in pools. We consider two criteria for designing pools: optimal coverage of the chemical space and minimal collision between compounds. We apply four pooling designs to a public dataset. We evaluate each method by determining how well the design criteria are met and whether the methods are able to find many diverse active compounds. One pooling design emerges as a winner, but all designed pools clearly outperform randomly created pools.

Journal ArticleDOI
TL;DR: This work proposes a general strategy for adapting variable selection tuning parameters that effectively estimates the tuning parameters so that the selection method avoids overfitting and underfitting.
Abstract: Many variable selection methods for linear regression depend critically on tuning parameters that control the performance of the method, for example, “entry” and “stay” significance levels in forward and backward selection. However, most methods do not adapt the tuning parameters to particular datasets. We propose a general strategy for adapting variable selection tuning parameters that effectively estimates the tuning parameters so that the selection method avoids overfitting and underfitting. The strategy is based on the principle that overfitting and underfitting can be directly observed in estimates of the error variance after adding controlled amounts of additional independent noise to the response variable, then running a variable selection method. It is related to the simulation technique SIMEX found in the measurement error literature. We focus on forward selection because of its simplicity and ability to handle large numbers of explanatory variables. Monte Carlo studies show that the new method c...