Journal Articleâ€¢

# The Design and Analysis of Experiments

01 Jun 1953-Yale Journal of Biology and Medicine (Yale Journal of Biology and Medicine)-Vol. 25, Iss: 6, pp 550-550

TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.

Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

##### Citations

More filters

â€¢â€¢

TL;DR: In this article, a non-parametric method for multivariate analysis of variance, based on sums of squared distances, is proposed. But it is not suitable for most ecological multivariate data sets.

Abstract: Hypothesis-testing methods for multivariate data are needed to make rigorous probability statements about the effects of factors and their interactions in experiments. Analysis of variance is particularly powerful for the analysis of univariate data. The traditional multivariate analogues, however, are too stringent in their assumptions for most ecological multivariate data sets. Non-parametric methods, based on permutation tests, are preferable. This paper describes a new non-parametric method for multivariate analysis of variance, after McArdle and Anderson (in press). It is given here, with several applications in ecology, to provide an alternative and perhaps more intuitive formulation for ANOVA (based on sums of squared distances) to complement the description pro- vided by McArdle and Anderson (in press) for the analysis of any linear model. It is an improvement on previous non-parametric methods because it allows a direct additive partitioning of variation for complex models. It does this while maintaining the flexibility and lack of formal assumptions of other non-parametric methods. The test- statistic is a multivariate analogue to Fisher's F-ratio and is calculated directly from any symmetric distance or dissimilarity matrix. P-values are then obtained using permutations. Some examples of the method are given for tests involving several factors, including factorial and hierarchical (nested) designs and tests of interactions.

12,328Â citations

â€¢â€¢

TL;DR: This paper introduces the reader to a response surface methodology that is especially good at modeling the nonlinear, multimodal functions that often occur in engineering and shows how these approximating functions can be used to construct an efficient global optimization algorithm with a credible stopping rule.

Abstract: In many engineering optimization problems, the number of function evaluations is severely limited by time or cost. These problems pose a special challenge to the field of global optimization, since existing methods often require more function evaluations than can be comfortably afforded. One way to address this challenge is to fit response surfaces to data collected by evaluating the objective and constraint functions at a few points. These surfaces can then be used for visualization, tradeoff analysis, and optimization. In this paper, we introduce the reader to a response surface methodology that is especially good at modeling the nonlinear, multimodal functions that often occur in engineering. We then show how these approximating functions can be used to construct an efficient global optimization algorithm with a credible stopping rule. The key to using response surfaces for global optimization lies in balancing the need to exploit the approximating surface (by sampling where it is minimized) with the need to improve the approximation (by sampling where prediction error may be high). Striking this balance requires solving certain auxiliary problems which have previously been considered intractable, but we show how these computational obstacles can be overcome.

6,914Â citations

### Cites background from "The Design and Analysis of Experime..."

...Most recently, the focus has been on developing accurate approximations to expensive computer codes and then using these approximations for visualization and optimization [13, 18, 29, 36]....

[...]

â€¢â€¢

TL;DR: This paper presents a meta-modelling framework for estimating Output from Computer Experiments-Predicting Output from Training Data and Criteria Based Designs for computer Experiments.

Abstract: Many scientific phenomena are now investigated by complex computer models or codes A computer experiment is a number of runs of the code with various inputs A feature of many computer experiments is that the output is deterministic--rerunning the code with the same inputs gives identical observations Often, the codes are computationally expensive to run, and a common objective of an experiment is to fit a cheaper predictor of the output to the data Our approach is to model the deterministic output as the realization of a stochastic process, thereby providing a statistical basis for designing experiments (choosing the inputs) for efficient prediction With this model, estimates of uncertainty of predictions are also available Recent work in this area is reviewed, a number of applications are discussed, and we demonstrate our methodology with an example

6,583Â citations

â€¢

01 Jan 1977

TL;DR: In this article, the Chi-square test of homogeneity of proportions is used to compare the proportions of different groups of individuals in a population to a single variable, and the Wilcoxon Signed-Rank Test is used for the comparison of different proportions.

Abstract: PART I: INTRODUCTION 1. WHAT IS STATISTICS? Introduction / Why Study Statistics? / Some Current Applications of Statistics / What Do Statisticians Do? / Quality and Process Improvement / A Note to the Student / Summary / Supplementary Exercises PART II: COLLECTING THE DATA 2. USING SURVEYS AND SCIENTIFIC STUDIES TO COLLECT DATA Introduction / Surveys / Scientific Studies / Observational Studies / Data Management: Preparing Data for Summarization and Analysis / Summary PART III: SUMMARIZING DATA 3. DATA DESCRIPTION Introduction / Describing Data on a Single Variable: Graphical Methods / Describing Data on a Single Variable: Measures of Central Tendency / Describing Data on a Single Variable: Measures of Variability / The Box Plot / Summarizing Data from More Than One Variable / Calculators, Computers, and Software Systems / Summary / Key Formulas / Supplementary Exercises PART IV: TOOLS AND CONCEPTS 4. PROBABILITY AND PROBABILITY DISTRIBUTIONS How Probability Can Be Used in Making Inferences / Finding the Probability of an Event / Basic Event Relations and Probability Laws / Conditional Probability and Independence / Bayes's Formula / Variables: Discrete and Continuous / Probability Distributions for Discrete Random Variables / A Useful Discrete Random Variable: The Binomial / Probability Distributions for Continuous Random Variables / A Useful Continuous Random Variable: The Normal Distribution / Random Sampling / Sampling Distributions / Normal Approximation to the Binomial / Summary / Key Formulas / Supplementary Exercises PART V: ANALYZING DATA: CENTRAL VALUES, VARIANCES, AND PROPORTIONS 5. INFERENCES ON A POPULATION CENTRAL VALUE Introduction and Case Study / Estimation of / Choosing the Sample Size for Estimating / A Statistical Test for / Choosing the Sample Size for Testing / The Level of Significance of a Statistical Test / Inferences about for Normal Population, s Unknown / Inferences about the Population Median / Summary / Key Formulas / Supplementary Exercises 6. COMPARING TWO POPULATION CENTRAL VALUES Introduction and Case Study / Inferences about 1 - 2: Independent Samples / A Nonparametric Alternative: The Wilcoxon Rank Sum Test / Inferences about 1 - 2: Paired Data / A Nonparametric Alternative: The Wilcoxon Signed-Rank Test / Choosing Sample Sizes for Inferences about 1 - 2 / Summary / Key Formulas / Supplementary Exercises 7. INFERENCES ABOUT POPULATION VARIANCES Introduction and Case Study / Estimation and Tests for a Population Variance / Estimation and Tests for Comparing Two Population Variances / Tests for Comparing k > 2 Population Variances / Summary / Key Formulas / Supplementary Exercises 8. INFERENCES ABOUT POPULATION CENTRAL VALUES Introduction and Case Study / A Statistical Test About More Than Two Population Variances / Checking on the Assumptions / Alternative When Assumptions are Violated: Transformations / A Nonparametric Alternative: The Kruskal-Wallis Test / Summary / Key Formulas / Supplementary Exercises 9. MULTIPLE COMPARISONS Introduction and Case Study / Planned Comparisons Among Treatments: Linear Contrasts / Which Error Rate Is Controlled / Multiple Comparisons with the Best Treatment / Comparison of Treatments to a Control / Pairwise Comparison on All Treatments / Summary / Key Formulas / Supplementary Exercises 10. CATEGORICAL DATA Introduction and Case Study / Inferences about a Population Proportion p / Comparing Two Population Proportions p1 - p2 / Probability Distributions for Discrete Random Variables / The Multinomial Experiment and Chi-Square Goodness-of-Fit Test / The Chi-Square Test of Homogeneity of Proportions / The Chi-Square Test of Independence of Two Nominal Level Variables / Fisher's Exact Test, a Permutation Test / Measures of Association / Combining Sets of Contingency Tables / Summary / Key Formulas / Supplementary Exercises PART VI: ANALYZING DATA: REGRESSION METHODS, MODEL BUILDING 11. SIMPLE LINEAR REGRESSION AND CORRELATION Linear Regression and the Method of Least Squares / Transformations to Linearize Data / Correlation / A Look Ahead: Multiple Regression / Summary of Key Formulas. Supplementary Exercises. 12. INFERENCES RELATED TO LINEAR REGRESSION AND CORRELATION Introduction and Case Study / Diagnostics for Detecting Violations of Model Conditions / Inferences about the Intercept and Slope of the Regression Line / Inferences about the Population Mean for a Specified Value of the Explanatory Variable / Predictions and Prediction Intervals / Examining Lack of Fit in the Model / The Inverse Regression Problem (Calibration): Predicting Values for x for a Specified Value of y / Summary / Key Formulas / Supplementary Exercises 13. MULTIPLE REGRESSION AND THE GENERAL LINEAR MODEL Introduction and Case Study / The General Linear Model / Least Squares Estimates of Parameters in the General Linear Model / Inferences about the Parameters in the General Linear Model / Inferences about the Population Mean and Predictions from the General Linear Model / Comparing the Slope of Several Regression Lines / Logistic Regression / Matrix Formulation of the General Linear Model / Summary / Key Formulas / Supplementary Exercises 14. BUILDING REGRESSION MODELS WITH DIAGNOSTICS Introduction and Case Study / Selecting the Variables (Step 1) / Model Formulation (Step 2) / Checking Model Conditions (Step 3) / Summary / Key Formulas / Supplementary Exercises PART VII: ANALYZING DATA: DESIGN OF EXPERIMENTS AND ANOVA 15. DESIGN CONCEPTS FOR EXPERIMENTS AND STUDIES Experiments, Treatments, Experimental Units, Blocking, Randomization, and Measurement Units / How Many Replications? / Studies for Comparing Means versus Studies for Comparing Variances / Summary / Key Formulas / Supplementary Exercises 16. ANALYSIS OF VARIANCE FOR STANDARD DESIGNS Introduction and Case Study / Completely Randomized Design with Single Factor / Randomized Block Design / Latin Square Design / Factorial Experiments in a Completely Randomized Design / The Estimation of Treatment Differences and Planned Comparisons in the Treatment Means / Checking Model Conditions / Alternative Analyses: Transformation and Friedman's Rank-Based Test / Summary / Key Formulas / Supplementary Exercises 17. ANALYSIS OF COVARIANCE Introduction and Case Study / A Completely Randomized Design with One Covariate / The Extrapolation Problem / Multiple Covariates and More Complicated Designs / Summary / Key Formulas / Supplementary Exercises 18. ANALYSIS OF VARIANCE FOR SOME UNBALANCED DESIGNS Introduction and Case Study / A Randomized Block Design with One or More Missing Observations / A Latin Square Design with Missing Data / Incomplete Block Designs / Summary / Key Formulas / Supplementary Exercises 19. ANALYSIS OF VARIANCE FOR SOME FIXED EFFECTS, RANDOM EFFECTS, AND MIXED EFFECTS MODELS Introduction and Case Study / A One-Factor Experiment with Random Treatment Effects / Extensions of Random-Effects Models / A Mixed Model: Experiments with Both Fixed and Random Treatment Effects / Models with Nested Factors / Rules for Obtaining Expected Mean Squares / Summary / Key Formulas / Supplementary Exercises 20. SPLIT-PLOT DESIGNS AND EXPERIMENTS WITH REPEATED MEASURES Introduction and Case Study / Split-Plot Designs / Single-Factor Experiments with Repeated Measures / Two-Factor Experiments with Repeated Measures on One of the Factors / Crossover Design / Summary / Key Formulas / Supplementary Exercises PART VIII: COMMUNICATING AND DOCUMENTING THE RESULTS OF A STUDY OR EXPERIMENT 21. COMMUNICATING AND DOCUMENTING THE RESULTS OF A STUDY OR EXPERIMENT Introduction / The Difficulty of Good Communication / Communication Hurdles: Graphical Distortions / Communication Hurdles: Biased Samples / Communication Hurdles: Sample Size / The Statistical Report / Documentation and Storage of Results / Summary / Supplementary Exercises

5,674Â citations

â€¢â€¢

TL;DR: In this article, the authors use a particular model for causal inference (Holland and Rubin 1983; Rubin 1974) to critique the discussions of other writers on causation and causal inference.

Abstract: Problems involving causal inference have dogged at the heels of statistics since its earliest days. Correlation does not imply causation, and yet causal conclusions drawn from a carefully designed experiment are often valid. What can a statistical model say about causation? This question is addressed by using a particular model for causal inference (Holland and Rubin 1983; Rubin 1974) to critique the discussions of other writers on causation and causal inference. These include selected philosophers, medical researchers, statisticians, econometricians, and proponents of causal modeling.

4,845Â citations

##### References

More filters

â€¢

01 Jan 1963

TL;DR: A survey drawn from social science research which deals with correlational, ex post facto, true experimental, and quasi-experimental designs and makes methodological recommendations is presented in this article.

Abstract: A survey drawn from social-science research which deals with correlational, ex post facto, true experimental, and quasi-experimental designs and makes methodological recommendations. Bibliogs.

10,916Â citations

â€¢â€¢

TL;DR: This paper introduces the reader to a response surface methodology that is especially good at modeling the nonlinear, multimodal functions that often occur in engineering and shows how these approximating functions can be used to construct an efficient global optimization algorithm with a credible stopping rule.

Abstract: In many engineering optimization problems, the number of function evaluations is severely limited by time or cost. These problems pose a special challenge to the field of global optimization, since existing methods often require more function evaluations than can be comfortably afforded. One way to address this challenge is to fit response surfaces to data collected by evaluating the objective and constraint functions at a few points. These surfaces can then be used for visualization, tradeoff analysis, and optimization. In this paper, we introduce the reader to a response surface methodology that is especially good at modeling the nonlinear, multimodal functions that often occur in engineering. We then show how these approximating functions can be used to construct an efficient global optimization algorithm with a credible stopping rule. The key to using response surfaces for global optimization lies in balancing the need to exploit the approximating surface (by sampling where it is minimized) with the need to improve the approximation (by sampling where prediction error may be high). Striking this balance requires solving certain auxiliary problems which have previously been considered intractable, but we show how these computational obstacles can be overcome.

6,914Â citations

â€¢â€¢

TL;DR: This paper presents a meta-modelling framework for estimating Output from Computer Experiments-Predicting Output from Training Data and Criteria Based Designs for computer Experiments.

Abstract: Many scientific phenomena are now investigated by complex computer models or codes A computer experiment is a number of runs of the code with various inputs A feature of many computer experiments is that the output is deterministic--rerunning the code with the same inputs gives identical observations Often, the codes are computationally expensive to run, and a common objective of an experiment is to fit a cheaper predictor of the output to the data Our approach is to model the deterministic output as the realization of a stochastic process, thereby providing a statistical basis for designing experiments (choosing the inputs) for efficient prediction With this model, estimates of uncertainty of predictions are also available Recent work in this area is reviewed, a number of applications are discussed, and we demonstrate our methodology with an example

6,583Â citations

â€¢

01 Jan 1977

TL;DR: In this article, the Chi-square test of homogeneity of proportions is used to compare the proportions of different groups of individuals in a population to a single variable, and the Wilcoxon Signed-Rank Test is used for the comparison of different proportions.

Abstract: PART I: INTRODUCTION 1. WHAT IS STATISTICS? Introduction / Why Study Statistics? / Some Current Applications of Statistics / What Do Statisticians Do? / Quality and Process Improvement / A Note to the Student / Summary / Supplementary Exercises PART II: COLLECTING THE DATA 2. USING SURVEYS AND SCIENTIFIC STUDIES TO COLLECT DATA Introduction / Surveys / Scientific Studies / Observational Studies / Data Management: Preparing Data for Summarization and Analysis / Summary PART III: SUMMARIZING DATA 3. DATA DESCRIPTION Introduction / Describing Data on a Single Variable: Graphical Methods / Describing Data on a Single Variable: Measures of Central Tendency / Describing Data on a Single Variable: Measures of Variability / The Box Plot / Summarizing Data from More Than One Variable / Calculators, Computers, and Software Systems / Summary / Key Formulas / Supplementary Exercises PART IV: TOOLS AND CONCEPTS 4. PROBABILITY AND PROBABILITY DISTRIBUTIONS How Probability Can Be Used in Making Inferences / Finding the Probability of an Event / Basic Event Relations and Probability Laws / Conditional Probability and Independence / Bayes's Formula / Variables: Discrete and Continuous / Probability Distributions for Discrete Random Variables / A Useful Discrete Random Variable: The Binomial / Probability Distributions for Continuous Random Variables / A Useful Continuous Random Variable: The Normal Distribution / Random Sampling / Sampling Distributions / Normal Approximation to the Binomial / Summary / Key Formulas / Supplementary Exercises PART V: ANALYZING DATA: CENTRAL VALUES, VARIANCES, AND PROPORTIONS 5. INFERENCES ON A POPULATION CENTRAL VALUE Introduction and Case Study / Estimation of / Choosing the Sample Size for Estimating / A Statistical Test for / Choosing the Sample Size for Testing / The Level of Significance of a Statistical Test / Inferences about for Normal Population, s Unknown / Inferences about the Population Median / Summary / Key Formulas / Supplementary Exercises 6. COMPARING TWO POPULATION CENTRAL VALUES Introduction and Case Study / Inferences about 1 - 2: Independent Samples / A Nonparametric Alternative: The Wilcoxon Rank Sum Test / Inferences about 1 - 2: Paired Data / A Nonparametric Alternative: The Wilcoxon Signed-Rank Test / Choosing Sample Sizes for Inferences about 1 - 2 / Summary / Key Formulas / Supplementary Exercises 7. INFERENCES ABOUT POPULATION VARIANCES Introduction and Case Study / Estimation and Tests for a Population Variance / Estimation and Tests for Comparing Two Population Variances / Tests for Comparing k > 2 Population Variances / Summary / Key Formulas / Supplementary Exercises 8. INFERENCES ABOUT POPULATION CENTRAL VALUES Introduction and Case Study / A Statistical Test About More Than Two Population Variances / Checking on the Assumptions / Alternative When Assumptions are Violated: Transformations / A Nonparametric Alternative: The Kruskal-Wallis Test / Summary / Key Formulas / Supplementary Exercises 9. MULTIPLE COMPARISONS Introduction and Case Study / Planned Comparisons Among Treatments: Linear Contrasts / Which Error Rate Is Controlled / Multiple Comparisons with the Best Treatment / Comparison of Treatments to a Control / Pairwise Comparison on All Treatments / Summary / Key Formulas / Supplementary Exercises 10. CATEGORICAL DATA Introduction and Case Study / Inferences about a Population Proportion p / Comparing Two Population Proportions p1 - p2 / Probability Distributions for Discrete Random Variables / The Multinomial Experiment and Chi-Square Goodness-of-Fit Test / The Chi-Square Test of Homogeneity of Proportions / The Chi-Square Test of Independence of Two Nominal Level Variables / Fisher's Exact Test, a Permutation Test / Measures of Association / Combining Sets of Contingency Tables / Summary / Key Formulas / Supplementary Exercises PART VI: ANALYZING DATA: REGRESSION METHODS, MODEL BUILDING 11. SIMPLE LINEAR REGRESSION AND CORRELATION Linear Regression and the Method of Least Squares / Transformations to Linearize Data / Correlation / A Look Ahead: Multiple Regression / Summary of Key Formulas. Supplementary Exercises. 12. INFERENCES RELATED TO LINEAR REGRESSION AND CORRELATION Introduction and Case Study / Diagnostics for Detecting Violations of Model Conditions / Inferences about the Intercept and Slope of the Regression Line / Inferences about the Population Mean for a Specified Value of the Explanatory Variable / Predictions and Prediction Intervals / Examining Lack of Fit in the Model / The Inverse Regression Problem (Calibration): Predicting Values for x for a Specified Value of y / Summary / Key Formulas / Supplementary Exercises 13. MULTIPLE REGRESSION AND THE GENERAL LINEAR MODEL Introduction and Case Study / The General Linear Model / Least Squares Estimates of Parameters in the General Linear Model / Inferences about the Parameters in the General Linear Model / Inferences about the Population Mean and Predictions from the General Linear Model / Comparing the Slope of Several Regression Lines / Logistic Regression / Matrix Formulation of the General Linear Model / Summary / Key Formulas / Supplementary Exercises 14. BUILDING REGRESSION MODELS WITH DIAGNOSTICS Introduction and Case Study / Selecting the Variables (Step 1) / Model Formulation (Step 2) / Checking Model Conditions (Step 3) / Summary / Key Formulas / Supplementary Exercises PART VII: ANALYZING DATA: DESIGN OF EXPERIMENTS AND ANOVA 15. DESIGN CONCEPTS FOR EXPERIMENTS AND STUDIES Experiments, Treatments, Experimental Units, Blocking, Randomization, and Measurement Units / How Many Replications? / Studies for Comparing Means versus Studies for Comparing Variances / Summary / Key Formulas / Supplementary Exercises 16. ANALYSIS OF VARIANCE FOR STANDARD DESIGNS Introduction and Case Study / Completely Randomized Design with Single Factor / Randomized Block Design / Latin Square Design / Factorial Experiments in a Completely Randomized Design / The Estimation of Treatment Differences and Planned Comparisons in the Treatment Means / Checking Model Conditions / Alternative Analyses: Transformation and Friedman's Rank-Based Test / Summary / Key Formulas / Supplementary Exercises 17. ANALYSIS OF COVARIANCE Introduction and Case Study / A Completely Randomized Design with One Covariate / The Extrapolation Problem / Multiple Covariates and More Complicated Designs / Summary / Key Formulas / Supplementary Exercises 18. ANALYSIS OF VARIANCE FOR SOME UNBALANCED DESIGNS Introduction and Case Study / A Randomized Block Design with One or More Missing Observations / A Latin Square Design with Missing Data / Incomplete Block Designs / Summary / Key Formulas / Supplementary Exercises 19. ANALYSIS OF VARIANCE FOR SOME FIXED EFFECTS, RANDOM EFFECTS, AND MIXED EFFECTS MODELS Introduction and Case Study / A One-Factor Experiment with Random Treatment Effects / Extensions of Random-Effects Models / A Mixed Model: Experiments with Both Fixed and Random Treatment Effects / Models with Nested Factors / Rules for Obtaining Expected Mean Squares / Summary / Key Formulas / Supplementary Exercises 20. SPLIT-PLOT DESIGNS AND EXPERIMENTS WITH REPEATED MEASURES Introduction and Case Study / Split-Plot Designs / Single-Factor Experiments with Repeated Measures / Two-Factor Experiments with Repeated Measures on One of the Factors / Crossover Design / Summary / Key Formulas / Supplementary Exercises PART VIII: COMMUNICATING AND DOCUMENTING THE RESULTS OF A STUDY OR EXPERIMENT 21. COMMUNICATING AND DOCUMENTING THE RESULTS OF A STUDY OR EXPERIMENT Introduction / The Difficulty of Good Communication / Communication Hurdles: Graphical Distortions / Communication Hurdles: Biased Samples / Communication Hurdles: Sample Size / The Statistical Report / Documentation and Storage of Results / Summary / Supplementary Exercises

5,674Â citations

â€¢â€¢

TL;DR: In this article, the authors use a particular model for causal inference (Holland and Rubin 1983; Rubin 1974) to critique the discussions of other writers on causation and causal inference.

Abstract: Problems involving causal inference have dogged at the heels of statistics since its earliest days. Correlation does not imply causation, and yet causal conclusions drawn from a carefully designed experiment are often valid. What can a statistical model say about causation? This question is addressed by using a particular model for causal inference (Holland and Rubin 1983; Rubin 1974) to critique the discussions of other writers on causation and causal inference. These include selected philosophers, medical researchers, statisticians, econometricians, and proponents of causal modeling.

4,845Â citations