scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 2003"


Book
13 Mar 2003
TL;DR: A comparison of Binary Tests and Regression Analysis and the Receiver Operating Characteristic Curve shows that Binary Tests are more accurate than Ordinal Tests when the Receiver operating characteristic curve is considered.
Abstract: 1. Introduction 2. Measures of Accuracy for Binary Tests 3. Comparing Binary Tests and Regression Analysis 4. The Receiver Operating Characteristic Curve 5. Estimating the ROC Curve 6. Covariate Effects on Continuous and Ordinal Tests 7. Incomplete Data and Imperfect Reference Tests 8. Study Design and Hypothesis Testing 9. More Topics and Conclusions References/Bibliography Index

2,289 citations


Journal ArticleDOI
Matthew D. Moran1
01 Feb 2003-Oikos
TL;DR: In this article, the authors argue that the sequential Bonferroni correction has several flaws ranging from mathematical to logical to practical that argue for rejecting this method in ecological studies, and more specifically, they argue for rejection of the sequentialBonfroni as a solution to this problem.
Abstract: Interpretation of results that include multiple statistical tests has been an issue of great concern for some time in the ecological literature. The basic problem is that when multiple tests are undertaken, each at the same significance level ( ), the probability of achieving at least one significant result is greater than that significance level (Zaykin et al. 2002). Therefore, there is an increased probability of rejecting a null hypothesis when it would be inappropriate to do so. The typical solution to this problem has been lowering the values for the table (i.e. establishing a table-wide significance level) and therefore reducing the probability of a spurious result. Specifically, the most common procedure has been the application of the sequential Bonferroni adjustment (Holm 1979, Miller 1981, Rice 1989). Arguments in this essay address the problems of adjusting probability values for tables of multiple statistical tests, and more specifically argue for rejection of the sequential Bonferroni as a solution to this problem. Since the influential publication of Rice (1989), the sequential Bonferroni correction has become the primary method of addressing the problem of multiple statistical tests in ecological research. The sequential Bonferroni adjusts the table-wide p-value to keep it constant at 0.05, and subsequently reduces the probability of a spurious result. Although other methods exist for addressing tables of multiple statistical tests, the sequential Bonferroni has become the most commonly utilized process. However, this method has several flaws ranging from mathematical to logical to practical that argue for rejecting this method in ecological studies.

1,475 citations


Journal ArticleDOI
TL;DR: This work examines the commonly used tests on the parameters in the random effects meta-regression with one covariate and proposes some new test statistics based on an improved estimator of the variance of the parameter estimates based on some theoretical considerations.
Abstract: The explanation of heterogeneity plays an important role in meta-analysis. The random effects meta-regression model allows the inclusion of trial-specific covariates which may explain a part of the heterogeneity. We examine the commonly used tests on the parameters in the random effects meta-regression with one covariate and propose some new test statistics based on an improved estimator of the variance of the parameter estimates. The approximation of the distribution of the newly proposed tests is based on some theoretical considerations. Moreover, the newly proposed tests can easily be extended to the case of more than one covariate. In a simulation study, we compare the tests with regard to their actual significance level and we consider the log relative risk as the parameter of interest. Our simulation study reflects the meta-analysis of the efficacy of a vaccine for the prevention of tuberculosis originally discussed in Berkey et al. The simulation study shows that the newly proposed tests are superior to the commonly used test in holding the nominal significance level.

1,249 citations


Journal ArticleDOI
TL;DR: It is found that Bonferroni-related tests offer little improvement over Bonferronsi, while the permutation method offers substantial improvement over the random field method for low smoothness and low degrees of freedom.
Abstract: Functional neuroimaging data embodies a massive multiple testing problem, where 100 000 correlated test statistics must be assessed. The familywise error rate, the chance of any false positives is ...

1,146 citations


Book
01 Jan 2003
TL;DR: In this paper, an empirical Bayes analysis plan for this situation is developed, using a local version of the false discovery rate to examine the inference issues, and two genomics problems are used as examples to show the importance of correctly choosing the null hypothesis.
Abstract: Current scientific techniques in genomics and image processing routinely produce hypothesis testing problems with hundreds or thousands of cases to consider simultaneously. This poses new difficulties for the statistician, but also opens new opportunities. In particular, it allows empirical estimation of an appropriate null hypothesis. The empirical null may be considerably more dispersed than the usual theoretical null distribution that would be used for any one case considered separately. An empirical Bayes analysis plan for this situation is developed, using a local version of the false discovery rate to examine the inference issues. Two genomics problems are used as examples to show the importance of correctly choosing the null hypothesis.

930 citations


Proceedings Article
09 Dec 2003
TL;DR: An improved algorithm for learning k while clustering based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution, which works well, and better than a recent method based on the BIC penalty for model complexity.
Abstract: When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem. In this paper we present an improved algorithm for learning k while clustering. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs k-means with increasing k in a hierarchical fashion until the test accepts the hypothesis that the data assigned to each k-means center are Gaussian. Two key advantages are that the hypothesis test does not limit the covariance of the data and does not compute a full covariance matrix. Additionally, G-means only requires one intuitive parameter, the standard statistical significance level α. We present results from experiments showing that the algorithm works well, and better than a recent method based on the BIC penalty for model complexity. In these experiments, we show that the BIC is ineffective as a scoring function, since it does not penalize strongly enough the model's complexity.

887 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of testing for multiple structural changes under very general conditions on the data and the errors: they considered a type test for the null hypothesis of no change vs. a pre-specified number of changes and also vs. the alternative hypothesis of I + 1 changes.
Abstract: Bai and Perron (1998), henceforth BP, considered estimating multiple structural changes in a linear model. The results are obtained under a general framework of partial structural changes which allows a subset of the parameters not to change.1 Methods to efficiently compute estimates are discussed in Bai and Perron (2003). BP also addressed the problem of testing for multiple structural changes under very general conditions on the data and the errors: they considered a type test for the null hypothesis of no change vs. a pre-specified number of changes and also vs. an alternative of an arbitrary number of changes (up to some maximum), as well as a procedure that allows one to test the null hypothesis of, say, I changes, vs. the alternative hypothesis of I + 1 changes. The latter is particularly useful in that it allows a specific to general modeling strategy to consistently determine the appropriate number of changes in the data. The tests can be constructed allowing different serial correlation in the errors, different distribution for the data and the errors across segments or imposing a common structure.

768 citations


Journal ArticleDOI
TL;DR: As a potential alternative to standard null hypothesis significance testing, methods for graphical presentation of data--particularly condition means and their corresponding confidence intervals--for a wide range of factorial designs used in experimental psychology are described.
Abstract: As a potential alternative to standard null hypothesis significance testing, we describe methods for graphical presentation of data--particularly condi- tion means and their corresponding confidence inter- vals--for a wide range of factorial designs used in ex- perimental psychology. We describe and illustrate con- fidence intervals specifically appropriate for between- subject versus within-subject factors. For designs in- volving more than two levels of a factor, we describe the use of contrasts for graphical illustration of theo- retically meaningful components of main effects and interactions. These graphical techniques lend them- selves to a natural and straightforward assessment of statistical power. extent that a variety of informative means of construct- ing inferences from data are made available and clearly understood, researchers will increase their likelihood of forming appropriate conclusions and communicating effectively with their audiences. A number of years ago, we advocated and de- scribed computational approaches to the use of confi- dence intervals as part of a graphical approach to data interpretation (Loftus & Masson, 1994; see also, Loftus, 2002). The power and effectiveness of graphical data presentation is undeniable (Tufte, 1983) and is common in all forms of scientific communication in experimen- tal psychology and in other fields. In many instances, however, plots of descriptive statistics (typically means) are not accompanied by any indication of vari- ability or stability associated with those descriptive statistics. The diligent reader, then, is forced to refer to a dreary accompanying recital of significance tests to determine how the pattern of means should be inter- preted.

754 citations


Journal ArticleDOI
TL;DR: In this paper, a stepwise multiple testing procedure is proposed to asymptotically control the familywise error rate at a desired level, which implicitly captures the joint dependence structure of the test statistics, which results in increased ability to detect alternative hypotheses.
Abstract: It is common in econometric applications that several hypothesis tests are carried out at the same time. The problem then becomes how to decide which hypotheses to reject, accounting for the multitude of tests. In this paper, we suggest a stepwise multiple testing procedure which asymptotically controls the familywise error rate at a desired level. Compared to related single-step methods, our procedure is more powerful in the sense that it often will reject more false hypotheses. Unlike some stepwise methods, our method implicitly captures the joint dependence structure of the test statistics, which results in increased ability to detect alternative hypotheses. We prove our method asymptotically controls the familywise error rate under minimal assumptions. Some simulation studies show the improvements of our methods over previous proposals. We also provide an application to a set of real data.

619 citations



Book
23 Dec 2003
TL;DR: Moderators and Social Science Theory and Practice Use of Moderated Multiple Regression Homogeneity of Error Variance Assumption Low Statistical Power and Proposed Remedies Complex MMR Models Assessing Practical Significance Conclusions.
Abstract: 1. What Is a Moderator Variable and Why Should We Care? Why Should We Study Moderator Variables? Distinction between Moderator and Mediator Variables Importance of A Priori Rationale in Investigating Moderating Effects Conclusions 2. Moderated Multiple Regression What Is MMR? Endorsement of MMR as an Appropriate Technique Pervasive Use of MMR in the Social Sciences: Literature Review Conclusions 3. Performing and Interpreting Moderated Multiple Regression Analysis Using Computer Programs Research Scenario Data Set Conducting an MMR Analysis Using Computer Programs: Two Steps Output Interpretation Conclusions 4. Homogeneity of Error Variance Assumption What Is the Homogeneity of Error Variance Assumption? Two Distinct Assumptions: Homoscedasticity and Homogeneity of Error Variance Is It a Big Deal to Violate the Assumption? Violation of the Assumption in Published Research How to Check If the Homogeneity Assumption Is Violated What to Do When the Homogeneity of Error Variance Assumption Is Violated ALTMMR: Computer Program to Check Assumption Compliance and Compute Alternative Statistics If Needed Conclusions 5. MMR's Low-Power Problem Statistical Inferences and Power Controversy Over Null Hypothesis Significance Testing Factors Affecting the Power of All Inferential Tests Factors Affecting the Power of MMR Effect Sizes and Power in Published Research Implications of Small Observed Effect Sizes for Social Science Research Conclusions 6. Light at the End of the Tunnel: How to Solve the Low-Power Problem How to Minimize the Impact of Factors Affecting the Power of All Inferential Tests How to Minimize the Impact of Factors Affecting the Power of MMR Conclusions 7. Computing Statistical Power Usefulness of Computing Statistical Power Empirically Based Programs Theory-Based Program Relative Impact of the Factors Affecting Power Conclusions 8. Complex MMR Models MMR Analyses Including a Moderator Variable with More Than Two Levels Linear Interactions and Non-linear Effects: Friends or Foes? Testing and Interpreting Three-Way and Higher-Order Interaction Effects Conclusions 9. Further Issues in the Interpretation of Moderating Effects Is the Moderating Effect Practically Significant? The Signed Coefficient Rule for Interpreting Moderating Effects The Importance on Identifying Criterion and Predictor A Priori Conclusions 10. Summary and Conclusions Moderators and Social Science Theory and Practice Use of Moderated Multiple Regression Homogeneity of Error Variance Assumption Low Statistical Power and Proposed Remedies Complex MMR Models Assessing Practical Significance Conclusions Appendix A. Computation of Bartlett's (1937) \ital\M\ital\ Statistic Appendix B. Computation of James's (1951) \ital\J\ital\ Statistic Appendix C. Computation of Alexander's (Alexander & Govern, 1994) \ital\A\ital\ Statistic Appendix D. Computation of Modified \ital\f\ital\\superscript\2\superscript\ Appendix E. Theory-Based Power Approximation References Name Index Subject Index 1. What Is a Moderator Variable and Why Should We Care? Why Should We Study Moderator Variables? Distinction between Moderator and Mediator Variables Importance of A Priori Rationale in Investigating Moderating Effects Conclusions 2. Moderated Multiple Regression What Is MMR? Endorsement of MMR as an Appropriate Technique Pervasive Use of MMR in the Social Sciences: Literature Review Conclusions 3. Performing and Interpreting Moderated Multiple Regression Analysis Using Computer Programs Research Scenario Data Set Conducting an MMR Analysis Using Computer Programs: Two Steps Output Interpretation Conclusions 4. Homogeneity of Error Variance Assumption What Is the Homogeneity of Error Variance Assumption? Two Distinct Assumptions: Homoscedasticity and Homogeneity of Error Variance Is It a Big Deal to Violate the Assumption? Violation of the Assumption in Published Research How to Check If the Homogeneity Assumption Is Violated What to Do When the Homogeneity of Error Variance Assumption Is Violated ALTMMR: Computer Program to Check Assumption Compliance and Compute Alternative Statistics If Needed Conclusions 5. MMR's Low-Power Problem Statistical Inferences and Power Controversy Over Null Hypothesis Significance Testing Factors Affecting the Power of All Inferential Tests Factors Affecting the Power of MMR Effect Sizes and Power in Published Research Implications of Small Observed Effect Sizes for Social Science Research Conclusions 6. Light at the End of the Tunnel: How to Solve the Low-Power Problem How to Minimize the Impact of Factors Affecting the Power of All Inferential Tests How to Minimize the Impact of Factors Affecting the Power of MMR Conclusions 7. Computing Statistical Power Usefulness of Computing Statistical Power Empirically Based Programs Theory-Based Program Relative Impact of the Factors Affecting Power Conclusions 8. Complex MMR Models MMR Analyses Including a Moderator Variable with More Than Two Levels Linear Interactions and Non-linear Effects: Friends or Foes? Testing and Interpreting Three-Way and Higher-Order Interaction Effects Conclusions 9. Further Issues in the Interpretation of Moderating Effects Is the Moderating Effect Practically Significant? The Signed Coefficient Rule for Interpreting Moderating Effects The Importance on Identifying Criterion and Predictor A Priori Conclusions 10. Summary and Conclusions Moderators and Social Science Theory and Practice Use of Moderated Multiple Regression Homogeneity of Error Variance Assumption Low Statistical Power and Proposed Remedies Complex MMR Models Assessing Practical Significance Conclusions Appendix A. Computation of Bartlett's (1937) \ital\M\ital\ Statistic Appendix B. Computation of James's (1951) \ital\J\ital\ Statistic Appendix C. Computation of Alexander's (Alexander & Govern, 1994) \ital\A\ital\ Statistic Appendix D. Computation of Modified \ital\f\ital\\superscript\2\superscript\ Appendix E. Theory-Based Power Approximation References Name Index Subject Index

Book
01 Jan 2003
TL;DR: In this article, a stepwise multiple testing procedure that asymptotically controls the familywise error rate is proposed, which implicitly captures the joint dependence structure of the test statistics, which results in increased ability to detect false hypotheses.
Abstract: In econometric applications, often several hypothesis tests are carried out at once. The problem then becomes how to decide which hypotheses to reject, accounting for the multitude of tests. This paper suggests a stepwise multiple testing procedure that asymptotically controls the familywise error rate. Compared to related single-step methods, the procedure is more powerful and often will reject more false hypotheses. In addition, we advocate the use of studentization when feasible. Unlike some stepwise methods, the method implicitly captures the joint dependence structure of the test statistics, which results in increased ability to detect false hypotheses. The methodology is presented in the context of comparing several strategies to a common benchmark. However, our ideas can easily be extended to other contexts where multiple tests occur. Some simulation studies show the improvements of our methods over previous proposals. We also provide an application to a set of real data.

Book
06 Jan 2003
TL;DR: In this article, Probability and Related Concepts Summarizing data sampling distributions and confidence intervals are discussed and a hypothesis testing least squares regression and Pearson's correlation is proposed to detect outliers in multivariate data.
Abstract: Introduction Probability and Related Concepts Summarizing Data Sampling Distributions and Confidence Intervals Hypothesis Testing Least Squares Regression and Pearson's Correlation Basic Bootstrap Methods Comparing Two Independent Groups One-Way Anova Two-Way Anova Comparing Dependent Groups Multiple Comparisons Detecting Outliers in Multivariate Data More Regression Methods Rank-Based and Nonparametric Methods

Posted Content
TL;DR: The Jobson-Korkie-test of equal Sharpe ratios as discussed by the authors is widely used in the performance evaluation literature and it has been shown that the test statistic can be simplified without loss of its statistical properties.
Abstract: The Jobson-Korkie-test of equal Sharpe Ratios is widely used in the performance evaluation literature. This letter has two purposes: First, it corrects a typographical error in the test statistic. Second, it shows that the test statistic can be simplified without loss of its statistical properties.

Journal ArticleDOI
TL;DR: This paper adopts an anomaly detection approach by detecting possible intrusions based on program or user profiles built from normal usage data using a scheme that can be justified from the perspective of hypothesis testing.

Journal Article
TL;DR: In this article, the power of the parametric t-test for trend detection is estimated by Monte Carlo simulation for various probability distributions and compared with the non-parametric Mann-Kendall test.
Abstract: The existence of a trend in a hydrological time series is detected by statistical tests. The power of a test is the probability that it will reject a null hypothesis when it is false. In this study, the power of the parametric t-test for trend detection is estimated by Monte Carlo simulation for various probability distributions and compared with the power of the non-parametric Mann-Kendall test. The t-test has less power than the non-parametric test when the probability distribution is skewed. However, for moderately skewed distributions the power ratio is close to one. Annual streamflow records in various regions of Turkey are analyzed by the two tests to compare their powers in detecting a trend.

Book
04 Jun 2003
TL;DR: Significance Testing Among Groups Using GeneSight Statistical Analysis of Microarray Data Using S-PLUS and Insightful ArrayAnalyzer SAS Software for Genomics Spofire's Decision Site THE ROAD AHEAD What Next?
Abstract: PREFACE INTRODUCTION Bioinformatics - An Emerging Discipline The Building Blocks of Genomic Information Expression of Genetic Information The Need for Microarrays MICROARRAYS Microarrays - Tools for Gene Expression Analysis Fabrication of Microarrays Applications of Microarrays Challenges in Using Microarrays in Gene Expression Studies Sources of Variability IMAGE PROCESSING Introduction Basic Elements of Digital Imaging Microarray Image Processing Image Processing of cDNA Microarrays Image Processing of Affymetrix Microarrrays ELEMENTS OF STATISTICS Introduction Some Basic Terms Elementary Statistics Probabilities Bayes' Theorem Probability Distributions Central Limit Theorem Are Replicates Useful? Summary Solved Problems Exercises STATISTICAL HYPOTHESIS TESTING Introduction The framework Hypothesis Testing and Significance "I Do Not Believe God Does Not Exist" An Algorithm for Hypothesis Testing Errors in Hypothesis Testing Solved Problems CLASSICAL APPROACHES TO DATA ANALYSIS Introduction Tests Involving a Single Sample Tests Involving Two Samples Exercises ANALYSIS OF VARIANCE - ANOVA Introduction One-Way ANOVA Two-Way ANOVA Quality Control Exercises EXPERIMENT DESIGN The Concept of Experiment Design Comparing Varieties Improving the Production Process Principles of Experimental Design Guidelines for Experimental Design A Short Synthesis of Statistical Experiment Designs Some Microarray Specific Experiment Designs MULTIPLE COMPARISONS Introduction The Problem of Multiple Comparisons A More Precise Argument Corrections for Multiple Comparisons ANALYSIS AND VISUALIZATION TOOLS Introduction Box Plots Gene Pies Scatter Plots Histograms Time Series Principal Component Analysis (PCA) Independent Component Analysis (ICA) CLUSTER ANALYSIS Introduction Metric Distances Hierarchical Clustering k-Means Clustering Kohonen Maps (SOFM) DATA PRE-PROCESSING AND NORMALIZATION Introduction General Pre-Processing Techniques Normalization Issues Specific to cDNA Data Normalization Issues Specific to Affymetrix Data Other Approaches to the Normalization of Affymetrix Data Useful Pre-Processing and Normalization Sequences Appendix METHODS FOR SELECTING DIFFERENTIALLY REGULATED GENES Introduction Criteria Fold Change Unusual Ratio Hypothesis Testing, Corrections for Multiple Comparisons and Resampling ANOVA Noise Sampling Model Based Maximum Likelihood Estimation Methods Affymetrix Comparison Calls Other Methods Appendix FUNCTIONAL ANALYSIS AND BIOLOGICAL INTERPRETATION OF MICROARRAY DATA Introduction The Gene Ontology Other Related Resources Translating Lists of Differentially Regulated Genes into Biological Knowledge Onto-Express Summary FOCUSED MICROARRAYS - COMPARISON AND SELECTION Introduction Criteria for Array Selection Onto-Compare Some Comparisons COMMERCIAL APPLICATIONS Introduction Significance Testing Among Groups Using GeneSight Statistical Analysis of Microarray Data Using S-PLUS and Insightful ArrayAnalyzer SAS Software for Genomics Spofire's Decision Site THE ROAD AHEAD What Next? Molecular Diagnosis Gene Regulatory Networks Conclusions REFERENCES

Book ChapterDOI
20 Jul 2003
TL;DR: This paper demonstrates a non-parametric technique for estimation of statistical significance in the context of discriminative analysis, which adopts permutation tests, first developed in classical statistics for hypothesis testing, to estimate how likely it is to obtain the observed classification performance, as measured by testing on a hold-out set or cross-validation, by chance.
Abstract: Estimating statistical significance of detected differences between two groups of medical scans is a challenging problem due to the high dimensionality of the data and the relatively small number of training examples. In this paper, we demonstrate a non-parametric technique for estimation of statistical significance in the context of discriminative analysis (i.e., training a classifier function to label new examples into one of two groups). Our approach adopts permutation tests, first developed in classical statistics for hypothesis testing, to estimate how likely we are to obtain the observed classification performance, as measured by testing on a hold-out set or cross-validation, by chance. We demonstrate the method on examples of both structural and functional neuroimaging studies.

Journal ArticleDOI
TL;DR: Three families of effect size estimators are described and their use in situations of general and specific interest to experimenting psychologists are described, with the emphasis on correlation (r-type) effect size indicators.
Abstract: This article describes three families of effect size estimators and their use in situations of general and specific interest to experimenting psychologists. The situations discussed include both between- and within-group (repeated measures) designs. Also described is the counternull statistic, which is useful in preventing common errors of interpretation in null hypothesis significance testing. The emphasis is on correlation (r-type) effect size indicators, but a wide variety of difference-type and ratio-type effect size estimators are also described.

Journal ArticleDOI
TL;DR: GeneMerge is a web-based and standalone program written in PERL that returns a range of functional and genomic data for a given set of study genes and provides statistical rank scores for over-representation of particular functions or categories in the data set.
Abstract: Summary: GeneMerge is a web-based and standalone program written in PERL that returns a range of functional and genomic data for a given set of study genes and provides statistical rank scores for over-representation of particular functions or categories in the data set. Functional or categorical data of all kinds can be analyzed with GeneMerge, facilitating regulatory and metabolic pathway analysis, tests of population genetic hypotheses, cross-experiment comparisons, and tests of chromosomal clustering, among others. GeneMerge can perform analyses on a wide variety of genomic data quickly and easily and facilitates both data mining and hypothesis testing. Availability: GeneMerge is available free of charge for academic use over the web and for download from: http://www.oeb.harvard.edu/hartl/lab/publications/ GeneMerge.html.

Journal Article
TL;DR: A feature selection method that can be applied directly to models that are linear with respect to their parameters, and indirectly to others, and is independent of the target machine is described.
Abstract: We describe a feature selection method that can be applied directly to models that are linear with respect to their parameters, and indirectly to others. It is independent of the target machine. It is closely related to classical statistical hypothesis tests, but it is more intuitive, hence more suitable for use by engineers who are not statistics experts. Furthermore, some assumptions of classical tests are relaxed. The method has been used successfully in a number of applications that are briefly described.

Journal ArticleDOI
Rogelio Oliva1
TL;DR: This paper posits that model calibration––the process of estimating the model parameters (structure) to obtain a match between observed and simulated structures and behaviors––is a stringent test of a hypothesis linking structure to behavior, and proposes a framework to use calibration as a form of model testing.

Journal ArticleDOI
TL;DR: In particular, the authors pointed out the incompatibility of Fisher's evidential p value with the Type I error rate, α, of Neyman-Pearson statistical orthodoxy, and pointed out that the difference between evidence (p's) and error (α's) reflects the fundamental differences between Fisher's ideas on significance testing and inductive inference.
Abstract: Confusion surrounding the reporting and interpretation of results of classical statistical tests is widespread among applied researchers, most of whom erroneously believe that such tests are prescribed by a single coherent theory of statistical inference. This is not the case: Classical statistical testing is an anonymous hybrid of the competing and frequently contradictory approaches formulated by R. A. Fisher on the one hand, and Jerzy Neyman and Egon Pearson on the other. In particular, there is a widespread failure to appreciate the incompatibility of Fisher's evidential p value with the Type I error rate, α, of Neyman-Pearson statistical orthodoxy. The distinction between evidence (p's) and error (α's) is not trivial. Instead, it reflects the fundamental differences between Fisher's ideas on significance testing and inductive inference, and Neyman-Pearson's views on hypothesis testing and inductive behavior. The emphasis of the article is to expose this incompatibility, but we also briefly note a pos...

Journal ArticleDOI
TL;DR: In this paper, a Bayesian framework for exploratory data analysis based on posterior predictive checks is presented, which can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis.
Abstract: Summary Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)—which are generally considered as unrelated statistical paradigms—can be particularly effective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data yrep and replicated parameters θrep follows a long tradition of generalizations in Bayesian theory. On the theoretical level, we present a predictive Bayesian formulation of goodness-of-fit testing, distinguishing between p-values (posterior probabilities that specified antisymmetric discrepancy measures will exceed 0) and u-values (data summaries with uniform sampling distributions). We explain that p-values, unlike u-values, are Bayesian probability statements in that they condition on observed data. Having reviewed the general theoretical framework, we discuss the implications for statistical graphics and exploratory data analysis, with the goal being to unify exploratory data analysis with more formal statistical methods based on probability models. We interpret various graphical displays as posterior predictive checks and discuss how Bayesian inference can be used to determine reference distributions. The goal of this work is not to downgrade descriptive statistics, or to suggest they be replaced by Bayesian modeling, but rather to suggest how exploratory data analysis fits into the probability-modeling paradigm. We conclude with a discussion of the implications for practical Bayesian inference. In particular, we anticipate that Bayesian software can be generalized to draw simulations of replicated data and parameters from their posterior predictive distribution, and these can in turn be used to calibrate EDA graphs.

Journal ArticleDOI
TL;DR: A collection of 1,220,000 simulated benchmark data sets generated under 51 different cluster models and the null hypothesis are presented, to be used for power evaluations and to compare the power of the spatial scan statistic, the maximized excess events test and the nonparametric M statistic.

MonographDOI
11 Jul 2003
TL;DR: In this article, the authors present a set of rules for text elements in graphics, including three rules for displaying text information in graphics and two rules for using three-dimensional graphics in computer printouts.
Abstract: Preface.PART I FOUNDATIONS.1. Sources of Error.Prescription.Fundamental Concepts.Ad Hoc, Post Hoc Hypotheses.2. Hypotheses: The Why of Your Research.Prescription.What Is a Hypothesis?How precise must a hypothesis be?Found Data.Null hypothesis.Neyman-Pearson Theory.Deduction and Induction.Losses.Decisions.To Learn More.3. Collecting Data.Preparation.Measuring Devices.Determining Sample Size.Fundamental Assumptions.Experimental Design.Four Guidelines.Are Experiments Really Necessary?To Learn More.PART II HYPOTHESIS TESTING AND ESTIMATION.4. Estimation.Prevention.Desirable and Not-So-Desirable Estimators.Interval Estimates.Improved Results.Summary.To Learn More.5. Testing Hypotheses: Choosing a Test Statistic.Comparing Means of Two Populations.Comparing Variances.Comparing the Means of K Samples.Higher-Order Experimental Designs.Contingency Tables.Inferior Tests.Multiple Tests.Before You Draw Conclusions.Summary.To Learn More.6. Strengths and Limitations of Some Miscellaneous Statistical Procedures.Bootstrap.Bayesian Methodology.Meta-Analysis.Permutation Tests.To Learn More.7. Reporting Your Results.Fundamentals.Tables.Standard Error.p-Values.Confidence Intervals.Recognizing and Reporting Biases.Reporting Power.Drawing Conclusions.Summary.To Learn More.8. Interpreting Reports.With A Grain of Salt.Rates and Percentages.Interpreting Computer Printouts.9. Graphics.The Soccer Data.Five Rules for Avoiding Bad Graphics.One Rule for Correct Usage of Three-Dimensional Graphics.The Misunderstood Pie Chart.Two Rules for Effective Display of Subgroup Information.Two Rules for Text Elements in Graphics.Multidimensional Displays.Choosing Graphical Displays.Summary.To Learn More.PART III BUILDING A MODEL.10. Univariate Regression.Model Selection.Estimating Coefficients.Further Considerations.Summary.To Learn More.11. Alternate Methods of Regression.Linear vs. Nonlinear Regression.Least Absolute Deviation Regression.Errors-in-Variables Regression.Quantile Regression.The Ecological Fallacy.Nonsense Regression.Summary.To Learn More.12. Multivariable Regression.Caveats.Factor Analysis.General Linearized Models.Reporting Your Results.A Conjecture.Decision Trees.Building a Successful Model.To Learn More.13. Validation.Methods of Validation.Measures of Predictive Success.Long-Term Stability.To Learn More.Appendix A.Appendix B.Glossary, Grouped by Related but Distinct Terms.Bibliography.Author Index.Subject Index.

Journal ArticleDOI
TL;DR: In this paper, the authors show that classical score test statistics, frequently advocated in practice, cannot be used in this context, but that well-chosen one-sided counterparts could be used instead.
Abstract: Whenever inference for variance components is required, the choice between one-sided and two-sided tests is crucial. This choice is usually driven by whether or not negative variance components are permitted. For two-sided tests, classical inferential procedures can be followed, based on likelihood ratios, score statistics, or Wald statistics. For one-sided tests, however, one-sided test statistics need to be developed, and their null distribution derived. While this has received considerable attention in the context of the likelihood ratio test, there appears to be much confusion about the related problem for the score test. The aim of this paper is to illustrate that classical (two-sided) score test statistics, frequently advocated in practice, cannot be used in this context, but that well-chosen one-sided counterparts could be used instead. The relation with likelihood ratio tests will be established, and all results are illustrated in an analysis of continuous longitudinal data using linear mixed models.

Journal ArticleDOI
TL;DR: A decision theoretic formulation of product partition models (PPMs) is presented that allows a formal treatment of different decision problems such as estimation or hypothesis testing and clustering methods simultaneously, and an algorithm is proposed that yields Bayes estimates of the quantities of interest and the groups of experimental units.
Abstract: Summary. We present a decision theoretic formulation of product partition models (PPMs) that allows a formal treatment of different decision problems such as estimation or hypothesis testing and clustering methods simultaneously. A key observation in our construction is the fact that PPMs can be formulated in the context of model selection. The underlying partition structure in these models is closely related to that arising in connection with Dirichlet processes. This allows a straightforward adaptation of some computational strategies--originally devised for nonparametric Bayesian problems-to our framework. The resulting algorithms are more flexible than other competing alternatives that are used for problems involving PPMs. We propose an algorithm that yields Bayes estimates of the quantities of interest and the groups of experimental units. We explore the application of our methods to the detection of outliers in normal and Student t regression models, with clustering structure equivalent to that induced by a Dirichlet process prior. We also discuss the sensitivity of the results considering different prior distributions for the partitions.

Journal ArticleDOI
TL;DR: An alternative statistical significance test is presented, based on Monte Carlo procedures, that produces the equivalent of an approximate randomization test for the null hypothesis that the actual distribution of responding is rectangular and demonstrate its superiority to the chi-square test.
Abstract: The authors demonstrated that the most common statistical significance test used with r(WG)-type interrater agreement indexes in applied psychology, based on the chi-square distribution, is flawed and inaccurate. The chi-square test is shown to be extremely conservative even for modest, standard significance levels (e.g., .05). The authors present an alternative statistical significance test, based on Monte Carlo procedures, that produces the equivalent of an approximate randomization test for the null hypothesis that the actual distribution of responding is rectangular and demonstrate its superiority to the chi-square test. Finally, the authors provide tables of critical values and offer downloadable software to implement the approximate randomization test for r(WG)-type and for average deviation (AD)-type interrater agreement indexes. The implications of these results for studying a broad range of interrater agreement problems in applied psychology are discussed.

Journal ArticleDOI
TL;DR: In this paper, a modified Bartlett statistical test is proposed to provide a more rational basis for rejecting the null hypothesis of stationarity in the correlated case, and the accompanying rejection criteria are determined from simulated correlated sample functions.
Abstract: Stationarity or statistical homogeneity is an important prerequisite for subsequent statistical analysis on a given section of a soil profile to be valid. The estimation of important soil statistics such as the variance is likely to be biased if the profile is not properly demarcated into stationary sections. Existing classical statistical tests are inadequate even for simple identification of stationarity in the variance because the spatial variations of soil properties are generally correlated with each other. In this paper, a modified Bartlett statistical test is proposed to provide a more rational basis for rejecting the null hypothesis of stationarity in the correlated case. The accompanying rejection criteria are determined from simulated correlated sample functions and summarized into a convenient form for practical use. A statistical-based soil boundary identification procedure is then developed using the modified Bartlett test statistic. Based on the analysis of a piezocone sounding record, two a...