scispace - formally typeset
Search or ask a question

Robust misinterpretation of confidence intervals

TL;DR: Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs, which suggests that many researchers do not know the correct interpretation of a CI.
Abstract: Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students—all in the field of psychology—were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.
Citations
More filters
Journal ArticleDOI
TL;DR: Ten prominent advantages of the Bayesian approach are outlined, and several objections to Bayesian hypothesis testing are countered.
Abstract: Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. In part I of this series we outline ten prominent advantages of the Bayesian approach. Many of these advantages translate to concrete opportunities for pragmatic researchers. For instance, Bayesian hypothesis testing allows researchers to quantify evidence and monitor its progression as data come in, without needing to know the intention with which the data were collected. We end by countering several objections to Bayesian hypothesis testing. Part II of this series discusses JASP, a free and open source software program that makes it easy to conduct Bayesian estimation and testing for a range of popular statistical scenarios (Wagenmakers et al. this issue).

940 citations

Journal ArticleDOI
TL;DR: It is shown in a number of examples that CIs do not necessarily have any of the properties of confidence intervals, and can lead to unjustified or arbitrary inferences, and is suggested that other theories of interval estimation should be used instead.
Abstract: Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. For this reason, we caution against relying upon confidence interval theory to justify interval estimates, and suggest that other theories of interval estimation should be used instead.

408 citations


Cites background from "Robust misinterpretation of confide..."

  • ...Authors choosing to report CIs have a responsibility to keep their readers from invalid inferences, because it is almost certain that without a warning readers will misinterpret them (Hoekstra et al., 2014)....

    [...]

  • ...Recent work has shown that this misunderstanding is pervasive among researchers, who likely learned it from textbooks, instructors, and confidence interval proponents (Hoekstra et al., 2014)....

    [...]

  • ...” Recent work has shown that this misunderstanding is pervasive among researchers, who likely learned it from textbooks, instructors, and confidence interval proponents (Hoekstra et al., 2014)....

    [...]

Journal ArticleDOI
25 Feb 2019
TL;DR: In psychology, ordinal variables, although extremely common in psychology, are almost exclusively analyzed with statistical models that falsely assume them to be metric as discussed by the authors, which can lead to distorted effect.
Abstract: Ordinal variables, although extremely common in psychology, are almost exclusively analyzed with statistical models that falsely assume them to be metric. This practice can lead to distorted effect...

287 citations

Journal ArticleDOI
TL;DR: It is shown that the use of the beta-binomial model makes it possible to determine accurate credible intervals even in data which exhibit substantial overdispersion, and Bayesian inference methods are used for estimating the posterior distribution of the parameters of the psychometric function.

275 citations

Journal ArticleDOI
TL;DR: In this article, the authors explore the concept of statistical evidence and how it can be quantified using the Bayes factor, and discuss the philosophical issues inherent in the use of the BFA.

228 citations

References
More filters
Journal ArticleDOI
Jacob Cohen1
TL;DR: The authors reviewed the problems with null hypothesis significance testing, including near universal misinterpretation of p as the probability that H is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects H₀ one thereby affirms the theory that led to the test.
Abstract: After 4 decades of severe criticism, the ritual of null hypothesis significance testing (mechanical dichotomous decisions around a sacred .05 criterion) still persists. This article reviews the problems with this practice, including near universal misinterpretation of p as the probability that H₀ is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects H₀ one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods are suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences, on replication. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

3,838 citations


"Robust misinterpretation of confide..." refers background in this paper

  • ...It has been suggested that the common misinterpretations of NHST arise in part because its results are erroneously given a Bayesian interpretation, such as when the pvalue is misinterpreted as the probability that the null hypothesis is true (e.g., Cohen, 1994; Dienes, 2011; Falk & Greenbaum, 1995)....

    [...]

  • ...Despite its frequent use, NHST has been criticized for many reasons, including its inability to provide the answers that researchers are interested in (e.g., Berkson, 1942; Cohen, 1994), its violation of the likelihood principle (e....

    [...]

  • ...…NHST has been criticized for many reasons, including its inability to provide the answers that researchers are interested in (e.g., Berkson, 1942; Cohen, 1994), its violation of the likelihood principle (e.g., Berger & Wolpert, 1988; Wagenmakers, 2007), its tendency to overestimate the evidence…...

    [...]

Journal ArticleDOI
TL;DR: The Task Force on Statistical Inference (TFSI) of the American Psychological Association (APA) as discussed by the authors was formed to discuss the application of significance testing in psychology journals and its alternatives, including alternative underlying models and data transformation.
Abstract: In the light of continuing debate over the applications of significance testing in psychology journals and following the publication of Cohen's (1994) article, the Board of Scientific Affairs (BSA) of the American Psychological Association (APA) convened a committee called the Task Force on Statistical Inference (TFSI) whose charge was "to elucidate some of the controversial issues surrounding applications of statistics including significance testing and its alternatives; alternative underlying models and data transformation; and newer methods made possible by powerful computers" (BSA, personal communication, February 28, 1996). Robert Rosenthal, Robert Abelson, and Jacob Cohen (cochairs) met initially and agreed on the desirability of having several types of specialists on the task force: statisticians, teachers of statistics, journal editors, authors of statistics books, computer experts, and wise elders. Nineindividuals were subsequently invited to join and all agreed. These were Leona Aiken, Mark Appelbaum, Gwyneth Boodoo, David A. Kenny, Helena Kraemer, Donald Rubin, Bruce Thompson, Howard Wainer, and Leland Wilkinson. In addition, Lee Cronbach, Paul Meehl, Frederick Mosteller and John Tukey served as Senior Advisors to the Task Force and commented on written materials.

2,706 citations

Journal ArticleDOI
TL;DR: The problem of post-experiment power calculation is discussed in this paper. But, the problem is extensive and present arguments to demonstrate the flaw in the logic, which is fundamentally flawed.
Abstract: It is well known that statistical power calculations can be valuable in planning an experiment. There is also a large literature advocating that power calculations be made whenever one performs a statistical test of a hypothesis and one obtains a statistically nonsignificant result. Advocates of such post-experiment power calculations claim the calculations should be used to aid in the interpretation of the experimental results. This approach, which appears in various forms, is fundamentally flawed. We document that the problem is extensive and present arguments to demonstrate the flaw in the logic.

1,611 citations


"Robust misinterpretation of confide..." refers background in this paper

  • ...Hoenig and Heisey (2001) stated that it is “surely prevalent that researchers interpret confidence intervals as if they were Bayesian credibility regions” (p. 5), but they did this without referring to data to back up this claim....

    [...]

  • ...Although, for example, Hoenig and Heisey (2001) predicted that a Bayesian interpretation of CIs would be prevalent, the items that can clearly be considered Bayesian statements (1–4) do not seem to be preferred over item 6, which is clearly a frequentist statement....

    [...]

Journal ArticleDOI
TL;DR: The book provides stronger standards for maintaining the participant confidentiality and for reducing bias in language describing participants and suggests that researchers avoid the use of derogatory language such as using “minority” for “non-white” populations.
Abstract: Similar to previous editions, the Publication Manual of the American Psychological Association (APA), Sixth Edition provides guidelines on all aspects of writing style and formatting for writers, e

1,447 citations

Book
01 Jan 1967
TL;DR: The twelve edition of the Introduction to Probability and Statistics (INTRODUCTION TO PROBABILITY and STATISTICS) as discussed by the authors has been used by hundreds of thousands of students since its first edition.
Abstract: Used by hundreds of thousands of students since its first edition, INTRODUCTION TO PROBABILITY AND STATISTICS continues to blend the best of its proven coverage with new innovations. While retaining the straightforward presentation and traditional outline for descriptive and inferential statistics, the Twelfth Edition incorporates exciting new learning aids like MyPersonal Trainer, MyApplet, and MyTip to ensure that students learn and understand the relevance of the material. The book takes advantage of modern technology, including computational software and interactive visual tools, to facilitate statistical reasoning as well as the understanding and interpretation of statistical results. In addition to showing how to apply statistical procedures, the authors explain how to meaningfully describe real sets of data, what the statistical tests mean in terms of their practical applications, how to evaluate the validity of the assumptions behind statistical tests, and what to do when statistical assumptions have been violated. This new edition retains the statistical integrity, examples, exercises and exposition that have made it a market leader, and builds upon this tradition of excellence with new technology integration.

1,164 citations