Robust misinterpretation of confidence intervals

Home
/
Papers
/
Robust misinterpretation of confidence intervals

Robust misinterpretation of confidence intervals

Rink Hoekstra¹, Richard D. Morey¹, Jeffrey N. Rouder², Eric-Jan Wagenmakers¹•Institutions (2)

University of Groningen¹, University of Missouri²

01 Nov 2013-

TL;DR: Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs, which suggests that many researchers do not know the correct interpretation of a CI.

read less

Abstract: Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students—all in the field of psychology—were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications.

[...]

Eric-Jan Wagenmakers¹, Maarten Marsman¹, Tahira Jamil¹, Alexander Ly¹, Josine Verhagen¹, Jonathon Love¹, Ravi Selker¹, Quentin Frederik Gronau¹, Martin Šmíra², Sacha Epskamp¹, Dora Matzke¹, Jeffrey N. Rouder³, Richard D. Morey⁴ - Show less +9 more•Institutions (4)

University of Amsterdam¹, Masaryk University², University of Missouri³, Cardiff University⁴

01 Feb 2018-Psychonomic Bulletin & Review

TL;DR: Ten prominent advantages of the Bayesian approach are outlined, and several objections to Bayesian hypothesis testing are countered.

...read moreread less

Abstract: Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. In part I of this series we outline ten prominent advantages of the Bayesian approach. Many of these advantages translate to concrete opportunities for pragmatic researchers. For instance, Bayesian hypothesis testing allows researchers to quantify evidence and monitor its progression as data come in, without needing to know the intention with which the data were collected. We end by countering several objections to Bayesian hypothesis testing. Part II of this series discusses JASP, a free and open source software program that makes it easy to conduct Bayesian estimation and testing for a range of popular statistical scenarios (Wagenmakers et al. this issue).

...read moreread less

940 citations

Journal Article•DOI•

The fallacy of placing confidence in confidence intervals

[...]

Richard D. Morey¹, Rink Hoekstra², Jeffrey N. Rouder³, Michael D. Lee⁴, Eric-Jan Wagenmakers⁵ - Show less +1 more•Institutions (5)

Cardiff University¹, University of Groningen², University of Missouri³, University of California, Irvine⁴, University of Amsterdam⁵

01 Feb 2016-Psychonomic Bulletin & Review

TL;DR: It is shown in a number of examples that CIs do not necessarily have any of the properties of confidence intervals, and can lead to unjustified or arbitrary inferences, and is suggested that other theories of interval estimation should be used instead.

...read moreread less

Abstract: Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. For this reason, we caution against relying upon confidence interval theory to justify interval estimates, and suggest that other theories of interval estimation should be used instead.

...read moreread less

408 citations

Cites background from "Robust misinterpretation of confide..."

...Authors choosing to report CIs have a responsibility to keep their readers from invalid inferences, because it is almost certain that without a warning readers will misinterpret them (Hoekstra et al., 2014)....
[...]
...Recent work has shown that this misunderstanding is pervasive among researchers, who likely learned it from textbooks, instructors, and confidence interval proponents (Hoekstra et al., 2014)....
[...]
...” Recent work has shown that this misunderstanding is pervasive among researchers, who likely learned it from textbooks, instructors, and confidence interval proponents (Hoekstra et al., 2014)....
[...]

Journal Article•DOI•

Ordinal Regression Models in Psychology: A Tutorial

[...]

Paul-Christian Bürkner¹, Matti Vuorre²•Institutions (2)

University of Münster¹, Columbia University²

25 Feb 2019

TL;DR: In psychology, ordinal variables, although extremely common in psychology, are almost exclusively analyzed with statistical models that falsely assume them to be metric as discussed by the authors, which can lead to distorted effect.

...read moreread less

Abstract: Ordinal variables, although extremely common in psychology, are almost exclusively analyzed with statistical models that falsely assume them to be metric. This practice can lead to distorted effect...

...read moreread less

287 citations

Journal Article•DOI•

Painfree and accurate Bayesian estimation of psychometric functions for (potentially) overdispersed data

[...]

Heiko H. Schütt¹, Heiko H. Schütt², Stefan Harmeling³, Jakob H. Macke⁴, Jakob H. Macke⁵, Felix A. Wichmann⁴ - Show less +2 more•Institutions (5)

University of Potsdam¹, University of Tübingen², University of Düsseldorf³, Max Planck Society⁴, Center of Advanced European Studies and Research⁵

01 May 2016-Vision Research

TL;DR: It is shown that the use of the beta-binomial model makes it possible to determine accurate credible intervals even in data which exhibit substantial overdispersion, and Bayesian inference methods are used for estimating the posterior distribution of the parameters of the psychometric function.

...read moreread less

275 citations

Journal Article•DOI•

The philosophy of Bayes’ factors and the quantification of statistical evidence

[...]

Richard D. Morey¹, Richard D. Morey², Jan-Willem Romeijn², Jeffrey N. Rouder³•Institutions (3)

Cardiff University¹, University of Groningen², University of Missouri³

01 Jun 2016-Journal of Mathematical Psychology

TL;DR: In this article, the authors explore the concept of statistical evidence and how it can be quantified using the Bayes factor, and discuss the philosophical issues inherent in the use of the BFA.

...read moreread less

228 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The earth is round (p < .05)

[...]

Jacob Cohen¹•Institutions (1)

York University¹

01 Dec 1994-American Psychologist

TL;DR: The authors reviewed the problems with null hypothesis significance testing, including near universal misinterpretation of p as the probability that H is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects H₀ one thereby affirms the theory that led to the test.

...read moreread less

Abstract: After 4 decades of severe criticism, the ritual of null hypothesis significance testing (mechanical dichotomous decisions around a sacred .05 criterion) still persists. This article reviews the problems with this practice, including near universal misinterpretation of p as the probability that H₀ is false, the misinterpretation that its complement is the probability of successful replication, and the mistaken assumption that if one rejects H₀ one thereby affirms the theory that led to the test. Exploratory data analysis and the use of graphic methods, a steady improvement in and a movement toward standardization in measurement, an emphasis on estimating effect sizes using confidence intervals, and the informed use of available statistical methods are suggested. For generalization, psychologists must finally rely, as has been done in all the older sciences, on replication. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

...read moreread less

3,838 citations

"Robust misinterpretation of confide..." refers background in this paper

...It has been suggested that the common misinterpretations of NHST arise in part because its results are erroneously given a Bayesian interpretation, such as when the pvalue is misinterpreted as the probability that the null hypothesis is true (e.g., Cohen, 1994; Dienes, 2011; Falk & Greenbaum, 1995)....
[...]
...Despite its frequent use, NHST has been criticized for many reasons, including its inability to provide the answers that researchers are interested in (e.g., Berkson, 1942; Cohen, 1994), its violation of the likelihood principle (e....
[...]
...…NHST has been criticized for many reasons, including its inability to provide the answers that researchers are interested in (e.g., Berkson, 1942; Cohen, 1994), its violation of the likelihood principle (e.g., Berger & Wolpert, 1988; Wagenmakers, 2007), its tendency to overestimate the evidence…...
[...]

Journal Article•DOI•

Statistical Methods in Psychology Journals: Guidelines and Explanations

[...]

Leland Wilkinson

01 Aug 1999-American Psychologist

TL;DR: The Task Force on Statistical Inference (TFSI) of the American Psychological Association (APA) as discussed by the authors was formed to discuss the application of significance testing in psychology journals and its alternatives, including alternative underlying models and data transformation.

...read moreread less

Abstract: In the light of continuing debate over the applications of significance testing in psychology journals and following the publication of Cohen's (1994) article, the Board of Scientific Affairs (BSA) of the American Psychological Association (APA) convened a committee called the Task Force on Statistical Inference (TFSI) whose charge was "to elucidate some of the controversial issues surrounding applications of statistics including significance testing and its alternatives; alternative underlying models and data transformation; and newer methods made possible by powerful computers" (BSA, personal communication, February 28, 1996). Robert Rosenthal, Robert Abelson, and Jacob Cohen (cochairs) met initially and agreed on the desirability of having several types of specialists on the task force: statisticians, teachers of statistics, journal editors, authors of statistics books, computer experts, and wise elders. Nineindividuals were subsequently invited to join and all agreed. These were Leona Aiken, Mark Appelbaum, Gwyneth Boodoo, David A. Kenny, Helena Kraemer, Donald Rubin, Bruce Thompson, Howard Wainer, and Leland Wilkinson. In addition, Lee Cronbach, Paul Meehl, Frederick Mosteller and John Tukey served as Senior Advisors to the Task Force and commented on written materials.

...read moreread less

2,706 citations

Journal Article•DOI•

The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis

[...]

John M. Hoenig¹, Dennis M. Heisey¹•Institutions (1)

Virginia Institute of Marine Science¹

01 Feb 2001-The American Statistician

TL;DR: The problem of post-experiment power calculation is discussed in this paper. But, the problem is extensive and present arguments to demonstrate the flaw in the logic, which is fundamentally flawed.

...read moreread less

Abstract: It is well known that statistical power calculations can be valuable in planning an experiment. There is also a large literature advocating that power calculations be made whenever one performs a statistical test of a hypothesis and one obtains a statistically nonsignificant result. Advocates of such post-experiment power calculations claim the calculations should be used to aid in the interpretation of the experimental results. This approach, which appears in various forms, is fundamentally flawed. We document that the problem is extensive and present arguments to demonstrate the flaw in the logic.

...read moreread less

1,611 citations

"Robust misinterpretation of confide..." refers background in this paper

...Hoenig and Heisey (2001) stated that it is “surely prevalent that researchers interpret confidence intervals as if they were Bayesian credibility regions” (p. 5), but they did this without referring to data to back up this claim....
[...]
...Although, for example, Hoenig and Heisey (2001) predicted that a Bayesian interpretation of CIs would be prevalent, the items that can clearly be considered Bayesian statements (1–4) do not seem to be preferred over item 6, which is clearly a frequentist statement....
[...]

Journal Article•DOI•

Publication Manual of the American Psychological AssociationPublication Manual of the American Psychological Association.

[...]

Kate E Decleene¹, Jennifer L. Fogo¹•Institutions (1)

College of Health Sciences, Bahrain¹

23 Mar 2012-Occupational Therapy in Health Care

TL;DR: The book provides stronger standards for maintaining the participant confidentiality and for reducing bias in language describing participants and suggests that researchers avoid the use of derogatory language such as using “minority” for “non-white” populations.

...read moreread less

Abstract: Similar to previous editions, the Publication Manual of the American Psychological Association (APA), Sixth Edition provides guidelines on all aspects of writing style and formatting for writers, e

...read moreread less

1,447 citations

Book•

Introduction to Probability and Statistics

[...]

William Mendenhall

01 Jan 1967

TL;DR: The twelve edition of the Introduction to Probability and Statistics (INTRODUCTION TO PROBABILITY and STATISTICS) as discussed by the authors has been used by hundreds of thousands of students since its first edition.

...read moreread less

Abstract: Used by hundreds of thousands of students since its first edition, INTRODUCTION TO PROBABILITY AND STATISTICS continues to blend the best of its proven coverage with new innovations. While retaining the straightforward presentation and traditional outline for descriptive and inferential statistics, the Twelfth Edition incorporates exciting new learning aids like MyPersonal Trainer, MyApplet, and MyTip to ensure that students learn and understand the relevance of the material. The book takes advantage of modern technology, including computational software and interactive visual tools, to facilitate statistical reasoning as well as the understanding and interpretation of statistical results. In addition to showing how to apply statistical procedures, the authors explain how to meaningfully describe real sets of data, what the statistical tests mean in terms of their practical applications, how to evaluate the validity of the assumptions behind statistical tests, and what to do when statistical assumptions have been violated. This new edition retains the statistical integrity, examples, exercises and exposition that have made it a market leader, and builds upon this tradition of excellence with new technology integration.

...read moreread less

1,164 citations