Power failure: why small sample size undermines the reliability of neuroscience

doi:10.1038/NRN3475

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Estimating the reproducibility of psychological science

[...]

Alexander A. Aarts, Joanna E. Anderson¹, Christopher J. Anderson², Peter Raymond Attridge³ +287 more•Institutions (116)

28 Aug 2015-Science

TL;DR: A large-scale assessment suggests that experimental reproducibility in psychology leaves a lot to be desired, and correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.

...read moreread less

Abstract: Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.

...read moreread less

5,532 citations

Cites background from "Power failure: why small sample siz..."

...Reproducibility Project 13 inflated effect sizes due to publication, selection, reporting, or other biases (9, 12-23)....
[...]
...problematic practices include selective reporting, selective analysis, and insufficient specification of the conditions necessary or sufficient to obtain the results (12-23)....
[...]
...publication bias favoring positive results together produce a literature with upwardly biased effect sizes (14, 16, 32, 33)....
[...]

Journal Article•DOI•

Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates

[...]

Anders Eklund¹, Thomas E. Nichols², Hans Knutsson¹•Institutions (2)

Linköping University¹, University of Warwick²

12 Jul 2016-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is found that the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%.

...read moreread less

Abstract: The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging.

...read moreread less

2,946 citations

Cites methods from "Power failure: why small sample siz..."

...In addition to unreliable statistical methods, the neuroimaging field also suffers from studies having low statistical power [42, 43]....
[...]

Journal Article•DOI•

Permutation inference for the general linear model.

[...]

Anderson M. Winkler¹, Anderson M. Winkler², Anderson M. Winkler³, Gerard R. Ridgway⁴, Matthew A. Webster², Stephen M. Smith², Thomas E. Nichols⁵ - Show less +3 more•Institutions (5)

Yale University¹, University of Oxford², GlaxoSmithKline³, Wellcome Trust Centre for Neuroimaging⁴, University of Warwick⁵

15 May 2014-NeuroImage

TL;DR: This paper presents a generic framework for permutation inference for complex general linear models (glms) when the errors are exchangeable and/or have a symmetric distribution, and shows that, even in the presence of nuisance effects, these permutation inferences are powerful while providing excellent control of false positives in a wide range of common and relevant imaging research scenarios.

...read moreread less

2,756 citations

Cites background from "Power failure: why small sample siz..."

...There are many reasonswhy larger samples aremore appropriate (see Button et al. (2013) for a recent review), and inwhat concerns permutation methods, larger samples allow smaller p-values, improve the variance estimates for each VG (which are embodied in the weighting matrix under restricted exchangeability), and allow finer control over the familywise error rate....
[...]

Journal Article•DOI•

A manifesto for reproducible science

[...]

Marcus R. Munafò¹, Brian A. Nosek², Brian A. Nosek³, Dorothy V. M. Bishop⁴, Katherine S. Button, Christopher D. Chambers⁵, Nathalie Percie du Sert, Uri Simonsohn⁶, Eric-Jan Wagenmakers⁷, Jennifer J. Ware, John P. A. Ioannidis⁸ - Show less +7 more•Institutions (8)

University of Bristol¹, Center for Open Science², University of Virginia³, University of Oxford⁴, Cardiff University⁵, University of Pennsylvania⁶, Amsterdam University College⁷, Stanford University⁸

10 Jan 2017-Nature Human Behaviour

TL;DR: This work argues for the adoption of measures to optimize key elements of the scientific process: methods, reporting and dissemination, reproducibility, evaluation and incentives, in the hope that this will facilitate action toward improving the transparency, reproducible and efficiency of scientific research.

...read moreread less

Abstract: Improving the reliability and efficiency of scientific research will increase the credibility of the published scientific literature and accelerate discovery. Here we argue for the adoption of measures to optimize key elements of the scientific process: methods, reporting and dissemination, reproducibility, evaluation and incentives. There is some evidence from both simulations and empirical studies supporting the likely effectiveness of these measures, but their broad adoption by researchers, institutions, funders and journals will require iterative evaluation and improvement. We discuss the goals of these measures, and how they can be implemented, in the hope that this will facilitate action toward improving the transparency, reproducibility and efficiency of scientific research.

...read moreread less

1,951 citations

Journal Article•DOI•

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

[...]

Sander Greenland¹, Stephen Senn, Kenneth J. Rothman², John B. Carlin³, Charles Poole⁴, Steven N. Goodman⁵, Douglas G. Altman⁶ - Show less +3 more•Institutions (6)

University of California, Los Angeles¹, Research Triangle Park², University of Melbourne³, University of North Carolina at Chapel Hill⁴, Stanford University⁵, University of Oxford⁶

21 May 2016-European Journal of Epidemiology

TL;DR: Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant as discussed by the authors, and there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists.

...read moreread less

Abstract: Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so-and yet these misinterpretations dominate much of the scientific literature In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power We conclude with guidelines for improving statistical interpretation and reporting

...read moreread less

1,584 citations