scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Power failure: why small sample size undermines the reliability of neuroscience

TL;DR: It is shown that the average statistical power of studies in the neurosciences is very low, and the consequences include overestimates of effect size and low reproducibility of results.
Abstract: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
28 Aug 2015-Science
TL;DR: A large-scale assessment suggests that experimental reproducibility in psychology leaves a lot to be desired, and correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
Abstract: Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.

5,532 citations


Cites background from "Power failure: why small sample siz..."

  • ...Reproducibility Project 13 inflated effect sizes due to publication, selection, reporting, or other biases (9, 12-23)....

    [...]

  • ...problematic practices include selective reporting, selective analysis, and insufficient specification of the conditions necessary or sufficient to obtain the results (12-23)....

    [...]

  • ...publication bias favoring positive results together produce a literature with upwardly biased effect sizes (14, 16, 32, 33)....

    [...]

Journal ArticleDOI
TL;DR: It is found that the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%.
Abstract: The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging.

2,946 citations


Cites methods from "Power failure: why small sample siz..."

  • ...In addition to unreliable statistical methods, the neuroimaging field also suffers from studies having low statistical power [42, 43]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a generic framework for permutation inference for complex general linear models (glms) when the errors are exchangeable and/or have a symmetric distribution, and shows that, even in the presence of nuisance effects, these permutation inferences are powerful while providing excellent control of false positives in a wide range of common and relevant imaging research scenarios.

2,756 citations


Cites background from "Power failure: why small sample siz..."

  • ...There are many reasonswhy larger samples aremore appropriate (see Button et al. (2013) for a recent review), and inwhat concerns permutation methods, larger samples allow smaller p-values, improve the variance estimates for each VG (which are embodied in the weighting matrix under restricted exchangeability), and allow finer control over the familywise error rate....

    [...]

Journal ArticleDOI
TL;DR: This work argues for the adoption of measures to optimize key elements of the scientific process: methods, reporting and dissemination, reproducibility, evaluation and incentives, in the hope that this will facilitate action toward improving the transparency, reproducible and efficiency of scientific research.
Abstract: Improving the reliability and efficiency of scientific research will increase the credibility of the published scientific literature and accelerate discovery. Here we argue for the adoption of measures to optimize key elements of the scientific process: methods, reporting and dissemination, reproducibility, evaluation and incentives. There is some evidence from both simulations and empirical studies supporting the likely effectiveness of these measures, but their broad adoption by researchers, institutions, funders and journals will require iterative evaluation and improvement. We discuss the goals of these measures, and how they can be implemented, in the hope that this will facilitate action toward improving the transparency, reproducibility and efficiency of scientific research.

1,951 citations

Journal ArticleDOI
TL;DR: Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant as discussed by the authors, and there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists.
Abstract: Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so-and yet these misinterpretations dominate much of the scientific literature In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power We conclude with guidelines for improving statistical interpretation and reporting

1,584 citations

References
More filters
Journal ArticleDOI
TL;DR: G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested.
Abstract: G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Windows Vista, and Mac OS X 10.4) and covers many different statistical tests of thet, F, and χ2 test families. In addition, it includes power analyses forz tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free.

40,195 citations

Journal ArticleDOI
TL;DR: The four articles in this special section onMeta-analysis illustrate some of the complexities entailed in meta-analysis methods and contributes both to advancing this methodology and to the increasing complexities that can befuddle researchers.
Abstract: During the past 30 years, meta-analysis has been an indispensable tool for revealing the hidden meaning of our research literatures. The four articles in this special section on meta-analysis illus...

20,272 citations

Journal ArticleDOI
TL;DR: Most of the papers surveyed did not report using randomisation or blinding to reduce bias in animal selection and outcome assessment, consistent with reviews of many research areas, including clinical studies, published in recent years.
Abstract: animals used (i.e., species/strain, sex, and age/weight). Most of the papers surveyed did not report using randomisation (87%) or blinding (86%) to reduce bias in animal selection and outcome assessment. Only 70% of the publications that used statistical methods fully described them and presented the results with a measure of precision or variability [5]. These findings are a cause for concern and are consistent with reviews of many research areas, including clinical studies, published in recent years [2–22].

6,271 citations

15 Aug 2006
TL;DR: In this paper, the authors discuss the implications of these problems for the conduct and interpretation of research and suggest that claimed research findings may often be simply accurate measures of the prevailing bias.
Abstract: There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser pre-selection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

5,003 citations

Journal ArticleDOI
01 Aug 2005-Chance
TL;DR: In this paper, the authors discuss the implications of these problems for the conduct and interpretation of research and conclude that the probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and the ratio of true to no relationships among the relationships probed in each scientifi c fi eld.
Abstract: Summary There is increasing concern that most current published research fi ndings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientifi c fi eld. In this framework, a research fi nding is less likely to be true when the studies conducted in a fi eld are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater fl exibility in designs, defi nitions, outcomes, and analytical modes; when there is greater fi nancial and other interest and prejudice; and when more teams are involved in a scientifi c fi eld in chase of statistical signifi cance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientifi c fi elds, claimed research fi ndings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research. It can be proven that most claimed research fi ndings are false.

4,999 citations