scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Excess significance bias in the literature on brain volume abnormalities.

01 Aug 2011-Archives of General Psychiatry (American Medical Association)-Vol. 68, Iss: 8, pp 773-780
TL;DR: There are too many studies with statistically significant results in the literature on brain volume abnormalities that suggest strong biases in the Literature, with selective outcome reporting and selective analyses reporting being possible explanations.
Abstract: Context Many studies report volume abnormalities in diverse brain structures in patients with various mental health conditions. Objective To evaluate whether there is evidence for an excess number of statistically significant results in studies of brain volume abnormalities that suggest the presence of bias in the literature. Data Sources PubMed (articles published from January 2006 to December 2009). Study Selection Recent meta-analyses of brain volume abnormalities in participants with various mental health conditions vs control participants with 6 or more data sets included, excluding voxel-based morphometry. Data Extraction Standardized effect sizes were extracted in each data set, and it was noted whether the results were“positive” (P Data Synthesis From 8 articles, 41 meta-analyses with 461 data sets were evaluated (median, 10 data sets per meta-analysis) pertaining to 7 conditions. Twenty-one of the 41 meta-analyses had found statistically significant associations, and 142 of 461 (31%) data sets had positive results. Even if the summary effect sizes of the meta-analyses were unbiased, the expected number of positive results would have been only 78.5 compared with the observed number of 142 (P Conclusion There are too many studies with statistically significant results in the literature on brain volume abnormalities. This pattern suggests strong biases in the literature, with selective outcome reporting and selective analyses reporting being possible explanations.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that the average statistical power of studies in the neurosciences is very low, and the consequences include overestimates of effect size and low reproducibility of results.
Abstract: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.

5,683 citations

Journal ArticleDOI
01 Apr 2014-BMJ
TL;DR: Evidence does not support the argument that vitamin D only supplementation increases bone mineral density or reduces the risk of fractures or falls in older people, and highly convincing evidence of a clear role of vitamin D does not exist for any outcome, but associations with a selection of outcomes are probable.
Abstract: Objective To evaluate the breadth, validity, and presence of biases of the associations of vitamin D with diverse outcomes.

791 citations

Journal ArticleDOI
TL;DR: Brain loss in schizophrenia is related to a combination of (early) neurodevelopmental processes-reflected in intracranial volume reduction-as well as illness progression.
Abstract: Although structural brain alterations in schizophrenia have been demonstrated extensively, their quantitative distribution has not been studied over the last 14 years despite advances in neuroimaging. Moreover, a volumetric meta-analysis has not been conducted in antipsychotic-naive patients. Therefore, meta-analysis on cross-sectional volumetric brain alterations in both medicated and antipsychotic-naive patients was conducted. Three hundred seventeen studies published from September 1, 1998 to January 1, 2012 comprising over 9000 patients were selected for meta-analysis, including 33 studies in antipsychotic-naive patients. In addition to effect sizes, potential modifying factors such as duration of illness, sex composition, current antipsychotic dose, and intelligence quotient matching status of participants were extracted where available. In the sample of medicated schizophrenia patients (n = 8327), intracranial and total brain volume was significantly decreased by 2.0% (effect size d = -0.17) and 2.6% (d = -0.30), respectively. Largest effect sizes were observed for gray matter structures, with effect sizes ranging from -0.22 to -0.58. In the sample of antipsychotic-naive patients (n = 771), volume reductions in caudate nucleus (d = -0.38) and thalamus (d = -0.68) were more pronounced than in medicated patients. White matter volume was decreased to a similar extent in both groups, while gray matter loss was less extensive in antipsychotic-naive patients. Gray matter reduction was associated with longer duration of illness and higher dose of antipsychotic medication at time of scanning. Therefore, brain loss in schizophrenia is related to a combination of (early) neurodevelopmental processes-reflected in intracranial volume reduction-as well as illness progression.

771 citations

Journal ArticleDOI
TL;DR: Significant gaps in methods reporting among fMRI studies are documented, and improved methodological descriptions in research reports would yield significant benefits for the field.

436 citations


Additional excerpts

  • ...Similarly, studies of putative brain volume abnormalities in patients with mental health disorders report far more positive results than would be expected given their power to detect such effects, likely reflecting the selective reporting of favorable analysis outcomes (Ioannidis, 2011)....

    [...]

Journal ArticleDOI
TL;DR: Schizophrenia is characterized by progressive gray matter volume decreases and lateral ventricular volume increases, and some of these neuroanatomical alterations may be associated with antipsychotic treatment.

431 citations


Cites background from "Excess significance bias in the lit..."

  • ...In general, ROI analyses can also be affected by publication biases: researchers could perform several exploratory analyses but report only those which yielded significant results (Ioannidis, 2011; Radua and Mataix-Cols, 2012)....

    [...]

References
More filters
Journal ArticleDOI
04 Sep 2003-BMJ
TL;DR: A new quantity is developed, I 2, which the authors believe gives a better measure of the consistency between trials in a meta-analysis, which is susceptible to the number of trials included in the meta- analysis.
Abstract: Cochrane Reviews have recently started including the quantity I 2 to help readers assess the consistency of the results of studies in meta-analyses. What does this new quantity mean, and why is assessment of heterogeneity so important to clinical practice? Systematic reviews and meta-analyses can provide convincing and reliable evidence relevant to many aspects of medicine and health care.1 Their value is especially clear when the results of the studies they include show clinically important effects of similar magnitude. However, the conclusions are less clear when the included studies have differing results. In an attempt to establish whether studies are consistent, reports of meta-analyses commonly present a statistical test of heterogeneity. The test seeks to determine whether there are genuine differences underlying the results of the studies (heterogeneity), or whether the variation in findings is compatible with chance alone (homogeneity). However, the test is susceptible to the number of trials included in the meta-analysis. We have developed a new quantity, I 2, which we believe gives a better measure of the consistency between trials in a meta-analysis. Assessment of the consistency of effects across studies is an essential part of meta-analysis. Unless we know how consistent the results of studies are, we cannot determine the generalisability of the findings of the meta-analysis. Indeed, several hierarchical systems for grading evidence state that the results of studies must be consistent or homogeneous to obtain the highest grading.2–4 Tests for heterogeneity are commonly used to decide on methods for combining studies and for concluding consistency or inconsistency of findings.5 6 But what does the test achieve in practice, and how should the resulting P values be interpreted? A test for heterogeneity examines the null hypothesis that all studies are evaluating the same effect. The usual test statistic …

45,105 citations

Journal ArticleDOI
TL;DR: It is concluded that H and I2, which can usually be calculated for published meta-analyses, are particularly useful summaries of the impact of heterogeneity, and one or both should be presented in publishedMeta-an analyses in preference to the test for heterogeneity.
Abstract: The extent of heterogeneity in a meta-analysis partly determines the difficulty in drawing overall conclusions. This extent may be measured by estimating a between-study variance, but interpretation is then specific to a particular treatment effect metric. A test for the existence of heterogeneity exists, but depends on the number of studies in the meta-analysis. We develop measures of the impact of heterogeneity on a meta-analysis, from mathematical criteria, that are independent of the number of studies and the treatment effect metric. We derive and propose three suitable statistics: H is the square root of the chi2 heterogeneity statistic divided by its degrees of freedom; R is the ratio of the standard error of the underlying mean from a random effects meta-analysis to the standard error of a fixed effect meta-analytic estimate, and I2 is a transformation of (H) that describes the proportion of total variation in study estimates that is due to heterogeneity. We discuss interpretation, interval estimates and other properties of these measures and examine them in five example data sets showing different amounts of heterogeneity. We conclude that H and I2, which can usually be calculated for published meta-analyses, are particularly useful summaries of the impact of heterogeneity. One or both should be presented in published meta-analyses in preference to the test for heterogeneity.

25,460 citations

15 Aug 2006
TL;DR: In this paper, the authors discuss the implications of these problems for the conduct and interpretation of research and suggest that claimed research findings may often be simply accurate measures of the prevailing bias.
Abstract: There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser pre-selection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

5,003 citations

Journal ArticleDOI
TL;DR: The problem of making a combined estimate has been discussed previously by Cochran and Yates and Cochran (1937) for agricultural experiments, and by Bliss (1952) for bioassays in different laboratories as discussed by the authors.
Abstract: When we are trying to make the best estimate of some quantity A that is available from the research conducted to date, the problem of combining results from different experiments is encountered. The problem is often troublesome, particularly if the individual estimates were made by different workers using different procedures. This paper discusses one of the simpler aspects of the problem, in which there is sufficient uniformity of experimental methods so that the ith experiment provides an estimate xi of u, and an estimate si of the standard error of xi . The experiments may be, for example, determinations of a physical or astronomical constant by different scientists, or bioassays carried out in different laboratories, or agricultural field experiments laid out in different parts of a region. The quantity xi may be a simple mean of the observations, as in a physical determination, or the difference between the means of two treatments, as in a comparative experiment, or a median lethal dose, or a regression coefficient. The problem of making a combined estimate has been discussed previously by Cochran (1937) and Yates and Cochran (1938) for agricultural experiments, and by Bliss (1952) for bioassays in different laboratories. The last two papers give recommendations for the practical worker. My purposes in treating the subject again are to discuss it in more general terms, to take account of some recent theoretical research, and, I hope, to bring the practical recommendations to the attention of some biologists who are not acquainted with the previous papers. The basic issue with which this paper deals is as follows. The simplest method of combining estimates made in a number of different experiments is to take the arithmetic mean of the estimates. If, however, the experiments vary in size, or appear to be of different precision, the investigator may wonder whether some kind of weighted meani would be more precise. This paper gives recommendations about the kinds of weighted mean that are appropriate, the situations in which they

4,335 citations

Journal ArticleDOI

3,250 citations