The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant
Summary (2 min read)
1 Introduction
- A common statistical error is to summarize comparisons by statistical significance and then draw a sharp distinction between significant and non-significant results.
- The approach of summarizing by statistical significance has a number of pitfalls, most of which are covered in standard statistics courses but one that the authors believe is less well known.
- The authors refer to the fact that changes in statistical significance are not themselves significant.
- If the estimated effect of a drug is to decrease blood pressure by 0.10 with a standard error of 0.03, this would be statistically significant but probably not important in practice (or so the authors suppose, given their general knowledge that blood pressure values are typically around 100).
2 Theoretical example: comparing the results of two exper-
- The first study is statistically significant at the 1% level, and the second is not at all statistically significant, being only one standard error away from 0.
- Both find a positive effect but with much different magnitudes.
- In fact, the third study finds an effect size much closer to that of the second study, but now because of the sample size it attains significance.
- Declarations of statistical significance are often associated with decision making.
3 Applied example: homosexuality and the number of older
- The paper, “Biological versus nonbiological older brothers and men’s sexual orientation,” (Bogaert, 2006), appeared recently in the Proceedings of the National Academy of Sciences and was picked up by several leading science news organizations (Bower, 2006, Motluk, 2006, Staedter, 2006).
- Only the number of biological older brothers reared with the participant, and not any other sibling characteristic including the number of nonbiological brothers reared with the participant, was significantly related to sexual orientation.
- The conclusions appear to be based on a comparison of significance (for the coefficient of the number of older brothers) with nonsignificance (for the other coefficients), even though the differences between the coefficients do not appear to be statistically significant.
- (Again the authors cannot be certain but they strongly suspect so from the graph and the table.).
- Given that the 95% confidence level is standard (and the authors are pretty sure the paper would not have been published had the results not been statistically significant at that level), it is appropriate that the rule should be applied consistently to hypotheses consistent with the data.
4 Applied example: health effects of low-frequency electro-
- The issue of comparisons between significance and non-significance is of even more concern in the increasingly common setting where there are a large number of comparisons.
- The researchers used this sort of display to hypothesize that one process was occurring at 255, 285, and 315 Hz (where effects were highly significant), another at 135 and 225 Hz (where effects were only moderately significant), and so forth.
- The estimates are all of relative calcium efflux, so that an effect of 0.1, for example, corresponds to a 10% increase compared to the control condition.
- At the very least, it is more informative to show the estimated treatment effect and standard error at each frequency, as in Figure 2b.
- The authors simple hierarchical model is not intended to be definitive, merely a model that the authors believe improves upon the separate judgments of statistical significance for each experiment.
Did you find this useful? Give us your feedback
Citations
4,361 citations
2,437 citations
Cites background from "The Difference Between “Significant..."
...Difference in significance does not imply significantly different (Gelman & Stern, 2006)....
[...]
1,761 citations
1,584 citations
1,354 citations
References
16,079 citations
2,251 citations
318 citations
242 citations
"The Difference Between “Significant..." refers background in this paper
...The article referred back to Blanchard and Bogaert (1996), which had the graph and table shown in Figure 1, along with the following summary: Significant beta coefficients differ statistically from The American Statistician, November 2006, Vol. 60, No. 4 329 zero and, when positive, indicate a…...
[...]
...We were curious about this—why older brothers and not older sisters? The article referred back to Blanchard and Bogaert (1996), which had the graph and table shown in Figure 1, along with the following summary:...
[...]
...From Blanchard and Bogaert (1996): (a) mean numbers of older and younger brothers and sisters for 302 homosexual men and 302 matched heterosexual men, (b) logistic regression of sexual orientation on family variables from these data....
[...]
Related Papers (5)
Frequently Asked Questions (4)
Q2. What is the way to estimate the effects at each frequency?
The multilevel analysis can be seen as a way to estimate the effects at each frequency j, without setting apparently “non-significant” results to zero.
Q3. What is the way to handle the large number of experiments in a single data analysis?
Another way to handle the large number of related experiments in a single data analysis is to fit a multilevel model of the sort used in meta-analysis.
Q4. What is the common mistake of the researchers in the chick-brain experiment?
The researchers in the chick-brain experiment made the common mistake of using statistical significance as a criterion for separating the estimates of different effects, an approach that does not make sense.