Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse.

doi:10.1007/S00265-010-1038-5

Open AccessJournal ArticleDOI

Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse.

Wolfgang Forstmeier, +1 more

- 01 Jan 2011 -

Behavioral Ecology and Sociobiology

- Vol. 65, Iss: 1, pp 47-55

Chats0

TLDR

Full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone, and favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of non-significant results.

Abstract:

Fitting generalised linear models (GLMs) with more than one predictor has become the standard method of analysis in evolutionary and behavioural research. Often, GLMs are used for exploratory data analysis, where one starts with a complex full model including interaction terms and then simplifies by removing non-significant terms. While this approach can be useful, it is problematic if significant effects are interpreted as if they arose from a single a priori hypothesis test. This is because model selection involves cryptic multiple hypothesis testing, a fact that has only rarely been acknowledged or quantified. We show that the probability of finding at least one ‘significant’ effect is high, even if all null hypotheses are true (e.g. 40% when starting with four predictors and their two-way interactions). This probability is close to theoretical expectations when the sample size (N) is large relative to the number of predictors including interactions (k). In contrast, type I error rates strongly exceed even those expectations when model simplification is applied to models that are over-fitted before simplification (low N/k ratio). The increase in false-positive results arises primarily from an overestimation of effect sizes among significant predictors, leading to upward-biased effect sizes that often cannot be reproduced in follow-up studies (‘the winner's curse’). Despite having their own problems, full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone. We favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of non-significant results.

Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse.

Citations

Who is who matters—The effects of pseudoreplication in stable isotope analyses

Endocrine changes related to dog domestication: Comparing urinary cortisol and oxytocin in hand-raised, pack-living dogs and wolves

Direct benefits from choosing a virgin male in the European grapevine moth, Lobesia botrana

Acute psychosocial stress alters thalamic network centrality

Testosterone production, sexually dimorphic morphology, and digit ratio in the dark-eyed junco

References

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach

Multiple Regression: Testing and Interpreting Interactions

Discovering Statistics Using SPSS

A Simple Sequentially Rejective Multiple Test Procedure

Related Papers (5)

R: A language and environment for statistical computing.

Fitting Linear Mixed-Effects Models Using lme4

Observational study of behavior: sampling methods.

An R Companion to Applied Regression

Experimental Design and Data Analysis for Biologists