scispace - formally typeset
Open AccessJournal ArticleDOI

Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse.

Reads0
Chats0
TLDR
Full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone, and favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of non-significant results.
Abstract
Fitting generalised linear models (GLMs) with more than one predictor has become the standard method of analysis in evolutionary and behavioural research. Often, GLMs are used for exploratory data analysis, where one starts with a complex full model including interaction terms and then simplifies by removing non-significant terms. While this approach can be useful, it is problematic if significant effects are interpreted as if they arose from a single a priori hypothesis test. This is because model selection involves cryptic multiple hypothesis testing, a fact that has only rarely been acknowledged or quantified. We show that the probability of finding at least one ‘significant’ effect is high, even if all null hypotheses are true (e.g. 40% when starting with four predictors and their two-way interactions). This probability is close to theoretical expectations when the sample size (N) is large relative to the number of predictors including interactions (k). In contrast, type I error rates strongly exceed even those expectations when model simplification is applied to models that are over-fitted before simplification (low N/k ratio). The increase in false-positive results arises primarily from an overestimation of effect sizes among significant predictors, leading to upward-biased effect sizes that often cannot be reproduced in follow-up studies (‘the winner's curse’). Despite having their own problems, full model tests and P value adjustments can be used as a guide to how frequently type I errors arise by sampling variation alone. We favour the presentation of full models, since they best reflect the range of predictors investigated and ensure a balanced representation also of non-significant results.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Production and comprehension of gestures between orang-utans (Pongo pygmaeus) in a referential communication game

TL;DR: Across conditions one donor performed well individually, and as a group orang-utans’ comprehension performance tended towards significance, which is explained on the grounds that comprehension required inferences that they found difficult – but not impossible.
Journal ArticleDOI

Call and be counted! Can we reliably estimate the number of callers in the indri's (Indri indri) song?

TL;DR: This work contributed data on the vocal chorusing of the indri lemurs, which emit howling cries, known as songs, uttered by two to five individuals, and found that the Acoustic Complexity Index positively correlated with the real chorus size, showing that an automated analysis of the chorus may provide information about the number of singers.
Journal ArticleDOI

Yawn contagion promotes motor synchrony in wild lions, Panthera leo

TL;DR: This article found that spontaneous yawning was particularly frequent when the lions were relaxed and, in agreement with the 24-hour activity cycle typical of the species, was similarly distributed over the night and day.
Journal ArticleDOI

Live long and prosper: durable benefits of early-life care in banded mongooses

TL;DR: It is shown in cooperatively breeding banded mongooses (Mungos mungo) that care received in the first 3 months of life has lifelong fitness benefits for both male and female recipients, and suggests that similar effects are likely to be widespread in social animals more generally.
Journal ArticleDOI

Children’s developing metaethical judgments

TL;DR: It is found that 9-year-olds, but not younger children, were more likely to judge that both parties could be right when a normative ingroup judge disagreed with an antinormative extraterrestrial judge than when the ant inormative judge was another ingroup individual.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Book

Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach

TL;DR: The second edition of this book is unique in that it focuses on methods for making formal statistical inference from all the models in an a priori set (Multi-Model Inference).
Book

Multiple Regression: Testing and Interpreting Interactions

TL;DR: In this article, the effects of predictor scaling on the coefficients of regression equations are investigated. But, they focus mainly on the effect of predictors scaling on coefficients of regressions.
Book

Discovering Statistics Using SPSS

TL;DR: Suitable for those new to statistics as well as students on intermediate and more advanced courses, the book walks students through from basic to advanced level concepts, all the while reinforcing knowledge through the use of SAS(R).
Journal ArticleDOI

A Simple Sequentially Rejective Multiple Test Procedure

TL;DR: In this paper, a simple and widely accepted multiple test procedure of the sequentially rejective type is presented, i.e. hypotheses are rejected one at a time until no further rejections can be done.