scispace - formally typeset
Open AccessJournal ArticleDOI

Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models:

Reads0
Chats0
TLDR
In this article, the effect of the number of events per variable (EPV) on the relative performance of three different methods for assessing the predictive accuracy of a logistic regression model: apparent performance in the analysis sample, split-sample validation and optimism correction using bootstrap methods.
Abstract
We conducted an extensive set of empirical analyses to examine the effect of the number of events per variable (EPV) on the relative performance of three different methods for assessing the predictive accuracy of a logistic regression model: apparent performance in the analysis sample, split-sample validation, and optimism correction using bootstrap methods. Using a single dataset of patients hospitalized with heart failure, we compared the estimates of discriminatory performance from these methods to those for a very large independent validation sample arising from the same population. As anticipated, the apparent performance was optimistically biased, with the degree of optimism diminishing as the number of events per variable increased. Differences between the bootstrap-corrected approach and the use of an independent validation sample were minimal once the number of events per variable was at least 20. Split-sample assessment resulted in too pessimistic and highly uncertain estimates of model performance. Apparent performance estimates had lower mean squared error compared to split-sample estimates, but the lowest mean squared error was obtained by bootstrap-corrected optimism estimates. For bias, variance, and mean squared error of the performance estimates, the penalty incurred by using split-sample validation was equivalent to reducing the sample size by a proportion equivalent to the proportion of the sample that was withheld for model validation. In conclusion, split-sample validation is inefficient and apparent performance is too optimistic for internal validation of regression-based prediction models. Modern validation methods, such as bootstrap-based optimism correction, are preferable. While these findings may be unsurprising to many statisticians, the results of the current study reinforce what should be considered good statistical practice in the development and validation of clinical prediction models.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Prediction models need appropriate internal, internal-external, and external validation

TL;DR: It is recently confirmed that a split sample approach with50% held out leads to models with a suboptimal perfor-mance, that is, models with unstable and on average the same performance as obtained with half the sample size, so strongly advise against random split sample approaches in small development samples.
Journal ArticleDOI

PROBAST : A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration

TL;DR: The rationale behind the domains and signaling questions, how to use them, and how to reach domain-level and overall judgments about ROB and applicability of primary studies to a review question are described.
Journal ArticleDOI

Minimum sample size for developing a multivariable prediction model: PART II ‐ binary and time‐to‐event outcomes

TL;DR: The minimum values of n and E (and subsequently the minimum number of events per predictor parameter, EPP) should be calculated to meet the following three criteria: small optimism in predictor effect estimates as defined by a global shrinkage factor of ≥0.9, aim to reduce overfitting conditional on a chosen p, and require prespecification of the model's anticipated Cox‐Snell R2.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Book

An introduction to the bootstrap

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Book

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Journal ArticleDOI

Classification and regression trees

TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.
Related Papers (5)