scispace - formally typeset
Journal ArticleDOI

Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors

Frank E. Harrell, +2 more
- 29 Feb 1996 - 
- Vol. 15, Iss: 4, pp 361-387
TLDR
In this article, an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, which are particularly needed for binary, ordinal, and time-to-event outcomes.
Abstract
Multivariable regression models are powerful tools that are used frequently in studies of clinical outcomes. These models can use a mixture of categorical and continuous variables and can handle partially observed (censored) responses. However, uncritical application of modelling techniques can result in models that poorly fit the dataset at hand, or, even more likely, inaccurately predict outcomes on new subjects. One must know how to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. Measurement of predictive accuracy can be difficult for survival time data in the presence of censoring. We discuss an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross-validation, before using predictions in a new data series. We discuss some of the hazards of poorly fitted and overfitted regression models and present one modelling strategy that avoids many of the problems discussed. The methods described are applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes. Methods are illustrated with a survival analysis in prostate cancer using Cox regression.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Prediction of Coronary Heart Disease Using Risk Factor Categories

TL;DR: A simple coronary disease prediction algorithm was developed using categorical variables, which allows physicians to predict multivariate CHD risk in patients without overt CHD.
Journal ArticleDOI

Predictive habitat distribution models in ecology

TL;DR: A review of predictive habitat distribution modeling is presented, which shows that a wide array of models has been developed to cover aspects as diverse as biogeography, conservation biology, climate change research, and habitat or species management.
Journal ArticleDOI

General Cardiovascular Risk Profile for Use in Primary Care The Framingham Heart Study

TL;DR: A sex-specific multivariable risk factor algorithm can be conveniently used to assess general CVD risk and risk of individual CVD events (coronary, cerebrovascular, and peripheral arterial disease and heart failure) and can be used to quantify risk and to guide preventive care.
Journal ArticleDOI

Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond

TL;DR: Two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables, are introduced that offer incremental information over the AUC and are proposed to be considered in addition to the A UC when assessing the performance of newer biomarkers.
Journal ArticleDOI

Validation of Clinical Classification Schemes for Predicting Stroke: Results From the National Registry of Atrial Fibrillation

TL;DR: The 2 existing classification schemes and especially a new stroke risk index, CHADS, can quantify risk of stroke for patients who have AF and may aid in selection of antithrombotic therapy.
References
More filters
Book

An introduction to the bootstrap

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Book

Applied Logistic Regression

TL;DR: Hosmer and Lemeshow as discussed by the authors provide an accessible introduction to the logistic regression model while incorporating advances of the last decade, including a variety of software packages for the analysis of data sets.
Journal ArticleDOI

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

James A. Hanley, +1 more
- 01 Apr 1982 - 
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Book

Principal Component Analysis

TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.
Journal ArticleDOI

Robust Locally Weighted Regression and Smoothing Scatterplots

TL;DR: Robust locally weighted regression as discussed by the authors is a method for smoothing a scatterplot, in which the fitted value at z k is the value of a polynomial fit to the data using weighted least squares, where the weight for (x i, y i ) is large if x i is close to x k and small if it is not.
Related Papers (5)