scispace - formally typeset
Journal ArticleDOI

Why Stepdown Procedures in Variable Selection

Nathan Mantel
- 01 Aug 1970 - 
- Vol. 12, Iss: 3, pp 621-625
TLDR
In this article, the authors show that the advantages of the variable selection scheme in which independent variables are successively discarded one at a time from the original full set are not known to workers in this field.
Abstract
Recent reviews have dealt with the subject of which variables to select and which to discard in multiple regression problems. Lindley (1968) emphasized that the method to be employed in any analysis should be related to the use intended for the finally fitted regression. In the report by Beale et al. (1967), the emphasis is on selecting the best subset for any specified number of retained independent variables. Here we will be concerned with pointing out the advantages of the variable selection scheme in which independent variables are successively discarded one at a time from the original full set. While these advantages are not unknown to workers in this field, they are however not appreciated by the statistical community in general. For the purposes of this demonstration it is assumed that we are in the nonsingular case so that the number of observations exceeds the number of regressor variables. Let us begin by considering economy of effort. Suppose that we were using a step-up regression procedure, ignoring for the while its theoretical deficiencies (to be discussed later). We should then first fit k simple regressions, one for each of the k regressor variables considered, selecting the single most significant individual regressor variable. Having made this selection we would proceed with k - 1 additional fits to determine which of the remaining variables in conjunction with the first selected yielded the greatest reduction in residual variation. This process is continued on so as to provide a successive selection and ordering of variables. We may even require the ordering of all k variables, leaving for later decision what critical juncture is to be employed in determining which of the k variables to retain, which to reject-if we do so we shall have made a total of k(k + 1)/2 fits, albeit they may have differed greatly in their degree of complexity. A complete stepdown regression procedure however requires but k fits, as will now be indicated. Suppose we have done a multiple regression on all k variables and wish to consider the k possible multiple regressions on all sets of k - 1 variables, that is where 1 variable has been deleted. The results for these k possible multiple regressions are implicit in the initial k-variable regression, provided we have secured the inverse matrix, or at least its diagonal, necessary for testing the significance of the fitted partial regression coefficients. The case

read more

Citations
More filters
Journal ArticleDOI

Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors

TL;DR: In this article, an easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, which are particularly needed for binary, ordinal, and time-to-event outcomes.
Book ChapterDOI

Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors

TL;DR: An easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes.

Tutorial in biostatistics multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors

TL;DR: An easily interpretable index of predictive discrimination as well as methods for assessing calibration of predicted survival probabilities are discussed, which are applicable to all regression models, but are particularly needed for binary, ordinal, and time-to-event outcomes.
BookDOI

Regression Modeling Strategies

TL;DR: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas.
Journal ArticleDOI

Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.

TL;DR: In virtually all medical domains, diagnostic and prognostic multivariable prediction models are being developed, validated, updated, and implemented with the aim to assist doctors and individuals in estimating probabilities and potentially influence their decision making.
References
More filters
Journal ArticleDOI

The discarding of variables in multivariate analysis.

TL;DR: Cut-off rules are developed that enable us to find the best solution to both problems by partial enumeration by maximizing the multiple correlation between the selected variables and the dependent variable.
Journal ArticleDOI

The Choice of Variables in Multiple Regression

TL;DR: In this paper, Packett et al. analyse the analysis of data from a multiple regression of a single variable, y, on a set of independent variables, xl, x2,...,xr.
Journal ArticleDOI

The Best Sub-Set in Multiple Regression Analysis

TL;DR: Garside as mentioned in this paper gives a procedure for comparing all sub-sets in multiple regression analysis and thereby obtaining the best sub-set of a given size in the sense of the minimum residual sum of squares.