An Introduction to Classification and Regression Tree (CART) Analysis

Open Access

An Introduction to Classification and Regression Tree (CART) Analysis

Chats0

TLDR

A common goal of many clinical research studies is the development of a reliable clinical decision rule, which can be used to classify new patients into clinically-important categories, and there are a number of reasons for these difficulties.

Abstract:

Introduction A common goal of many clinical research studies is the development of a reliable clinical decision rule, which can be used to classify new patients into clinically-important categories. Examples of such clinical decision rules include triage rules, whether used in the out-of-hospital setting or in the emergency department, and rules used to classify patients into various risk categories so that appropriate decisions can be made regarding treatment or hospitalization. Traditional statistical methods are cumbersome to use, or of limited utility, in addressing these types of classification problems. There are a number of reasons for these difficulties. First, there are generally many possible " predictor " variables which makes the task of variable selection difficult. Traditional statistical methods are poorly suited for this sort of multiple comparison. Second, the predictor variables are rarely nicely distributed. Many clinical variables are not normally distributed and different groups of patients may have markedly different degrees of variation or variance. Third, complex interactions or patterns may exist in the data. For example, the value of one variable (e.g., age) may substantially affect the importance of another variable (e.g., weight). These types of interactions are generally difficult to model, and virtually impossible to model when the number of interactions and variables becomes substantial. Fourth, the results of traditional methods may be difficult to use. For example, a multivariate logistic regression model yields a probability of disease, which can be calculated using the regression coefficients and the characteristics of the patient, yet such models are rarely utilized in clinical practice. Clinicians generally do not think in terms of probability but, rather in terms of categories, such as " low risk " versus " high risk. " Regardless of the statistical methodology being used, the creation of a clinical decision rule requires a relatively large dataset. For each patient in the dataset, one variable (the dependent variable), records whether or not that patient had the condition which we hope to predic t accurately in future patients. Examples might include significant injury after trauma, myocardial infarction, or subarachnoid hemorrhage in the setting of headache. In addition, other variables record the values of patient characteristics which we believe might help us to predict the value of the dependent variable. For example, if one hopes to predict the presence of subarachnoid hemorrhage, a possible predictor variable might be whether or not the patient's headache was sudden in onset; another possible …

An Introduction to Classification and Regression Tree (CART) Analysis

Citations

Risk Stratification for In-Hospital Mortality in Acutely Decompensated Heart Failure: Classification and Regression Tree Analysis

Decreased beta-amyloid1-42 and increased tau levels in cerebrospinal fluid of patients with Alzheimer disease.

Regional patterns of agricultural land use and deforestation in Colombia

Serum Drug Concentrations Predictive of Pulmonary Tuberculosis Outcomes

Genetic influence on variability in human acute experimental pain sensitivity associated with gender, ethnicity and psychological temperament.

References

Classification and regression trees

Predictive Factors of Restenosis After Coronary Stent Placement

A Classification Tree Approach to the Development of Actuarial Violence Risk Assessment Tools

Classification and regression tree analysis of 1000 consecutive patients with unknown primary carcinoma.

Predictive value of history and physical examination in patients with suspected ectopic pregnancy.

Related Papers (5)

Classification and Regression Trees.

Random Forests

Classification and regression trees: a powerful yet simple technique for ecological data analysis

C4.5: Programs for Machine Learning

Data Mining: Concepts and Techniques