scispace - formally typeset
Open AccessJournal ArticleDOI

Comparación entre árboles de regresión CART y regresión lineal

Juan Sepúlveda, +1 more
- Vol. 6, Iss: 2, pp 175-195
Reads0
Chats0
TLDR
Predictive levels of linear regression with CART are compared through simulation and it was found that when the correct linear regression model is adjusted to the data, the prediction error oflinear regression is always lower than that of CART.
Abstract
Linear regression is the most widely used method in statistics to predict values of continuous variables due to its easy interpretation, but in many situations the suppositions to apply the model are not met and some users tend to force them leading them to erroneous conclusions. CART regression trees is a regression alternative that does not require suppositions on the data to be analyzed and is a method of easy interpretation of results. This work compares predictive levels of linear regression with CART through simulation. In general, it was found that when the correct linear regression model is adjusted to the data, the prediction error of linear regression is always lower than that of CART. It was also found that when linear regression model is erroneously adjusted to the data, the prediction error of CART is lower than that of linear regression only when it has a sufficiently large amount of data.

read more

Content maybe subject to copyright    Report

Comparaci´on entre
´
Arboles de
Regresi´on CART y Regresi´on Lineal
Juan Felipe D´ıaz Sep´ulveda
Universi d ad Nacional de Colombia
Facultad de Ciencias, Escuela de Estad´ıstica
Medell´ın, Colombia
2012


Comparaci´on entre
´
Arboles de
Regresi´on CART y Regresi´on Lineal
Juan Felipe D´ıaz Sep´ulveda
Trabajo de grado presentado como requisito parcial para optar al t´ıtulo de :
Magister en Ciencias - Estad´ıstica
Director:
Ph.D. J uan Carlos Corr e a Morales
Universi d ad Nacional de Colombia
Facultad de Ciencias, Escuela de Estad´ıstica
Medell´ın, Colombia
2012


v
Resumen
La Regresi´on lineal es el etodo as u sa d o en estad´ıstica para predecir valores de variables
continuas debido a su acil interpretaci´on, per o en muchas situaciones los supuestos para
aplicar el modelo no se cumplen y algunos u s u ar i os tienden a forzarlos lle vando a conclu-
siones err´oneas. Los ´arbol es de regresi´on CART son una alternativa de regresi´on que no
requiere supuestos sobre los datos a anal i zar y es un etodo de acil interpretaci´on de l os
resultados. En este trabajo se comparan a nivel predictivo la Regresi´on lineal con CART
mediant e si mulaci´on. En general, se encontr´o que cuan d o se ajusta el modelo de regresi´on
lineal correcto a los dat os, el error de predicci´on de regresi´on lineal siempre es menor que el
de CART. Tambi´en se encontr´o que cuando se ajusta err´oneamente un modelo de regresi´on
lineal a los datos, el error de predicci ´on de CART es menor qu e el de regresi´o n lineal olo
cuando se tiene una cantidad de datos suficientemente grande.
Palabras clave: Simulaci ´on, Error de predicci´on, Regresi´on Lineal,
´
Arboles de clasificaci´on y Regre-
si´on CART.
Abstract
Linear regression is the stat i stical method most used to predict values of continuous variables be-
cause of its easy interpretation, but in many situations to appl y the model assumptions are not
met and some users tend to force leading to erroneous conclusions. CART regression trees are an
alternative regression requires no assumptions about the data to be analyzed and a method of
easy interpr e tat i on of th e r e su l ts. In th i s paper we compare the predictive level from both CART
and li ne ar regression through simulation. In general, it was found that when adjusting the cor r e ct
linear regression model to the data, the linear regression prediction error is always less than the
CART prediction error. We also found that when adjusted erroneously linear regression model to
the data, CART prediction error is smaller than the linear regression prediction error only when it
has a sufficiently large amount of data.
Keywords: Simulation, Prediction error, Li ne ar Regression, CART: Classificati on and Regression
Trees.

Citations
More filters
Journal ArticleDOI

Modelos y metodologías de credit score para personas naturales: una revisión literaria

TL;DR: In this paper, a literature review on risk scoring models for credit granting in personal banking is provided, with an up-to-date list supported by scholars and experts in the field.
Journal ArticleDOI

Modelo para la valoración de la calidad de vida: un análisis en teletrabajo o trabajo en casa conceptualizado en épocas de Covid-19

TL;DR: In this paper , a modelo de valoración de la calidad de vida de los trabajadores that se encuentren bajo teletrabajo o trabajo en casa, adicional a esto, promover un modelo estadístico en R que facilite la interpretación and análisis, con estó, la empresa que lo realice podrá formular planteamientos correctivos que permita mejorar el nivel de satisfacción del personal, con ello, el desempeño y productividad laboral.
Journal ArticleDOI

The Impact of Candidates’ Profile and Campaign Decisions in Electoral Results: A Data Analytics Approach

TL;DR: In this paper, the influence of the political profile of candidates and their campaign effort (characterized by electoral expenditure and by territorial deployment strategies retrieved from social networks activity) on the electoral results was analyzed by using three of the most frequent data analyitcs algorithms.

A Model for Avalanche Forecasting on the Bonaigua Pass, Spain, Using Classification Trees

TL;DR: In this paper, the authors used a classification tree method to determine periods of significant avalanche activity in terms of the predefined avalanche day concept, which is performed for the entire road in a combined analysis and also for three individual sub-areas within the Bonaigua Pass.

Influencia de las variables fisicoquímicas en la estructura de tallas y distribución de Meoma ventricosa grandis (Echinodermata: Brissidae) dentro del canal Boca Chica, Acapulco, México Influence of physicochemical variables in the size structure and distribution of Meoma ventricosa grandis (Echinodermata: Brissidae) within the Boca Chica Channel, Acapulco, Mexico

TL;DR: In this paper, the authors examined different physicochemical variables (sediment texture, temperature, salinity, dissolved oxygen, turbidity, nitrite, nitrate, ammonium, phosphate, organic matter of marine snow, organic matters of sediment and organic contents of digestive tract) and their influence on the size structure and distribution of Meoma ventricosa grandis.
References
More filters
Journal ArticleDOI

Unbiased Recursive Partitioning: A Conditional Inference Framework

TL;DR: A unified framework for recursive partitioning is proposed which embeds tree-structured regression models into a well defined theory of conditional inference procedures and it is shown that the predicted accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection.

An Introduction to Recursive Partitioning Using the RPART Routines

TL;DR: The tree is constructed: Splitting criteria, building the tree, variable importance, and more.
BookDOI

Modern Multivariate Statistical Techniques

TL;DR: The identity matrices have different dimensions — In the top row of each matrix, the identity matrix has dimension r and in the bottom row it has dimension s.

An Introduction to Classification and Regression Tree (CART) Analysis

TL;DR: A common goal of many clinical research studies is the development of a reliable clinical decision rule, which can be used to classify new patients into clinically-important categories, and there are a number of reasons for these difficulties.

Regression trees with unbiased variable selection and interaction detection

Wei-Yin Loh
TL;DR: The proposed algorithm, GUIDE, is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure and allows fast computation speed, natural ex- tension to data sets with categorical variables, and direct detection of local two- variable interactions.