scispace - formally typeset
Open AccessJournal ArticleDOI

Comparación entre árboles de regresión CART y regresión lineal

Juan Sepúlveda, +1 more
- Vol. 6, Iss: 2, pp 175-195
Reads0
Chats0
TLDR
Predictive levels of linear regression with CART are compared through simulation and it was found that when the correct linear regression model is adjusted to the data, the prediction error oflinear regression is always lower than that of CART.
Abstract
Linear regression is the most widely used method in statistics to predict values of continuous variables due to its easy interpretation, but in many situations the suppositions to apply the model are not met and some users tend to force them leading them to erroneous conclusions. CART regression trees is a regression alternative that does not require suppositions on the data to be analyzed and is a method of easy interpretation of results. This work compares predictive levels of linear regression with CART through simulation. In general, it was found that when the correct linear regression model is adjusted to the data, the prediction error of linear regression is always lower than that of CART. It was also found that when linear regression model is erroneously adjusted to the data, the prediction error of CART is lower than that of linear regression only when it has a sufficiently large amount of data.

read more

Content maybe subject to copyright    Report

Comparaci´on entre
´
Arboles de
Regresi´on CART y Regresi´on Lineal
Juan Felipe D´ıaz Sep´ulveda
Universi d ad Nacional de Colombia
Facultad de Ciencias, Escuela de Estad´ıstica
Medell´ın, Colombia
2012


Comparaci´on entre
´
Arboles de
Regresi´on CART y Regresi´on Lineal
Juan Felipe D´ıaz Sep´ulveda
Trabajo de grado presentado como requisito parcial para optar al t´ıtulo de :
Magister en Ciencias - Estad´ıstica
Director:
Ph.D. J uan Carlos Corr e a Morales
Universi d ad Nacional de Colombia
Facultad de Ciencias, Escuela de Estad´ıstica
Medell´ın, Colombia
2012


v
Resumen
La Regresi´on lineal es el etodo as u sa d o en estad´ıstica para predecir valores de variables
continuas debido a su acil interpretaci´on, per o en muchas situaciones los supuestos para
aplicar el modelo no se cumplen y algunos u s u ar i os tienden a forzarlos lle vando a conclu-
siones err´oneas. Los ´arbol es de regresi´on CART son una alternativa de regresi´on que no
requiere supuestos sobre los datos a anal i zar y es un etodo de acil interpretaci´on de l os
resultados. En este trabajo se comparan a nivel predictivo la Regresi´on lineal con CART
mediant e si mulaci´on. En general, se encontr´o que cuan d o se ajusta el modelo de regresi´on
lineal correcto a los dat os, el error de predicci´on de regresi´on lineal siempre es menor que el
de CART. Tambi´en se encontr´o que cuando se ajusta err´oneamente un modelo de regresi´on
lineal a los datos, el error de predicci ´on de CART es menor qu e el de regresi´o n lineal olo
cuando se tiene una cantidad de datos suficientemente grande.
Palabras clave: Simulaci ´on, Error de predicci´on, Regresi´on Lineal,
´
Arboles de clasificaci´on y Regre-
si´on CART.
Abstract
Linear regression is the stat i stical method most used to predict values of continuous variables be-
cause of its easy interpretation, but in many situations to appl y the model assumptions are not
met and some users tend to force leading to erroneous conclusions. CART regression trees are an
alternative regression requires no assumptions about the data to be analyzed and a method of
easy interpr e tat i on of th e r e su l ts. In th i s paper we compare the predictive level from both CART
and li ne ar regression through simulation. In general, it was found that when adjusting the cor r e ct
linear regression model to the data, the linear regression prediction error is always less than the
CART prediction error. We also found that when adjusted erroneously linear regression model to
the data, CART prediction error is smaller than the linear regression prediction error only when it
has a sufficiently large amount of data.
Keywords: Simulation, Prediction error, Li ne ar Regression, CART: Classificati on and Regression
Trees.

Citations
More filters
Journal ArticleDOI

Modelos y metodologías de credit score para personas naturales: una revisión literaria

TL;DR: In this paper, a literature review on risk scoring models for credit granting in personal banking is provided, with an up-to-date list supported by scholars and experts in the field.
Journal ArticleDOI

Modelo para la valoración de la calidad de vida: un análisis en teletrabajo o trabajo en casa conceptualizado en épocas de Covid-19

TL;DR: In this paper , a modelo de valoración de la calidad de vida de los trabajadores that se encuentren bajo teletrabajo o trabajo en casa, adicional a esto, promover un modelo estadístico en R que facilite la interpretación and análisis, con estó, la empresa que lo realice podrá formular planteamientos correctivos que permita mejorar el nivel de satisfacción del personal, con ello, el desempeño y productividad laboral.
Journal ArticleDOI

The Impact of Candidates’ Profile and Campaign Decisions in Electoral Results: A Data Analytics Approach

TL;DR: In this paper, the influence of the political profile of candidates and their campaign effort (characterized by electoral expenditure and by territorial deployment strategies retrieved from social networks activity) on the electoral results was analyzed by using three of the most frequent data analyitcs algorithms.

A Model for Avalanche Forecasting on the Bonaigua Pass, Spain, Using Classification Trees

TL;DR: In this paper, the authors used a classification tree method to determine periods of significant avalanche activity in terms of the predefined avalanche day concept, which is performed for the entire road in a combined analysis and also for three individual sub-areas within the Bonaigua Pass.

Influencia de las variables fisicoquímicas en la estructura de tallas y distribución de Meoma ventricosa grandis (Echinodermata: Brissidae) dentro del canal Boca Chica, Acapulco, México Influence of physicochemical variables in the size structure and distribution of Meoma ventricosa grandis (Echinodermata: Brissidae) within the Boca Chica Channel, Acapulco, Mexico

TL;DR: In this paper, the authors examined different physicochemical variables (sediment texture, temperature, salinity, dissolved oxygen, turbidity, nitrite, nitrate, ammonium, phosphate, organic matter of marine snow, organic matters of sediment and organic contents of digestive tract) and their influence on the size structure and distribution of Meoma ventricosa grandis.
References
More filters

Comparing regression trees with neural networks in aerobic fitness approximation

TL;DR: A method for aerobic fitness measurement using regression trees and neural networks as candidates for the task and a comparison of the results is presented.
Book ChapterDOI

Ordinal Classification Trees Based on Impurity Measures

TL;DR: A new criterion for generating classification trees in the case when the response variable is ordered categorical is introduced, permitting the application of an algorithm recently proposed by Mola and Siciliano to fasten the process of growing the maximal tree.
Book ChapterDOI

Proximity Measures Between Classification Trees

TL;DR: In this article, the authors analyze features and limitations of these proximity measures and suggest a normalizing factor for the distance defined by Shannon and Banks; furthermore they propose a new dissimilarity measure that considers both the aspects explored separately by the previous ones.
Book ChapterDOI

A Modal Symbolic Pattern Classifier

TL;DR: A new algorithm to classify symbolic data which is based on a dissimilarity function which measures the difference in content and in position between them and a particular kind of simulated images is classified according to this approach.
Book ChapterDOI

The STP Procedure as Overfitting Avoidance Tool in Classification Trees

TL;DR: STP procedure studing the dipendence between response variable and split variables, applied to both simulations and real examples can evaluate the presence of overfitting, preserving only significant subdivisions.