Robust decision trees: removing outliers from databases

Open AccessProceedings Article

Robust decision trees: removing outliers from databases

- pp 174-179

TLDR

This paper examines C4.5, a decision tree algorithm that is already quite robust - few algorithms have been shown to consistently achieve higher accuracy, and extends the pruning method to fully remove the effect of outliers, and this results in improvement on many databases.

Abstract:

Finding and removing outliers is an important problem in data mining. Errors in large databases can be extremely common, so an important property of a data mining algorithm is robustness with respect to errors in the database. Most sophisticated methods in machine learning address this problem to some extent, but not fully, and can be improved by addressing the problem more directly. In this paper we examine C4.5, a decision tree algorithm that is already quite robust - few algorithms have been shown to consistently achieve higher accuracy. C4.5 incorporates a pruning scheme that partially addresses the outlier removal problem. In our ROBUST-C4.5 algorithm we extend the pruning method to fully remove the effect of outliers, and this results in improvement on many databases.

Citations

PDF

Open Access

More filters

Book

Data Mining: Practical Machine Learning Tools and Techniques

Ian H. Witten, +2 more

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Journal ArticleDOI

A Survey of Outlier Detection Methodologies

Victoria J. Hodge, +1 more

- 01 Oct 2004 -

Artificial Intelligence Review

TL;DR: A survey of contemporary techniques for outlier detection is introduced and their respective motivations are identified and distinguish their advantages and disadvantages in a comparative review.

...read moreread less

Journal ArticleDOI

Classification in the Presence of Label Noise: A Survey

Benoît Frénay, +1 more

- 01 May 2014 -

IEEE Transactions on Neural Networks

TL;DR: In this paper, label noise consists of mislabeled instances: no additional information is assumed to be available like e.g., confidences on labels.

...read moreread less

Journal ArticleDOI

Identifying mislabeled training data

Carla E. Brodley, +1 more

- 01 Jul 1999 -

Journal of Artificial Intelligence Resea...

TL;DR: This paper uses a set of learning algorithms to create classifiers that serve as noise filters for the training data and suggests that for situations in which there is a paucity of data, consensus filters are preferred, whereas majority vote filters are preferable for situations with an abundance of data.

...read moreread less

Journal ArticleDOI

Class noise vs. attribute noise: a quantitative study of their impacts

Xingquan Zhu, +1 more

- 22 Nov 2003 -

Artificial Intelligence Review

TL;DR: A systematic evaluation on the effect of noise in machine learning separates noise into two categories: class noise and attribute noise, and investigates the relationship between attribute noise and classification accuracy, the impact of noise at different attributes, and possible solutions in handling attribute noise.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Classification and Regression Trees.

John Van Ryzin, +4 more

- 01 Mar 1986 -

Journal of the American Statistical Asso...

Book

C4.5: Programs for Machine Learning

J. Ross Quinlan

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

Book

Classification and regression trees

Leo Breiman

TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

...read moreread less

Journal ArticleDOI

Generalized Additive Models.

R. A. Brown, +2 more

- 01 Jun 1991 -

Biometrics

Book

Robust Regression and Outlier Detection

Peter J. Rousseeuw, +1 more

TL;DR: This paper presents the results of a two-year study of the statistical treatment of outliers in the context of one-Dimensional Location and its applications to discrete-time reinforcement learning.

...read moreread less

Robust decision trees: removing outliers from databases

Citations

Data Mining: Practical Machine Learning Tools and Techniques

A Survey of Outlier Detection Methodologies

Classification in the Presence of Label Noise: A Survey

Identifying mislabeled training data

Class noise vs. attribute noise: a quantitative study of their impacts

References

Classification and Regression Trees.

C4.5: Programs for Machine Learning

Classification and regression trees

Generalized Additive Models.

Robust Regression and Outlier Detection

Related Papers (5)

C4.5: Programs for Machine Learning

Induction of Decision Trees

A Survey of Outlier Detection Methodologies

Classification and Regression Trees.

Efficient algorithms for mining outliers from large data sets