Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

Open AccessProceedings Article

Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

- pp 359-366

TLDR

In this article, a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems is described, which often outperforms the ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees.

Abstract:

Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

Citations

PDF

Open Access

More filters

Book

Data Mining: Practical Machine Learning Tools and Techniques

Ian H. Witten, +2 more

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Journal ArticleDOI

Toward integrating feature selection algorithms for classification and clustering

Huan Liu, +1 more

- 01 Apr 2005 -

IEEE Transactions on Knowledge and Data ...

TL;DR: With the categorizing framework, the efforts toward-building an integrated system for intelligent feature selection are continued, and an illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms.

...read moreread less

Proceedings Article

Feature selection for high-dimensional data: a fast correlation-based filter solution

Lei Yu, +1 more

TL;DR: A novel concept, predominant correlation, is introduced, and a fast filter method is proposed which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis.

...read moreread less

Journal Article

Efficient Feature Selection via Analysis of Relevance and Redundancy

Lei Yu, +1 more

- 01 Dec 2004 -

Journal of Machine Learning Research

TL;DR: It is shown that feature relevance alone is insufficient for efficient feature selection of high-dimensional data, and a new framework is introduced that decouples relevance analysis and redundancy analysis.

...read moreread less

Journal ArticleDOI

Unsupervised feature selection using feature similarity

Pabitra Mitra, +2 more

- 01 Mar 2002 -

IEEE Transactions on Pattern Analysis an...

TL;DR: An unsupervised feature selection algorithm suitable for data sets, large in both dimension and size, based on measuring similarity between features whereby redundancy therein is removed, which does not need any search and is fast.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Wrappers for feature subset selection

Ron Kohavi, +1 more

- 01 Dec 1997 -

Artificial Intelligence

TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.

...read moreread less

Correlation-based Feature Selection for Machine Learning

Mark Hall

TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.

...read moreread less

Book ChapterDOI

A Practical Approach to Feature Selection

Kenji Kira, +1 more

TL;DR: Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.

...read moreread less

Book ChapterDOI

Estimating attributes: analysis and extensions of RELIEF

Igor Kononenko

TL;DR: In the context of machine learning from examples this paper deals with the problem of estimating the quality of attributes with and without dependencies among them and is analysed and extended to deal with noisy, incomplete, and multi-class data sets.

...read moreread less

Journal ArticleDOI

Locally Weighted Learning

Christopher G. Atkeson, +2 more

- 01 Feb 1997 -

Artificial Intelligence Review

TL;DR: The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, and applications of locally weighted learning.

...read moreread less

Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

Citations

Data Mining: Practical Machine Learning Tools and Techniques

Toward integrating feature selection algorithms for classification and clustering

Feature selection for high-dimensional data: a fast correlation-based filter solution

Efficient Feature Selection via Analysis of Relevance and Redundancy

Unsupervised feature selection using feature similarity

References

Wrappers for feature subset selection

Correlation-based Feature Selection for Machine Learning

A Practical Approach to Feature Selection

Estimating attributes: analysis and extensions of RELIEF

Locally Weighted Learning

Related Papers (5)

Wrappers for feature subset selection

An introduction to variable and feature selection

Data Mining: Practical Machine Learning Tools and Techniques

C4.5: Programs for Machine Learning

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy