scispace - formally typeset
Open AccessProceedings Article

Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

Mark Hall
- pp 359-366
TLDR
In this article, a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems is described, which often outperforms the ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees.
Abstract
Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

read more

Citations
More filters
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI

Toward integrating feature selection algorithms for classification and clustering

TL;DR: With the categorizing framework, the efforts toward-building an integrated system for intelligent feature selection are continued, and an illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms.
Proceedings Article

Feature selection for high-dimensional data: a fast correlation-based filter solution

TL;DR: A novel concept, predominant correlation, is introduced, and a fast filter method is proposed which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis.
Journal Article

Efficient Feature Selection via Analysis of Relevance and Redundancy

TL;DR: It is shown that feature relevance alone is insufficient for efficient feature selection of high-dimensional data, and a new framework is introduced that decouples relevance analysis and redundancy analysis.
Journal ArticleDOI

Unsupervised feature selection using feature similarity

TL;DR: An unsupervised feature selection algorithm suitable for data sets, large in both dimension and size, based on measuring similarity between features whereby redundancy therein is removed, which does not need any search and is fast.
References
More filters
Journal ArticleDOI

Wrappers for feature subset selection

TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.

Correlation-based Feature Selection for Machine Learning

Mark Hall
TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
Book ChapterDOI

A Practical Approach to Feature Selection

TL;DR: Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.
Book ChapterDOI

Estimating attributes: analysis and extensions of RELIEF

TL;DR: In the context of machine learning from examples this paper deals with the problem of estimating the quality of attributes with and without dependencies among them and is analysed and extended to deal with noisy, incomplete, and multi-class data sets.
Journal ArticleDOI

Locally Weighted Learning

TL;DR: The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, and applications of locally weighted learning.