Open AccessProceedings Article
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning
Mark Hall
- pp 359-366
TLDR
In this article, a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems is described, which often outperforms the ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees.Abstract:
Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.read more
Citations
More filters
Book
Data Mining: Practical Machine Learning Tools and Techniques
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Journal ArticleDOI
Toward integrating feature selection algorithms for classification and clustering
TL;DR: With the categorizing framework, the efforts toward-building an integrated system for intelligent feature selection are continued, and an illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms.
Proceedings Article
Feature selection for high-dimensional data: a fast correlation-based filter solution
TL;DR: A novel concept, predominant correlation, is introduced, and a fast filter method is proposed which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis.
Journal Article
Efficient Feature Selection via Analysis of Relevance and Redundancy
TL;DR: It is shown that feature relevance alone is insufficient for efficient feature selection of high-dimensional data, and a new framework is introduced that decouples relevance analysis and redundancy analysis.
Journal ArticleDOI
Unsupervised feature selection using feature similarity
TL;DR: An unsupervised feature selection algorithm suitable for data sets, large in both dimension and size, based on measuring similarity between features whereby redundancy therein is removed, which does not need any search and is fast.
References
More filters
Journal ArticleDOI
Wrappers for feature subset selection
Ron Kohavi,George H. John +1 more
TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.
Correlation-based Feature Selection for Machine Learning
TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
Book ChapterDOI
A Practical Approach to Feature Selection
Kenji Kira,Larry A. Rendell +1 more
TL;DR: Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.
Book ChapterDOI
Estimating attributes: analysis and extensions of RELIEF
TL;DR: In the context of machine learning from examples this paper deals with the problem of estimating the quality of attributes with and without dependencies among them and is analysed and extended to deal with noisy, incomplete, and multi-class data sets.
Journal ArticleDOI
Locally Weighted Learning
TL;DR: The survey discusses distance functions, smoothing parameters, weighting functions, local model structures, regularization of the estimates and bias, assessing predictions, handling noisy data and outliers, improving the quality of predictions by tuning fit parameters, and applications of locally weighted learning.