scispace - formally typeset
Journal ArticleDOI

A review of feature selection methods on synthetic data

Reads0
Chats0
TLDR
Several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features.
Abstract
With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selec- tion method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selec- tion methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.

read more

Citations
More filters
Journal ArticleDOI

Applications of machine learning to machine fault diagnosis: A review and roadmap

TL;DR: A review and roadmap to systematically cover the development of IFD following the progress of machine learning theories and offer a future perspective is presented.
Journal ArticleDOI

Relief-based feature selection: Introduction and review.

TL;DR: This work broadly examines types of feature selection and defines RBAs, and introduces the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features.
Journal ArticleDOI

A review of microarray datasets and applied feature selection methods

TL;DR: An experimental evaluation on the most representative datasets using well-known feature selection methods is presented, bearing in mind that the aim is not to provide the best feature selection method, but to facilitate their comparative study by the research community.
Journal ArticleDOI

Feature selection using Joint Mutual Information Maximisation

TL;DR: Two new feature selection methods are proposed based on joint mutual information, namely JMIM and NJMIM, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally.
Journal ArticleDOI

Feature Selection: A literature Review

TL;DR: The concepts of feature relevance, general procedures, evaluation criteria, and the characteristics of feature selection are introduced and guidelines are provided for user to select a feature selection algorithm without knowing the information of each algorithm.
References
More filters
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Book

Genetic Algorithms

Book

Classification and regression trees

Leo Breiman
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Related Papers (5)