Journal ArticleDOI
A review of feature selection methods on synthetic data
Reads0
Chats0
TLDR
Several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features.Abstract:
With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selec- tion method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selec- tion methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.read more
Citations
More filters
Journal ArticleDOI
Applications of machine learning to machine fault diagnosis: A review and roadmap
TL;DR: A review and roadmap to systematically cover the development of IFD following the progress of machine learning theories and offer a future perspective is presented.
Journal ArticleDOI
Relief-based feature selection: Introduction and review.
TL;DR: This work broadly examines types of feature selection and defines RBAs, and introduces the original Relief algorithm and associated concepts, emphasizing the intuition behind how it works, how feature weights generated by the algorithm can be interpreted, and why it is sensitive to feature interactions without evaluating combinations of features.
Journal ArticleDOI
A review of microarray datasets and applied feature selection methods
Verónica Bolón-Canedo,Noelia Sánchez-Maroño,Amparo Alonso-Betanzos,José Manuel Benítez,Francisco Herrera,Francisco Herrera +5 more
TL;DR: An experimental evaluation on the most representative datasets using well-known feature selection methods is presented, bearing in mind that the aim is not to provide the best feature selection method, but to facilitate their comparative study by the research community.
Journal ArticleDOI
Feature selection using Joint Mutual Information Maximisation
TL;DR: Two new feature selection methods are proposed based on joint mutual information, namely JMIM and NJMIM, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally.
Journal ArticleDOI
Feature Selection: A literature Review
Vipin Kumar,Sonajharia Minz +1 more
TL;DR: The concepts of feature relevance, general procedures, evaluation criteria, and the characteristics of feature selection are introduced and guidelines are provided for user to select a feature selection algorithm without knowing the information of each algorithm.
References
More filters
Journal ArticleDOI
Classification and Regression Trees.
Book
C4.5: Programs for Machine Learning
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Book
Data Mining: Practical Machine Learning Tools and Techniques
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Book
Classification and regression trees
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.