Journal ArticleDOI
Recent advances in feature selection and its applications
TLDR
This review paper presents a selection of challenges which are of particular current interests, such as feature selection for high-dimensional small sample size data, large-scale data, and secure feature selection, as well as some representative applications of feature selection.Abstract:
Feature selection is one of the key problems for machine learning and data mining. In this review paper, a brief historical background of the field is given, followed by a selection of challenges which are of particular current interests, such as feature selection for high-dimensional small sample size data, large-scale data, and secure feature selection. Along with these challenges, some hot topics for feature selection have emerged, e.g., stable feature selection, multi-view feature selection, distributed feature selection, multi-label feature selection, online feature selection, and adversarial feature selection. Then, the recent advances of these topics are surveyed in this paper. For each topic, the existing problems are analyzed, and then, current solutions to these problems are presented and discussed. Besides the topics, some representative applications of feature selection are also introduced, such as applications in bioinformatics, social media, and multimedia retrieval.read more
Citations
More filters
Journal ArticleDOI
Binary grasshopper optimisation algorithm approaches for feature selection problems
Majdi Mafarja,Ibrahim Aljarah,Hossam Faris,Abdelaziz I. Hammouri,Ala' M. Al-Zoubi,Seyedali Mirjalili +5 more
TL;DR: Binary variants of the recent Grasshopper Optimisation Algorithm are proposed in this work and employed to select the optimal feature subset for classification purposes within a wrapper-based framework and the comparative results show the superior performance of the BGOA and B GOA-M methods compared to other similar techniques in the literature.
Journal ArticleDOI
Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare
TL;DR: The experimental results show that the proposed feature selection algorithm (FCMIM) is feasible with classifier support vector machine for designing a high-level intelligent system to identify heart disease and it achieved good accuracy as compared to previously proposed methods.
Journal ArticleDOI
A recent overview of the state-of-the-art elements of text classification
TL;DR: Six baseline elements of text classification including data collection, data analysis for labelling, feature construction and weighing, feature selection and projection, training of a classification model, and solution evaluation are described.
Journal ArticleDOI
Stability of feature selection algorithm: A review
TL;DR: An overview of feature selection techniques and instability of the feature selection algorithm is provided and some of the solutions which can handle the different source of instability are presented.
Journal ArticleDOI
Manifold regularized discriminative feature selection for multi-label learning
TL;DR: A low-dimensional embedding is constructed based on the original feature space to fit the label distribution for capturing the label correlations locally, which is also constrained using the label information in consideration of the co-occurrence relationships of label pairs, and the convergence is guaranteed.
References
More filters
Journal ArticleDOI
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI
Induction of Decision Trees
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.