scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 2015"


Proceedings ArticleDOI
10 Dec 2015
TL;DR: Experimental results indicate that the proposed feature selection based on mutual information criterion is capable of improving the performance of the machine learning models in terms of prediction accuracy and reduction in training time.
Abstract: The application of machine learning models such as support vector machine (SVM) and artificial neural networks (ANN) in predicting reservoir properties has been effective in the recent years when compared with the traditional empirical methods. Despite that the machine learning models suffer a lot in the faces of uncertain data which is common characteristics of well log dataset. The reason for uncertainty in well log dataset includes a missing scale, data interpretation and measurement error problems. Feature Selection aimed at selecting feature subset that is relevant to the predicting property. In this paper a feature selection based on mutual information criterion is proposed, the strong point of this method relies on the choice of threshold based on statistically sound criterion for the typical greedy feedforward method of feature selection. Experimental results indicate that the proposed method is capable of improving the performance of the machine learning models in terms of prediction accuracy and reduction in training time.

825 citations


Journal ArticleDOI
TL;DR: Various ways of performing dimensionality reduction on high-dimensional microarray data are summarised to provide a clearer idea of when to use each one of them for saving computational time and resources.
Abstract: We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources.

749 citations


Journal ArticleDOI
TL;DR: This work presents a review of the state of the art of information-theoretic feature selection methods, and describes a unifying theoretical framework which can retrofit successful heuristic criteria, indicating the approximations made by each method.
Abstract: In this work we present a review of the state of the art of information theoretic feature selection methods. The concepts of feature relevance, redundance and complementarity (synergy) are clearly defined, as well as Markov blanket. The problem of optimal feature selection is defined. A unifying theoretical framework is described, which can retrofit successful heuristic criteria, indicating the approximations made by each method. A number of open problems in the field are presented.

684 citations


Journal ArticleDOI
TL;DR: A novel deep-learning-based feature-selection method is proposed, which formulates the feature- selection problem as a feature reconstruction problem, and an iterative algorithm is developed to adapt the DBN to produce the inquired reconstruction weights.
Abstract: With the popular use of high-resolution satellite images, more and more research efforts have been placed on remote sensing scene classification/recognition. In scene classification, effective feature selection can significantly boost the final performance. In this letter, a novel deep-learning-based feature-selection method is proposed, which formulates the feature-selection problem as a feature reconstruction problem. Note that the popular deep-learning technique, i.e., the deep belief network (DBN), achieves feature abstraction by minimizing the reconstruction error over the whole feature set, and features with smaller reconstruction errors would hold more feature intrinsics for image representation. Therefore, the proposed method selects features that are more reconstructible as the discriminative features. Specifically, an iterative algorithm is developed to adapt the DBN to produce the inquired reconstruction weights. In the experiments, 2800 remote sensing scene images of seven categories are collected for performance evaluation. Experimental results demonstrate the effectiveness of the proposed method.

637 citations


Proceedings ArticleDOI
25 May 2015
TL;DR: This review considers most of the commonly used FS techniques, including standard filter, wrapper, and embedded methods, and provides insight into FS for recent hybrid approaches and other advanced topics.
Abstract: Feature selection (FS) methods can be used in data pre-processing to achieve efficient data reduction. This is useful for finding accurate data models. Since exhaustive search for optimal feature subset is infeasible in most cases, many search strategies have been proposed in literature. The usual applications of FS are in classification, clustering, and regression tasks. This review considers most of the commonly used FS techniques. Particular emphasis is on the application aspects. In addition to standard filter, wrapper, and embedded methods, we also provide insight into FS for recent hybrid approaches and other advanced topics.

610 citations


Journal ArticleDOI
11 Dec 2015-Sensors
TL;DR: A review of different classification techniques used to recognize human activities from wearable inertial sensor data shows that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms.
Abstract: This paper presents a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three inertial sensor units were used in this study and were worn by healthy subjects at key points of upper/lower body limbs (chest, right thigh and left ankle). Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. Four supervised classification techniques namely, k-Nearest Neighbor (k-NN), Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and Random Forest (RF) as well as three unsupervised classification techniques namely, k-Means, Gaussian mixture models (GMM) and Hidden Markov Model (HMM), are compared in terms of correct classification rate, F-measure, recall, precision, and specificity. Raw data and extracted features are used separately as inputs of each classifier. The feature selection is performed using a wrapper approach based on the RF algorithm. Based on our experiments, the results obtained show that the k-NN classifier provides the best performance compared to other supervised classification algorithms, whereas the HMM classifier is the one that gives the best results among unsupervised classification algorithms. This comparison highlights which approach gives better performance in both supervised and unsupervised contexts. It should be noted that the obtained results are limited to the context of this study, which concerns the classification of the main daily living human activities using three wearable accelerometers placed at the chest, right shank and left ankle of the subject.

549 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce the knockoff filter, a new variable selection procedure for controlling the false discovery rate (FDR) in the statistical linear model whenever there are at least as many observations as variables.
Abstract: In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR)—the expected fraction of false discoveries among all discoveries—is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. This paper introduces the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method achieves exact FDR control in finite sample settings no matter the design or covariates, the number of variables in the model, or the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. As the name suggests, the method operates by manufacturing knockoff variables that are cheap—their construction does not require any new data—and are designed to mimic the correlation structure found within the existing variables, in a way that allows for accurate FDR control, beyond what is possible with permutation-based methods. The method of knockoffs is very general and flexible, and can work with a broad class of test statistics. We test the method in combination with statistics from the Lasso for sparse regression, and obtain empirical results showing that the resulting method has far more power than existing selection rules when the proportion of null variables is high.

525 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the selection of optimal neighborhoods for individual 3D points significantly improves the results of 3D scene analysis and may even further increase the quality of the derived results while significantly reducing both processing time and memory consumption.
Abstract: 3D scene analysis in terms of automatically assigning 3D points a respective semantic label has become a topic of great importance in photogrammetry, remote sensing, computer vision and robotics. In this paper, we address the issue of how to increase the distinctiveness of geometric features and select the most relevant ones among these for 3D scene analysis. We present a new, fully automated and versatile framework composed of four components: (i) neighborhood selection, (ii) feature extraction, (iii) feature selection and (iv) classification. For each component, we consider a variety of approaches which allow applicability in terms of simplicity, efficiency and reproducibility, so that end-users can easily apply the different components and do not require expert knowledge in the respective domains. In a detailed evaluation involving 7 neighborhood definitions, 21 geometric features, 7 approaches for feature selection, 10 classifiers and 2 benchmark datasets, we demonstrate that the selection of optimal neighborhoods for individual 3D points significantly improves the results of 3D scene analysis. Additionally, we show that the selection of adequate feature subsets may even further increase the quality of the derived results while significantly reducing both processing time and memory consumption.

513 citations


Journal ArticleDOI
TL;DR: Adaptive branch-site random effects likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion, delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster.
Abstract: Over the past two decades, comparative sequence analysis using codon-substitution models has been honed into a powerful and popular approach for detecting signatures of natural selection from molecular data. A substantial body of work has focused on developing a class of “branch-site” models which permit selective pressures on sequences, quantified by the ω ratio, to vary among both codon sites and individual branches in the phylogeny. We develop and present a method in this class, adaptive branch-site random effects likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion. By applying models of different complexity to different branches in the phylogeny, aBSREL delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster. Based on simulated data analysis, we offer guidelines for what extent and strength of diversifying positive selection can be detected reliably and suggest that there is a natural limit on the optimal parametric complexity for “branch-site” models. An aBSREL analysis of 8,893 Euteleostomes gene alignments demonstrates that over 80% of branches in typical gene phylogenies can be adequately modeled with a single ω ratio model, that is, current models are unnecessarily complicated. However, there are a relatively small number of key branches, whose identities are derived from the data using a model selection procedure, for which it is essential to accurately model evolutionary complexity.

501 citations


Journal ArticleDOI
TL;DR: Two new feature selection methods are proposed based on joint mutual information, namely JMIM and NJMIM, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally.
Abstract: Two new feature selection methods are proposed based on joint mutual information.The methods use joint mutual information with maximum of the minimum criterion.The methods address the problem of selection of redundant and irrelevant features.The methods are evaluated using eleven public data sets and five competing methods.The proposed JMIM method outperforms five competing methods in terms of accuracy. Feature selection is used in many application areas relevant to expert and intelligent systems, such as data mining and machine learning, image processing, anomaly detection, bioinformatics and natural language processing. Feature selection based on information theory is a popular approach due its computational efficiency, scalability in terms of the dataset dimensionality, and independence from the classifier. Common drawbacks of this approach are the lack of information about the interaction between the features and the classifier, and the selection of redundant and irrelevant features. The latter is due to the limitations of the employed goal functions leading to overestimation of the feature significance.To address this problem, this article introduces two new nonlinear feature selection methods, namely Joint Mutual Information Maximisation (JMIM) and Normalised Joint Mutual Information Maximisation (NJMIM); both these methods use mutual information and the 'maximum of the minimum' criterion, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally. The proposed methods are compared using eleven publically available datasets with five competing methods. The results demonstrate that the JMIM method outperforms the other methods on most tested public datasets, reducing the relative average classification error by almost 6% in comparison to the next best performing method. The statistical significance of the results is confirmed by the ANOVA test. Moreover, this method produces the best trade-off between accuracy and stability.

496 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed approaches not only speed up the training and incremental learning processes of AdaBoost, but also yield better or competitive vehicle classification accuracies compared with several state-of-the-art methods, showing their potential for real-time applications.

Proceedings ArticleDOI
01 Nov 2015
TL;DR: Wang et al. as mentioned in this paper proposed an end-to-end hierarchical architecture for skeleton-based action recognition with CNN, which represented a skeleton sequence as a matrix by concatenating the joint coordinates in each instant and arranging those vector representations in a chronological order.
Abstract: Temporal dynamics of postures over time is crucial for sequence-based action recognition. Human actions can be represented by the corresponding motions of articulated skeleton. Most of the existing approaches for skeleton based action recognition model the spatial-temporal evolution of actions based on hand-crafted features. As a kind of hierarchically adaptive filter banks, Convolutional Neural Network (CNN) performs well in representation learning. In this paper, we propose an end-to-end hierarchical architecture for skeleton based action recognition with CNN. Firstly, we represent a skeleton sequence as a matrix by concatenating the joint coordinates in each instant and arranging those vector representations in a chronological order. Then the matrix is quantified into an image and normalized to handle the variable-length problem. The final image is fed into a CNN model for feature extraction and recognition. For the specific structure of such images, the simple max-pooling plays an important role on spatial feature selection as well as temporal frequency adjustment, which can obtain more discriminative joint information for different actions and meanwhile address the variable-frequency problem. Experimental results demonstrate that our method achieves the state-of-art performance with high computational efficiency, especially surpassing the existing result by more than 15 percentage on the challenging ChaLearn gesture recognition dataset.

Journal ArticleDOI
TL;DR: A new feature selection approach that is based on the integration of a genetic algorithm and particle swarm optimization is proposed and is able to automatically select the most informative features in terms of classification accuracy within an acceptable CPU processing time.
Abstract: A new feature selection approach that is based on the integration of a genetic algorithm and particle swarm optimization is proposed. The overall accuracy of a support vector machine classifier on validation samples is used as a fitness value. The new approach is carried out on the well-known Indian Pines hyperspectral data set. Results confirm that the new approach is able to automatically select the most informative features in terms of classification accuracy within an acceptable CPU processing time without requiring the number of desired features to be set a priori by users. Furthermore, the usefulness of the proposed method is also tested for road detection. Results confirm that the proposed method is capable of discriminating between road and background pixels and performs better than the other approaches used for comparison in terms of performance metrics.

Journal ArticleDOI
TL;DR: The VSURF algorithm returns two subsets of variables, one of which is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one corresponding to a model trying to avoid redundancy focusing more closely on prediction objective.
Abstract: This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented.

Journal ArticleDOI
TL;DR: An improved algorithm SVM-RFE + CBR is proposed by incorporating the correlation bias reduction (CBR) strategy into the feature elimination procedure, which outperforms the original SVM -RFE and other typical algorithms.
Abstract: Support vector machine recursive feature elimination (SVM-RFE) is a powerful feature selection algorithm. However, when the candidate feature set contains highly correlated features, the ranking criterion of SVM-RFE will be biased, which would hinder the application of SVM-RFE on gas sensor data. In this paper, the linear and nonlinear SVM-RFE algorithms are studied. After investigating the correlation bias, an improved algorithm SVM-RFE + CBR is proposed by incorporating the correlation bias reduction (CBR) strategy into the feature elimination procedure. Experiments are conducted on a synthetic dataset and two breath analysis datasets, one of which contains temperature modulated sensors. Large and comprehensive sets of transient features are extracted from the sensor responses. The classification accuracy with feature selection proves the efficacy of the proposed SVM-RFE + CBR. It outperforms the original SVM-RFE and other typical algorithms. An ensemble method is further studied to improve the stability of the proposed method. By statistically analyzing the features’ rankings, some knowledge is obtained, which can guide future design of e-noses and feature extraction algorithms.

Journal ArticleDOI
TL;DR: The Decoding Toolbox (TDT) is introduced which represents a user-friendly, powerful and flexible package for multivariate analysis of functional brain imaging data and offers a promising option for researchers who want to employ multivariate analyses of brain activity patterns.
Abstract: The multivariate analysis of brain signals has recently sparked a great amount of interest, yet accessible and versatile tools to carry out decoding analyses are scarce. Here we introduce The Decoding Toolbox (TDT) which represents a user-friendly, powerful and flexible package for multivariate analysis of functional brain imaging data. TDT is written in Matlab and equipped with an interface to the widely used brain data analysis package SPM. The toolbox allows running fast whole-brain analyses, region-of-interest analyses and searchlight analyses, using machine learning classifiers, pattern correlation analysis, or representational similarity analysis. It offers automatic creation and visualization of diverse cross-validation schemes, feature scaling, nested parameter selection, a variety of feature selection methods, multiclass capabilities, and pattern reconstruction from classifier weights. While basic users can implement a generic analysis in one line of code, advanced users can extend the toolbox to their needs or exploit the structure to combine it with external high-performance classification toolboxes. The toolbox comes with an example data set which can be used to try out the various analysis methods. Taken together, TDT offers a promising option for researchers who want to employ multivariate analyses of brain activity patterns.

Proceedings Article
06 Jul 2015
TL;DR: In this article, the authors investigate the robustness of feature selection methods, including LASSO, ridge regression and elastic net, under attack and show that they can be significantly compromised under attack, highlighting the need for specific countermeasures.
Abstract: Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

Journal ArticleDOI
TL;DR: A new feature-selection approach based on the cuttlefish optimization algorithm which is used for intrusion detection systems (IDSs) gives a higher detection rate and accuracy rate with a lower false alarm rate, when compared with the obtained results using all features.
Abstract: A modified version of the cuttlefish algorithm is discussed.The proposed model can be used as a novel feature-selection model.Cuttlefish algorithm is used as a search strategy to find optimal subset of features.Decision tree is used to evaluate the quality of the selected features.Data pre-processing for feature selection is also examined in the paper. This paper presents a new feature-selection approach based on the cuttlefish optimization algorithm which is used for intrusion detection systems (IDSs). Because IDSs deal with a large amount of data, one of the crucial tasks of IDSs is to keep the best quality of features that represent the whole data and remove the redundant and irrelevant features. The proposed model uses the cuttlefish algorithm (CFA) as a search strategy to ascertain the optimal subset of features and the decision tree (DT) classifier as a judgement on the selected features that are produced by the CFA. The KDD Cup 99 dataset is used to evaluate the proposed model. The results show that the feature subset obtained by using CFA gives a higher detection rate and accuracy rate with a lower false alarm rate, when compared with the obtained results using all features.

Journal ArticleDOI
TL;DR: Tackling software data issues, including redundancy, correlation, feature irrelevance and missing samples, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.
Abstract: Context Several issues hinder software defect data including redundancy, correlation, feature irrelevance and missing samples. It is also hard to ensure balanced distribution between data pertaining to defective and non-defective software. In most experimental cases, data related to the latter software class is dominantly present in the dataset. Objective The objectives of this paper are to demonstrate the positive effects of combining feature selection and ensemble learning on the performance of defect classification. Along with efficient feature selection, a new two-variant (with and without feature selection) ensemble learning algorithm is proposed to provide robustness to both data imbalance and feature redundancy. Method We carefully combine selected ensemble learning models with efficient feature selection to address these issues and mitigate their effects on the defect classification performance. Results Forward selection showed that only few features contribute to high area under the receiver-operating curve (AUC). On the tested datasets, greedy forward selection (GFS) method outperformed other feature selection techniques such as Pearson’s correlation. This suggests that features are highly unstable. However, ensemble learners like random forests and the proposed algorithm, average probability ensemble (APE), are not as affected by poor features as in the case of weighted support vector machines (W-SVMs). Moreover, the APE model combined with greedy forward selection (enhanced APE) achieved AUC values of approximately 1.0 for the NASA datasets: PC2, PC4, and MC1. Conclusion This paper shows that features of a software dataset must be carefully selected for accurate classification of defective components. Furthermore, tackling the software data issues, mentioned above, with the proposed combined learning model resulted in remarkable classification performance paving the way for successful quality control.

Journal ArticleDOI
TL;DR: The strategy to use several different feature models in a single pool, together with feature selection to optimize the fault diagnosis system, and robust performance estimation techniques usually not encountered in the context of engineering are employed are employed.
Abstract: Distinct feature extraction methods are simultaneously used to describe bearing faults. This approach produces a large number of heterogeneous features that augment discriminative information but, at the same time, create irrelevant and redundant information. A subsequent feature selection phase filters out the most discriminative features. The feature models are based on the complex envelope spectrum, statistical time- and frequency-domain parameters, and wavelet packet analysis. Feature selection is achieved by conventional search of the feature space by greedy methods. For the final fault diagnosis, the k-nearest neighbor classifier, feedforward net, and support vector machine are used. Performance criteria are the estimated error rate and the area under the receiver operating characteristic curve (AUC-ROC). Experimental results are shown for the Case Western Reserve University Bearing Data. The main contribution of this paper is the strategy to use several different feature models in a single pool, together with feature selection to optimize the fault diagnosis system. Moreover, robust performance estimation techniques usually not encountered in the context of engineering are employed.

Journal ArticleDOI
TL;DR: Supervised feature selection of a SNP subset in the G-BLUP framework provided a flexible, generalisable and computationally efficient alternative to Bayes C; but careful evaluation of predictive performance is required when supervised feature selection has been used.
Abstract: In this study, we investigated the effect of five feature selection approaches on the performance of a mixed model (G-BLUP) and a Bayesian (Bayes C) prediction method. We predicted height, high density lipoprotein cholesterol (HDL) and body mass index (BMI) within 2,186 Croatian and into 810 UK individuals using genome-wide SNP data. Using all SNP information Bayes C and G-BLUP had similar predictive performance across all traits within the Croatian data, and for the highly polygenic traits height and BMI when predicting into the UK data. Bayes C outperformed G-BLUP in the prediction of HDL, which is influenced by loci of moderate size, in the UK data. Supervised feature selection of a SNP subset in the G-BLUP framework provided a flexible, generalisable and computationally efficient alternative to Bayes C; but careful evaluation of predictive performance is required when supervised feature selection has been used.

Journal ArticleDOI
TL;DR: A novel approach that employs Binary-coded discrete Fully Informed Particle Swarm optimization and Extreme Learning Machines (ELM) to develop fast and accurate IVS algorithms that are particularly suited for rainfall–runoff modeling applications characterized by high nonlinearity in the catchment dynamics.

Proceedings ArticleDOI
07 Dec 2015
TL;DR: A feature selection method exploiting the convergence properties of power series of matrices and introducing the concept of infinite feature selection (Inf-FS), which permits the investigation of the importance (relevance and redundancy) of a feature when injected into an arbitrary set of cues.
Abstract: Filter-based feature selection has become crucial in many classification settings, especially object recognition, recently faced with feature learning strategies that originate thousands of cues. In this paper, we propose a feature selection method exploiting the convergence properties of power series of matrices, and introducing the concept of infinite feature selection (Inf-FS). Considering a selection of features as a path among feature distributions and letting these paths tend to an infinite number permits the investigation of the importance (relevance and redundancy) of a feature when injected into an arbitrary set of cues. Ranking the importance individuates candidate features, which turn out to be effective from a classification point of view, as proved by a thoroughly experimental section. The Inf-FS has been tested on thirteen diverse benchmarks, comparing against filters, embedded methods, and wrappers, in all the cases we achieve top performances, notably on the classification tasks of PASCAL VOC 2007-2012.

Journal ArticleDOI
TL;DR: A novel feature selection algorithm based on Ant Colony Optimization (ACO) called Advanced Binary ACO (ABACO), is presented and simulation results verify that the algorithm provides a suitable feature subset with good classification accuracy using a smaller feature set than competing feature selection methods.

Journal ArticleDOI
TL;DR: The promising results of the proposed sleep staging framework suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection.

Journal ArticleDOI
TL;DR: The origins and importance of feature selection are discussed and recent contributions in a range of applications are outlined, from DNA microarray analysis to face recognition.
Abstract: The explosion of big data has posed important challenges to researchers.Feature selection is paramount when dealing with high-dimensional datasets.We review the state-of-the-art and recent contributions in feature selection.The emerging challenges in feature selection are identified and discussed. In an era of growing data complexity and volume and the advent of big data, feature selection has a key role to play in helping reduce high-dimensionality in machine learning problems. We discuss the origins and importance of feature selection and outline recent contributions in a range of applications, from DNA microarray analysis to face recognition. Recent years have witnessed the creation of vast datasets and it seems clear that these will only continue to grow in size and number. This new big data scenario offers both opportunities and challenges to feature selection researchers, as there is a growing need for scalable yet efficient feature selection methods, given that existing methods are likely to prove inadequate.

Journal ArticleDOI
TL;DR: Experimental analysis on synthetic and real-world data demonstrates that the proposed regularized self-representation (RSR) model can effectively identify the representative features, outperforming many state-of-the-art unsupervised feature selection methods in terms of clustering accuracy, redundancy reduction and classification accuracy.

Journal ArticleDOI
TL;DR: Algorithms for fitting nonconvex penalties such as SCAD and MCP stably and efficiently and real data examples comparing and contrasting the statistical properties of these methods are presented.
Abstract: Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grouping structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods.

Posted Content
TL;DR: Zhang et al. as discussed by the authors introduced a variation of maxout activation, called Max-Feature-Map (MFM), into each convolutional layer of CNN to separate noisy and informative signals and play the role of feature selection between two feature maps.
Abstract: The volume of convolutional neural network (CNN) models proposed for face recognition has been continuously growing larger to better fit large amount of training data. When training data are obtained from internet, the labels are likely to be ambiguous and inaccurate. This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels. First, we introduce a variation of maxout activation, called Max-Feature-Map (MFM), into each convolutional layer of CNN. Different from maxout activation that uses many feature maps to linearly approximate an arbitrary convex activation function, MFM does so via a competitive relationship. MFM can not only separate noisy and informative signals but also play the role of feature selection between two feature maps. Second, three networks are carefully designed to obtain better performance meanwhile reducing the number of parameters and computational costs. Lastly, a semantic bootstrapping method is proposed to make the prediction of the networks more consistent with noisy labels. Experimental results show that the proposed framework can utilize large-scale noisy data to learn a Light model that is efficient in computational costs and storage spaces. The learned single network with a 256-D representation achieves state-of-the-art results on various face benchmarks without fine-tuning. The code is released on this https URL.

Journal ArticleDOI
TL;DR: The proposed technique uses alpha profiling to reduce the time complexity while irrelevant features are discarded using an ensemble of Filtered, Correlation and Consistency based feature selection techniques, which outperforms other published techniques in terms of accuracy, false positive rate and detection time.
Abstract: Alpha profiling reduces the number of comparisons by 85.76%.Optimal features (21 out of 41) are suggested. Features are reduced by 48.78%.Beta profiling is used to reduce the size of training dataset by 7.83%.Network traffic profiling and feature selection reduce space and time complexity.Accuracy of 98.66% and false positive rate of 1.74% are achieved in 2.43 s. Anomaly based Intrusion Detection Systems (IDS) learn normal and anomalous behavior by analyzing network traffic in various benchmark datasets. Common challenges for IDSs are large amounts of data to process, low detection rates and high rates of false alarms. In this paper, a technique based on the Online Sequential Extreme Learning Machine (OS-ELM) is presented for intrusion detection. The proposed technique uses alpha profiling to reduce the time complexity while irrelevant features are discarded using an ensemble of Filtered, Correlation and Consistency based feature selection techniques. Instead of sampling, beta profiling is used to reduce the size of the training dataset. For performance evaluation of proposed technique the standard NSL-KDD 2009 (Network Security Laboratory-Knowledge Discovery and Data Mining) dataset is used. In this paper time and space complexity of the proposed technique is also discussed. The experimental results yielded an accuracy of 98.66% with a false positive rate of 1.74% and a detection time of 2.43 s for binary class NSL-KDD dataset. The proposed IDS achieve 97.67% of accuracy with 1.74% of false positive rate in 2.65 s of detection time for multi-class NSL-KDD dataset. The Kyoto University benchmark dataset is also used to test the proposed IDS. Accuracy of 96.37% with false positive rate of 5.76% is yielded by the proposed technique. The proposed technique outperforms other published techniques in terms of accuracy, false positive rate and detection time. Based on the experimental results achieved, we conclude that the proposed technique is an efficient method for network intrusion detection.