scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 2012"


Journal ArticleDOI
TL;DR: A review of available methods for variable selection within one of the many modeling approaches for high-throughput data, Partial Least Squares Regression, to get an understanding of the characteristics of the methods and to get a basis for selecting an appropriate method for own use.

1,180 citations


Journal Article
TL;DR: Overall it is concluded that the JMI criterion provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.
Abstract: We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature--instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.

1,058 citations


Posted Content
TL;DR: This work considers the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately and shows classification performance often much better than using standard selection and hyperparameter optimization methods.
Abstract: Many different machine learning algorithms exist; taking into account each algorithm's hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that addresses these issues in isolation. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA, spanning 2 ensemble methods, 10 meta-methods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR-10, we show classification performance often much better than using standard selection/hyperparameter optimization methods. We hope that our approach will help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.

1,004 citations


Journal ArticleDOI
TL;DR: The FBCSP algorithm performed relatively the best among the other submitted algorithms and yielded a mean kappa value of 0.569 and 0.600 across all subjects in Datasets 2a and 2b of the BCI Competition IV.
Abstract: The Common Spatial Pattern (CSP) algorithm is an effective and popular method for classifying 2-class motor imagery electroencephalogram (EEG) data, but its effectiveness depends on the subject-specific frequency band. This paper presents the Filter Bank Common Spatial Pattern (FBCSP) algorithm to optimize the subject-specific frequency band for CSP on Datasets 2a and 2b of the Brain-Computer Interface (BCI) Competition IV. Dataset 2a comprised 4 classes of 22 channels EEG data from 9 subjects, and Dataset 2b comprised 2 classes of 3 bipolar channels EEG data from 9 subjects. Multi-class extensions to FBCSP are also presented to handle the 4-class EEG data in Dataset 2a, namely, Divide-and-Conquer (DC), Pair-Wise (PW), and One-Versus-Rest (OVR) approaches. Two feature selection algorithms are also presented to select discriminative CSP features on Dataset 2b, namely, the Mutual Information-based Best Individual Feature (MIBIF) algorithm, and the Mutual Information-based Rough Set Reduction (MIRSR) algorithm. The single-trial classification accuracies were presented using 10x10-fold cross-validations on the training data and session-to-session transfer on the evaluation data from both datasets. Disclosure of the test data labels after the BCI Competition IV showed that the FBCSP algorithm performed relatively the best among the other submitted algorithms and yielded a mean kappa value of 0.569 and 0.600 across all subjects in Datasets 2a and 2b respectively.

862 citations


Journal ArticleDOI
TL;DR: In this article, a sure independence screening procedure based on distance correlation (DC-SIS) was proposed for ultra-high-dimensional data analysis, which can be used directly to screen grouped predictor variables and multivariate response variables.
Abstract: This article is concerned with screening features in ultrahigh-dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance correlation (DC-SIS). The DC-SIS can be implemented as easily as the sure independence screening (SIS) procedure based on the Pearson correlation proposed by Fan and Lv. However, the DC-SIS can significantly improve the SIS. Fan and Lv established the sure screening property for the SIS based on linear models, but the sure screening property is valid for the DC-SIS under more general settings, including linear models. Furthermore, the implementation of the DC-SIS does not require model specification (e.g., linear model or generalized linear model) for responses or predictors. This is a very appealing property in ultrahigh-dimensional data analysis. Moreover, the DC-SIS can be used directly to screen grouped predictor variables and multivariate response variables. We establish ...

641 citations


Posted Content
TL;DR: This paper proposes to accelerate the computation of the l2, 1-norm regularized regression model by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method---an optimal first-order black-box method for smooth conveX optimization.
Abstract: The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the l2,1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the l2,1-norm regularization is that it encourages multiple predictors to share similar sparsity patterns. However, the resulting optimization problem is challenging to solve due to the non-smoothness of the l2,1-norm regularization. In this paper, we propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method-an optimal first-order black-box method for smooth convex optimization. A key building block in solving the reformulations is the Euclidean projection. We show that the Euclidean projection for the first reformulation can be analytically computed, while the Euclidean projection for the second one can be computed in linear time. Empirical evaluations on several data sets verify the efficiency of the proposed algorithms.

630 citations


Journal ArticleDOI
01 Jun 2012-Genomics
TL;DR: This article systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning.

625 citations


Journal ArticleDOI
TL;DR: This paper proposes a general methodology, namely multi-modal multi-task (M3T) learning, to jointly predict multiple variables from multi- modal data, which can achieve better performance on both regression and classification tasks than the conventional learning methods.

580 citations


Journal ArticleDOI
TL;DR: This survey focuses on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery, and presents them in a unified framework.
Abstract: A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.

500 citations


Book ChapterDOI
01 Jan 2012
TL;DR: The Random Forest technique, which includes an ensemble of decision trees and incorporates feature selection and interactions naturally in the learning process, is a popular choice because it is nonparametric, interpretable, efficient, and has high prediction accuracy for many types of data.
Abstract: Modern biology has experienced an increased use of machine learning techniques for large scale and complex biological data analysis. In the area of Bioinformatics, the Random Forest (RF) [6] technique, which includes an ensemble of decision trees and incorporates feature selection and interactions naturally in the learning process, is a popular choice. It is nonparametric, interpretable, efficient, and has high prediction accuracy for many types of data. Recent work in computational biology has seen an increased use of RF, owing to its unique advantages in dealing with small sample size, high-dimensional feature space, and complex data structures.

497 citations


Proceedings Article
22 Jul 2012
TL;DR: A new unsupervised learning algorithm, namely Nonnegative Discriminative Feature Selection (NDFS), which exploits the discriminative information and feature correlation simultaneously to select a better feature subset.
Abstract: In this paper, a new unsupervised learning algorithm, namely Nonnegative Discriminative Feature Selection (NDFS), is proposed. To exploit the discriminative information in unsupervised scenarios, we perform spectral clustering to learn the cluster labels of the input samples, during which the feature selection is performed simultaneously. The joint learning of the cluster labels and feature selection matrix enables NDFS to select the most discriminative features. To learn more accurate cluster labels, a nonnegative constraint is explicitly imposed to the class indicators. To reduce the redundant or even noisy features, l2,1-norm minimization constraint is added into the objective function, which guarantees the feature selection matrix sparse in rows. Our algorithm exploits the discriminative information and feature correlation simultaneously to select a better feature subset. A simple yet efficient iterative algorithm is designed to optimize the proposed objective function. Experimental results on different real world datasets demonstrate the encouraging performance of our algorithm over the state-of-the-arts.

Journal ArticleDOI
TL;DR: A novel nearest neighbor-based feature weighting algorithm, which learns a feature Weighting vector by maximizing the expected leave-one-out classification accuracy with a regularization term, is proposed.
Abstract: Feature selection is of considerable importance in data mining and machine learning, especially for high dimensional data. In this paper, we propose a novel nearest neighbor-based feature weighting algorithm, which learns a feature weighting vector by maximizing the expected leave-one-out classification accuracy with a regularization term. The algorithm makes no parametric assumptions about the distribution of the data and scales naturally to multiclass problems. Experiments conducted on artificial and real data sets demonstrate that the proposed algorithm is largely insensitive to the increase in the number of irrelevant features and performs better than the state-of-the-art methods in most cases.

Journal ArticleDOI
TL;DR: Empirical results show that selected reduced attributes give better performance to design IDS that is efficient and effective for network intrusion detection.

Journal Article
TL;DR: This work introduces a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion, and shows that a number of existing feature selectors are special cases of this framework.
Abstract: We introduce a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion. The key idea is that good features should be highly dependent on the labels. Our approach leads to a greedy procedure for feature selection. We show that a number of existing feature selectors are special cases of this framework. Experiments on both artificial and real-world data show that our feature selector works well in practice.

Journal ArticleDOI
TL;DR: The core idea is to enlarge the distance between different classes under the conceptual framework of LSR, and a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged.
Abstract: This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called e-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the e-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.

Proceedings ArticleDOI
01 Jan 2012
TL;DR: This work proposes a novel method for re-identification that learns a selection and weighting of mid-level semantic attributes to describe people, an attribute-centric, parts-based feature representation that differs from and complements existing low-level features that rely purely on bottom-up statistics for feature selection.
Abstract: Visually identifying a target individual reliably in a crowded environment observed by a distributed camera network is critical to a variety of tasks in managing business information, border control, and crime prevention. Automatic re-identification of a human candidate from public space CCTV video is challenging due to spatiotemporal visual feature variations and strong visual similarity between different people, compounded by low-resolution and poor quality video data. In this work, we propose a novel method for re-identification that learns a selection and weighting of mid-level semantic attributes to describe people. Specifically, the model learns an attribute-centric, parts-based feature representation. This differs from and complements existing low-level features for re-identification that rely purely on bottom-up statistics for feature selection, which are limited in discriminating and identifying reliably visual appearances of target people appearing in different camera views under certain degrees of occlusion due to crowdedness. Our experiments demonstrate the effectiveness of our approach compared to existing feature representations when applied to benchmarking datasets.

Proceedings ArticleDOI
22 Aug 2012
TL;DR: The wrapper approach combines the power of exploration of the bats together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set.
Abstract: Feature selection aims to find the most important information from a given set of features. As this task can be seen as an optimization problem, the combinatorial growth of the possible solutions may be in-viable for a exhaustive search. In this paper we propose a new nature-inspired feature selection technique based on the bats behaviour, which has never been applied to this context so far. The wrapper approach combines the power of exploration of the bats together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set. Experiments conducted in five public datasets have demonstrated that the proposed approach can outperform some well-known swarm-based techniques.

Journal ArticleDOI
TL;DR: In this article, the authors consider situations where they are not only interested in sparsity, but where some structural prior knowledge is available as well, and show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables.
Abstract: Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the $\ell_1$-norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of non-linear variable selection.

Journal ArticleDOI
TL;DR: In this article, the effect of all-combinations model strategy and model averaging strategies on parameter estimates and variable selection was investigated in the Cormack-Jolly-Seber data type.
Abstract: One challenge an analyst often encounters when dealing with complex mark–recapture models is how to limit the number of a priori models. While all possible combinations of model structures on the different parameters (e.g., ϕ, p) can be considered, such a strategy often results in a burdensome number of models, leading to the use of ad hoc strategies to reduce the number of models constructed. For the Cormack–Jolly–Seber data type, one example of an ad hoc strategy is to hold a general ϕ model structure constant while investigating model structures on p, and then to hold the resulting best structure on p constant and investigate structures on ϕ. Many comparable strategies exist. The effect of following ad hoc strategies on parameter estimates as well as for variable selection and whether model averaging can ameliorate any problems are unknown. By means of a simulation study, we have investigated this informational gap by comparing the all-combinations model building strategy with two ad hoc strategies and with truth, as well as considering the results of model averaging. We found that model selection strategy had little effect on parameter estimator bias and precision and that model averaging did improve bias and precision slightly. In terms of variable selection (i.e., cumulative Akaike’s information criterion weights), model sets based on ad hoc strategies did not perform as well as those based on all combinations, as less important variables often had higher weights with the former than with the all possible combinations strategy. Increased sample size resulted in increased variable weights, with an infinite sample size resulting in all variable weights equaling 1 for variables with any predictive influence. Thus, the distinction between statistical importance (dependent on sample size) and biological importance must be recognized when utilizing cumulative weights. We recommend that all-combinations model strategy and model averaging be used. However, if an ad hoc strategy is relied upon to reduce the computational demand, parameter estimates will generally be comparable to the all-combinations strategy, but variable weights will not correspond to the all-combinations strategy.

Proceedings ArticleDOI
Xudong Cao1, Yichen Wei1, Fang Wen1, Jian Sun1
16 Jun 2012
TL;DR: This paper presents a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment that significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.
Abstract: We present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment. Unlike previous regression-based approaches, we directly learn a vectorial regression function to infer the whole facial shape (a set of facial landmarks) from the image and explicitly minimize the alignment errors over the training data. The inherent shape constraint is naturally encoded into the regressor in a cascaded learning framework and applied from coarse to fine during the test, without using a fixed parametric shape model as in most previous methods. To make the regression more effective and efficient, we design a two-level boosted regression, shape-indexed features and a correlation-based feature selection method. This combination enables us to learn accurate models from large training data in a short time (20 minutes for 2,000 training images), and run regression extremely fast in test (15 ms for a 87 landmarks shape). Experiments on challenging data show that our approach significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.

Proceedings ArticleDOI
12 Aug 2012
TL;DR: This paper proposes a Robust Multi-Task Feature Learning algorithm (rMTFL) which simultaneously captures a common set of features among relevant tasks and identifies outlier tasks, and provides a detailed theoretical analysis on the proposed rMTFL formulation.
Abstract: Multi-task learning (MTL) aims to improve the performance of multiple related tasks by exploiting the intrinsic relationships among them. Recently, multi-task feature learning algorithms have received increasing attention and they have been successfully applied to many applications involving high dimensional data. However, they assume that all tasks share a common set of features, which is too restrictive and may not hold in real-world applications, since outlier tasks often exist. In this paper, we propose a Robust Multi-Task Feature Learning algorithm (rMTFL) which simultaneously captures a common set of features among relevant tasks and identifies outlier tasks. Specifically, we decompose the weight (model) matrix for all tasks into two components. We impose the well-known group Lasso penalty on row groups of the first component for capturing the shared features among relevant tasks. To simultaneously identify the outlier tasks, we impose the same group Lasso penalty but on column groups of the second component. We propose to employ the accelerated gradient descent to efficiently solve the optimization problem in rMTFL, and show that the proposed algorithm is scalable to large-size problems. In addition, we provide a detailed theoretical analysis on the proposed rMTFL formulation. Specifically, we present a theoretical bound to measure how well our proposed rMTFL approximates the true evaluation, and provide bounds to measure the error between the estimated weights of rMTFL and the underlying true weights. Moreover, by assuming that the underlying true weights are above the noise level, we present a sound theoretical result to show how to obtain the underlying true shared features and outlier tasks (sparsity patterns). Empirical studies on both synthetic and real-world data demonstrate that our proposed rMTFL is capable of simultaneously capturing shared features among tasks and identifying outlier tasks.

Proceedings ArticleDOI
TL;DR: This paper proposes efficient algorithms for group sparse optimization with mixed l2,1-regularization, which arises from the reconstruction of group sparse signals in compressive sensing, and the group Lasso problem in statistics and machine learning.
Abstract: This paper proposes efficient algorithms for group sparse optimization with mixed l2,1-regularization, which arises from the reconstruction of group sparse signals in compressive sensing, and the group Lasso problem in statistics and machine learning. It is known that encoding the group information in addition to sparsity can often lead to better signal recovery/feature selection. The l2,1-regularization promotes group sparsity, but the resulting problem, due to the mixed-norm structure and possible grouping irregularity, is considered more difficult to solve than the conventional l1-regularized problem. Our approach is based on a variable splitting strategy and the classic alternating direction method (ADM). Two algorithms are presented, one derived from the primal and the other from the dual of the l2,1-regularized problem. The convergence of the proposed algorithms is guaranteed by the existing ADM theory. General group configurations such as overlapping groups and incomplete covers can be easily handled by our approach. Computational results show that on random problems the proposed ADM algorithms exhibit good efficiency, and strong stability and robustness.

Journal ArticleDOI
TL;DR: It is shown that the most accurate characterizations are achieved by using prior knowledge of where to expect neurodegeneration (hippocampus and parahippocampal gyrus) and that feature selection does improve the classification accuracies, but it depends on the method adopted.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper shows that learning more adaptive receptive fields increases performance even with a significantly smaller codebook size at the coding layer, and adopts the idea of over-completeness to learn the optimal pooling parameters.
Abstract: In this paper we examine the effect of receptive field designs on classification accuracy in the commonly adopted pipeline of image classification. While existing algorithms usually use manually defined spatial regions for pooling, we show that learning more adaptive receptive fields increases performance even with a significantly smaller codebook size at the coding layer. To learn the optimal pooling parameters, we adopt the idea of over-completeness by starting with a large number of receptive field candidates, and train a classifier with structured sparsity to only use a sparse subset of all the features. An efficient algorithm based on incremental feature selection and retraining is proposed for fast learning. With this method, we achieve the best published performance on the CIFAR-10 dataset, using a much lower dimensional feature space than previous methods.

Journal ArticleDOI
TL;DR: Theoretically, it is shown that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency andsharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation.
Abstract: In high-dimensional data analysis, feature selection becomes one effective means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency and sharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation. It permits up to exponentially many candidate features. Computationally, we develop difference convex methods to implement the computational surrogate through prime and dual subproblems. These results establish a central role of L 0 constrained and regularized likelihoods in feature selection and parameter estimation involving selection. As applications of the general method and theory, we perform feature selection...

Journal ArticleDOI
TL;DR: A recent review of group selection methods for variable selection can be found in this article, where the authors give a selective review concerning methodological developments, theoretical properties and computational algorithms for group selection.
Abstract: Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.

Journal ArticleDOI
TL;DR: This work assesses an alternative to MCMC based on a simple variational approximation to retain useful features of Bayesian variable selection at a reduced cost and illustrates how these results guide the use of variational inference for a genome-wide association study with thousands of samples and hundreds of thousands of variables.
Abstract: The Bayesian approach to variable selection in regression is a powerful tool for tackling many scientific problems. Inference for variable selection models is usually implemented using Markov chain Monte Carlo (MCMC). Because MCMC can impose a high computational cost in studies with a large number of variables, we assess an alternative to MCMC based on a simple variational approximation. Our aim is to retain useful features of Bayesian variable selection at a reduced cost. Using simulations designed to mimic genetic association studies, we show that this simple variational approximation yields posterior inferences in some settings that closely match exact values. In less restrictive (and more realistic) conditions, we show that posterior probabilities of inclusion for individual variables are often incorrect, but variational estimates of other useful quantities|including posterior distributions of the hyperparameters|are remarkably accurate. We illustrate how these results guide the use of variational inference for a genome-wide association study with thousands of samples and hundreds of thousands of variables.

Journal ArticleDOI
TL;DR: This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification that is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution.
Abstract: High dimensionality of the feature space is one of the most important concerns in text classification problems due to processing time and accuracy considerations. Selection of distinctive features is therefore essential for text classification. This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification. The proposed method is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution. The comparison is carried out for different datasets, classification algorithms, and success measures. Experimental results explicitly indicate that DFS offers a competitive performance with respect to the abovementioned approaches in terms of classification accuracy, dimension reduction rate and processing time.

Journal ArticleDOI
TL;DR: The lightweight IDS has been developed by using a wrapper based feature selection algorithm that maximizes the specificity and sensitivity of the IDS as well as by employing a neural ensemble decision tree iterative procedure to evolve optimal features.
Abstract: The objective of this paper is to construct a lightweight Intrusion Detection System (IDS) aimed at detecting anomalies in networks. The crucial part of building lightweight IDS depends on preprocessing of network data, identifying important features and in the design of efficient learning algorithm that classify normal and anomalous patterns. Therefore in this work, the design of IDS is investigated from these three perspectives. The goals of this paper are (i) removing redundant instances that causes the learning algorithm to be unbiased (ii) identifying suitable subset of features by employing a wrapper based feature selection algorithm (iii) realizing proposed IDS with neurotree to achieve better detection accuracy. The lightweight IDS has been developed by using a wrapper based feature selection algorithm that maximizes the specificity and sensitivity of the IDS as well as by employing a neural ensemble decision tree iterative procedure to evolve optimal features. An extensive experimental evaluation of the proposed approach with a family of six decision tree classifiers namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern has been introduced.

Posted Content
TL;DR: TIGRESS (Trustful Inference of Gene Regression using Stability Selection) as discussed by the authors is the state-of-the-art method for gene regulatory network inference using least angle regression (LARS) and stability selection.
Abstract: Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (Trustful Inference of Gene REgulation using Stability Selection), was ranked among the top methods in the DREAM5 gene network reconstruction challenge. We investigate in depth the influence of the various parameters of the method and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference. TIGRESS reaches state-of-the-art performance on benchmark data. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on this http URL. Running TIGRESS online is possible on GenePattern: this http URL.