Showing papers on "Feature selection published in 2012"

PDF

Open Access

Journal Article•DOI•

A review of variable selection methods in Partial Least Squares Regression

[...]

Tahir Mehmood¹, Kristian Hovde Liland¹, Lars Snipen¹, Solve Sæbø¹•Institutions (1)

15 Aug 2012-Chemometrics and Intelligent Laboratory Systems

TL;DR: A review of available methods for variable selection within one of the many modeling approaches for high-throughput data, Partial Least Squares Regression, to get an understanding of the characteristics of the methods and to get a basis for selecting an appropriate method for own use.

...read moreread less

1,180 citations

Journal Article•

Conditional likelihood maximisation: a unifying framework for information theoretic feature selection

[...]

Gavin Brown¹, Adam Craig Pocock¹, Ming-Jie Zhao¹, Mikel Luján¹•Institutions (1)

University of Manchester¹

01 Jan 2012-Journal of Machine Learning Research

TL;DR: Overall it is concluded that the JMI criterion provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.

...read moreread less

Abstract: We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature--instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples.

...read moreread less

1,058 citations

Posted Content•

Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms

[...]

Chris Thornton¹, Frank Hutter¹, Holger H. Hoos¹, Kevin Leyton-Brown¹•Institutions (1)

University of British Columbia¹

18 Aug 2012-arXiv: Learning

TL;DR: This work considers the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately and shows classification performance often much better than using standard selection and hyperparameter optimization methods.

...read moreread less

Abstract: Many different machine learning algorithms exist; taking into account each algorithm's hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that addresses these issues in isolation. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA, spanning 2 ensemble methods, 10 meta-methods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR-10, we show classification performance often much better than using standard selection/hyperparameter optimization methods. We hope that our approach will help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.

...read moreread less

1,004 citations

Journal Article•DOI•

Filter Bank Common Spatial Pattern Algorithm on BCI Competition IV Datasets 2a and 2b.

[...]

Kai Keng Ang¹, Zheng Yang Chin¹, Chuanchu Wang¹, Cuntai Guan¹, Haihong Zhang¹ - Show less +1 more•Institutions (1)

Institute for Infocomm Research Singapore¹

29 Mar 2012-Frontiers in Neuroscience

TL;DR: The FBCSP algorithm performed relatively the best among the other submitted algorithms and yielded a mean kappa value of 0.569 and 0.600 across all subjects in Datasets 2a and 2b of the BCI Competition IV.

...read moreread less

Abstract: The Common Spatial Pattern (CSP) algorithm is an effective and popular method for classifying 2-class motor imagery electroencephalogram (EEG) data, but its effectiveness depends on the subject-specific frequency band. This paper presents the Filter Bank Common Spatial Pattern (FBCSP) algorithm to optimize the subject-specific frequency band for CSP on Datasets 2a and 2b of the Brain-Computer Interface (BCI) Competition IV. Dataset 2a comprised 4 classes of 22 channels EEG data from 9 subjects, and Dataset 2b comprised 2 classes of 3 bipolar channels EEG data from 9 subjects. Multi-class extensions to FBCSP are also presented to handle the 4-class EEG data in Dataset 2a, namely, Divide-and-Conquer (DC), Pair-Wise (PW), and One-Versus-Rest (OVR) approaches. Two feature selection algorithms are also presented to select discriminative CSP features on Dataset 2b, namely, the Mutual Information-based Best Individual Feature (MIBIF) algorithm, and the Mutual Information-based Rough Set Reduction (MIRSR) algorithm. The single-trial classification accuracies were presented using 10x10-fold cross-validations on the training data and session-to-session transfer on the evaluation data from both datasets. Disclosure of the test data labels after the BCI Competition IV showed that the FBCSP algorithm performed relatively the best among the other submitted algorithms and yielded a mean kappa value of 0.569 and 0.600 across all subjects in Datasets 2a and 2b respectively.

...read moreread less

862 citations

Journal Article•DOI•

Feature Screening via Distance Correlation Learning

[...]

Runze Li¹, Wei Zhong¹, Liping Zhu¹•Institutions (1)

Shanghai University of Finance and Economics¹

01 Jun 2012-Journal of the American Statistical Association

TL;DR: In this article, a sure independence screening procedure based on distance correlation (DC-SIS) was proposed for ultra-high-dimensional data analysis, which can be used directly to screen grouped predictor variables and multivariate response variables.

...read moreread less

Abstract: This article is concerned with screening features in ultrahigh-dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance correlation (DC-SIS). The DC-SIS can be implemented as easily as the sure independence screening (SIS) procedure based on the Pearson correlation proposed by Fan and Lv. However, the DC-SIS can significantly improve the SIS. Fan and Lv established the sure screening property for the SIS based on linear models, but the sure screening property is valid for the DC-SIS under more general settings, including linear models. Furthermore, the implementation of the DC-SIS does not require model specification (e.g., linear model or generalized linear model) for responses or predictors. This is a very appealing property in ultrahigh-dimensional data analysis. Moreover, the DC-SIS can be used directly to screen grouped predictor variables and multivariate response variables. We establish ...

...read moreread less

641 citations

Posted Content•

Multi-Task Feature Learning Via Efficient l2,1-Norm Minimization

[...]

Jun Liu, Shuiwang Ji, Jieping Ye

09 May 2012-arXiv: Learning

TL;DR: This paper proposes to accelerate the computation of the l2, 1-norm regularized regression model by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method---an optimal first-order black-box method for smooth conveX optimization.

...read moreread less

Abstract: The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the l2,1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the l2,1-norm regularization is that it encourages multiple predictors to share similar sparsity patterns. However, the resulting optimization problem is challenging to solve due to the non-smoothness of the l2,1-norm regularization. In this paper, we propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method-an optimal first-order black-box method for smooth convex optimization. A key building block in solving the reformulations is the Euclidean projection. We show that the Euclidean projection for the first reformulation can be analytically computed, while the Euclidean projection for the second one can be computed in linear time. Empirical evaluations on several data sets verify the efficiency of the proposed algorithms.

...read moreread less

630 citations

Journal Article•DOI•

Random forests for genomic data analysis.

[...]

Xi Chen¹, Hemant Ishwaran², Hemant Ishwaran¹•Institutions (2)

Vanderbilt University¹, University of Miami²

01 Jun 2012-Genomics

TL;DR: This article systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning.

...read moreread less

625 citations

Journal Article•DOI•

Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer's disease

[...]

Daoqiang Zhang¹, Daoqiang Zhang², Dinggang Shen²•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, University of North Carolina at Chapel Hill²

16 Jan 2012-NeuroImage

TL;DR: This paper proposes a general methodology, namely multi-modal multi-task (M3T) learning, to jointly predict multiple variables from multi- modal data, which can achieve better performance on both regression and classification tasks than the conventional learning methods.

...read moreread less

580 citations

Journal Article•DOI•

A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis

[...]

Cosmin Lazar¹, Jonatan Taminau¹, Stijn Meganck¹, David Steenhoff¹, Alain Coletta, Colin Molter, V. de Schaetzen, Robin Duque, Hugues Bersini, Ann Nowé¹ - Show less +6 more•Institutions (1)

VU University Amsterdam¹

01 Jul 2012-IEEE/ACM Transactions on Computational Biology and Bioinformatics

TL;DR: This survey focuses on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery, and presents them in a unified framework.

...read moreread less

Abstract: A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities.

...read moreread less

500 citations

Book Chapter•DOI•

Random Forest for Bioinformatics

[...]

Yanjun Qi

01 Jan 2012

TL;DR: The Random Forest technique, which includes an ensemble of decision trees and incorporates feature selection and interactions naturally in the learning process, is a popular choice because it is nonparametric, interpretable, efficient, and has high prediction accuracy for many types of data.

...read moreread less

Abstract: Modern biology has experienced an increased use of machine learning techniques for large scale and complex biological data analysis. In the area of Bioinformatics, the Random Forest (RF) [6] technique, which includes an ensemble of decision trees and incorporates feature selection and interactions naturally in the learning process, is a popular choice. It is nonparametric, interpretable, efficient, and has high prediction accuracy for many types of data. Recent work in computational biology has seen an increased use of RF, owing to its unique advantages in dealing with small sample size, high-dimensional feature space, and complex data structures.

...read moreread less

497 citations

Proceedings Article•

Unsupervised feature selection using nonnegative spectral analysis

[...]

Zechao Li¹, Yi Yang², Jing Liu¹, Xiaofang Zhou³, Hanqing Lu¹ - Show less +1 more•Institutions (3)

Chinese Academy of Sciences¹, Carnegie Mellon University², University of Queensland³

22 Jul 2012

TL;DR: A new unsupervised learning algorithm, namely Nonnegative Discriminative Feature Selection (NDFS), which exploits the discriminative information and feature correlation simultaneously to select a better feature subset.

...read moreread less

Abstract: In this paper, a new unsupervised learning algorithm, namely Nonnegative Discriminative Feature Selection (NDFS), is proposed. To exploit the discriminative information in unsupervised scenarios, we perform spectral clustering to learn the cluster labels of the input samples, during which the feature selection is performed simultaneously. The joint learning of the cluster labels and feature selection matrix enables NDFS to select the most discriminative features. To learn more accurate cluster labels, a nonnegative constraint is explicitly imposed to the class indicators. To reduce the redundant or even noisy features, l2,1-norm minimization constraint is added into the objective function, which guarantees the feature selection matrix sparse in rows. Our algorithm exploits the discriminative information and feature correlation simultaneously to select a better feature subset. A simple yet efficient iterative algorithm is designed to optimize the proposed objective function. Experimental results on different real world datasets demonstrate the encouraging performance of our algorithm over the state-of-the-arts.

...read moreread less

Journal Article•DOI•

Neighborhood Component Feature Selection for High-Dimensional Data

[...]

Wei Yang, Kuanquan Wang, Wangmeng Zuo¹•Institutions (1)

Harbin Institute of Technology¹

01 Jan 2012-Journal of Computers

TL;DR: A novel nearest neighbor-based feature weighting algorithm, which learns a feature Weighting vector by maximizing the expected leave-one-out classification accuracy with a regularization term, is proposed.

...read moreread less

Abstract: Feature selection is of considerable importance in data mining and machine learning, especially for high dimensional data. In this paper, we propose a novel nearest neighbor-based feature weighting algorithm, which learns a feature weighting vector by maximizing the expected leave-one-out classification accuracy with a regularization term. The algorithm makes no parametric assumptions about the distribution of the data and scales naturally to multiclass problems. Experiments conducted on artificial and real data sets demonstrate that the proposed algorithm is largely insensitive to the increase in the number of irrelevant features and performs better than the state-of-the-art methods in most cases.

...read moreread less

Journal Article•DOI•

Intrusion detection using naive bayes classifier with feature reduction

[...]

Saurabh Mukherjee¹, Neelam Sharma¹•Institutions (1)

Banasthali Vidyapith¹

01 Jan 2012-Procedia Technology

TL;DR: Empirical results show that selected reduced attributes give better performance to design IDS that is efficient and effective for network intrusion detection.

...read moreread less

Journal Article•

Feature selection via dependence maximization

[...]

Le Song¹, Alexander J. Smola², Arthur Gretton³, Justin Bedo⁴, Karsten M. Borgwardt³ - Show less +1 more•Institutions (4)

Georgia Institute of Technology¹, Yahoo!², Max Planck Society³, Australian National University⁴

01 Jan 2012-Journal of Machine Learning Research

TL;DR: This work introduces a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion, and shows that a number of existing feature selectors are special cases of this framework.

...read moreread less

Abstract: We introduce a framework for feature selection based on dependence maximization between the selected features and the labels of an estimation problem, using the Hilbert-Schmidt Independence Criterion. The key idea is that good features should be highly dependent on the labels. Our approach leads to a greedy procedure for feature selection. We show that a number of existing feature selectors are special cases of this framework. Experiments on both artificial and real-world data show that our feature selector works well in practice.

...read moreread less

Journal Article•DOI•

Discriminative Least Squares Regression for Multiclass Classification and Feature Selection

[...]

Shiming Xiang, Feiping Nie¹, Gaofeng Meng, Chunhong Pan, Changshui Zhang² - Show less +1 more•Institutions (2)

University of Texas at Arlington¹, Tsinghua University²

11 Sep 2012-IEEE Transactions on Neural Networks

TL;DR: The core idea is to enlarge the distance between different classes under the conceptual framework of LSR, and a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged.

...read moreread less

Abstract: This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called e-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the e-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.

...read moreread less

Proceedings Article•DOI•

Person Re-identification by Attributes.

[...]

Ryan Layne¹, Timothy M. Hospedales, Shaogang Gong¹•Institutions (1)

Queen Mary University of London¹

01 Jan 2012

TL;DR: This work proposes a novel method for re-identification that learns a selection and weighting of mid-level semantic attributes to describe people, an attribute-centric, parts-based feature representation that differs from and complements existing low-level features that rely purely on bottom-up statistics for feature selection.

...read moreread less

Abstract: Visually identifying a target individual reliably in a crowded environment observed by a distributed camera network is critical to a variety of tasks in managing business information, border control, and crime prevention. Automatic re-identification of a human candidate from public space CCTV video is challenging due to spatiotemporal visual feature variations and strong visual similarity between different people, compounded by low-resolution and poor quality video data. In this work, we propose a novel method for re-identification that learns a selection and weighting of mid-level semantic attributes to describe people. Specifically, the model learns an attribute-centric, parts-based feature representation. This differs from and complements existing low-level features for re-identification that rely purely on bottom-up statistics for feature selection, which are limited in discriminating and identifying reliably visual appearances of target people appearing in different camera views under certain degrees of occlusion due to crowdedness. Our experiments demonstrate the effectiveness of our approach compared to existing feature representations when applied to benchmarking datasets.

...read moreread less

Proceedings Article•DOI•

BBA: A Binary Bat Algorithm for Feature Selection

[...]

Rodrigo Y. M. Nakamura¹, Luis A. M. Pereira¹, Kelton A. P. Costa¹, Douglas Rodrigues¹, João Paulo Papa¹, Xin-She Yang - Show less +2 more•Institutions (1)

Sao Paulo State University¹

22 Aug 2012

TL;DR: The wrapper approach combines the power of exploration of the bats together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set.

...read moreread less

Abstract: Feature selection aims to find the most important information from a given set of features. As this task can be seen as an optimization problem, the combinatorial growth of the possible solutions may be in-viable for a exhaustive search. In this paper we propose a new nature-inspired feature selection technique based on the bats behaviour, which has never been applied to this context so far. The wrapper approach combines the power of exploration of the bats together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set. Experiments conducted in five public datasets have demonstrated that the proposed approach can outperform some well-known swarm-based techniques.

...read moreread less

Journal Article•DOI•

Structured sparsity through convex optimization

[...]

Francis Bach¹, Rodolphe Jenatton¹, Julien Mairal, Guillaume Obozinski¹•Institutions (1)

École Normale Supérieure¹

01 Nov 2012-Statistical Science

TL;DR: In this article, the authors consider situations where they are not only interested in sparsity, but where some structural prior knowledge is available as well, and show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables.

...read moreread less

Abstract: Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the $\ell_1$-norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of non-linear variable selection.

...read moreread less

Journal Article•DOI•

Comparison of model building and selection strategies

[...]

Paul F. Doherty¹, Gary C. White¹, Kenneth P. Burnham¹•Institutions (1)

Colorado State University¹

01 Feb 2012-Journal of Ornithology

TL;DR: In this article, the effect of all-combinations model strategy and model averaging strategies on parameter estimates and variable selection was investigated in the Cormack-Jolly-Seber data type.

...read moreread less

Abstract: One challenge an analyst often encounters when dealing with complex mark–recapture models is how to limit the number of a priori models. While all possible combinations of model structures on the different parameters (e.g., ϕ, p) can be considered, such a strategy often results in a burdensome number of models, leading to the use of ad hoc strategies to reduce the number of models constructed. For the Cormack–Jolly–Seber data type, one example of an ad hoc strategy is to hold a general ϕ model structure constant while investigating model structures on p, and then to hold the resulting best structure on p constant and investigate structures on ϕ. Many comparable strategies exist. The effect of following ad hoc strategies on parameter estimates as well as for variable selection and whether model averaging can ameliorate any problems are unknown. By means of a simulation study, we have investigated this informational gap by comparing the all-combinations model building strategy with two ad hoc strategies and with truth, as well as considering the results of model averaging. We found that model selection strategy had little effect on parameter estimator bias and precision and that model averaging did improve bias and precision slightly. In terms of variable selection (i.e., cumulative Akaike’s information criterion weights), model sets based on ad hoc strategies did not perform as well as those based on all combinations, as less important variables often had higher weights with the former than with the all possible combinations strategy. Increased sample size resulted in increased variable weights, with an infinite sample size resulting in all variable weights equaling 1 for variables with any predictive influence. Thus, the distinction between statistical importance (dependent on sample size) and biological importance must be recognized when utilizing cumulative weights. We recommend that all-combinations model strategy and model averaging be used. However, if an ad hoc strategy is relied upon to reduce the computational demand, parameter estimates will generally be comparable to the all-combinations strategy, but variable weights will not correspond to the all-combinations strategy.

...read moreread less

Proceedings Article•DOI•

Face alignment by Explicit Shape Regression

[...]

Xudong Cao¹, Yichen Wei¹, Fang Wen¹, Jian Sun¹•Institutions (1)

Microsoft¹

16 Jun 2012

TL;DR: This paper presents a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment that significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.

...read moreread less

Abstract: We present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment. Unlike previous regression-based approaches, we directly learn a vectorial regression function to infer the whole facial shape (a set of facial landmarks) from the image and explicitly minimize the alignment errors over the training data. The inherent shape constraint is naturally encoded into the regressor in a cascaded learning framework and applied from coarse to fine during the test, without using a fixed parametric shape model as in most previous methods. To make the regression more effective and efficient, we design a two-level boosted regression, shape-indexed features and a correlation-based feature selection method. This combination enables us to learn accurate models from large training data in a short time (20 minutes for 2,000 training images), and run regression extremely fast in test (15 ms for a 87 landmarks shape). Experiments on challenging data show that our approach significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.

...read moreread less

Proceedings Article•DOI•

Robust multi-task feature learning

[...]

Pinghua Gong¹, Jieping Ye², Changshui Zhang¹•Institutions (2)

Tsinghua University¹, Arizona State University²

12 Aug 2012

TL;DR: This paper proposes a Robust Multi-Task Feature Learning algorithm (rMTFL) which simultaneously captures a common set of features among relevant tasks and identifies outlier tasks, and provides a detailed theoretical analysis on the proposed rMTFL formulation.

...read moreread less

Abstract: Multi-task learning (MTL) aims to improve the performance of multiple related tasks by exploiting the intrinsic relationships among them. Recently, multi-task feature learning algorithms have received increasing attention and they have been successfully applied to many applications involving high dimensional data. However, they assume that all tasks share a common set of features, which is too restrictive and may not hold in real-world applications, since outlier tasks often exist. In this paper, we propose a Robust Multi-Task Feature Learning algorithm (rMTFL) which simultaneously captures a common set of features among relevant tasks and identifies outlier tasks. Specifically, we decompose the weight (model) matrix for all tasks into two components. We impose the well-known group Lasso penalty on row groups of the first component for capturing the shared features among relevant tasks. To simultaneously identify the outlier tasks, we impose the same group Lasso penalty but on column groups of the second component. We propose to employ the accelerated gradient descent to efficiently solve the optimization problem in rMTFL, and show that the proposed algorithm is scalable to large-size problems. In addition, we provide a detailed theoretical analysis on the proposed rMTFL formulation. Specifically, we present a theoretical bound to measure how well our proposed rMTFL approximates the true evaluation, and provide bounds to measure the error between the estimated weights of rMTFL and the underlying true weights. Moreover, by assuming that the underlying true weights are above the noise level, we present a sound theoretical result to show how to obtain the underlying true shared features and outlier tasks (sparsity patterns). Empirical studies on both synthetic and real-world data demonstrate that our proposed rMTFL is capable of simultaneously capturing shared features among tasks and identifying outlier tasks.

...read moreread less

Proceedings Article•DOI•

Group Sparse Optimization by Alternating Direction Method

[...]

Wei Deng¹, Wotao Yin¹, Yin Zhang¹•Institutions (1)

Rice University¹

22 Nov 2012-Proceedings of SPIE

TL;DR: This paper proposes efficient algorithms for group sparse optimization with mixed l2,1-regularization, which arises from the reconstruction of group sparse signals in compressive sensing, and the group Lasso problem in statistics and machine learning.

...read moreread less

Abstract: This paper proposes efficient algorithms for group sparse optimization with mixed l2,1-regularization, which arises from the reconstruction of group sparse signals in compressive sensing, and the group Lasso problem in statistics and machine learning. It is known that encoding the group information in addition to sparsity can often lead to better signal recovery/feature selection. The l2,1-regularization promotes group sparsity, but the resulting problem, due to the mixed-norm structure and possible grouping irregularity, is considered more difficult to solve than the conventional l1-regularized problem. Our approach is based on a variable splitting strategy and the classic alternating direction method (ADM). Two algorithms are presented, one derived from the primal and the other from the dual of the l2,1-regularized problem. The convergence of the proposed algorithms is guaranteed by the existing ADM theory. General group configurations such as overlapping groups and incomplete covers can be easily handled by our approach. Computational results show that on random problems the proposed ADM algorithms exhibit good efficiency, and strong stability and robustness.

...read moreread less

Journal Article•DOI•

Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images.

[...]

Carlton Chu¹, Ai Ling Hsu², Kun Hsien Chou², Peter A. Bandettini¹, Ching Po Lin² - Show less +1 more•Institutions (2)

National Institutes of Health¹, National Yang-Ming University²

01 Mar 2012-NeuroImage

TL;DR: It is shown that the most accurate characterizations are achieved by using prior knowledge of where to expect neurodegeneration (hippocampus and parahippocampal gyrus) and that feature selection does improve the classification accuracies, but it depends on the method adopted.

...read moreread less

Proceedings Article•DOI•

Beyond spatial pyramids: Receptive field learning for pooled image features

[...]

Yangqing Jia¹, Chang Huang, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

16 Jun 2012

TL;DR: This paper shows that learning more adaptive receptive fields increases performance even with a significantly smaller codebook size at the coding layer, and adopts the idea of over-completeness to learn the optimal pooling parameters.

...read moreread less

Abstract: In this paper we examine the effect of receptive field designs on classification accuracy in the commonly adopted pipeline of image classification. While existing algorithms usually use manually defined spatial regions for pooling, we show that learning more adaptive receptive fields increases performance even with a significantly smaller codebook size at the coding layer. To learn the optimal pooling parameters, we adopt the idea of over-completeness by starting with a large number of receptive field candidates, and train a classifier with structured sparsity to only use a sparse subset of all the features. An efficient algorithm based on incremental feature selection and retraining is proposed for fast learning. With this method, we achieve the best published performance on the CIFAR-10 dataset, using a much lower dimensional feature space than previous methods.

...read moreread less

Journal Article•DOI•

Likelihood-based selection and sharp parameter estimation.

[...]

Xiaotong Shen¹, Wei Pan¹, Yunzhang Zhu¹•Institutions (1)

University of Minnesota¹

31 Jan 2012-Journal of the American Statistical Association

TL;DR: Theoretically, it is shown that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency andsharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation.

...read moreread less

Abstract: In high-dimensional data analysis, feature selection becomes one effective means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency and sharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation. It permits up to exponentially many candidate features. Computationally, we develop difference convex methods to implement the computational surrogate through prime and dual subproblems. These results establish a central role of L 0 constrained and regularized likelihoods in feature selection and parameter estimation involving selection. As applications of the general method and theory, we perform feature selection...

...read moreread less

Journal Article•DOI•

A Selective Review of Group Selection in High-Dimensional Models

[...]

Jian Huang¹, Patrick Breheny², Shuangge Ma³•Institutions (3)

University of Iowa¹, University of Kentucky², Yale University³

01 Jan 2012-Statistical Science

TL;DR: A recent review of group selection methods for variable selection can be found in this article, where the authors give a selective review concerning methodological developments, theoretical properties and computational algorithms for group selection.

...read moreread less

Abstract: Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.

...read moreread less

Journal Article•DOI•

Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies

[...]

Peter Carbonetto¹, Matthew Stephens¹•Institutions (1)

University of Chicago¹

01 Mar 2012-Bayesian Analysis

TL;DR: This work assesses an alternative to MCMC based on a simple variational approximation to retain useful features of Bayesian variable selection at a reduced cost and illustrates how these results guide the use of variational inference for a genome-wide association study with thousands of samples and hundreds of thousands of variables.

...read moreread less

Abstract: The Bayesian approach to variable selection in regression is a powerful tool for tackling many scientific problems. Inference for variable selection models is usually implemented using Markov chain Monte Carlo (MCMC). Because MCMC can impose a high computational cost in studies with a large number of variables, we assess an alternative to MCMC based on a simple variational approximation. Our aim is to retain useful features of Bayesian variable selection at a reduced cost. Using simulations designed to mimic genetic association studies, we show that this simple variational approximation yields posterior inferences in some settings that closely match exact values. In less restrictive (and more realistic) conditions, we show that posterior probabilities of inclusion for individual variables are often incorrect, but variational estimates of other useful quantities|including posterior distributions of the hyperparameters|are remarkably accurate. We illustrate how these results guide the use of variational inference for a genome-wide association study with thousands of samples and hundreds of thousands of variables.

...read moreread less

Journal Article•DOI•

A novel probabilistic feature selection method for text classification

[...]

Alper Kursat Uysal¹, Serkan Gunal¹•Institutions (1)

Anadolu University¹

01 Dec 2012-Knowledge Based Systems

TL;DR: This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification that is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution.

...read moreread less

Abstract: High dimensionality of the feature space is one of the most important concerns in text classification problems due to processing time and accuracy considerations. Selection of distinctive features is therefore essential for text classification. This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification. The proposed method is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution. The comparison is carried out for different datasets, classification algorithms, and success measures. Experimental results explicitly indicate that DFS offers a competitive performance with respect to the abovementioned approaches in terms of classification accuracy, dimension reduction rate and processing time.

...read moreread less

Journal Article•DOI•

Decision tree based light weight intrusion detection using a wrapper approach

[...]

Siva S. Sivatha Sindhu¹, S. Geetha², Arputharaj Kannan¹•Institutions (2)

Anna University¹, Thiagarajar College of Engineering²

01 Jan 2012-Expert Systems With Applications

TL;DR: The lightweight IDS has been developed by using a wrapper based feature selection algorithm that maximizes the specificity and sensitivity of the IDS as well as by employing a neural ensemble decision tree iterative procedure to evolve optimal features.

...read moreread less

Abstract: The objective of this paper is to construct a lightweight Intrusion Detection System (IDS) aimed at detecting anomalies in networks. The crucial part of building lightweight IDS depends on preprocessing of network data, identifying important features and in the design of efficient learning algorithm that classify normal and anomalous patterns. Therefore in this work, the design of IDS is investigated from these three perspectives. The goals of this paper are (i) removing redundant instances that causes the learning algorithm to be unbiased (ii) identifying suitable subset of features by employing a wrapper based feature selection algorithm (iii) realizing proposed IDS with neurotree to achieve better detection accuracy. The lightweight IDS has been developed by using a wrapper based feature selection algorithm that maximizes the specificity and sensitivity of the IDS as well as by employing a neural ensemble decision tree iterative procedure to evolve optimal features. An extensive experimental evaluation of the proposed approach with a family of six decision tree classifiers namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern has been introduced.

...read moreread less

Posted Content•

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

[...]

Anne-Claire Haury¹, Fantine Mordelet², Paola Vera-Licona³, Paola Vera-Licona¹, Paola Vera-Licona⁴, Jean-Philippe Vert³, Jean-Philippe Vert¹, Jean-Philippe Vert⁴ - Show less +4 more•Institutions (4)

Mines ParisTech¹, Duke University², Curie Institute³, French Institute of Health and Medical Research⁴

06 May 2012-arXiv: Machine Learning

TL;DR: TIGRESS (Trustful Inference of Gene Regression using Stability Selection) as discussed by the authors is the state-of-the-art method for gene regulatory network inference using least angle regression (LARS) and stability selection.

...read moreread less

Abstract: Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (Trustful Inference of Gene REgulation using Stability Selection), was ranked among the top methods in the DREAM5 gene network reconstruction challenge. We investigate in depth the influence of the various parameters of the method and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference. TIGRESS reaches state-of-the-art performance on benchmark data. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on this http URL. Running TIGRESS online is possible on GenePattern: this http URL.

...read moreread less

Collapse