Showing papers on "Feature selection published in 2009"

PDF

Open Access

Journal Article•DOI•

Normalized Mutual Information Feature Selection

[...]

Pablo A. Estevez¹, M. Tesmer, Claudio A. Perez¹, Jacek M. Zurada²•Institutions (2)

University of Chile¹, University of Louisville²

01 Feb 2009-IEEE Transactions on Neural Networks

TL;DR: A filter method of feature selection based on mutual information, called normalized mutual information feature selection (NMIFS), is presented and is combined with a genetic algorithm to form a hybrid filter/wrapper method called GAMIFS.

...read moreread less

Abstract: A filter method of feature selection based on mutual information, called normalized mutual information feature selection (NMIFS), is presented. NMIFS is an enhancement over Battiti's MIFS, MIFS-U, and mRMR methods. The average normalized mutual information is proposed as a measure of redundancy among features. NMIFS outperformed MIFS, MIFS-U, and mRMR on several artificial and benchmark data sets without requiring a user-defined parameter. In addition, NMIFS is combined with a genetic algorithm to form a hybrid filter/wrapper method called GAMIFS. This includes an initialization procedure and a mutation operator based on NMIFS to speed up the convergence of the genetic algorithm. GAMIFS overcomes the limitations of incremental search algorithms that are unable to find dependencies between groups of features.

...read moreread less

989 citations

Journal Article•DOI•

A review of Bayesian variable selection methods: what, how and which

[...]

Robert B. O'Hara, Mikko J. Sillanpää

01 Mar 2009-Bayesian Analysis

TL;DR: The results suggest that SSVS, reversible jump MCMC and adaptive shrinkage methods can all work well, but the choice of which method is better will depend on the priors that are used, and also on how they are implemented.

...read moreread less

Abstract: The selection of variables in regression problems has occupied the minds of many statisticians. Several Bayesian variable selection methods have been developed, and we concentrate on the following methods: Kuo & Mallick, Gibbs Variable Selection (GVS), Stochastic Search Variable Selection (SSVS), adaptive shrinkage with Jeffreys' prior or a Laplacian prior, and reversible jump MCMC. We review these methods, in the context of their different properties. We then implement the methods in BUGS, using both real and simulated data as examples, and investigate how the different methods perform in practice. Our results suggest that SSVS, reversible jump MCMC and adaptive shrinkage methods can all work well, but the choice of which method is better will depend on the priors that are used, and also on how they are implemented.

...read moreread less

740 citations

Journal Article•DOI•

A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

[...]

Bjoern H. Menze¹, B. Michael Kelm¹, Ralf Masuch, Uwe Himmelreich², Peter Bachert³, Wolfgang Petrich⁴, Wolfgang Petrich⁵, Fred A. Hamprecht⁵, Fred A. Hamprecht¹ - Show less +5 more•Institutions (5)

Interdisciplinary Center for Scientific Computing¹, Katholieke Universiteit Leuven², German Cancer Research Center³, Hoffmann-La Roche⁴, Heidelberg University⁵

10 Jul 2009-BMC Bioinformatics

TL;DR: The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random Forest classifier, in spite of their limitation to model linear dependencies only.

...read moreread less

Abstract: Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task.

...read moreread less

726 citations

Journal Article•DOI•

L1 Penalized Estimation in the Cox Proportional Hazards Model

[...]

Jelle J. Goeman¹•Institutions (1)

Leiden University Medical Center¹

24 Nov 2009-Biometrical Journal

TL;DR: A novel algorithm that efficiently computes L1 penalized (lasso) estimates of parameters in high‐dimensional models, based on a combination of gradient ascent optimization with the Newton–Raphson algorithm, which is described for a general likelihood function.

...read moreread less

Abstract: This article presents a novel algorithm that efficiently computes L(1) penalized (lasso) estimates of parameters in high-dimensional models. The lasso has the property that it simultaneously performs variable selection and shrinkage, which makes it very useful for finding interpretable prediction rules in high-dimensional data. The new algorithm is based on a combination of gradient ascent optimization with the Newton-Raphson algorithm. It is described for a general likelihood function and can be applied in generalized linear models and other models with an L(1) penalty. The algorithm is demonstrated in the Cox proportional hazards model, predicting survival of breast cancer patients using gene expression data, and its performance is compared with competing approaches. An R package, penalized, that implements the method, is available on CRAN.

...read moreread less

719 citations

Journal Article•DOI•

Environmental Sound Recognition With Time–Frequency Audio Features

[...]

Selina Chu¹, Shrikanth S. Narayanan¹, C.-C.J. Kuo¹•Institutions (1)

University of Southern California¹

01 Aug 2009-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An empirical feature analysis for audio environment characterization is performed and a matching pursuit algorithm is proposed to use to obtain effective time-frequency features to yield higher recognition accuracy for environmental sounds.

...read moreread less

Abstract: The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spectrum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to characterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characterization and propose to use the matching pursuit (MP) algorithm to obtain effective time-frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effectiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners.

...read moreread less

626 citations

A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

[...]

Bjoern H. Menze, Bernd Michael Kelm, Ralf Masuch, Uwe Himmerlreich, Peter Bachert, Wolfgang Petrich, Fred A. Hamprecht - Show less +3 more

01 Jul 2009

597 citations

Journal Article•DOI•

The composite absolute penalties family for grouped and hierarchical variable selection

[...]

Peng Zhao, Guilherme Rocha, Bin Yu

02 Sep 2009-arXiv: Statistics Theory

TL;DR: CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with $p\gg n$ and possibly mis-specified groupings, and iCAP is seen to be parsimonious in the experiments.

...read moreread less

Abstract: Extracting useful information from high-dimensional data is an important focus of today's statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the $L_1$-penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including $L_1$ to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the across-group and within-group levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. We propose using the BLASSO and cross-validation to compute CAP estimates in general. For a subfamily of CAP estimates involving only the $L_1$ and $L_{\infty}$ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived so that the regularization parameter is selected without cross-validation. CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with $p\gg n$ and possibly mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments.

...read moreread less

592 citations

Journal Article•DOI•

The composite absolute penalties family for grouped and hierarchical variable selection

[...]

Peng Zhao, Guilherme Rocha, Bin Yu

01 Dec 2009-Annals of Statistics

TL;DR: In this paper, the Composite Absolute Penalties (CAP) family is proposed to combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates.

...read moreread less

Abstract: Extracting useful information from high-dimensional data is an important focus of today’s statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L1-penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including L1 to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the across-group and within-group levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. We propose using the BLASSO and cross-validation to compute CAP estimates in general. For a subfamily of CAP estimates involving only the L1 and L∞ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived so that the regularization parameter is selected without cross-validation. CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with p≫n and possibly mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments.

...read moreread less

572 citations

Journal Article•DOI•

High-dimensional variable selection

[...]

Larry Wasserman¹, Kathryn Roeder•Institutions (1)

Carnegie Mellon University¹

01 Jan 2009-Annals of Statistics

TL;DR: This paper looks at the error rates and power of some multi-stage regression methods and considers three screening methods: the lasso, marginal regression, and forward stepwise regression.

...read moreread less

Abstract: This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as "screening" and the last stage as "cleaning." We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions.

...read moreread less

570 citations

Journal Article•DOI•

Feature selection for text classification with Naïve Bayes

[...]

Jingnian Chen¹, Houkuan Huang¹, Shengfeng Tian¹, Youli Qu¹•Institutions (1)

Beijing Jiaotong University¹

01 Apr 2009-Expert Systems With Applications

TL;DR: Two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets are presented: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM).

...read moreread less

Abstract: As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naive Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naive Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.

...read moreread less

534 citations

Journal Article•DOI•

New Approaches to Fuzzy-Rough Feature Selection

[...]

Richard Jensen¹, Qiang Shen¹•Institutions (1)

Aberystwyth University¹

01 Aug 2009-IEEE Transactions on Fuzzy Systems

TL;DR: Three new approaches to fuzzy-rough FS-based on fuzzy similarity relations based on crisp discernibility matrices are proposed and utilized and initial experimentation shows that the methods greatly reduce dimensionality while preserving classification accuracy.

...read moreread less

Abstract: There has been great interest in developing methodologies that are capable of dealing with imprecision and uncertainty. The large amount of research currently being carried out in fuzzy and rough sets is representative of this. Many deep relationships have been established, and recent studies have concluded as to the complementary nature of the two methodologies. Therefore, it is desirable to extend and hybridize the underlying concepts to deal with additional aspects of data imperfection. Such developments offer a high degree of flexibility and provide robust solutions and advanced tools for data analysis. Fuzzy-rough set-based feature (FS) selection has been shown to be highly useful at reducing data dimensionality but possesses several problems that render it ineffective for large datasets. This paper proposes three new approaches to fuzzy-rough FS-based on fuzzy similarity relations. In particular, a fuzzy extension to crisp discernibility matrices is proposed and utilized. Initial experimentation shows that the methods greatly reduce dimensionality while preserving classification accuracy.

...read moreread less

Proceedings Article•

Multi-task feature learning via efficient l 2, 1 -norm minimization

[...]

Jun Liu¹, Shuiwang Ji¹, Jieping Ye¹•Institutions (1)

Arizona State University¹

18 Jun 2009

TL;DR: In this paper, the authors consider the l2, 1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family, and propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method.

...read moreread less

Abstract: The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the l2, 1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the l2, 1-norm regularization is that it encourages multiple predictors to share similar sparsity patterns. However, the resulting optimization problem is challenging to solve due to the non-smoothness of the l2, 1-norm regularization. In this paper, we propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method---an optimal first-order black-box method for smooth convex optimization. A key building block in solving the reformulations is the Euclidean projection. We show that the Euclidean projection for the first reformulation can be analytically computed, while the Euclidean projection for the second one can be computed in linear time. Empirical evaluations on several data sets verify the efficiency of the proposed algorithms.

...read moreread less

Robust biomarker identification for cancer diagnosis using ensemble feature selection methods

[...]

Thomas Abeel, Thibault Helleputte, Yves Van de Peer, Pierre Dupont, Yvan Saeys - Show less +1 more

01 Jan 2009

TL;DR: It is shown that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances.

...read moreread less

Abstract: Motivation: Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method. Results: Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-ofthe-art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray data sets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of about 15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature.

...read moreread less

Proceedings Article•DOI•

More generality in efficient multiple kernel learning

[...]

Manik Varma¹, Bodla Rakesh Babu²•Institutions (2)

Microsoft¹, International Institute of Information Technology, Hyderabad²

14 Jun 2009

TL;DR: It is observed that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization while retaining all the efficiency of existing large scale optimization algorithms.

...read moreread less

Abstract: Recent advances in Multiple Kernel Learning (MKL) have positioned it as an attractive tool for tackling many supervised learning tasks. The development of efficient gradient descent based optimization schemes has made it possible to tackle large scale problems. Simultaneously, MKL based algorithms have achieved very good results on challenging real world applications. Yet, despite their successes, MKL approaches are limited in that they focus on learning a linear combination of given base kernels.In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization. This can be achieved while retaining all the efficiency of existing large scale optimization algorithms. To highlight the advantages of generalized kernel learning, we tackle feature selection problems on benchmark vision and UCI databases. It is demonstrated that the proposed formulation can lead to better results not only as compared to traditional MKL but also as compared to state-of-the-art wrapper and filter methods for feature selection.

...read moreread less

Journal Article•DOI•

Feature selection for multi-label naive Bayes classification

[...]

Min-Ling Zhang¹, José M. Peña², Víctor Robles²•Institutions (2)

Nanjing University¹, Technical University of Madrid²

01 Sep 2009-Information Sciences

TL;DR: This paper proposes a method called Mlnb which adapts the traditional naive Bayes classifiers to deal with multi-label instances and achieves comparable performance to other well-established multi- label learning algorithms.

...read moreread less

Journal Article•

Ultrahigh Dimensional Feature Selection: Beyond The Linear Model

[...]

Jianqing Fan¹, Richard J. Samworth², Yichao Wu²•Institutions (2)

Princeton University¹, University of Cambridge²

01 Dec 2009-Journal of Machine Learning Research

TL;DR: This paper extends ISIS, without explicit definition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case and improves ISIS by allowing feature deletion in the iterative process.

...read moreread less

Abstract: Variable selection in high-dimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking (Fan & Lv, 2008) or feature selection using a two-sample t-test in high-dimensional classification (Tibshirani et al., 2003). Within the context of the linear model, Fan & Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit definition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case. Even in the least-squares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in high-dimensional classification where the popularly used two-sample t-method fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology.

...read moreread less

Journal Article•DOI•

Shrinkage tuning parameter selection with a diverging number of parameters

[...]

Hansheng Wang¹, Bo Li², Chenlei Leng³•Institutions (3)

Peking University¹, Tsinghua University², National University of Singapore³

01 Jun 2009-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: In this article, the authors further enlarge the scope of applicability of the traditional Bayesian information criterion type criteria to the situation with a diverging number of parameters for both unpenalized and penalized estimators.

...read moreread less

Abstract: Contemporary statistical research frequently deals with problems involving a diverging number of parameters. For those problems, various shrinkage methods (e.g. the lasso and smoothly clipped absolute deviation) are found to be particularly useful for variable selection. Nevertheless, the desirable performances of those shrinkage methods heavily hinge on an appropriate selection of the tuning parameters. With a fixed predictor dimension, Wang and co-worker have demonstrated that the tuning parameters selected by a Bayesian information criterion type criterion can identify the true model consistently. In this work, similar results are further extended to the situation with a diverging number of parameters for both unpenalized and penalized estimators. Consequently, our theoretical results further enlarge not only the scope of applicability of the traditional Bayesian information criterion type criteria but also that of those shrinkage estimation methods.

...read moreread less

Journal Article•DOI•

A wrapper method for feature selection using Support Vector Machines

[...]

Sebastián Maldonado¹, Richard Weber¹•Institutions (1)

University of Chile¹

01 Jun 2009-Information Sciences

TL;DR: A novel wrapper Algorithm for Feature Selection, using Support Vector Machines with kernel functions, based on a sequential backward selection, using the number of errors in a validation subset as the measure to decide which feature to remove in each iteration.

...read moreread less

Journal Article•DOI•

Disease state prediction from resting state functional connectivity

[...]

R. Cameron Craddock¹, R. Cameron Craddock², Paul E. Holtzheimer², Xiaoping Hu¹, Helen S. Mayberg² - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, Emory University²

01 Dec 2009-Magnetic Resonance in Medicine

TL;DR: A support vector classifier was trained that reliably distinguishes healthy volunteers from clinically depressed patients and two feature selection algorithms were implemented that incorporate reliability information into the feature selection process.

...read moreread less

Abstract: The application of multivoxel pattern analysis methods has attracted increasing attention, particularly for brain state prediction and real-time functional MRI applications. Support vector classification is the most popular of these techniques, owing to reports that it has better prediction accuracy and is less sensitive to noise. Support vector classification was applied to learn functional connectivity patterns that distinguish patients with depression from healthy volunteers. In addition, two feature selection algorithms were implemented (one filter method, one wrapper method) that incorporate reliability information into the feature selection process. These reliability feature selections methods were compared to two previously proposed feature selection methods. A support vector classifier was trained that reliably distinguishes healthy volunteers from clinically depressed patients. The reliability feature selection methods outperformed previously utilized methods. The proposed framework for applying support vector classification to functional connectivity data is applicable to other disease states beyond major depression.

...read moreread less

Proceedings Article•DOI•

Recognising action as clouds of space-time interest points

[...]

Matteo Bregonzio¹, Shaogang Gong¹, Tao Xiang¹•Institutions (1)

Queen Mary University of London¹

20 Jun 2009

TL;DR: This paper proposes a novel action recognition approach which differs significantly from previous interest points based approaches in that only the global spatiotemporal distribution of the interest points are exploited.

...read moreread less

Abstract: Much of recent action recognition research is based on space-time interest points extracted from video using a Bag of Words (BOW) representation. It mainly relies on the discriminative power of individual local space-time descriptors, whilst ignoring potentially valuable information about the global spatio-temporal distribution of interest points. In this paper, we propose a novel action recognition approach which differs significantly from previous interest points based approaches in that only the global spatiotemporal distribution of the interest points are exploited. This is achieved through extracting holistic features from clouds of interest points accumulated over multiple temporal scales followed by automatic feature selection. Our approach avoids the non-trivial problems of selecting the optimal space-time descriptor, clustering algorithm for constructing a codebook, and selecting codebook size faced by previous interest points based methods. Our model is able to capture smooth motions, robust to view changes and occlusions at a low computation cost. Experiments using the KTH and WEIZMANN datasets demonstrate that our approach outperforms most existing methods.

...read moreread less

Journal Article•DOI•

Text feature selection using ant colony optimization

[...]

Mehdi Hosseinzadeh Aghdam¹, Nasser Ghasem-Aghaee¹, Mohammad Ehsan Basiri¹•Institutions (1)

University of Isfahan¹

01 Apr 2009-Expert Systems With Applications

TL;DR: This work presents a novel feature selection algorithm that is based on ant colony optimization that is inspired by observation on real ants in their search for the shortest paths to food sources and shows the superiority of the proposed algorithm on Reuters-21578 dataset.

...read moreread less

Abstract: Feature selection and feature extraction are the most important steps in classification systems. Feature selection is commonly used to reduce dimensionality of datasets with tens or hundreds of thousands of features which would be impossible to process further. One of the problems in which feature selection is essential is text categorization. A major problem of text categorization is the high dimensionality of the feature space; therefore, feature selection is the most important step in text categorization. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present a novel feature selection algorithm that is based on ant colony optimization. Ant colony optimization algorithm is inspired by observation on real ants in their search for the shortest paths to food sources. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of genetic algorithm, information gain and CHI on the task of feature selection in Reuters-21578 dataset. Simulation results on Reuters-21578 dataset show the superiority of the proposed algorithm.

...read moreread less

Journal Article•DOI•

Feature selection with dynamic mutual information

[...]

Huawen Liu¹, Jigui Sun¹, Lei Liu¹, Huijie Zhang²•Institutions (2)

Jilin University¹, Northeast Normal University²

01 Jul 2009-Pattern Recognition

TL;DR: A new feature selection algorithm based on dynamic mutual information, which is only estimated on unlabeled instances is proposed, which can bring most information measurements in previous algorithms together.

...read moreread less

Journal Article•DOI•

Sparse Canonical Correlation Analysis with Application to Genomic Data Integration

[...]

Elena Parkhomenko, David Tritchler, Joseph Beyene

06 Jan 2009-Statistical Applications in Genetics and Molecular Biology

TL;DR: This paper presents Sparse Canonical Correlation Analysis (SCCA) which examines the relationships between two types of variables and provides sparse solutions that include only small subsets of variables of each type by maximizing the correlation between the subsetsOf variables of different types while performing variable selection.

...read moreread less

Abstract: Large scale genomic studies with multiple phenotypic or genotypic measures may require the identification of complex multivariate relationships. In multivariate analysis a common way to inspect the relationship between two sets of variables based on their correlation is canonical correlation analysis, which determines linear combinations of all variables of each type with maximal correlation between the two linear combinations. However, in high dimensional data analysis, when the number of variables under consideration exceeds tens of thousands, linear combinations of the entire sets of features may lack biological plausibility and interpretability. In addition, insufficient sample size may lead to computational problems, inaccurate estimates of parameters and non-generalizable results. These problems may be solved by selecting sparse subsets of variables, i.e. obtaining sparse loadings in the linear combinations of variables of each type. In this paper we present Sparse Canonical Correlation Analysis (SCCA) which examines the relationships between two types of variables and provides sparse solutions that include only small subsets of variables of each type by maximizing the correlation between the subsets of variables of different types while performing variable selection. We also present an extension of SCCA--adaptive SCCA. We evaluate their properties using simulated data and illustrate practical use by applying both methods to the study of natural variation in human gene expression.

...read moreread less

Book Chapter•DOI•

On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification

[...]

Jorge M. Santos¹, Mark J. Embrechts²•Institutions (2)

Instituto Superior de Engenharia do Porto¹, Rensselaer Polytechnic Institute²

02 Oct 2009

TL;DR: This paper investigates the usability of this clustering validation measure in supervised classification problems by two different approaches: as a performance measure and in feature selection.

...read moreread less

Abstract: The Adjusted Rand Index (ARI) is frequently used in cluster validation since it is a measure of agreement between two partitions: one given by the clustering process and the other defined by external criteria. In this paper we investigate the usability of this clustering validation measure in supervised classification problems by two different approaches: as a performance measure and in feature selection. Since ARI measures the relation between pairs of dataset elements not using information from classes (labels) it can be used to detect problems with the classification algorithm specially when combined with conventional performance measures. Instead, if we use the class information, we can apply ARI also to perform feature selection. We present the results of several experiments where we have applied ARI both as a performance measure and for feature selection showing the validity of this index for the given tasks.

...read moreread less

Journal Article•DOI•

Performance of several variable-selection methods applied to real ecological data.

[...]

Paul A. Murtaugh¹•Institutions (1)

Oregon State University¹

01 Oct 2009-Ecology Letters

TL;DR: It is argued that there is no 'best' method of variable selection and that any of the regression-based approaches discussed here is capable of yielding useful predictive models.

...read moreread less

Abstract: I evaluated the predictive ability of statistical models obtained by applying seven methods of variable selection to 12 ecological and environmental data sets. Cross-validation, involving repeated splits of each data set into training and validation subsets, was used to obtain honest estimates of predictive ability that could be fairly compared among methods. There was surprisingly little difference in predictive ability among five methods based on multiple linear regression. Stepwise methods performed similarly to exhaustive algorithms for subset selection, and the choice of criterion for comparing models (Akaike's information criterion, Schwarz's Bayesian information criterion or F statistics) had little effect on predictive ability. For most of the data sets, two methods based on regression trees yielded models with substantially lower predictive ability. I argue that there is no 'best' method of variable selection and that any of the regression-based approaches discussed here is capable of yielding useful predictive models.

...read moreread less

Journal Article•DOI•

A group bridge approach for variable selection

[...]

Jian Huang¹, Shuange Ma², Huiliang Xie³, Cun-Hui Zhang⁴•Institutions (4)

University of Iowa¹, Yale University², University of Miami³, Rutgers University⁴

01 Jun 2009-Biometrika

TL;DR: The proposed group bridge approach is a penalized regularization method that uses a specially designed group bridge penalty that has the oracle group selection property, in that it can correctly select important groups with probability converging to one.

...read moreread less

Abstract: In multiple regression problems when covariates can be naturally grouped, it is important to carry out feature selection at the group and within-group individual variable levels simultaneously. The existing methods, including the lasso and group lasso, are designed for either variable selection or group selection, but not for both. We propose a group bridge approach that is capable of simultaneous selection at both the group and within-group individual variable levels. The proposed approach is a penalized regularization method that uses a specially designed group bridge penalty. It has the oracle group selection property, in that it can correctly select important groups with probability converging to one. In contrast, the group lasso and group least angle regression methods in general do not possess such an oracle property in group selection. Simulation studies indicate that the group bridge has superior performance in group and individual variable selection relative to several existing methods.

...read moreread less

Proceedings Article•DOI•

Steganalysis by subtractive pixel adjacency matrix

[...]

Tomáš Pevný, Patrick Bas, Jessica Fridrich¹•Institutions (1)

Binghamton University¹

07 Sep 2009

TL;DR: A method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is least significant bit (LSB) matching.

...read moreread less

Abstract: This paper presents a novel method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is LSB matching. First, arguments are provided for modeling differences between adjacent pixels using first-order and second-order Markov chains. Subsets of sample transition probability matrices are then used as features for a steganalyzer implemented by support vector machines. The accuracy of the presented steganalyzer is evaluated on LSB matching and four different databases. The steganalyzer achieves superior accuracy with respect to prior art and provides stable results across various cover sources. Since the feature set based on second-order Markov chain is high-dimensional, we address the issue of curse of dimensionality using a feature selection algorithm and show that the curse did not occur in our experiments.

...read moreread less

Journal Article•DOI•

Fault detection and diagnosis in process data using one-class support vector machines

[...]

Sankar Mahadevan¹, Sirish L. Shah¹•Institutions (1)

University of Alberta¹

01 Dec 2009-Journal of Process Control

TL;DR: It is shown that the proposed algorithm outperformed PCA and DPCA both in terms of detection and diagnosis of faults.

...read moreread less

Journal Article•DOI•

Performance of feature-selection methods in the classification of high-dimension data

[...]

Jianping Hua¹, Waibhav Tembe¹, Edward R. Dougherty¹•Institutions (1)

Translational Genomics Research Institute¹

01 Mar 2009-Pattern Recognition

TL;DR: This study compares some basic feature- selection methods in settings involving thousands of features, using both model-based synthetic data and real data and evaluates the performances of feature-selection algorithms for different distribution models and classifiers.

...read moreread less

Journal Article•DOI•

Discriminant Saliency, the Detection of Suspicious Coincidences, and Applications to Visual Recognition

[...]

Dashan Gao¹, Sunhyoung Han², Nuno Vasconcelos²•Institutions (2)

General Electric¹, University of California, San Diego²

01 Jun 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is shown that Barlow's principle of inference by the detection of suspicious coincidences enables computationally efficient saliency measures which are nearly optimal for classification.

...read moreread less

Abstract: A discriminant formulation of top-down visual saliency, intrinsically connected to the recognition problem, is proposed. The new formulation is shown to be closely related to a number of classical principles for the organization of perceptual systems, including infomax, inference by detection of suspicious coincidences, classification with minimal uncertainty, and classification with minimum probability of error. The implementation of these principles with computational parsimony, by exploitation of the statistics of natural images, is investigated. It is shown that Barlow's principle of inference by the detection of suspicious coincidences enables computationally efficient saliency measures which are nearly optimal for classification. This principle is adopted for the solution of the two fundamental problems in discriminant saliency, feature selection and saliency detection. The resulting saliency detector is shown to have a number of interesting properties, and act effectively as a focus of attention mechanism for the selection of interest points according to their relevance for visual recognition. Experimental evidence shows that the selected points have good performance with respect to 1) the ability to localize objects embedded in significant amounts of clutter, 2) the ability to capture information relevant for image classification, and 3) the richness of the set of visual attributes that can be considered salient.

...read moreread less

Collapse