Journal ArticleDOI
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
TLDR
Saeys et al. as discussed by the authors proposed a large-scale analysis of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features.Abstract:
Motivation: Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method.
Results: Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-of-the-art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray datasets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of ~15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature.
Contact: yvan.saeys@psb.ugent.be
Supplementary information: Supplementary data are available at Bioinformatics online.read more
Citations
More filters
Journal ArticleDOI
A survey on feature selection methods
Girish Chandrashekar,Ferat Sahin +1 more
TL;DR: The objective is to provide a generic introduction to variable elimination which can be applied to a wide array of machine learning problems and focus on Filter, Wrapper and Embedded methods.
Journal ArticleDOI
Feature Selection: A Data Perspective
TL;DR: This survey revisits feature selection research from a data perspective and reviews representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data, and categorizes them into four main groups: similarity- based, information-theoretical-based, sparse-learning-based and statistical-based.
Journal ArticleDOI
Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota
Kristoffer Forslund,Falk Hildebrand,Falk Hildebrand,Trine Nielsen,Gwen Falony,Gwen Falony,Shinichi Sunagawa,Edi Prifti,Sara Vieira-Silva,Sara Vieira-Silva,Valborg Gudmundsdottir,Helle Krogh Pedersen,Manimozhiyan Arumugam,Karsten Kristiansen,Anita Y. Voigt,Anita Y. Voigt,Henrik Vestergaard,Rajna Hercog,Paul I. Costea,Jens Roat Kultima,Junhua Li,Torben Jørgensen,Torben Jørgensen,Florence Levenez,Joël Doré,H. Bjørn Nielsen,Søren Brunak,Søren Brunak,Jeroen Raes,Jeroen Raes,Jeroen Raes,Torben Hansen,Torben Hansen,Jun Wang,S. Dusko Ehrlich,S. Dusko Ehrlich,Peer Bork,Oluf Pedersen +37 more
TL;DR: A unified signature of gut microbiome shifts in T2D with a depletion of butyrate-producing taxa is reported, highlighting the need to disentangle gut microbiota signatures of specific human diseases from those of medication.
Journal ArticleDOI
Feature selection in machine learning: A new perspective
TL;DR: This study discusses several frequently-used evaluation measures for feature selection, and surveys supervised, unsupervised, and semi-supervised feature selection methods, which are widely applied in machine learning problems, such as classification and clustering.
Journal ArticleDOI
Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.
TL;DR: The recent progress of SVMs in cancer genomic studies is reviewed and the strength of the SVM learning and its future perspective incancer genomic applications is comprehended.
References
More filters
Journal ArticleDOI
An introduction to variable and feature selection
Isabelle Guyon,André Elisseeff +1 more
TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Journal ArticleDOI
Bootstrap Methods: Another Look at the Jackknife
TL;DR: In this article, the authors discuss the problem of estimating the sampling distribution of a pre-specified random variable R(X, F) on the basis of the observed data x.
Journal ArticleDOI
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Proceedings ArticleDOI
A training algorithm for optimal margin classifiers
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.