Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

doi:10.1093/BIOINFORMATICS/BTP630

Journal ArticleDOI

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

Thomas Abeel, +4 more

- 01 Feb 2010 -

Bioinformatics

- Vol. 26, Iss: 3, pp 392-398

TLDR

Saeys et al. as discussed by the authors proposed a large-scale analysis of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features.

Abstract:

Motivation: Biomarker discovery is an important topic in biomedical applications of computational biology, including applications such as gene and SNP selection from high-dimensional data. Surprisingly, the stability with respect to sampling variation or robustness of such selection processes has received attention only recently. However, robustness of biomarkers is an important issue, as it may greatly influence subsequent biological validations. In addition, a more robust set of markers may strengthen the confidence of an expert in the results of a selection method. Results: Our first contribution is a general framework for the analysis of the robustness of a biomarker selection algorithm. Secondly, we conducted a large-scale analysis of the recently introduced concept of ensemble feature selection, where multiple feature selections are combined in order to increase the robustness of the final set of selected features. We focus on selection methods that are embedded in the estimation of support vector machines (SVMs). SVMs are powerful classification models that have shown state-of-the-art performance on several diagnosis and prognosis tasks on biological data. Their feature selection extensions also offered good results for gene selection tasks. We show that the robustness of SVMs for biomarker discovery can be substantially increased by using ensemble feature selection techniques, while at the same time improving upon classification performances. The proposed methodology is evaluated on four microarray datasets showing increases of up to almost 30% in robustness of the selected biomarkers, along with an improvement of ~15% in classification performance. The stability improvement with ensemble methods is particularly noticeable for small signature sizes (a few tens of genes), which is most relevant for the design of a diagnosis or prognosis model from a gene signature. Contact: yvan.saeys@psb.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online.

Robust biomarker identification for cancer diagnosis with ensemble feature selection methods

Citations

A survey on feature selection methods

Feature Selection: A Data Perspective

Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota

Feature selection in machine learning: A new perspective

Applications of Support Vector Machine (SVM) Learning in Cancer Genomics.

References

An introduction to variable and feature selection

Bootstrap Methods: Another Look at the Jackknife

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Exploratory data analysis

A training algorithm for optimal margin classifiers

Related Papers (5)

A review of feature selection techniques in bioinformatics

Gene Selection for Cancer Classification using Support Vector Machines

An introduction to variable and feature selection

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Random Forests