Gene Selection for Cancer Classification using Support Vector Machines

doi:10.1023/A:1012487302797

Open AccessJournal ArticleDOI

Gene Selection for Cancer Classification using Support Vector Machines

Isabelle Guyon, +3 more

- 11 Mar 2002 -

Machine Learning

- Vol. 46, Iss: 1, pp 389-422

TLDR

In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.

Abstract:

DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues. In this paper, we address the problem of selection of a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays. Using available training examples from cancer and normal patients, we build a classifier suitable for genetic diagnosis, as well as drug discovery. Previous attempts to address this problem select genes with correlation techniques. We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE). We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer. In contrast with the baseline method, our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error, while 64 genes are necessary for the baseline method to get the best result (one leave-one-out error). In the colon cancer database, using only 4 genes our method is 98% accurate, while the baseline method is only 86% accurate.

Gene Selection for Cancer Classification using Support Vector Machines

Citations

Data Mining: Practical Machine Learning Tools and Techniques

Regularization and variable selection via the elastic net

An introduction to variable and feature selection

A review of feature selection techniques in bioinformatics

What is a support vector machine

References

Support-Vector Networks

Statistical learning theory

Cluster analysis and display of genome-wide expression patterns

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

Pattern classification and scene analysis

Related Papers (5)

An introduction to variable and feature selection

Random Forests

The Nature of Statistical Learning Theory

LIBSVM: A library for support vector machines

Statistical learning theory