Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes.

doi:10.1186/1471-2105-6-148

Open AccessJournal ArticleDOI

Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes.

Thanyaluk Jirapech-Umpai, +1 more

- 15 Jun 2005 -

BMC Bioinformatics

- Vol. 6, Iss: 1, pp 148-148

Chats0

TLDR

An evolutionary algorithm is applied to identify the near-optimal set of predictive genes that classify the data and a Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia.

Abstract:

In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed. In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software [3] is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings – indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors. The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A review of feature selection techniques in bioinformatics

Yvan Saeys, +2 more

- 10 Sep 2007 -

Bioinformatics

TL;DR: A basic taxonomy of feature selection techniques is provided, providing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.

...read moreread less

Journal ArticleDOI

Gene selection and classification of microarray data using random forest

Ramon Diaz-Uriarte, +1 more

- 06 Jan 2006 -

BMC Bioinformatics

TL;DR: It is shown that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy.

...read moreread less

Journal ArticleDOI

A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.

Zena M. Hira, +1 more

- 11 Jun 2015 -

Advances in Bioinformatics

TL;DR: Various ways of performing dimensionality reduction on high-dimensional microarray data are summarised to provide a clearer idea of when to use each one of them for saving computational time and resources.

...read moreread less

Journal ArticleDOI

Characterization of MicroRNA Expression Levels and Their Biological Correlates in Human Cancer Cell Lines

Arti B. Gaur, +7 more

- 15 Mar 2007 -

Cancer Research

TL;DR: Evidence that microRNA expression patterns may mark specific biological characteristics of tumors and/or mediate biological activities important for the pathobiology of malignant tumors is provided.

...read moreread less

Journal ArticleDOI

A review of microarray datasets and applied feature selection methods

Verónica Bolón-Canedo, +5 more

- 01 Oct 2014 -

Information Sciences

TL;DR: An experimental evaluation on the most representative datasets using well-known feature selection methods is presented, bearing in mind that the aim is not to provide the best feature selection method, but to facilitate their comparative study by the research community.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Todd R. Golub, +12 more

- 15 Oct 1999 -

Science

TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

...read moreread less

Book

An Introduction to Genetic Algorithms

Melanie Mitchell

TL;DR: An Introduction to Genetic Algorithms focuses in depth on a small set of important and interesting topics -- particularly in machine learning, scientific modeling, and artificial life -- and reviews a broad span of research, including the work of Mitchell and her colleagues.

...read moreread less

Journal ArticleDOI

Comparison of discrimination methods for the classification of tumors using gene expression data

Sandrine Dudoit, +2 more

- 01 Mar 2002 -

Journal of the American Statistical Asso...

TL;DR: Different discrimination methods for the classification of tumors based on gene expression data include nearest-neighbor classifiers, linear discriminant analysis, and classification trees, which are applied to datasets from three recently published cancer gene expression studies.

...read moreread less

Journal ArticleDOI

Systematic variation in gene expression patterns in human cancer cell lines.

Douglas T. Ross, +17 more

- 01 Mar 2000 -

Nature Genetics

TL;DR: Using cDNA microarrays to explore the variation in expression of approximately 8,000 unique genes among the 60 cell lines used in the National Cancer Institute's screen for anti-cancer drugs provided a novel molecular characterization of this important group of human cell lines and their relationships to tumours in vivo.

...read moreread less

Journal ArticleDOI

Tissue classification with gene expression profiles.

Amir Ben-Dor, +5 more

- 01 Jan 2000 -

Journal of Computational Biology

TL;DR: This work examines three sets of gene expression data measured across sets of tumor(s) and normal clinical samples, and presents results of performing leave-one-out cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM, AdaBoost and a novel clustering-based classification technique.

...read moreread less