scispace - formally typeset
Open AccessJournal ArticleDOI

Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes.

Thanyaluk Jirapech-Umpai, +1 more
- 15 Jun 2005 - 
- Vol. 6, Iss: 1, pp 148-148
Reads0
Chats0
TLDR
An evolutionary algorithm is applied to identify the near-optimal set of predictive genes that classify the data and a Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia.
Abstract
In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed. In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software [3] is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings – indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors. The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A review of feature selection techniques in bioinformatics

TL;DR: A basic taxonomy of feature selection techniques is provided, providing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.
Journal ArticleDOI

Gene selection and classification of microarray data using random forest

TL;DR: It is shown that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy.
Journal ArticleDOI

A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.

TL;DR: Various ways of performing dimensionality reduction on high-dimensional microarray data are summarised to provide a clearer idea of when to use each one of them for saving computational time and resources.
Journal ArticleDOI

Characterization of MicroRNA Expression Levels and Their Biological Correlates in Human Cancer Cell Lines

TL;DR: Evidence that microRNA expression patterns may mark specific biological characteristics of tumors and/or mediate biological activities important for the pathobiology of malignant tumors is provided.
Journal ArticleDOI

A review of microarray datasets and applied feature selection methods

TL;DR: An experimental evaluation on the most representative datasets using well-known feature selection methods is presented, bearing in mind that the aim is not to provide the best feature selection method, but to facilitate their comparative study by the research community.
References
More filters
Journal ArticleDOI

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Book

An Introduction to Genetic Algorithms

TL;DR: An Introduction to Genetic Algorithms focuses in depth on a small set of important and interesting topics -- particularly in machine learning, scientific modeling, and artificial life -- and reviews a broad span of research, including the work of Mitchell and her colleagues.
Journal ArticleDOI

Comparison of discrimination methods for the classification of tumors using gene expression data

TL;DR: Different discrimination methods for the classification of tumors based on gene expression data include nearest-neighbor classifiers, linear discriminant analysis, and classification trees, which are applied to datasets from three recently published cancer gene expression studies.
Journal ArticleDOI

Systematic variation in gene expression patterns in human cancer cell lines.

TL;DR: Using cDNA microarrays to explore the variation in expression of approximately 8,000 unique genes among the 60 cell lines used in the National Cancer Institute's screen for anti-cancer drugs provided a novel molecular characterization of this important group of human cell lines and their relationships to tumours in vivo.
Journal ArticleDOI

Tissue classification with gene expression profiles.

TL;DR: This work examines three sets of gene expression data measured across sets of tumor(s) and normal clinical samples, and presents results of performing leave-one-out cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM, AdaBoost and a novel clustering-based classification technique.
Related Papers (5)