Tumor classification by partial least squares using microarray gene expression data
Danh V. Nguyen,David M. Rocke +1 more
Reads0
Chats0
TLDR
A novel analysis procedure for classifying (predicting) human tumor samples based on microarray gene expressions is proposed and PLS proves superior to the well known dimension reduction method of Principal Components Analysis (PCA).Abstract:
Motivation: One important application of gene expression microarray data is classification of samples into categories, such as the type of tumor. The use of microarrays allows simultaneous monitoring of thousands of genes expressions per sample. This ability to measure gene expression en masse has resulted in data with the number of variables p (genes) far exceeding the number of samples N . Standard statistical methodologies in classification and prediction do not work well or even at all when N < p. Modification of existing statistical methodologies or development of new methodologies is needed for the analysis of microarray data. Results: We propose a novel analysis procedure for classifying (predicting) human tumor samples based on microarray gene expressions. This procedure involves dimension reduction using Partial Least Squares (PLS) and classification using Logistic Discrimination (LD) and Quadratic Discriminant Analysis (QDA). We compare PLS to the well known dimension reduction method of Principal Components Analysis (PCA). Under many circumstances PLS proves superior; we illustrate a condition when PCA particularly fails to predict well relative to PLS. The proposed methods were applied to five different microarray data sets involving various human tumor samples: (1) normal versus ovarian tumor; (2) Acute Myeloid Leukemia (AML) versus Acute Lymphoblastic Leukemia (ALL); (3) Diffuse Large B-cell Lymphoma (DLBCLL) versus B-cell Chronic Lymphocytic Leukemia (BCLL); (4) normal versus colon tumor; and (5) Non-SmallCell-Lung-Carcinoma (NSCLC) versus renal samples. Stability of classification results and methods were further assessed by re-randomization studies. Availability: The methodology can be implemented using a combination of standard statistical methods, available, for example, in SAS. Illustrative SAS code is available from the first author.read more
Citations
More filters
Journal ArticleDOI
Random forest: a classification and regression tool for compound classification and QSAR modeling.
Vladimir Svetnik,Andy Liaw,Christopher Tong,J. Christopher Culberson,Robert P. Sheridan,Bradley P. Feuston +5 more
TL;DR: It is the combination of relatively high prediction accuracy and its collection of desired features that makes Random Forest uniquely suited for modeling in cheminformatics.
Journal ArticleDOI
Minimum redundancy feature selection from microarray gene expression data.
Chris Ding,Hanchuan Peng +1 more
TL;DR: How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes.
Journal ArticleDOI
mixOmics: An R package for 'omics feature selection and multiple data integration
TL;DR: MixOmics is introduced, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation and extends Projection to Latent Structure models for discriminant analysis.
Journal ArticleDOI
The pls Package: Principal Component and Partial Least Squares Regression in R
Bjørn-Helge Mevik,Ron Wehrens +1 more
TL;DR: The pls package implements principal component regression (PCR) and partial least squares regression (PLSR) in R and is freely available from the Comprehensive R Archive Network (CRAN), licensed under the GNU General Public License (GPL).
Proceedings ArticleDOI
Minimum redundancy feature selection from microarray gene expression data
Chris Ding,Hanchuan Peng +1 more
TL;DR: Feature sets obtained through the minimum redundancy - maximum relevance framework represent broader spectrum of characteristics of phenotypes than those obtained through standard ranking methods; they are more robust, generalize well to unseen data, and lead to significantly improved classifications in extensive experiments on 5 gene expressions data sets.
References
More filters
Book ChapterDOI
Nonparametric Estimation from Incomplete Observations
Edward L. Kaplan,Paul Meier +1 more
TL;DR: In this article, the product-limit (PL) estimator was proposed to estimate the proportion of items in the population whose lifetimes would exceed t (in the absence of such losses), without making any assumption about the form of the function P(t).
Book
The Nature of Statistical Learning Theory
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Book
Applied Logistic Regression
David W. Hosmer,Stanley Lemeshow +1 more
TL;DR: Hosmer and Lemeshow as discussed by the authors provide an accessible introduction to the logistic regression model while incorporating advances of the last decade, including a variety of software packages for the analysis of data sets.
Journal ArticleDOI
Applied Logistic Regression.
TL;DR: Applied Logistic Regression, Third Edition provides an easily accessible introduction to the logistic regression model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables.
Book
Generalized Linear Models
Peter McCullagh,John A. Nelder +1 more
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Related Papers (5)
Gene Selection for Cancer Classification using Support Vector Machines
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Ash A. Alizadeh,Michael B. Eisen,R. Eric Davis,Izidore S. Lossos,Andreas Rosenwald,Jennifer C. Boldrick,Hajeer Sabet,Truc Tran,Xin Yu,John Powell,Liming Yang,Gerald E. Marti,Troy Moore,James I. Hudson,Li-Sheng Lu,David B. Lewis,Robert Tibshirani,Gavin Sherlock,Wing C. Chan,Timothy C. Greiner,Dennis D. Weisenburger,James O. Armitage,Roger A. Warnke,Ronald Levy,Wyndham H. Wilson,M. R. Grever,John C. Byrd,David Botstein,Patrick O. Brown,Louis M. Staudt +29 more