Tumor classification by partial least squares using microarray gene expression data

doi:10.1093/BIOINFORMATICS/18.1.39

Open AccessJournal ArticleDOI

Tumor classification by partial least squares using microarray gene expression data

Danh V. Nguyen, +1 more

- 01 Jan 2002 -

Bioinformatics

- Vol. 18, Iss: 1, pp 39-50

Chats0

TLDR

A novel analysis procedure for classifying (predicting) human tumor samples based on microarray gene expressions is proposed and PLS proves superior to the well known dimension reduction method of Principal Components Analysis (PCA).

Abstract:

Motivation: One important application of gene expression microarray data is classification of samples into categories, such as the type of tumor. The use of microarrays allows simultaneous monitoring of thousands of genes expressions per sample. This ability to measure gene expression en masse has resulted in data with the number of variables p (genes) far exceeding the number of samples N . Standard statistical methodologies in classification and prediction do not work well or even at all when N < p. Modification of existing statistical methodologies or development of new methodologies is needed for the analysis of microarray data. Results: We propose a novel analysis procedure for classifying (predicting) human tumor samples based on microarray gene expressions. This procedure involves dimension reduction using Partial Least Squares (PLS) and classification using Logistic Discrimination (LD) and Quadratic Discriminant Analysis (QDA). We compare PLS to the well known dimension reduction method of Principal Components Analysis (PCA). Under many circumstances PLS proves superior; we illustrate a condition when PCA particularly fails to predict well relative to PLS. The proposed methods were applied to five different microarray data sets involving various human tumor samples: (1) normal versus ovarian tumor; (2) Acute Myeloid Leukemia (AML) versus Acute Lymphoblastic Leukemia (ALL); (3) Diffuse Large B-cell Lymphoma (DLBCLL) versus B-cell Chronic Lymphocytic Leukemia (BCLL); (4) normal versus colon tumor; and (5) Non-SmallCell-Lung-Carcinoma (NSCLC) versus renal samples. Stability of classification results and methods were further assessed by re-randomization studies. Availability: The methodology can be implemented using a combination of standard statistical methods, available, for example, in SAS. Illustrative SAS code is available from the first author.

Tumor classification by partial least squares using microarray gene expression data

Citations

Random forest: a classification and regression tool for compound classification and QSAR modeling.

Minimum redundancy feature selection from microarray gene expression data.

mixOmics: An R package for 'omics feature selection and multiple data integration

The pls Package: Principal Component and Partial Least Squares Regression in R

Minimum redundancy feature selection from microarray gene expression data

References

Nonparametric Estimation from Incomplete Observations

The Nature of Statistical Learning Theory

Applied Logistic Regression

Applied Logistic Regression.

Generalized Linear Models

Related Papers (5)

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Comparison of discrimination methods for the classification of tumors using gene expression data

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

Gene Selection for Cancer Classification using Support Vector Machines

Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling