Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
TLDR
A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework and has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets.Abstract:
Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.read more
Citations
More filters
Journal ArticleDOI
MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis.
Jasmine Chong,Othman Soufan,Carin Li,Iurie Caraus,Shuzhao Li,Guillaume Bourque,David S. Wishart,Jianguo Xia +7 more
TL;DR: The user interface of MetaboAnalyst 4.0 has been reengineered to provide a more modern look and feel, as well as to give more space and flexibility to introduce new functions.
Journal ArticleDOI
mixOmics: An R package for 'omics feature selection and multiple data integration
TL;DR: MixOmics is introduced, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation and extends Projection to Latent Structure models for discriminant analysis.
Journal ArticleDOI
A review of variable selection methods in Partial Least Squares Regression
TL;DR: A review of available methods for variable selection within one of the many modeling approaches for high-throughput data, Partial Least Squares Regression, to get an understanding of the characteristics of the methods and to get a basis for selecting an appropriate method for own use.
Journal ArticleDOI
Biomarker development in the precision medicine era: lung cancer as a case study
TL;DR: Efforts at the national level of several countries to tie molecular measurement of samples to patient data via electronic medical records are the future of precision medicine research.
Journal ArticleDOI
Human–Agent Teaming for Multirobot Control: A Review of Human Factors Issues
TL;DR: The human factors literature on intelligent systems was reviewed, and two key human performance issues related to H-A teaming for multirobot control and some promising user interface design solutions to address these issues were discussed.
References
More filters
Journal ArticleDOI
Random Forests
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Book
The Nature of Statistical Learning Theory
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Related Papers (5)
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Yoav Benjamini,Yosef Hochberg +1 more