scispace - formally typeset
Open AccessJournal ArticleDOI

Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems

TLDR
A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework and has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets.
Abstract
Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis.

TL;DR: The user interface of MetaboAnalyst 4.0 has been reengineered to provide a more modern look and feel, as well as to give more space and flexibility to introduce new functions.
Journal ArticleDOI

mixOmics: An R package for 'omics feature selection and multiple data integration

TL;DR: MixOmics is introduced, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation and extends Projection to Latent Structure models for discriminant analysis.
Journal ArticleDOI

A review of variable selection methods in Partial Least Squares Regression

TL;DR: A review of available methods for variable selection within one of the many modeling approaches for high-throughput data, Partial Least Squares Regression, to get an understanding of the characteristics of the methods and to get a basis for selecting an appropriate method for own use.
Journal ArticleDOI

Biomarker development in the precision medicine era: lung cancer as a case study

TL;DR: Efforts at the national level of several countries to tie molecular measurement of samples to patient data via electronic medical records are the future of precision medicine research.
Journal ArticleDOI

Human–Agent Teaming for Multirobot Control: A Review of Human Factors Issues

TL;DR: The human factors literature on intelligent systems was reviewed, and two key human performance issues related to H-A teaming for multirobot control and some promising user interface design solutions to address these issues were discussed.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI

Gene Ontology: tool for the unification of biology

TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Related Papers (5)