scispace - formally typeset
Open AccessJournal ArticleDOI

Centering, scaling, and transformations: improving the biological information content of metabolomics data

TLDR
Range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis).
Abstract
Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability. Different data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis. Different pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis). In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Principal component analysis

TL;DR: The paper focuses on the use of principal component analysis in typical chemometric areas but the results are generally applicable.
Journal ArticleDOI

MetaboAnalyst: a web server for metabolomic data analysis and interpretation

TL;DR: A freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst, which supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods.
Journal ArticleDOI

Using MetaboAnalyst 4.0 for Comprehensive and Integrative Metabolomics Data Analysis.

TL;DR: An overview of the main functional modules and the general workflow of MetaboAnalyst 4.0 is provided, followed by 12 detailed protocols: © 2019 by John Wiley & Sons, Inc.
Journal ArticleDOI

Using MetaboAnalyst 3.0 for Comprehensive Metabolomics Data Analysis.

TL;DR: This unit provides an overview of the main functional modules and the general workflow of the latest version of MetaboAnalyst (MetaboAn analyst 3.0), followed by eight detailed protocols.
Journal ArticleDOI

MetaboAnalyst 2.0—a comprehensive server for metabolomic data analysis

TL;DR: MetaboAnalyst 2.0 now contains dozens of new features and functions including new procedures for data filtering, data editing and data normalization and it also supports multi-group data analysis, two-factor analysis as well as time-series data analysis.
References
More filters
Journal ArticleDOI

KEGG: Kyoto Encyclopedia of Genes and Genomes

TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Book

Principal Component Analysis

TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.
Journal ArticleDOI

Cluster analysis and display of genome-wide expression patterns

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Reference EntryDOI

Principal Component Analysis

TL;DR: Principal component analysis (PCA) as discussed by the authors replaces the p original variables by a smaller number, q, of derived variables, the principal components, which are linear combinations of the original variables.
Journal ArticleDOI

An Analysis of Transformations

TL;DR: In this article, Lindley et al. make the less restrictive assumption that such a normal, homoscedastic, linear model is appropriate after some suitable transformation has been applied to the y's.
Related Papers (5)