scispace - formally typeset
Open AccessJournal ArticleDOI

Understanding sequencing data as compositions: an outlook and review.

Reads0
Chats0
TLDR
The principles of compositional data analysis (CoDA) are summarized, evidence is provided for why sequencing data are compositional, methods available for analyzing sequencingData are discussed, and future directions with regard to this field of study are highlighted.
Abstract
Motivation Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other components within that sample. This property arises from the assay technology itself, whereby the number of counts recorded for each sample is constrained by an arbitrary total sum (i.e. library size). Consequently, sequencing data, as compositional data, exist in a non-Euclidean space that, without normalization or transformation, renders invalid many conventional analyses, including distance measures, correlation coefficients and multivariate statistical models. Results The purpose of this review is to summarize the principles of compositional data analysis (CoDA), provide evidence for why sequencing data are compositional, discuss compositionally valid methods available for analyzing sequencing data, and highlight future directions with regard to this field of study. Supplementary information Supplementary data are available at Bioinformatics online.

read more

Citations
More filters
Journal ArticleDOI

Benchmarking Metagenomics Tools for Taxonomic Classification.

TL;DR: The key metrics used to assess performance are described, a framework for the comparison of additional classifiers is offered, and the future of metagenomic data analysis is discussed.
Journal ArticleDOI

A field guide for the compositional analysis of any-omics data.

TL;DR: This work synthesizes the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data and proposes the log-ratio transformation as a general solution to answer the question, “Relative to some important activity of the cell, what is changing?”
Journal ArticleDOI

Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment.

TL;DR: In this paper, a review of the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities, is presented.
Journal ArticleDOI

Reproducible changes in the gut microbiome suggest a shift in microbial and host metabolism during spaceflight

TL;DR: Light is shed on the specific environmental factors that contributed to a robust effect on the gut microbiome during spaceflight with important implications for mammalian metabolism and provide a basis for future efforts to develop microbiota-based countermeasures that mitigate risks to crew health during long-term human space expeditions.
Journal ArticleDOI

Variable selection in microbiome compositional data analysis

TL;DR: A reproducible vignette is provided for the application of selbal, a forward selection approach for the identification of compositional balances, and clr- lasso and coda-lasso, two penalized regression models for compositional data analysis, to enable researchers to fully leverage their potential in microbiome studies.
References
More filters
Journal ArticleDOI

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI

Differential expression analysis for sequence count data.

Simon Anders, +1 more
- 27 Oct 2010 - 
TL;DR: A method based on the negative binomial distribution, with variance and mean linked by local regression, is proposed and an implementation, DESeq, as an R/Bioconductor package is presented.
Journal ArticleDOI

Mapping and quantifying mammalian transcriptomes by RNA-Seq.

TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Journal ArticleDOI

Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments

TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Journal ArticleDOI

RNA-Seq: a revolutionary tool for transcriptomics

TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Related Papers (5)