Waste not, want not: why rarefying microbiome data is inadmissible.
Paul J. McMurdie,Susan Holmes +1 more
Reads0
Chats0
TLDR
It is advocated that investigators avoid rarefying altogether and supported statistical theory is provided that simultaneously accounts for library size differences and biological variability using an appropriate mixture model.Abstract:
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.read more
Citations
More filters
Journal ArticleDOI
Using “omics” and integrated multi-omics approaches to guide probiotic selection to mitigate chytridiomycosis and other emerging infectious diseases
Eria A. Rebollar,Rachael E. Antwis,Rachael E. Antwis,Rachael E. Antwis,Matthew H. Becker,Lisa K. Belden,Molly C. Bletz,Robert M. Brucker,Xavier A. Harrison,Myra C. Hughey,Jordan G. Kueneman,Andrew H. Loudon,Valerie J. McKenzie,Daniel Medina,Kevin P. C. Minbiole,Louise A. Rollins-Smith,Jenifer B. Walke,Sophie Weiss,Douglas C. Woodhams,Reid N. Harris +19 more
TL;DR: Using 16S rRNA gene amplicon sequencing and methods such as indicator species analysis, the Kolmogorov–Smirnov Measure, and co-occurrence networks to identify bacteria that are associated with pathogen resistance in field surveys and experimental trials is recommended.
Journal ArticleDOI
Navigating the labyrinth: a guide to sequence-based, community ecology of arbuscular mycorrhizal fungi
Miranda M. Hart,Kristin Aleklett,Pierre-Luc Chagnon,Cameron Egan,Stefano Ghignone,Thorunn Helgason,Ylva Lekberg,Maarja Öpik,Brian J. Pickles,Lauren P. Waller +9 more
TL;DR: The goal was to improve the quality and accessibility of NGS data for the AMF research community and demonstrated how different approaches can significantly alter analysis outcomes.
Journal ArticleDOI
Protistan community analysis: key findings of a large-scale molecular sampling.
Lars Grossmann,Manfred Jensen,Dominik Heider,Steffen Jost,Edvard Glücksman,Hanna Hartikainen,Shazia Mahamdallie,Shazia Mahamdallie,Michelle Gardner,Daniel Hoffmann,David Bass,David Bass,Jens Boenigk +12 more
TL;DR: It is shown that protistan community patterns are highly consistent within habitat types and geographic regions, provided that sample processing is standardised, and evidence is provided that distribution patterns are not solely resulting from random processes.
Journal ArticleDOI
A single early-in-life macrolide course has lasting effects on murine microbial network topology and immunity.
Victoria E. Ruiz,Thomas Battaglia,Zachary D. Kurtz,Luc Bijnens,Amy Ou,Isak Engstrand,Xuhui Zheng,Tadasu Iizumi,Briana J. Mullins,Christian L. Müller,Ken Cadwell,Richard Bonneau,Guillermo I. Perez-Perez,Martin J. Blaser,Martin J. Blaser +14 more
TL;DR: It is demonstrated that a single pulsed macrolide antibiotic treatment (PAT) course early in life is sufficient to lead to durable alterations to the murine intestinal microbiota, ileal gene expression, specific intestinal T-cell populations, and secretory IgA expression.
Journal ArticleDOI
Unravelling the collateral damage of antibiotics on gut bacteria
Lisa A. Maier,Camille V. Goemans,Jakob Wirbel,Michael Kuhn,Claudia Eberl,Mihaela Pruteanu,Patrick Müller,Sarela García-Santamarina,Elisabetta Cacace,Boyao Zhang,Cordula Gekeler,Tisya Banerjee,Exene Erin Anderson,Alessio Milanese,Ulrike Löber,Sofia K. Forslund,Kiran Raosaheb Patil,Michael B. Zimmermann,Bärbel Stecher,Georg Zeller,Peer Bork,Athanasios Typas +21 more
TL;DR: In this article, the authors characterized 144 antibiotics from a previous screen of more than 1,000 drugs on 38 representative human gut microbiome species and identified drugs that mitigate their collateral damage on commensal bacteria without compromising their efficacy against pathogens.
References
More filters
Journal Article
R: A language and environment for statistical computing.
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Journal ArticleDOI
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Yoav Benjamini,Yosef Hochberg +1 more
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Book
ggplot2: Elegant Graphics for Data Analysis
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Journal ArticleDOI
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI
QIIME allows analysis of high-throughput community sequencing data.
J. Gregory Caporaso,Justin Kuczynski,Jesse Stombaugh,Kyle Bittinger,Frederic D. Bushman,Elizabeth K. Costello,Noah Fierer,Antonio Gonzalez Peña,Julia K. Goodrich,Jeffrey I. Gordon,Gavin A. Huttley,Scott T. Kelley,Dan Knights,Jeremy E. Koenig,Ruth E. Ley,Catherine A. Lozupone,Daniel McDonald,Brian D. Muegge,Meg Pirrung,Jens Reeder,Joel Sevinsky,Peter J. Turnbaugh,William A. Walters,Jeremy Widmann,Tanya Yatsunenko,Jesse R. Zaneveld,Rob Knight,Rob Knight +27 more
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Related Papers (5)
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
QIIME allows analysis of high-throughput community sequencing data.
J. Gregory Caporaso,Justin Kuczynski,Jesse Stombaugh,Kyle Bittinger,Frederic D. Bushman,Elizabeth K. Costello,Noah Fierer,Antonio Gonzalez Peña,Julia K. Goodrich,Jeffrey I. Gordon,Gavin A. Huttley,Scott T. Kelley,Dan Knights,Jeremy E. Koenig,Ruth E. Ley,Catherine A. Lozupone,Daniel McDonald,Brian D. Muegge,Meg Pirrung,Jens Reeder,Joel Sevinsky,Peter J. Turnbaugh,William A. Walters,Jeremy Widmann,Tanya Yatsunenko,Jesse R. Zaneveld,Rob Knight,Rob Knight +27 more
Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
Patrick D. Schloss,Patrick D. Schloss,Sarah L. Westcott,Sarah L. Westcott,Thomas Ryabin,Justine R. Hall,Martin Hartmann,Emily B. Hollister,Ryan A. Lesniewski,Brian B. Oakley,Donovan H. Parks,Courtney J. Robinson,Jason W. Sahl,Blaz Stres,Gerhard G. Thallinger,David J. Van Horn,Carolyn F. Weber +16 more