scispace - formally typeset

Journal ArticleDOI

phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.

22 Apr 2013-PLOS ONE (Public Library of Science)-Vol. 8, Iss: 4

TL;DR: The phyloseq project for R is a new open-source software package dedicated to the object-oriented representation and analysis of microbiome census data in R, which supports importing data from a variety of common formats, as well as many analysis techniques.
Abstract: Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.
Topics: Bioconductor (53%)
Citations
More filters

Journal ArticleDOI
Evan Bolyen1, Jai Ram Rideout1, Matthew R. Dillon1, Nicholas A. Bokulich1  +120 moreInstitutions (47)
TL;DR: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and R.K.P. and partial support was also provided by the following: grants NIH U54CA143925 and U54MD012388.
Abstract: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. Partial support was also provided by the following: grants NIH U54CA143925 (J.G.C. and T.P.) and U54MD012388 (J.G.C. and T.P.); grants from the Alfred P. Sloan Foundation (J.G.C. and R.K.); ERCSTG project MetaPG (N.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences QYZDB-SSW-SMC021 (Y.B.); the Australian National Health and Medical Research Council APP1085372 (G.A.H., J.G.C., Von Bing Yap and R.K.); the Natural Sciences and Engineering Research Council (NSERC) to D.L.G.; and the State of Arizona Technology and Research Initiative Fund (TRIF), administered by the Arizona Board of Regents, through Northern Arizona University. All NCI coauthors were supported by the Intramural Research Program of the National Cancer Institute. S.M.G. and C. Diener were supported by the Washington Research Foundation Distinguished Investigator Award.

3,456 citations


Journal ArticleDOI
Paul J. McMurdie1, Susan Holmes1Institutions (1)
TL;DR: It is advocated that investigators avoid rarefying altogether and supported statistical theory is provided that simultaneously accounts for library size differences and biological variability using an appropriate mixture model.
Abstract: Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

1,746 citations


Cites methods from "phyloseq: an R package for reproduc..."

  • ...We have provided convenient wrappers for edgeR and DESeq that are tailored for microbiome count data, and these wrappers are included in the most recent release of the phyloseq package [53] with corresponding tutorials....

    [...]

  • ...These simulations, analyses, and graphics rely upon the cluster [58], foreach [59], ggplot2 [60], phyloseq [53], plyr [61], reshape2 [62], and ROCR [39] R packages; in addition to the DESeq [3], edgeR [2], and PoiClaClu [63] R packages for RNASeq data, and tools available in the standard R distribution [64]....

    [...]


Journal ArticleDOI
Guangchuang Yu1, David K. Smith1, Hongbo Zhu1, Yi Guan1  +1 moreInstitutions (1)
TL;DR: An r package, ggtree, which provides programmable visualization and annotation of phylogenetic trees, which can read more tree file formats than other softwares, and support visualization of phylo, multiphylo, phylo4, phyla4d, obkdata and phyloseq tree objects defined in other r packages.
Abstract: Summary We present an r package, ggtree, which provides programmable visualization and annotation of phylogenetic trees. ggtree can read more tree file formats than other softwares, including newick, nexus, NHX, phylip and jplace formats, and support visualization of phylo, multiphylo, phylo4, phylo4d, obkdata and phyloseq tree objects defined in other r packages. It can also extract the tree/branch/node-specific and other data from the analysis outputs of beast, epa, hyphy, paml, phylodog, pplacer, r8s, raxml and revbayes software, and allows using these data to annotate the tree. The package allows colouring and annotation of a tree by numerical/categorical node attributes, manipulating a tree by rotating, collapsing and zooming out clades, highlighting user selected clades or operational taxonomic units and exploration of a large tree by zooming into a selected portion. A two-dimensional tree can be drawn by scaling the tree width based on an attribute of the nodes. A tree can be annotated with an associated numerical matrix (as a heat map), multiple sequence alignment, subplots or silhouette images. The package ggtree is released under the artistic-2.0 license. The source code and documents are freely available through bioconductor (http://www.bioconductor.org/packages/ggtree).

1,580 citations


Cites methods from "phyloseq: an R package for reproduc..."

  • ...Some packages, including APE (Paradis, Claude & Strimmer 2004) and PHYTOOLS (Revell 2012), which are capable of displaying and annotating trees, are developed using the base graphics system of R. OUTBREAKTOOLS (Jombart et al. 2014) and PHYLOSEQ (McMurdie & Holmes 2013) extended GGPLOT2 to draw phylogenetic trees....

    [...]

  • ...For example, if we have plotted a tree without taxa labels, OUTBREAKTOOLS and PHYLOSEQ provide no easy way for general R users, who have little knowledge about the infrastructures of these packages, to add a layer of taxa labels....

    [...]

  • ...Even though OUTBREAKTOOLS and PHYLOSEQ are developed based on GGPLOT2, the most valuable part of GGPLOT2 syntax – adding layers of annotations – is not supported in these packages....

    [...]

  • ...…including APE (Paradis, Claude & Strimmer 2004) and PHYTOOLS (Revell 2012), which are capable of displaying and annotating trees, are developed using the base graphics system of R. OUTBREAKTOOLS (Jombart et al. 2014) and PHYLOSEQ (McMurdie & Holmes 2013) extended GGPLOT2 to draw phylogenetic trees....

    [...]


Journal ArticleDOI
TL;DR: A first systematic analysis of microbiota changes in the ileum and colon using multiple diets and investigating both fecal and mucosal samples demonstrates correlations between the microbiota and dysfunctions of gut, adipose tissue, and liver, independent of a specific disease-inducing diet.
Abstract: Development of non-alcoholic fatty liver disease (NAFLD) is linked to obesity, adipose tissue inflammation, and gut dysfunction, all of which depend on diet. So far, studies have mainly focused on diet-related fecal microbiota changes, but other compartments may be more informative on host health. We present a first systematic analysis of microbiota changes in the ileum and colon using multiple diets and investigating both fecal and mucosal samples. Ldlr−/−.Leiden mice received one of three different energy-dense (ED)-diets (n = 15/group) for 15 weeks. All of the ED diets induced obesity and metabolic risk factors, altered short-chain fatty acids (SCFA), and increased gut permeability and NAFLD to various extents. ED diets reduced the diversity of high-abundant bacteria and increased the diversity of low-abundant bacteria in all of the gut compartments. The ED groups showed highly variable, partially overlapping microbiota compositions that differed significantly from chow. Correlation analyses demonstrated that (1) specific groups of bacteria correlate with metabolic risk factors, organ dysfunction, and NAFLD endpoints, (2) colon mucosa had greater predictive value than other compartments, (3) correlating bacteria differed per compartment, and (4) some bacteria correlated with plasma SCFA levels. In conclusion, this comprehensive microbiota analysis demonstrates correlations between the microbiota and dysfunctions of gut, adipose tissue, and liver, independent of a specific disease-inducing diet.

1,082 citations


Journal ArticleDOI
21 Jul 2016-Nature
TL;DR: It is shown how the human gut microbiome impacts the serum metabolome and associates with insulin resistance in 277 non-diabetic Danish individuals and suggested that microbial targets may have the potential to diminish insulin resistance and reduce the incidence of common metabolic and cardiovascular disorders.
Abstract: Insulin resistance is a forerunner state of ischaemic cardiovascular disease and type 2 diabetes. Here we show how the human gut microbiome impacts the serum metabolome and associates with insulin resistance in 277 non-diabetic Danish individuals. The serum metabolome of insulin-resistant individuals is characterized by increased levels of branched-chain amino acids (BCAAs), which correlate with a gut microbiome that has an enriched biosynthetic potential for BCAAs and is deprived of genes encoding bacterial inward transporters for these amino acids. Prevotella copri and Bacteroides vulgatus are identified as the main species driving the association between biosynthesis of BCAAs and insulin resistance, and in mice we demonstrate that P. copri can induce insulin resistance, aggravate glucose intolerance and augment circulating levels of BCAAs. Our findings suggest that microbial targets may have the potential to diminish insulin resistance and reduce the incidence of common metabolic and cardiovascular disorders.

937 citations


References
More filters

Journal Article
01 Jan 2014-MSOR connections
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

229,202 citations


Journal ArticleDOI
11 Apr 2010-Nature Methods
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Abstract: Supplementary Figure 1 Overview of the analysis pipeline. Supplementary Table 1 Details of conventionally raised and conventionalized mouse samples. Supplementary Discussion Expanded discussion of QIIME analyses presented in the main text; Sequencing of 16S rRNA gene amplicons; QIIME analysis notes; Expanded Figure 1 legend; Links to raw data and processed output from the runs with and without denoising.

24,116 citations


"phyloseq: an R package for reproduc..." refers background or methods in this paper

  • ...clustering output formats like QIIME [11], mothur [12], the RDP-...

    [...]

  • ...packages/pipelines, including QIIME [11], mothur [12], and...

    [...]

  • ...We would also like to thank the developers of the open source packages on which phyloseq depends, in particular Rob Knight and his lab for QIIME [11], Hadley Wickham for the ggplot2 [57], reshape [89], and plyr [90] packages, as well as the Bioconductor and R teams [24,34]....

    [...]

  • ...Virtual machine image and cloud-deployed ‘‘pipeline’’ analyses [11,15,19] can further increase accessibility of analyses...

    [...]

  • ...Instead, phyloseq provides tools to read the output files of the most common OTU-clustering applications [7,11,12,14], and represents this data in R as an instance of the main data class....

    [...]


Book
13 Aug 2009-
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Abstract: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, its easy to: produce handsome, publication-quality plots, with automatic legends created from the plot specification superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and youll learn everything you need in the book. After reading this book youll be able to produce graphics customized precisely for your problems,and youll find it easy to get graphics out of your head and on to the screen or page.

23,839 citations


01 Jan 2012-

15,925 citations


Journal ArticleDOI
TL;DR: M mothur is used as a case study to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments.
Abstract: mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the alpha and beta diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.

14,946 citations


"phyloseq: an R package for reproduc..." refers methods in this paper

  • ...Instead, phyloseq provides tools to read the output files of the most common OTU-clustering applications [7,11,12,14], and represents this data in R as an instance of the main data class....

    [...]

  • ...clustering output formats like QIIME [11], mothur [12], the RDP-...

    [...]

  • ...packages/pipelines, including QIIME [11], mothur [12], and...

    [...]

  • ...This PDF file contains a table summarizing a comparison of supported capabilities between phyloseq and QIIME [11], mothur [12], and the pair of packages OTUbase [35] and mcaGUI [88]....

    [...]


Network Information
Related Papers (5)
Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
202245
20212,236
20201,761
20191,204
2018840
2017537