scispace - formally typeset
Search or ask a question
Journal ArticleDOI

phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.

22 Apr 2013-PLOS ONE (Public Library of Science)-Vol. 8, Iss: 4
TL;DR: The phyloseq project for R is a new open-source software package dedicated to the object-oriented representation and analysis of microbiome census data in R, which supports importing data from a variety of common formats, as well as many analysis techniques.
Abstract: Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors treated effluent from a paper mill in sequencing batch reactors (SBRs) and monitored the abundance and activity of NFB with a view to producing a sludge that could work as a bio-fertilizer.
Abstract: Nitrogen-fixing bacteria (NFB) can reduce nitrogen at ambient pressure and temperature. In this study, we treated effluent from a paper mill in sequencing batch reactors (SBRs) and monitored the abundance and activity of NFB with a view to producing a sludge that could work as a biofertilizer. Four reactors were inoculated with activated sludge enriched with NFB and fed with a high C/N waste (100:0.5) from a paper mill. Though the reactors were able to reduce the organic load of the wastewater by up to 89%, they did not have any nitrogen-fixing activity and showed a decrease in the putative number of NFB (quantified with qPCR). The most abundant species in the reactors treating high C/N paper mill wastewater was identified by Illumina MiSeq 16S rRNA gene amplicon sequencing as Methyloversatilis sp. (relative abundance of 4.4%). Nitrogen fixation was observed when the C/N ratio was increased by adding sucrose. We suspect that real-world biological nitrogen fixation (BNF) will only occur where there is a C/N ratio ≤100:0.07. Consequently, operators should actively avoid adding or allowing nitrogen in the waste streams if they wish to valorize their sludge and reduce running costs. PRACTITIONER POINTS: Efficient biological wastewater treatment of low nitrogen paper mill effluent was achieved without nutrient supplementation. The sludge was still capable of fixing nitrogen although this process was not observed in the wastewater treatment system. This high C/N wastewater treatment technology could be used with effluents from cassava flour, olive oil, wine and dairy industries.

3 citations

Journal ArticleDOI
TL;DR: In this paper, the authors studied the ecological patterns (dispersal and oceanographic factors) underlying the microbial community distribution in a linear span of 450 km along the estuarine-influenced Chilean Patagonian fjords.
Abstract: Fjords are sensitive areas affected by climate change and can act as a natural laboratory to study microbial ecological processes. The Chilean Patagonian fjords (41oS – 56oS), belonging to the Subantarctic ecosystem (46oS – 60oS), make up one of the world’s largest fjord systems. In this region, estuarine water (EW) strongly influences oceanographic conditions, generating sharp gradients of oxygen, salinity and nutrients, the effects of which on the microbial community structure are poorly understood. During the spring of 2017 we studied the ecological patterns (dispersal and oceanographic factors) underlying the microbial community distribution in a linear span of 450 km along the estuarine-influenced Chilean Patagonian fjords. Our results show that widespread microbial dispersion existed along the fjords where bacterioplankton exhibited dependence on the eukaryotic phytoplankton community composition. This dependence was particularly observed under the low chlorophyll-a conditions of the Baker Channel area, in which a significant relationship was revealed between SAR11 Clade III and the eukaryotic families Pyrenomonadaceae (Cryptophyte) and Coccomyxaceae (Chlorophyta). Furthermore, dissolved oxygen and salinity were revealed as the main drivers influencing the surface marine microbial communities in these fjords. A strong salinity gradient resulted in the segregation of the Baker Channel prokaryotic communities from the rest of the Patagonian fjords. Likewise, Microbacteriaceae, Burkholderiaceae and SAR11 Clade III, commonly found in freshwater, were strongly associated with EW conditions in these fjords. The direct effect of EW on the microbial community structure and diversity of the fjords exemplifies the significance that climate change and, in particular, deglaciation have on this marine region and its productivity.

3 citations

Journal ArticleDOI
TL;DR: In this article, the authors investigated the effects of temperature variations (12-80°C) on microbial communities and their capacity to mineralize acetate in aerobically incubated sediments sampled from a pristine aquifer.

3 citations

Journal ArticleDOI
Lichao Lu1, Dong Dong1, Marvin Yeung1, Zhuqiu Sun1, Jinying Xi1 
TL;DR: In this paper, a lab-scale fluidized bed reactor (FBR) was set up treating gaseous toluene and compared with a packed bed reactor with the same bed height of 150 cm.

3 citations

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Book
13 Aug 2009
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Abstract: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, its easy to: produce handsome, publication-quality plots, with automatic legends created from the plot specification superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and youll learn everything you need in the book. After reading this book youll be able to produce graphics customized precisely for your problems,and youll find it easy to get graphics out of your head and on to the screen or page.

29,504 citations

Journal ArticleDOI
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Abstract: Supplementary Figure 1 Overview of the analysis pipeline. Supplementary Table 1 Details of conventionally raised and conventionalized mouse samples. Supplementary Discussion Expanded discussion of QIIME analyses presented in the main text; Sequencing of 16S rRNA gene amplicons; QIIME analysis notes; Expanded Figure 1 legend; Links to raw data and processed output from the runs with and without denoising.

28,911 citations


"phyloseq: an R package for reproduc..." refers background or methods in this paper

  • ...clustering output formats like QIIME [11], mothur [12], the RDP-...

    [...]

  • ...packages/pipelines, including QIIME [11], mothur [12], and...

    [...]

  • ...We would also like to thank the developers of the open source packages on which phyloseq depends, in particular Rob Knight and his lab for QIIME [11], Hadley Wickham for the ggplot2 [57], reshape [89], and plyr [90] packages, as well as the Bioconductor and R teams [24,34]....

    [...]

  • ...Virtual machine image and cloud-deployed ‘‘pipeline’’ analyses [11,15,19] can further increase accessibility of analyses...

    [...]

  • ...Instead, phyloseq provides tools to read the output files of the most common OTU-clustering applications [7,11,12,14], and represents this data in R as an instance of the main data class....

    [...]

Journal ArticleDOI
TL;DR: M mothur is used as a case study to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments.
Abstract: mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the alpha and beta diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.

17,350 citations


"phyloseq: an R package for reproduc..." refers methods in this paper

  • ...Instead, phyloseq provides tools to read the output files of the most common OTU-clustering applications [7,11,12,14], and represents this data in R as an instance of the main data class....

    [...]

  • ...clustering output formats like QIIME [11], mothur [12], the RDP-...

    [...]

  • ...packages/pipelines, including QIIME [11], mothur [12], and...

    [...]

  • ...This PDF file contains a table summarizing a comparison of supported capabilities between phyloseq and QIIME [11], mothur [12], and the pair of packages OTUbase [35] and mcaGUI [88]....

    [...]

Related Papers (5)