scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data.

TL;DR: The potential of Fast UniFrac is shown using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens.
Abstract: Next-generation sequencing techniques, and PhyloChip, have made simultaneous phylogenetic analyses of hundreds of microbial communities possible Insight into community structure has been limited by the inability to integrate and visualize such vast datasets Fast UniFrac overcomes these issues, allowing integration of larger numbers of sequences and samples into a single analysis Its new array-based implementation offers orders of magnitude improvements over the original version New 3D visualization of principal coordinates analysis results, with the option to view multiple coordinate axes simultaneously, provides a powerful way to quickly identify patterns that relate vast numbers of microbial communities We show the potential of Fast UniFrac using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens We show that a Fast UniFrac analysis using a reference tree recaptures patterns that could not be detected without considering phylogenetic relationships and that Fast UniFrac, coupled with BLAST-based sequence assignment, can be used to quickly analyze pyrosequencing runs containing hundreds of thousands of sequences, showing patterns relating human and gut samples Finally, we show that the application of Fast UniFrac to PhyloChip data could identify well-defined subcategories associated with infection Together, these case studies point the way toward a broad range of applications and show some of the new features of Fast UniFrac
Citations
More filters
Journal ArticleDOI
22 Apr 2013-PLOS ONE
TL;DR: The phyloseq project for R is a new open-source software package dedicated to the object-oriented representation and analysis of microbiome census data in R, which supports importing data from a variety of common formats, as well as many analysis techniques.
Abstract: Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

11,272 citations

Journal ArticleDOI
TL;DR: This work sequences a diverse array of 25 environmental samples and three known “mock communities” at a depth averaging 3.1 million reads per sample to demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature.
Abstract: The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In particular, the coevolution of new sequencing platforms and new software tools allows data acquisition and analysis on an unprecedented scale. Here we report the next stage in this coevolutionary arms race, using the Illumina GAIIx platform to sequence a diverse array of 25 environmental samples and three known “mock communities” at a depth averaging 3.1 million reads per sample. We demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature (notably, the saline/nonsaline split in environmental samples and the split between host-associated and free-living communities). We also demonstrate that 2,000 Illumina single-end reads are sufficient to recapture the same relationships among samples that we observe with the full dataset. The results thus open up the possibility of conducting large-scale studies analyzing thousands of samples simultaneously to survey microbial communities at an unprecedented spatial and temporal resolution.

6,767 citations


Cites methods from "Fast UniFrac: facilitating high-thr..."

  • ...We have previously shown, however, that patterns that were observed with de novo tree-making methods could be captured equally well using a “BLAST to reference tree” protocol (21) and then calculating community differences with UniFrac....

    [...]

Journal ArticleDOI
TL;DR: Soils collected across a long-term liming experiment were used to investigate the direct influence of pH on the abundance and composition of the two major soil microbial taxa, fungi and bacteria, and both the relative abundance and diversity of bacteria were positively related to pH.
Abstract: Soils collected across a long-term liming experiment (pH 4.0-8.3), in which variation in factors other than pH have been minimized, were used to investigate the direct influence of pH on the abundance and composition of the two major soil microbial taxa, fungi and bacteria. We hypothesized that bacterial communities would be more strongly influenced by pH than fungal communities. To determine the relative abundance of bacteria and fungi, we used quantitative PCR (qPCR), and to analyze the composition and diversity of the bacterial and fungal communities, we used a bar-coded pyrosequencing technique. Both the relative abundance and diversity of bacteria were positively related to pH, the latter nearly doubling between pH 4 and 8. In contrast, the relative abundance of fungi was unaffected by pH and fungal diversity was only weakly related with pH. The composition of the bacterial communities was closely defined by soil pH; there was as much variability in bacterial community composition across the 180-m distance of this liming experiment as across soils collected from a wide range of biomes in North and South America, emphasizing the dominance of pH in structuring bacterial communities. The apparent direct influence of pH on bacterial community composition is probably due to the narrow pH ranges for optimal growth of bacteria. Fungal community composition was less strongly affected by pH, which is consistent with pure culture studies, demonstrating that fungi generally exhibit wider pH ranges for optimal growth.

2,966 citations


Cites methods from "Fast UniFrac: facilitating high-thr..."

  • ...UniFrac analysis was performed using the Fast UniFrac Web interface, the Silva reference tree and an environment file mapping pyrosequencing reads to Silva reference sequences for each sample as described previously (Hamady et al., 2010)....

    [...]

Journal ArticleDOI
09 Aug 2012-Nature
TL;DR: The data support a relationship between diet, microbiota and health status, and indicate a role for diet-driven microbiota alterations in varying rates of health decline upon ageing.
Abstract: Alterations in intestinal microbiota composition are associated with several chronic conditions, including obesity and inflammatory diseases. The microbiota of older people displays greater inter-individual variation than that of younger adults. Here we show that the faecal microbiota composition from 178 elderly subjects formed groups, correlating with residence location in the community, day-hospital, rehabilitation or in long-term residential care. However, clustering of subjects by diet separated them by the same residence location and microbiota groupings. The separation of microbiota composition significantly correlated with measures of frailty, co-morbidity, nutritional status, markers of inflammation and with metabolites in faecal water. The individual microbiota of people in long-stay care was significantly less diverse than that of community dwellers. Loss of community-associated microbiota correlated with increased frailty. Collectively, the data support a relationship between diet, microbiota and health status, and indicate a role for diet-driven microbiota alterations in varying rates of health decline upon ageing.

2,622 citations

Journal ArticleDOI
TL;DR: It is advocated that investigators avoid rarefying altogether and supported statistical theory is provided that simultaneously accounts for library size differences and biological variability using an appropriate mixture model.
Abstract: Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

2,184 citations


Cites background or methods from "Fast UniFrac: facilitating high-thr..."

  • ...Many early microbiome investigations are variants of Simulation A, and also used rarefying prior to calculating UniFrac distances [27]....

    [...]

  • ...Rarefying is now an exceedingly common precursor to microbiome multivariate workflows that seek to relate sample covariates to sample-wise distance matrices [19,27,28]; for example, integrated as a recommended option in QIIME’s [29] beta_diversity_through_plots....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations


"Fast UniFrac: facilitating high-thr..." refers methods in this paper

  • ...We demonstrate that using BLAST's (Altschul et al., 1990) megablast method to find the nearest neighbor of each short read in an existing library (in this case the Greengenes core set), recaptures the same patterns detected using the parsimony insertion method of ARB, and that these methods can be…...

    [...]

  • ..., 2006) using BLAST’s megablast protocol (Altschul et al., 1990)....

    [...]

  • ...We show below that the analysis of such large sequence sets is possible by assigning them to their closest relative in a phylogeny of the Greengenes core set (DeSantis et al., 2006) using BLAST's megablast protocol (Altschul et al., 1990)....

    [...]

  • ...We demonstrate that using BLAST's (Altschul et al., 1990) megablast method to find the nearest neighbor of each short read in an existing library (in this case the Greengenes core set), recaptures the same patterns detected using the parsimony insertion method of ARB, and that these methods can be applied to pyrosequencing data with hundreds of thousands of sequences....

    [...]

  • ...We show that using BLAST’s (Altschul et al., 1990) megablast method to find the nearest neighbor of each short read in an existing library (in this case the Greengenes core set), recaptures the same patterns detected using the parsimony insertion method of ARB, and that these methods can be applied to pyrosequencing data with hundreds of thousands of sequences....

    [...]

Journal ArticleDOI
21 Dec 2006-Nature
TL;DR: It is demonstrated through metagenomic and biochemical analyses that changes in the relative abundance of the Bacteroidetes and Firmicutes affect the metabolic potential of the mouse gut microbiota and indicates that the obese microbiome has an increased capacity to harvest energy from the diet.
Abstract: The worldwide obesity epidemic is stimulating efforts to identify host and environmental factors that affect energy balance. Comparisons of the distal gut microbiota of genetically obese mice and their lean littermates, as well as those of obese and lean human volunteers have revealed that obesity is associated with changes in the relative abundance of the two dominant bacterial divisions, the Bacteroidetes and the Firmicutes. Here we demonstrate through metagenomic and biochemical analyses that these changes affect the metabolic potential of the mouse gut microbiota. Our results indicate that the obese microbiome has an increased capacity to harvest energy from the diet. Furthermore, this trait is transmissible: colonization of germ-free mice with an 'obese microbiota' results in a significantly greater increase in total body fat than colonization with a 'lean microbiota'. These results identify the gut microbiota as an additional contributing factor to the pathophysiology of obesity.

10,126 citations


"Fast UniFrac: facilitating high-thr..." refers background in this paper

  • ...…Marhaver et al., 2008) assemblages important for understanding human health and disease (Frank et al., 2007; Li et al., 2008; Osman et al., 2008; Turnbaugh et al., 2006; Wen et al., 2008), bioremediation (Hiibel et al., 2008), and basic ecology and evolution (Balakirev et al., 2008; Bryant et…...

    [...]

  • ..., 2008) assemblages important for understanding human health and disease (Turnbaugh et al., 2006; Frank et al., 2007; Li et al., 2008; Osman et al., 2008; Wen et al., 2008), bioremediation (Hiibel et al....

    [...]

Journal ArticleDOI
TL;DR: A 16S rRNA gene database (http://greengenes.lbl.gov) was used to provide chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies as mentioned in this paper.
Abstract: A 16S rRNA gene database (http://greengenes.lbl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.

9,593 citations

Journal ArticleDOI
21 Dec 2006-Nature
TL;DR: It is shown that the relative proportion of Bacteroidetes is decreased in obese people by comparison with lean people, and that this proportion increases with weight loss on two types of low-calorie diet.
Abstract: Two groups of beneficial bacteria are dominant in the human gut, the Bacteroidetes and the Firmicutes. Here we show that the relative proportion of Bacteroidetes is decreased in obese people by comparison with lean people, and that this proportion increases with weight loss on two types of low-calorie diet. Our findings indicate that obesity has a microbial component, which might have potential therapeutic implications.

7,550 citations


"Fast UniFrac: facilitating high-thr..." refers background or methods in this paper

  • ..., 2008b); (2) an analysis of how gut bacterial populations change in obese humans on fat-restricted and carbohydrate-restricted diets (Ley et al., 2006); (3) pyrosequencing studies of the human hand (Fierer et al....

    [...]

  • ...We repeated the UniFrac analysis reported in Figure 1a of (Ley et al., 2006)....

    [...]

  • ...The Global Environment dataset (Ley et al., 2008b) (99,801 sequences), the human obesity dataset (Ley et al., 2006)(18,348 sequences), and all unique pyrosequences from studies of the human hand, and the fecal microbiota of lean and obese twins (Fierer et al., 2008; Turnbaugh et al., 2009) (232,165…...

    [...]

  • ...Hierarchical clustering based on UniFrac analysis of an ARB parsimony insertion tree showed that the bacterial lineages were remarkably constant within individuals over time, because samples from the same person generally clustered with each other rather than with samples from other people (Ley et al., 2006)....

    [...]

  • ..., 2008b)(99 801 sequences), the human obesity dataset (Ley et al., 2006)(18 348 sequences), and all unique pyrosequences from studies of the human hand, and the fecal microbiota of lean and obese twins (Fierer et al....

    [...]

Journal ArticleDOI
22 Jan 2009-Nature
TL;DR: The faecal microbial communities of adult female monozygotic and dizygotic twin pairs concordant for leanness or obesity, and their mothers are characterized to address how host genotype, environmental exposure and host adiposity influence the gut microbiome.
Abstract: The human distal gut harbours a vast ensemble of microbes (the microbiota) that provide important metabolic capabilities, including the ability to extract energy from otherwise indigestible dietary polysaccharides. Studies of a few unrelated, healthy adults have revealed substantial diversity in their gut communities, as measured by sequencing 16S rRNA genes, yet how this diversity relates to function and to the rest of the genes in the collective genomes of the microbiota (the gut microbiome) remains obscure. Studies of lean and obese mice suggest that the gut microbiota affects energy balance by influencing the efficiency of calorie harvest from the diet, and how this harvested energy is used and stored. Here we characterize the faecal microbial communities of adult female monozygotic and dizygotic twin pairs concordant for leanness or obesity, and their mothers, to address how host genotype, environmental exposure and host adiposity influence the gut microbiome. Analysis of 154 individuals yielded 9,920 near full-length and 1,937,461 partial bacterial 16S rRNA sequences, plus 2.14 gigabases from their microbiomes. The results reveal that the human gut microbiome is shared among family members, but that each person's gut microbial community varies in the specific bacterial lineages present, with a comparable degree of co-variation between adult monozygotic and dizygotic twin pairs. However, there was a wide array of shared microbial genes among sampled individuals, comprising an extensive, identifiable 'core microbiome' at the gene, rather than at the organismal lineage, level. Obesity is associated with phylum-level changes in the microbiota, reduced bacterial diversity and altered representation of bacterial genes and metabolic pathways. These results demonstrate that a diversity of organismal assemblages can nonetheless yield a core microbiome at a functional level, and that deviations from this core are associated with different physiological states (obese compared with lean).

6,970 citations


"Fast UniFrac: facilitating high-thr..." refers background or methods in this paper

  • ...…sequences), and all unique pyrosequences from studies of the human hand, and the fecal microbiota of lean and obese twins (Fierer et al., 2008; Turnbaugh et al., 2009) (232,165 unique sequences from 680,000 initial reads) were then searched against the Greengenes core set using megablast....

    [...]

  • ..., 2006)(18 348 sequences), and all unique pyrosequences from studies of the human hand, and the fecal microbiota of lean and obese twins (Fierer et al., 2008; Turnbaugh et al., 2009) (232 165 unique sequences from 680 000 initial reads) were then searched against the Greengenes core set using megablast....

    [...]

  • ...…on fat and carbohydrate restricted diets (Ley et al., 2006) (3) pyrosequencing studies of the human hand (Fierer et al., 2008), and of fecal microbiota of lean and obese twin pairs and their mothers (Turnbaugh et al., 2009), and (4) a PhyloChip study of citrus pathogens (Sagaram et al., 2009)....

    [...]

  • ...…et al., 2007) and related efforts to study microbial communities occupying various human body habitats are revealing a surprising amount of diversity among individuals in skin (Fierer et al., 2008; Grice et al., 2008), gut (Turnbaugh et al., 2009), and mouth ecosystems (Nasidze et al., 2009)....

    [...]

  • ...The levels of intra- and interpersonal variability observed within and between human body habitats (Frank et al., 2007; Fierer et al., 2008; Ley et al., 2008a; Turnbaugh et al., 2009) suggest that large sample sizes, including time series analyses, will be especially critical for understanding whether or not observed community structures are significantly associated with physiologic or pathophysiologic states....

    [...]

Related Papers (5)