Structure, function and diversity of the healthy human microbiome

doi:10.1038/NATURE11234

Home
/
Papers
/
Structure, function and diversity of the healthy human microbiome

Journal Article•DOI•

Structure, function and diversity of the healthy human microbiome

Curtis Huttenhower¹, Curtis Huttenhower², Dirk Gevers², Rob Knight³ +250 more•Institutions (42)

14 Jun 2012-Nature (Nature Publishing Group)-Vol. 486, Iss: 7402, pp 207-214

TL;DR: The Human Microbiome Project Consortium reported the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome as discussed by the authors.

read less

Abstract: The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

DADA2: High-resolution sample inference from Illumina amplicon data

[...]

Benjamin J. Callahan¹, Paul J. McMurdie, Michael J. Rosen¹, Andrew W. Han, Amy Jo A. Johnson, Susan Holmes¹ - Show less +2 more•Institutions (1)

Stanford University¹

01 Jul 2016-Nature Methods

TL;DR: The open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors is presented, revealing a diversity of previously undetected Lactobacillus crispatus variants.

...read moreread less

Abstract: We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.

...read moreread less

14,505 citations

Journal Article•DOI•

UPARSE: highly accurate OTU sequences from microbial amplicon reads

[...]

Robert C. Edgar

01 Oct 2013-Nature Methods

TL;DR: The UPARSE pipeline reports operational taxonomic unit (OTU) sequences with ≤1% incorrect bases in artificial microbial community tests, compared with >3% correct bases commonly reported by other methods.

...read moreread less

Abstract: Amplified marker-gene sequences can be used to understand microbial community structure, but they suffer from a high level of sequencing and amplification artifacts. The UPARSE pipeline reports operational taxonomic unit (OTU) sequences with ≤1% incorrect bases in artificial microbial community tests, compared with >3% incorrect bases commonly reported by other methods. The improved accuracy results in far fewer OTUs, consistently closer to the expected number of species in a community.

...read moreread less

11,329 citations

Journal Article•DOI•

phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.

[...]

Paul J. McMurdie¹, Susan Holmes¹•Institutions (1)

Stanford University¹

22 Apr 2013-PLOS ONE

TL;DR: The phyloseq project for R is a new open-source software package dedicated to the object-oriented representation and analysis of microbiome census data in R, which supports importing data from a variety of common formats, as well as many analysis techniques.

...read moreread less

Abstract: Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

...read moreread less

11,272 citations

Journal Article•DOI•

Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences

[...]

Morgan G. I. Langille¹, Jesse R. Zaneveld², J. Gregory Caporaso³, J. Gregory Caporaso⁴, Daniel McDonald⁵, Dan Knights⁶, Joshua A Reyes⁷, Jose C. Clemente⁵, Deron E. Burkepile⁸, Rebecca Vega Thurber², Rob Knight⁵, Rob Knight⁹, Robert G. Beiko¹, Curtis Huttenhower⁷, Curtis Huttenhower¹⁰ - Show less +11 more•Institutions (10)

Dalhousie University¹, Oregon State University², Argonne National Laboratory³, Northern Arizona University⁴, University of Colorado Boulder⁵, University of Minnesota⁶, Harvard University⁷, Florida International University⁸, Howard Hughes Medical Institute⁹, Broad Institute¹⁰

01 Sep 2013-Nature Biotechnology

TL;DR: The results demonstrate that phylogeny and function are sufficiently linked that this 'predictive metagenomic' approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.

...read moreread less

Abstract: Profiling phylogenetic marker genes, such as the 16S rRNA gene, is a key tool for studies of microbial communities but does not provide direct evidence of a community's functional capabilities. Here we describe PICRUSt (phylogenetic investigation of communities by reconstruction of unobserved states), a computational approach to predict the functional composition of a metagenome using marker gene data and a database of reference genomes. PICRUSt uses an extended ancestral-state reconstruction algorithm to predict which gene families are present and then combines gene families to estimate the composite metagenome. Using 16S information, PICRUSt recaptures key findings from the Human Microbiome Project and accurately predicts the abundance of gene families in host-associated and environmental communities, with quantifiable uncertainty. Our results demonstrate that phylogeny and function are sufficiently linked that this 'predictive metagenomic' approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.

...read moreread less

6,860 citations

Journal Article•DOI•

VSEARCH: a versatile open source tool for metagenomics

[...]

Torbjørn Rognes¹, Torbjørn Rognes², Tomas Flouri³, Tomas Flouri⁴, Ben Nichols⁵, Christopher Quince⁶, Christopher Quince⁵, Frédéric Mahé⁷ - Show less +4 more•Institutions (7)

Oslo University Hospital¹, University of Oslo², Karlsruhe Institute of Technology³, Heidelberg Institute for Theoretical Studies⁴, University of Glasgow⁵, University of Warwick⁶, Kaiserslautern University of Technology⁷

18 Oct 2016-PeerJ

TL;DR: VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with US EARCH for paired-ends read merging and dereplication.

...read moreread less

Abstract: Background: VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods: When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results: VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0. Discussion: VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.

...read moreread less

5,850 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species

[...]

Joseph J. Gillespie¹, Joseph J. Gillespie², Alice R. Wattam¹, Stephen A. Cammer¹, Joseph L. Gabbard¹, Maulik Shukla¹, Oral Dalay¹, Timothy P. Driscoll¹, Deborah Hix¹, Shrinivasrao P. Mane¹, Chunhong Mao¹, Eric K. Nordberg¹, Mark Scott¹, Julie R. Schulman¹, Eric E. Snyder¹, Eric E. Snyder³, Daniel E. Sullivan¹, Chunxia Wang¹, Chunxia Wang⁴, Andrew S. Warren¹, Kelly P. Williams⁵, Kelly P. Williams¹, Tian Xue¹, Hyunseung Yoo¹, Chengdong Zhang¹, Yan Zhang¹, Rebecca Will¹, Ronald W. Kenyon¹, Bruno W. S. Sobral¹ - Show less +25 more•Institutions (5)

Virginia Bioinformatics Institute¹, University of Maryland, Baltimore², SRA International³, Novozymes⁴, Sandia National Laboratories⁵

01 Nov 2011-Infection and Immunity

TL;DR: The major features available at PATRIC are summarized, dividing the resources into two major categories: organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data.

...read moreread less

Abstract: Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided.

...read moreread less

264 citations

Journal Article•DOI•

Analyses of the Microbial Diversity across the Human Microbiome

[...]

Kelvin Li¹, Monika Bihan¹, Shibu Yooseph¹, Barbara A. Methé¹•Institutions (1)

J. Craig Venter Institute¹

13 Jun 2012-PLOS ONE

TL;DR: The results demonstrate the importance, and motivate the introduction, of several visualization and analysis methods tuned specifically for next-generation sequence data, further revealing that low abundant taxa serve as an important reservoir of genetic diversity in the human microbiome.

...read moreread less

Abstract: Analysis of human body microbial diversity is fundamental to understanding community structure, biology and ecology. The National Institutes of Health Human Microbiome Project (HMP) has provided an unprecedented opportunity to examine microbial diversity within and across body habitats and individuals through pyrosequencing-based profiling of 16 S rRNA gene sequences (16 S) from habits of the oral, skin, distal gut, and vaginal body regions from over 200 healthy individuals enabling the application of statistical techniques. In this study, two approaches were applied to elucidate the nature and extent of human microbiome diversity. First, bootstrap and parametric curve fitting techniques were evaluated to estimate the maximum number of unique taxa, Smax, and taxa discovery rate for habitats across individuals. Next, our results demonstrated that the variation of diversity within low abundant taxa across habitats and individuals was not sufficiently quantified with standard ecological diversity indices. This impact from low abundant taxa motivated us to introduce a novel rank-based diversity measure, the Tail statistic, (“τ”), based on the standard deviation of the rank abundance curve if made symmetric by reflection around the most abundant taxon. Due to τ’s greater sensitivity to low abundant taxa, its application to diversity estimation of taxonomic units using taxonomic dependent and independent methods revealed a greater range of values recovered between individuals versus body habitats, and different patterns of diversity within habitats. The greatest range of τ values within and across individuals was found in stool, which also exhibited the most undiscovered taxa. Oral and skin habitats revealed variable diversity patterns, while vaginal habitats were consistently the least diverse. Collectively, these results demonstrate the importance, and motivate the introduction, of several visualization and analysis methods tuned specifically for next-generation sequence data, further revealing that low abundant taxa serve as an important reservoir of genetic diversity in the human microbiome.

...read moreread less

230 citations

Journal Article•DOI•

Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes

[...]

Georgia Giannoukos¹, Dawn Ciulla¹, Katherine H. Huang¹, Brian J. Haas¹, Jacques Izard², Jacques Izard³, Joshua Z. Levin¹, Jonathan Livny¹, Ashlee M. Earl¹, Dirk Gevers¹, Doyle V. Ward¹, Chad Nusbaum¹, Bruce W. Birren¹, Andreas Gnirke¹ - Show less +10 more•Institutions (3)

Broad Institute¹, The Forsyth Institute², Harvard University³

28 Mar 2012-Genome Biology

TL;DR: This work has developed a process for transcriptome analysis of bacterial communities that accommodates both intact and fragmented starting RNA and combines efficient rRNA removal with strand-specific RNA-seq and resulting expression profiles were highly reproducible.

...read moreread less

Abstract: We have developed a process for transcriptome analysis of bacterial communities that accommodates both intact and fragmented starting RNA and combines efficient rRNA removal with strand-specific RNA-seq. We applied this approach to an RNA mixture derived from three diverse cultured bacterial species and to RNA isolated from clinical stool samples. The resulting expression profiles were highly reproducible, enriched up to 40-fold for non-rRNA transcripts, and correlated well with profiles representing undepleted total RNA.

...read moreread less

228 citations

Journal Article•DOI•

PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data

[...]

Thomas J. Sharpton¹, Samantha J. Riesenfeld¹, Steven W. Kembel², Joshua Ladau¹, James P. O'Dwyer², James P. O'Dwyer³, Jessica L. Green², Jonathan A. Eisen⁴, Katherine S. Pollard¹ - Show less +5 more•Institutions (4)

University of California, San Francisco¹, University of Oregon², University of Leeds³, University of California, Davis⁴

20 Jan 2011-PLOS Computational Biology

TL;DR: Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs.

...read moreread less

Abstract: Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity?

...read moreread less

84 citations