scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies

01 Jan 2013-Nucleic Acids Research (Oxford University Press)-Vol. 41, Iss: 1, pp 1-11
TL;DR: The results of this study may be used as a guideline for selecting primer pairs with the best overall coverage and phylum spectrum for specific applications, therefore reducing the bias in PCR-based microbial diversity studies.
Abstract: 16S ribosomal RNA gene (rDNA) amplicon analysis remains the standard approach for the cultivation-independent investigation of microbial diversity. The accuracy of these analyses depends strongly on the choice of primers. The overall coverage and phylum spectrum of 175 primers and 512 primer pairs were evaluated in silico with respect to the SILVA 16S/18S rDNA non-redundant reference dataset (SSURef 108 NR). Based on this evaluation a selection of 'best available' primer pairs for Bacteria and Archaea for three amplicon size classes (100-400, 400-1000, ≥ 1000 bp) is provided. The most promising bacterial primer pair (S-D-Bact-0341-b-S-17/S-D-Bact-0785-a-A-21), with an amplicon size of 464 bp, was experimentally evaluated by comparing the taxonomic distribution of the 16S rDNA amplicons with 16S rDNA fragments from directly sequenced metagenomes. The results of this study may be used as a guideline for selecting primer pairs with the best overall coverage and phylum spectrum for specific applications, therefore reducing the bias in PCR-based microbial diversity studies.
Citations
More filters
Journal ArticleDOI
TL;DR: It is demonstrated that contaminating DNA is ubiquitous in commonly used DNA extraction kits and other laboratory reagents, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass.
Abstract: The study of microbial communities has been revolutionised in recent years by the widespread adoption of culture independent analytical techniques such as 16S rRNA gene sequencing and metagenomics. One potential confounder of these sequence-based approaches is the presence of contamination in DNA extraction kits and other laboratory reagents. In this study we demonstrate that contaminating DNA is ubiquitous in commonly used DNA extraction kits and other laboratory reagents, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass. Contamination impacts both PCR-based 16S rRNA gene surveys and shotgun metagenomics. We provide an extensive list of potential contaminating genera, and guidelines on how to mitigate the effects of contamination. These results suggest that caution should be advised when applying sequence-based techniques to the study of microbiota present in low biomass environments. Concurrent sequencing of negative control samples is strongly advised.

2,459 citations


Cites methods from "Evaluation of general 16S ribosomal..."

  • ...Primers used were: S-D-Bact-0564-a-S- 15, 5′AYTGGGYDTAAAGNG and S-D-Bact-0785-b-A-18, 5TACNVGGGTATCTAATCC [65] generating a 253 bp amplicon....

    [...]

Journal ArticleDOI
TL;DR: It is shown that beyond in silico predictions, testing with mock communities and field samples is important in primer selection, and a single mismatch can strongly bias amplification, but even perfectly matched primers can exhibit preferential amplification.
Abstract: Summary Microbial community analysis via high-throughput sequencing of amplified 16S rRNA genes is an essential microbiology tool. We found the popular primer pair 515F (515F-C) and 806R greatly underestimated (e.g. SAR11) or overestimated (e.g. Gammaproteobacteria) common marine taxa. We evaluated marine samples and mock communities (containing 11 or 27 marine 16S clones), showing alternative primers 515F-Y (5′-GTGYCAGCMGCCGCGGTAA) and 926R (5′-CCGYCAATTYMTTTRAGTTT) yield more accurate estimates of mock community abundances, produce longer amplicons that can differentiate taxa unresolvable with 515F-C/806R, and amplify eukaryotic 18S rRNA. Mock communities amplified with 515F-Y/926R yielded closer observed community composition versus expected (r2 = 0.95) compared with 515F-Y/806R (r2 ∼ 0.5). Unexpectedly, biases with 515F-Y/806R against SAR11 in field samples (∼4–10-fold) were stronger than in mock communities (∼2-fold). Correcting a mismatch to Thaumarchaea in the 515F-C increased their apparent abundance in field samples, but not as much as using 926R rather than 806R. With plankton samples rich in eukaryotic DNA (> 1 μm size fraction), 18S sequences averaged ∼17% of all sequences. A single mismatch can strongly bias amplification, but even perfectly matched primers can exhibit preferential amplification. We show that beyond in silico predictions, testing with mock communities and field samples is important in primer selection.

2,077 citations

Journal ArticleDOI
TL;DR: This article proposes rational taxonomic boundaries for high taxa of bacteria and archaea on the basis of 16S rRNA gene sequence identities and suggests a rationale for the circumscription of uncultured taxa that is compatible with the taxonomy of cultured bacteria and Archaea.
Abstract: Publicly available sequence databases of the small subunit ribosomal RNA gene, also known as 16S rRNA in bacteria and archaea, are growing rapidly, and the number of entries currently exceeds 4 million. However, a unified classification and nomenclature framework for all bacteria and archaea does not yet exist. In this Analysis article, we propose rational taxonomic boundaries for high taxa of bacteria and archaea on the basis of 16S rRNA gene sequence identities and suggest a rationale for the circumscription of uncultured taxa that is compatible with the taxonomy of cultured bacteria and archaea. Our analyses show that only nearly complete 16S rRNA sequences give accurate measures of taxonomic diversity. In addition, our analyses suggest that most of the 16S rRNA sequences of the high taxa will be discovered in environmental surveys by the end of the current decade.

1,755 citations

Journal ArticleDOI
TL;DR: New genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, are used to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included.
Abstract: The tree of life is one of the most important organizing principles in biology1. Gene surveys suggest the existence of an enormous number of branches2, but even an approximation of the full scale of the tree has remained elusive. Recent depictions of the tree of life have focused either on the nature of deep evolutionary relationships3–5 or on the known, well-classified diversity of life with an emphasis on eukaryotes6. These approaches overlook the dramatic change in our understanding of life's diversity resulting from genomic sampling of previously unexamined environments. New methods to generate genome sequences illuminate the identity of organisms and their metabolic capacities, placing them in community and ecosystem contexts7,8. Here, we use new genomic data from over 1,000 uncultivated and little known organisms, together with published sequences, to infer a dramatically expanded version of the tree of life, with Bacteria, Archaea and Eukarya included. The depiction is both a global overview and a snapshot of the diversity within each major lineage. The results reveal the dominance of bacterial diversification and underline the importance of organisms lacking isolated representatives, with substantial evolution concentrated in a major radiation of such organisms. This tree highlights major lineages currently underrepresented in biogeochemical models and identifies radiations that are probably important for future evolutionary analyses. An update to the ‘tree of life’ has revealed a dominance of bacterial diversity in many ecosystems and extensive evolution in some branches of the tree. It also highlights how few organisms we have been able to cultivate for further investigation.

1,614 citations

Journal ArticleDOI
29 Apr 2016-Science
TL;DR: Stool consistency showed the largest effect size, whereas medication explained largest total variance and interacted with other covariate-microbiota associations, and proposed disease marker genera associated to host covariates were found associated to microbiota compositional variation with a 92% replication rate.
Abstract: Fecal microbiome variation in the average, healthy population has remained under-investigated. Here, we analyzed two independent, extensively phenotyped cohorts: the Belgian Flemish Gut Flora Project (FGFP; discovery cohort; N = 1106) and the Dutch LifeLines-DEEP study (LLDeep; replication; N = 1135). Integration with global data sets (N combined = 3948) revealed a 14-genera core microbiota, but the 664 identified genera still underexplore total gut diversity. Sixty-nine clinical and questionnaire-based covariates were found associated to microbiota compositional variation with a 92% replication rate. Stool consistency showed the largest effect size, whereas medication explained largest total variance and interacted with other covariate-microbiota associations. Early-life events such as birth mode were not reflected in adult microbiota composition. Finally, we found that proposed disease marker genera associated to host covariates, urging inclusion of the latter in study design.

1,562 citations

References
More filters
Journal ArticleDOI
TL;DR: UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Abstract: Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

17,301 citations

Journal ArticleDOI
TL;DR: Analysis of the genomic DNA from a bacterial biofilm grown under aerobic conditions suggests that sulfate-reducing bacteria, despite their anaerobicity, were present in this environment.
Abstract: We describe a new molecular approach to analyzing the genetic diversity of complex microbial populations. This technique is based on the separation of polymerase chain reaction-amplified fragments of genes coding for 16S rRNA, all the same length, by denaturing gradient gel electrophoresis (DGGE). DGGE analysis of different microbial communities demonstrated the presence of up to 10 distinguishable bands in the separation pattern, which were most likely derived from as many different species constituting these populations, and thereby generated a DGGE profile of the populations. We showed that it is possible to identify constituents which represent only 1% of the total population. With an oligonucleotide probe specific for the V3 region of 16S rRNA of sulfate-reducing bacteria, particular DNA fragments from some of the microbial populations could be identified by hybridization analysis. Analysis of the genomic DNA from a bacterial biofilm grown under aerobic conditions suggests that sulfate-reducing bacteria, despite their anaerobicity, were present in this environment. The results we obtained demonstrate that this technique will contribute to our understanding of the genetic diversity of uncharacterized microbial populations.

11,380 citations


Additional excerpts

  • ...Primer pairs were: (i): S-D-Bact-0341b-S-17, 50-CCTACGGGNGGCWGCAG-30 (32), and S-D-Bact-0785-a-A-21, 50-GACTACHVGGGTATCTA ATCC-3 (32); and (ii): S-D-Bact-0008-a-S-16, 50-AGAG TTTGATCMTGGC-30 (33), and S-D-Bact-0907-a-A-20, 50-CCGTCAATTCMTTTGAGTTT-30 (34)....

    [...]

Journal ArticleDOI
TL;DR: A 16S rRNA gene database (http://greengenes.lbl.gov) was used to provide chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies as mentioned in this paper.
Abstract: A 16S rRNA gene database (http://greengenes.lbl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.

9,593 citations

Journal ArticleDOI
15 Sep 2005-Nature
TL;DR: A scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments with 96% coverage at 99.96% accuracy in one run of the machine is described.
Abstract: The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.

8,434 citations


"Evaluation of general 16S ribosomal..." refers background in this paper

  • ...For example, ‘0338’ stands for start position 338 in the Escherichia coli system of nomenclature (23); (5) A single lowercase letter indicating the version of the probe....

    [...]

  • ...In 2006, Roche’s 454 GS 20 pyrosequencing (5) became the first high-throughput sequencing technology to be successfully applied for large scale biodiversity analysis and was key to uncovering the ‘rare biosphere’ (6)....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the protocol developed for these instruments successfully recaptures known biological results, and additionally that biological conclusions are consistent across sequencing platforms (the HiSeq2000 versus the MiSeq) and across the sequenced regions of amplicons.
Abstract: DNA sequencing continues to decrease in cost with the Illumina HiSeq2000 generating up to 600 Gb of paired-end 100 base reads in a ten-day run. Here we present a protocol for community amplicon sequencing on the HiSeq2000 and MiSeq Illumina platforms, and apply that protocol to sequence 24 microbial communities from host-associated and free-living environments. A critical question as more sequencing platforms become available is whether biological conclusions derived on one platform are consistent with what would be derived on a different platform. We show that the protocol developed for these instruments successfully recaptures known biological results, and additionally that biological conclusions are consistent across sequencing platforms (the HiSeq2000 versus the MiSeq) and across the sequenced regions of amplicons.

6,840 citations


"Evaluation of general 16S ribosomal..." refers background in this paper

  • ...The attractiveness of Illumina (8) lies in the reduced per base costs and comparatively high sequencing depth (9), despite having short read lengths....

    [...]

Related Papers (5)