scispace - formally typeset
Search or ask a question
Journal ArticleDOI

UPARSE: highly accurate OTU sequences from microbial amplicon reads

01 Oct 2013-Nature Methods (Nature Publishing Group)-Vol. 10, Iss: 10, pp 996-998
TL;DR: The UPARSE pipeline reports operational taxonomic unit (OTU) sequences with ≤1% incorrect bases in artificial microbial community tests, compared with >3% correct bases commonly reported by other methods.
Abstract: Amplified marker-gene sequences can be used to understand microbial community structure, but they suffer from a high level of sequencing and amplification artifacts. The UPARSE pipeline reports operational taxonomic unit (OTU) sequences with ≤1% incorrect bases in artificial microbial community tests, compared with >3% incorrect bases commonly reported by other methods. The improved accuracy results in far fewer OTUs, consistently closer to the expected number of species in a community.
Citations
More filters
Journal ArticleDOI
TL;DR: The open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors is presented, revealing a diversity of previously undetected Lactobacillus crispatus variants.
Abstract: We present the open-source software package DADA2 for modeling and correcting Illumina-sequenced amplicon errors (https://github.com/benjjneb/dada2). DADA2 infers sample sequences exactly and resolves differences of as little as 1 nucleotide. In several mock communities, DADA2 identified more real variants and output fewer spurious sequences than other methods. We applied DADA2 to vaginal samples from a cohort of pregnant women, revealing a diversity of previously undetected Lactobacillus crispatus variants.

14,505 citations

Journal ArticleDOI
18 Oct 2016-PeerJ
TL;DR: VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with US EARCH for paired-ends read merging and dereplication.
Abstract: Background: VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods: When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results: VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0. Discussion: VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.

5,850 citations


Cites methods from "UPARSE: highly accurate OTU sequenc..."

  • ...QIIME and UPARSE are both based on USEARCH (Edgar, 2010), a set of tools designed and implemented by Robert C. Edgar, and available at http://drive5.com/usearch/....

    [...]

  • ...We have not prioritized commands related to amino acid sequences (findorfs), local alignment (allpairs_local, pairs_local, search_local, ublast ), brute-force search (search_global, pairs_global), UDB databases (makeudb_ublast, makeudb_usearch, udb2fasta, udbinfo, udbstats), and the UPARSE pipeline (cluster_otus, uparse_ref )....

    [...]

  • ...Several pipelines have been developed for microbiome analysis, among which mothur (Schloss et al., 2009), QIIME (Caporaso et al., 2010), and UPARSE (Edgar, 2013) are the most popular....

    [...]

Journal ArticleDOI
05 Jan 2018-Science
TL;DR: Examination of the oral and gut microbiome of melanoma patients undergoing anti-programmed cell death 1 protein (PD-1) immunotherapy suggested enhanced systemic and antitumor immunity in responding patients with a favorable gut microbiome as well as in germ-free mice receiving fecal transplants from responding patients.
Abstract: Preclinical mouse models suggest that the gut microbiome modulates tumor response to checkpoint blockade immunotherapy; however, this has not been well-characterized in human cancer patients. Here we examined the oral and gut microbiome of melanoma patients undergoing anti-programmed cell death 1 protein (PD-1) immunotherapy (n = 112). Significant differences were observed in the diversity and composition of the patient gut microbiome of responders versus nonresponders. Analysis of patient fecal microbiome samples (n = 43, 30 responders, 13 nonresponders) showed significantly higher alpha diversity (P < 0.01) and relative abundance of bacteria of the Ruminococcaceae family (P < 0.01) in responding patients. Metagenomic studies revealed functional differences in gut bacteria in responders, including enrichment of anabolic pathways. Immune profiling suggested enhanced systemic and antitumor immunity in responding patients with a favorable gut microbiome as well as in germ-free mice receiving fecal transplants from responding patients. Together, these data have important implications for the treatment of melanoma patients with immune checkpoint inhibitors.

2,791 citations

Journal ArticleDOI
09 Apr 2015-Cell
TL;DR: It is demonstrated that Indigenous spore-forming bacteria from the mouse and human microbiota promote 5-HT biosynthesis from colonic enterochromaffin cells (ECs), which supply 5- HT to the mucosa, lumen, and circulating platelets and elevating luminal concentrations of particular microbial metabolites increases colonic and blood5-HT in germ-free mice.

2,047 citations

Journal ArticleDOI
19 Nov 2015-Cell
TL;DR: A machine-learning algorithm is devised that integrates blood parameters, dietary habits, anthropometrics, physical activity, and gut microbiota measured in an 800-person cohort and shows that it accurately predicts personalized postprandial glycemic response to real-life meals, and a blinded randomized controlled dietary intervention based on this algorithm resulted in significantly lower postpr andial responses and consistent alterations to gut microbiota configuration.

1,748 citations


Cites background from "UPARSE: highly accurate OTU sequenc..."

  • ...0 (Edgar, 2013) to obtain RA from 16S rRNA reads....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Abstract: Supplementary Figure 1 Overview of the analysis pipeline. Supplementary Table 1 Details of conventionally raised and conventionalized mouse samples. Supplementary Discussion Expanded discussion of QIIME analyses presented in the main text; Sequencing of 16S rRNA gene amplicons; QIIME analysis notes; Expanded Figure 1 legend; Links to raw data and processed output from the runs with and without denoising.

28,911 citations

Journal ArticleDOI
TL;DR: UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Abstract: Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

17,301 citations

Journal ArticleDOI
TL;DR: UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences, and in testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus.
Abstract: Motivation: Chimeric DNA sequences often form during polymerase chain reaction amplification, especially when sequencing single regions (e.g. 16S rRNA or fungal Internal Transcribed Spacer) to assess diversity or compare populations. Undetected chimeras may be misinterpreted as novel species, causing inflated estimates of diversity and spurious inferences of differences between populations. Detection and removal of chimeras is therefore of critical importance in such experiments. Results: We describe UCHIME, a new program that detects chimeric sequences with two or more segments. UCHIME either uses a database of chimera-free sequences or detects chimeras de novo by exploiting abundance data. UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences. In testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus. UCHIME is >100× faster than Perseus and >1000× faster than ChimeraSlayer. Contact: [email protected] Availability: Source, binaries and data: http://drive5.com/uchime. Supplementary information:Supplementary data are available at Bioinformatics online.

11,904 citations

Journal ArticleDOI
TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.
Abstract: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.

9,604 citations

Journal ArticleDOI
TL;DR: A 16S rRNA gene database (http://greengenes.lbl.gov) was used to provide chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies as mentioned in this paper.
Abstract: A 16S rRNA gene database (http://greengenes.lbl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.

9,593 citations