A high-throughput DNA sequence aligner for microbial ecology studies

doi:10.1371/JOURNAL.PONE.0008230

Open AccessJournal ArticleDOI

A high-throughput DNA sequence aligner for microbial ecology studies

Patrick D. Schloss

- 14 Dec 2009 -

PLOS ONE

- Vol. 4, Iss: 12

TLDR

The aligner described in this study will enable scientists to rapidly generate robust multiple sequences alignments that are implicitly based upon the predicted secondary structure of the 16S rRNA molecule.

Abstract:

As the scope of microbial surveys expands with the parallel growth in sequencing capacity, a significant bottleneck in data analysis is the ability to generate a biologically meaningful multiple sequence alignment. The most commonly used aligners have varying alignment quality and speed, tend to depend on a specific reference alignment, or lack a complete description of the underlying algorithm. The purpose of this study was to create and validate an aligner with the goal of quickly generating a high quality alignment and having the flexibility to use any reference alignment. Using the simple nearest alignment space termination algorithm, the resulting aligner operates in linear time, requires a small memory footprint, and generates a high quality alignment. In addition, the alignments generated for variable regions were of as high a quality as the alignment of full-length sequences. As implemented, the method was able to align 18 full-length 16S rRNA gene sequences and 58 V2 region sequences per second to the 50,000-column SILVA reference alignment. Most importantly, the resulting alignments were of a quality equal to SILVA-generated alignments. The aligner described in this study will enable scientists to rapidly generate robust multiple sequences alignments that are implicitly based upon the predicted secondary structure of the 16S rRNA molecule. Furthermore, because the implementation is not connected to a specific database it is easy to generalize the method to reference alignments for any DNA sequence.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Metagenomic biomarker discovery and explanation

Nicola Segata, +7 more

- 24 Jun 2011 -

Genome Biology

TL;DR: A new method for metagenomic biomarker discovery is described and validates by way of class comparison, tests of biological consistency and effect size estimation to address the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities.

...read moreread less

Journal ArticleDOI

Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform.

James J. Kozich, +4 more

- 01 Sep 2013 -

Applied and Environmental Microbiology

TL;DR: This work presents an improved method for sequencing variable regions within the 16S rRNA gene using Illumina's MiSeq platform, which is currently capable of producing paired 250-nucleotide reads and demonstrates that it can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost.

...read moreread less

Journal ArticleDOI

SINA: accurate high throughput multiple sequence alignment of ribosomal RNA genes

Elmar Pruesse, +2 more

- 15 Jul 2012 -

Bioinformatics

TL;DR: The SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project was evaluated and was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks.

...read moreread less

Journal ArticleDOI

Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies

Patrick D. Schloss, +2 more

- 14 Dec 2011 -

PLOS ONE

TL;DR: Improved quality-filtering pipeline was applied to several benchmarking studies and observed that even with the stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.

...read moreread less

Journal ArticleDOI

Intestinal Domination and the Risk of Bacteremia in Patients Undergoing Allogeneic Hematopoietic Stem Cell Transplantation

Ying Taur, +13 more

- 01 Oct 2012 -

Clinical Infectious Diseases

TL;DR: During allo-HSCT, the diversity and stability of the intestinal flora are disrupted, resulting in domination by bacteria associated with subsequent bacteremia, and assessment of fecal microbiota identifies patients at highest risk for bloodstream infection during allo

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Stephen F. Altschul, +6 more

- 01 Sep 1997 -

Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

Journal ArticleDOI

Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities

Patrick D. Schloss, +16 more

- 01 Dec 2009 -

Applied and Environmental Microbiology

TL;DR: M mothur is used as a case study to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments.

...read moreread less

Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

Saul B. Needleman, +1 more

- 28 Mar 1970 -

Journal of Molecular Biology

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.

...read moreread less

16S/23S rRNA sequencing

D. J. Lane

Journal ArticleDOI

Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB

Todd Z. DeSantis, +9 more

- 01 Jul 2006 -

Applied and Environmental Microbiology

TL;DR: A 16S rRNA gene database (http://greengenes.lbl.gov) was used to provide chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies as mentioned in this paper.

...read moreread less

Collapse

Related Papers (5)

Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy

Qiong Wang, +3 more

- 15 Aug 2007 -

Applied and Environmental Microbiology

UCHIME improves sensitivity and speed of chimera detection

Robert C. Edgar, +4 more

- 01 Aug 2011 -

Bioinformatics

SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB

Elmar Pruesse, +6 more

- 01 Dec 2007 -

Nucleic Acids Research

A high-throughput DNA sequence aligner for microbial ecology studies

Citations

Metagenomic biomarker discovery and explanation

Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform.

SINA: accurate high throughput multiple sequence alignment of ribosomal RNA genes

Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies

Intestinal Domination and the Risk of Bacteremia in Patients Undergoing Allogeneic Hematopoietic Stem Cell Transplantation

References

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities

A general method applicable to the search for similarities in the amino acid sequence of two proteins

16S/23S rRNA sequencing

Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB

Related Papers (5)

Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities

Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy

UCHIME improves sensitivity and speed of chimera detection

SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB

QIIME allows analysis of high-throughput community sequencing data.