scispace - formally typeset
Open AccessJournal ArticleDOI

ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time.

Yunpeng Cai, +1 more
- 01 Aug 2011 - 
- Vol. 39, Iss: 14
Reads0
Chats0
TLDR
A new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work and exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.
Abstract
Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Survival in hostile territory: the microbiota of the stomach.

TL;DR: The available literature about the gastric microbiota in humans and selected model animals is summarized, the methods used in its characterization are discussed, and gaps in knowledge are identified to advance the understanding of the bacterial colonization of the different layers of the Gastric mucosa.
Journal ArticleDOI

Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era

TL;DR: Some recent tools and databases used widely in metagenomic research are reviewed and insights into the current challenges and future of metagenomics from a bioinformatics perspective are given.
Journal ArticleDOI

NG-Tax, a highly accurate and validated pipeline for analysis of 16S rRNA amplicons from complex biomes

TL;DR: NG-Tax demonstrated high robustness against choice of region and other technical biases associated with 16S rRNA gene amplicon sequencing studies, diminishing their impact and providing accurate qualitative and quantitative representation of the true sample composition.
Journal ArticleDOI

Accuracy of microbial community diversity estimated by closed- and open-reference OTUs

Robert C. Edgar
- 04 Oct 2017 - 
TL;DR: Next-generation sequencing of 16S ribosomal RNA is widely used to survey microbial communities, but closed- and open-reference OTU assignment matches reads to a reference database at 97% identity, then clusters unmatched reads using a de novo method (open).
Journal ArticleDOI

A clinician's guide to microbiome analysis.

TL;DR: The major decision points confronting new entrants to the field or for those designing new projects in microbiome research are summarized and recommendations based on current technology options and the experience of sequencing platform choices are provided.
References
More filters
Journal ArticleDOI

Basic Local Alignment Search Tool

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI

MUSCLE: multiple sequence alignment with high accuracy and high throughput

TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Book

Introduction to Algorithms

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Journal ArticleDOI

Hierarchical Grouping to Optimize an Objective Function

TL;DR: In this paper, a procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical.
Related Papers (5)