scispace - formally typeset
Open AccessJournal ArticleDOI

ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time.

Yunpeng Cai, +1 more
- 01 Aug 2011 - 
- Vol. 39, Iss: 14
Reads0
Chats0
TLDR
A new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work and exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.
Abstract
Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A general species delimitation method with applications to phylogenetic placements

TL;DR: The Poisson tree processes (PTP) model is introduced to infer putative species boundaries on a given phylogenetic input tree and yields more accurate results than de novo species delimitation methods.
Journal ArticleDOI

FROGS: Find, Rapidly, OTUs with Galaxy Solution.

TL;DR: This Galaxy‐supported pipeline, called FROGS, is designed to analyze large sets of amplicon sequences and produce abundance tables of Operational Taxonomic Units (OTUs) and their taxonomic affiliation to highlight databases conflicts and uncertainties.
Journal ArticleDOI

Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences.

TL;DR: Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into ‘MED nodes’, which represent homogeneous operational taxonomic units and enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision.
Journal ArticleDOI

Composition and Similarity of Bovine Rumen Microbiota across Individual Animals

TL;DR: Although the bacterial taxa may vary considerably between cow rumens, they appear to be phylogenetically related, which suggests that the functional requirement imposed by the rumen ecological niche selects taxa that potentially share similar genetic features.
Journal ArticleDOI

Updating the 97% identity threshold for 16S ribosomal RNA OTUs.

Robert C. Edgar
- 15 Jul 2018 - 
TL;DR: Using a large set of high‐quality 16S rRNA sequences from finished genomes, the correspondence of OTUs to species is assessed for five representative clustering algorithms using four accuracy metrics and all algorithms had comparable accuracy when tuned to a given metric.
References
More filters
Journal ArticleDOI

Microbial diversity in the deep sea and the underexplored “rare biosphere”

TL;DR: It is shown that bacterial communities of deep water masses of the North Atlantic and diffuse flow hydrothermal vents are one to two orders of magnitude more complex than previously reported for any microbial environment.
Journal ArticleDOI

Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness

TL;DR: A computer program, DOTUR, is developed, which assigns sequences to OTUs by using either the furthest, average, or nearest neighbor algorithm for each distance level, which addresses the challenge of assigning sequences to operational taxonomic units (OTUs) based on the genetic distances between sequences.
Journal ArticleDOI

The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing

TL;DR: Ciprofloxacin treatment influenced the abundance of about a third of the bacterial taxa in the gut, decreasing the taxonomic richness, diversity, and evenness of the community, and support the hypothesis of functional redundancy in the human gut microbiota.
Book

Algorithm Design

Jon Kleinberg, +1 more
TL;DR: Algorithm Design introduces algorithms by looking at the real-world problems that motivate them and encourages an understanding of the algorithm design process and an appreciation of the role of algorithms in the broader field of computer science.
Related Papers (5)