ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time.
Yunpeng Cai,Yijun Sun +1 more
Reads0
Chats0
TLDR
A new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work and exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.Abstract:
Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.read more
Citations
More filters
Journal ArticleDOI
A general species delimitation method with applications to phylogenetic placements
TL;DR: The Poisson tree processes (PTP) model is introduced to infer putative species boundaries on a given phylogenetic input tree and yields more accurate results than de novo species delimitation methods.
Journal ArticleDOI
FROGS: Find, Rapidly, OTUs with Galaxy Solution.
Frédéric Escudié,Lucas Auer,Maria Bernard,Mahendra Mariadassou,Laurent Cauquil,Katia Vidal,Sarah Maman,Guillermina Hernandez-Raquet,Sylvie Combes,Géraldine Pascal +9 more
TL;DR: This Galaxy‐supported pipeline, called FROGS, is designed to analyze large sets of amplicon sequences and produce abundance tables of Operational Taxonomic Units (OTUs) and their taxonomic affiliation to highlight databases conflicts and uncertainties.
Journal ArticleDOI
Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences.
A. Murat Eren,Hilary G. Morrison,Pamela Jean Lescault,Julie Reveillaud,Joseph H. Vineis,Mitchell L. Sogin +5 more
TL;DR: Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into ‘MED nodes’, which represent homogeneous operational taxonomic units and enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision.
Journal ArticleDOI
Composition and Similarity of Bovine Rumen Microbiota across Individual Animals
Elie Jami,Itzhak Mizrahi +1 more
TL;DR: Although the bacterial taxa may vary considerably between cow rumens, they appear to be phylogenetically related, which suggests that the functional requirement imposed by the rumen ecological niche selects taxa that potentially share similar genetic features.
Journal ArticleDOI
Updating the 97% identity threshold for 16S ribosomal RNA OTUs.
TL;DR: Using a large set of high‐quality 16S rRNA sequences from finished genomes, the correspondence of OTUs to species is assessed for five representative clustering algorithms using four accuracy metrics and all algorithms had comparable accuracy when tuned to a given metric.
References
More filters
Journal ArticleDOI
ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences
Yijun Sun,Yunpeng Cai,Li Liu,Fahong Yu,Michael L. Farrell,William L. McKendree,William G. Farmerie +6 more
TL;DR: This work proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods for analyzing large collections of 16S ribosomal sequences, and developed two versions, one for personal computers (PCs) and one for computer clusters (CCs).
Journal ArticleDOI
A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis
TL;DR: It is found that existing methods vary widely in their outputs, and that inappropriate use of distance levels for taxonomic assignments likely resulted in substantial overestimates of biodiversity in many studies.
Journal ArticleDOI
Exploring Microbial Diversity Using 16S rRNA High-Throughput Methods
Fabrice Armougom,Didier Raoult +1 more
TL;DR: The short read length produced by next-generation sequencing technology has led to new computational efforts in the taxonomic sequence assignment process, which enables a comparison of microbial community profiles and also contributes to the understanding of the potential impact of a particular microbial community.
Journal ArticleDOI
PANGEA: pipeline for analysis of next generation amplicons.
Adriana Giongo,David B. Crabb,Austin G. Davis-Richardson,Diane Chauliac,Jennifer M. Mobberley,Kelsey A. Gano,Nabanita Mukherjee,George Casella,Luiz Fernando Wurdig Roesch,Brandon Walts,Alberto Riva,Gary M. King,Eric W. Triplett +12 more
TL;DR: The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases by processing sequences obtained directly from the sequencer to provide the files needed for sequence identification by BLAST and for comparison of microbial communities.
Journal ArticleDOI
Alignment and clustering of phylogenetic markers - implications for microbial diversity studies
James R. White,Saket Navlakha,Niranjan Nagarajan,Mohammadreza Ghodsi,Carl Kingsford,Mihai Pop +5 more
TL;DR: This analysis provides strong evidence that the species-level diversity estimates produced using common OTU methodologies are inflated due to overly stringent parameter choices and describes an example of how semi-supervised clustering can produce OTUs that are more robust to changes in algorithm parameters.
Related Papers (5)
Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
Patrick D. Schloss,Patrick D. Schloss,Sarah L. Westcott,Sarah L. Westcott,Thomas Ryabin,Justine R. Hall,Martin Hartmann,Emily B. Hollister,Ryan A. Lesniewski,Brian B. Oakley,Donovan H. Parks,Courtney J. Robinson,Jason W. Sahl,Blaz Stres,Gerhard G. Thallinger,David J. Van Horn,Carolyn F. Weber +16 more
QIIME allows analysis of high-throughput community sequencing data.
J. Gregory Caporaso,Justin Kuczynski,Jesse Stombaugh,Kyle Bittinger,Frederic D. Bushman,Elizabeth K. Costello,Noah Fierer,Antonio Gonzalez Peña,Julia K. Goodrich,Jeffrey I. Gordon,Gavin A. Huttley,Scott T. Kelley,Dan Knights,Jeremy E. Koenig,Ruth E. Ley,Catherine A. Lozupone,Daniel McDonald,Brian D. Muegge,Meg Pirrung,Jens Reeder,Joel Sevinsky,Peter J. Turnbaugh,William A. Walters,Jeremy Widmann,Tanya Yatsunenko,Jesse R. Zaneveld,Rob Knight,Rob Knight +27 more