ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time.
Yunpeng Cai,Yijun Sun +1 more
TLDR
A new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work and exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.Abstract:
Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.read more
Citations
More filters
Book ChapterDOI
Metagenomics for Monitoring Environmental Biodiversity: Challenges, Progress, and Opportunities
Raghu Chandramohan,Raghu Chandramohan,Cheng Yang,Cheng Yang,Yunpeng Cai,May D. Wang,May D. Wang +6 more
TL;DR: This chapter provides an overview of metagenomics covering the major steps involved in data collection, processing, and analysis, and describes and discusses experiment design, sample processing and quality control, sequencing and assembly, annotation, and downstream analyses.
Journal ArticleDOI
The crosstalk of the human microbiome in breast and colon cancer: A metabolomics analysis.
Anirban Goutam Mukherjee,Uddesh Ramesh Wanjari,Pragya Bradu,Reshma Murali,Sandra Kannampuzha,Tamizhini L,G. Priya Doss,Arun Prakash B P,Kaviyarasi Renu,Abhijit Dey,Balachandar Vellingiri,Abilash Valsala Gopalakrishnan +11 more
TL;DR: In this article , the role of the human microbiome and metabolomics interact with breast and colon cancer, and the influence of anti-tumor medications on the microbiota and proactive measures that can be taken to treat cancer using a variety of therapies, including radiotherapy, chemotherapy, next-generation biotherapeutics, gene-based therapy, integrated omics technology, and machine learning.
Posted ContentDOI
OptiClust: Improved method for assigning amplicon-based sequence data to operational taxonomic units
TL;DR: A new OTU assignment algorithm that iteratively reassigns sequences to new OTUs to optimize the Matthews correlation coefficient (MCC), a measure of the quality of OTU assignments is developed, representing a significant advance that is likely to have numerous other applications.
DissertationDOI
Exploring research frontiers in aquatic ecosystems: role of hospital and urban effluents in the dissemination of antibiotic resistance and metals to fresh water ecosystems
TL;DR: In this article, the emerging contaminants as well as the prevalence of antibiotic resistant Pseudomonas spp. in the sediments receiving partially/untreated wastewaters and the effects of contamination on the composition and the diversity of bacterial communities in the sediment were explored.
Journal ArticleDOI
Differences in microbial communities from Quaternary volcanic soils at different stages of development: Evidence from Late Pleistocene and Holocene volcanoes
Jin Chen,Yaxin Zheng,Yuqing Guo,Fansheng Li,Daolong Xu,Lumeng Chao,Hanting Qu,Baojie Wang,Xiaodan Ma,Siyu Wang,Yuying Bao +10 more
TL;DR: In this article, the differences in microbial communities from Quaternary volcanic soils at different stages of development in Inner Mongolia are investigated, with the objective of elucidating their differences.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI
MUSCLE: multiple sequence alignment with high accuracy and high throughput
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Journal ArticleDOI
QIIME allows analysis of high-throughput community sequencing data.
J. Gregory Caporaso,Justin Kuczynski,Jesse Stombaugh,Kyle Bittinger,Frederic D. Bushman,Elizabeth K. Costello,Noah Fierer,Antonio Gonzalez Peña,Julia K. Goodrich,Jeffrey I. Gordon,Gavin A. Huttley,Scott T. Kelley,Dan Knights,Jeremy E. Koenig,Ruth E. Ley,Catherine A. Lozupone,Daniel McDonald,Brian D. Muegge,Meg Pirrung,Jens Reeder,Joel Sevinsky,Peter J. Turnbaugh,William A. Walters,Jeremy Widmann,Tanya Yatsunenko,Jesse R. Zaneveld,Rob Knight,Rob Knight +27 more
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Book
Introduction to Algorithms
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Journal ArticleDOI
Hierarchical Grouping to Optimize an Objective Function
TL;DR: In this paper, a procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical.
Related Papers (5)
Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities
Patrick D. Schloss,Patrick D. Schloss,Sarah L. Westcott,Sarah L. Westcott,Thomas Ryabin,Justine R. Hall,Martin Hartmann,Emily B. Hollister,Ryan A. Lesniewski,Brian B. Oakley,Donovan H. Parks,Courtney J. Robinson,Jason W. Sahl,Blaz Stres,Gerhard G. Thallinger,David J. Van Horn,Carolyn F. Weber +16 more
QIIME allows analysis of high-throughput community sequencing data.
J. Gregory Caporaso,Justin Kuczynski,Jesse Stombaugh,Kyle Bittinger,Frederic D. Bushman,Elizabeth K. Costello,Noah Fierer,Antonio Gonzalez Peña,Julia K. Goodrich,Jeffrey I. Gordon,Gavin A. Huttley,Scott T. Kelley,Dan Knights,Jeremy E. Koenig,Ruth E. Ley,Catherine A. Lozupone,Daniel McDonald,Brian D. Muegge,Meg Pirrung,Jens Reeder,Joel Sevinsky,Peter J. Turnbaugh,William A. Walters,Jeremy Widmann,Tanya Yatsunenko,Jesse R. Zaneveld,Rob Knight,Rob Knight +27 more