Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin
Nicholas A. Bokulich,Benjamin D. Kaehler,Jai Ram Rideout,Matthew R. Dillon,Evan Bolyen,Rob Knight,Gavin A. Huttley,J. Gregory Caporaso +7 more
TLDR
The results illustrate the importance of parameter tuning for optimizing classifier performance, and the recommendations regarding parameter choices for these classifiers under a range of standard operating conditions are made.Abstract:
Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier (
https://github.com/qiime2/q2-feature-classifier
), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (
https://github.com/caporaso-lab/tax-credit-data
). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.read more
Citations
More filters
Journal ArticleDOI
Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2
Evan Bolyen,Jai Ram Rideout,Matthew R. Dillon,Nicholas A. Bokulich,Christian C. Abnet,Gabriel A. Al-Ghalith,Harriet Alexander,Harriet Alexander,Eric J. Alm,Manimozhiyan Arumugam,Francesco Asnicar,Yang Bai,Jordan E. Bisanz,Kyle Bittinger,Asker Daniel Brejnrod,Colin J. Brislawn,C. Titus Brown,Benjamin J. Callahan,Andrés Mauricio Caraballo-Rodríguez,John Chase,Emily K. Cope,Ricardo Silva,Christian Diener,Pieter C. Dorrestein,Gavin M. Douglas,Daniel M. Durall,Claire Duvallet,Christian F. Edwardson,Madeleine Ernst,Madeleine Ernst,Mehrbod Estaki,Jennifer Fouquier,Julia M. Gauglitz,Sean M. Gibbons,Sean M. Gibbons,Deanna L. Gibson,Antonio Gonzalez,Kestrel Gorlick,Jiarong Guo,Benjamin Hillmann,Susan Holmes,Hannes Holste,Curtis Huttenhower,Curtis Huttenhower,Gavin A. Huttley,Stefan Janssen,Alan K. Jarmusch,Lingjing Jiang,Benjamin D. Kaehler,Benjamin D. Kaehler,Kyo Bin Kang,Kyo Bin Kang,Christopher R. Keefe,Paul Keim,Scott T. Kelley,Dan Knights,Irina Koester,Tomasz Kosciolek,Jorden Kreps,Morgan G. I. Langille,Joslynn S. Lee,Ruth E. Ley,Ruth E. Ley,Yong-Xin Liu,Erikka Loftfield,Catherine A. Lozupone,Massoud Maher,Clarisse Marotz,Bryan D Martin,Daniel McDonald,Lauren J. McIver,Lauren J. McIver,Alexey V. Melnik,Jessica L. Metcalf,Sydney C. Morgan,Jamie Morton,Ahmad Turan Naimey,Jose A. Navas-Molina,Jose A. Navas-Molina,Louis-Félix Nothias,Stephanie B. Orchanian,Talima Pearson,Samuel L. Peoples,Samuel L. Peoples,Daniel Petras,Mary L. Preuss,Elmar Pruesse,Lasse Buur Rasmussen,Adam R. Rivers,Michael S. Robeson,Patrick Rosenthal,Nicola Segata,Michael Shaffer,Arron Shiffer,Rashmi Sinha,Se Jin Song,John R. Spear,Austin D. Swafford,Luke R. Thompson,Luke R. Thompson,Pedro J. Torres,Pauline Trinh,Anupriya Tripathi,Peter J. Turnbaugh,Sabah Ul-Hasan,Justin J. J. van der Hooft,Fernando Vargas,Yoshiki Vázquez-Baeza,Emily Vogtmann,Max von Hippel,William A. Walters,Yunhu Wan,Mingxun Wang,Jonathan Warren,Kyle C. Weber,Kyle C. Weber,Charles H. D. Williamson,Amy D. Willis,Zhenjiang Zech Xu,Jesse R. Zaneveld,Yilong Zhang,Qiyun Zhu,Rob Knight,J. Gregory Caporaso +123 more
TL;DR: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and R.K.P. and partial support was also provided by the following: grants NIH U54CA143925 and U54MD012388.
Journal ArticleDOI
Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice
Gil Sharon,Nikki Jamie Cruz,Dae Wook Kang,Michael J. Gandal,B. D. Wang,Young-Mo Kim,Erika M. Zink,Cameron P. Casey,Bryn C. Taylor,Christianne J. Lane,Lisa M. Bramer,Nancy G. Isern,David W. Hoyt,Cecilia Noecker,Michael J. Sweredoski,Annie Moradian,Elhanan Borenstein,Janet K. Jansson,Rob Knight,Thomas O. Metz,Carlos Lois,Daniel H. Geschwind,Rosa Krajmalnik-Brown,Sarkis K. Mazmanian +23 more
TL;DR: It is proposed that the gut microbiota regulates behaviors in mice via production of neuroactive metabolites, suggesting that gut-brain connections contribute to the pathophysiology of ASD.
Journal ArticleDOI
Bile acid metabolites control TH17 and Treg cell differentiation
Saiyu Hang,Donggi Paik,Lina Yao,Eunha Kim,Jamma Trinath,Jingping Lu,Soyoung Ha,Brandon N. Nelson,Samantha P. Kelly,Lin Wu,Ye Zheng,Randy S. Longman,Fraydoon Rastinejad,A. Sloan Devlin,Michael R. Krout,Michael A. Fischbach,Dan R. Littman,Dan R. Littman,Jun R. Huh,Jun R. Huh +19 more
TL;DR: Two derivatives of lithocholic acid are revealed that act as regulators of T helper cells that express IL-17a and regulatory T cells, thus influencing host immune responses.
Evaluación de la diversidad taxonómica y funcional de la comunidad microbiana relacionada con el ciclo del nitrógeno en suelos de cultivo de arroz con diferentes manejos del tamo
Carreño Carreño,Jibda del Pilar +1 more
TL;DR: The impact of the quema de arroz on the microorganismos edaficos in the disponibilidad and ciclaje de nutrientes is poco conocido, es por esto que el retorno de los residuos vegetales al suelo ha been propuesto como una alternativa de manejo eficiente de los residentes pos-cosecha.
Journal ArticleDOI
Long-term benefit of Microbiota Transfer Therapy on autism symptoms and gut microbiota
Dae Wook Kang,Dae Wook Kang,James B. Adams,Devon M. Coleman,Elena L. Pollard,Juan Maldonado,Sharon McDonough-Means,J. Gregory Caporaso,Rosa Krajmalnik-Brown +8 more
TL;DR: The observations demonstrate the long-term safety and efficacy of MTT as a potential therapy to treat children with ASD who have GI problems, and warrant a double-blind, placebo-controlled trial in the future.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal Article
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +15 more
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Journal ArticleDOI
QIIME allows analysis of high-throughput community sequencing data.
J. Gregory Caporaso,Justin Kuczynski,Jesse Stombaugh,Kyle Bittinger,Frederic D. Bushman,Elizabeth K. Costello,Noah Fierer,Antonio Gonzalez Peña,Julia K. Goodrich,Jeffrey I. Gordon,Gavin A. Huttley,Scott T. Kelley,Dan Knights,Jeremy E. Koenig,Ruth E. Ley,Catherine A. Lozupone,Daniel McDonald,Brian D. Muegge,Meg Pirrung,Jens Reeder,Joel Sevinsky,Peter J. Turnbaugh,William A. Walters,Jeremy Widmann,Tanya Yatsunenko,Jesse R. Zaneveld,Rob Knight,Rob Knight +27 more
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Posted Content
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Andreas Müller,Joel Nothman,Gilles Louppe,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +18 more
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Journal ArticleDOI
Search and clustering orders of magnitude faster than BLAST
TL;DR: UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Related Papers (5)
FastTree 2--approximately maximum-likelihood trees for large alignments.
phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.
Paul J. McMurdie,Susan Holmes +1 more
QIIME allows analysis of high-throughput community sequencing data.
J. Gregory Caporaso,Justin Kuczynski,Jesse Stombaugh,Kyle Bittinger,Frederic D. Bushman,Elizabeth K. Costello,Noah Fierer,Antonio Gonzalez Peña,Julia K. Goodrich,Jeffrey I. Gordon,Gavin A. Huttley,Scott T. Kelley,Dan Knights,Jeremy E. Koenig,Ruth E. Ley,Catherine A. Lozupone,Daniel McDonald,Brian D. Muegge,Meg Pirrung,Jens Reeder,Joel Sevinsky,Peter J. Turnbaugh,William A. Walters,Jeremy Widmann,Tanya Yatsunenko,Jesse R. Zaneveld,Rob Knight,Rob Knight +27 more