scispace - formally typeset
Search or ask a question
Author

Arron Shiffer

Bio: Arron Shiffer is an academic researcher from Northern Arizona University. The author has contributed to research in topics: Microbiome & Raw data. The author has an hindex of 7, co-authored 12 publications receiving 4372 citations.

Papers
More filters
Journal ArticleDOI
Evan Bolyen1, Jai Ram Rideout1, Matthew R. Dillon1, Nicholas A. Bokulich1, Christian C. Abnet2, Gabriel A. Al-Ghalith3, Harriet Alexander4, Harriet Alexander5, Eric J. Alm6, Manimozhiyan Arumugam7, Francesco Asnicar8, Yang Bai9, Jordan E. Bisanz10, Kyle Bittinger11, Asker Daniel Brejnrod7, Colin J. Brislawn12, C. Titus Brown4, Benjamin J. Callahan13, Andrés Mauricio Caraballo-Rodríguez14, John Chase1, Emily K. Cope1, Ricardo Silva14, Christian Diener15, Pieter C. Dorrestein14, Gavin M. Douglas16, Daniel M. Durall17, Claire Duvallet6, Christian F. Edwardson, Madeleine Ernst18, Madeleine Ernst14, Mehrbod Estaki17, Jennifer Fouquier19, Julia M. Gauglitz14, Sean M. Gibbons15, Sean M. Gibbons20, Deanna L. Gibson17, Antonio Gonzalez14, Kestrel Gorlick1, Jiarong Guo21, Benjamin Hillmann3, Susan Holmes22, Hannes Holste14, Curtis Huttenhower23, Curtis Huttenhower24, Gavin A. Huttley25, Stefan Janssen26, Alan K. Jarmusch14, Lingjing Jiang14, Benjamin D. Kaehler27, Benjamin D. Kaehler25, Kyo Bin Kang28, Kyo Bin Kang14, Christopher R. Keefe1, Paul Keim1, Scott T. Kelley29, Dan Knights3, Irina Koester14, Tomasz Kosciolek14, Jorden Kreps1, Morgan G. I. Langille16, Joslynn S. Lee30, Ruth E. Ley31, Ruth E. Ley32, Yong-Xin Liu, Erikka Loftfield2, Catherine A. Lozupone19, Massoud Maher14, Clarisse Marotz14, Bryan D Martin20, Daniel McDonald14, Lauren J. McIver23, Lauren J. McIver24, Alexey V. Melnik14, Jessica L. Metcalf33, Sydney C. Morgan17, Jamie Morton14, Ahmad Turan Naimey1, Jose A. Navas-Molina14, Jose A. Navas-Molina34, Louis-Félix Nothias14, Stephanie B. Orchanian, Talima Pearson1, Samuel L. Peoples20, Samuel L. Peoples35, Daniel Petras14, Mary L. Preuss36, Elmar Pruesse19, Lasse Buur Rasmussen7, Adam R. Rivers37, Michael S. Robeson38, Patrick Rosenthal36, Nicola Segata8, Michael Shaffer19, Arron Shiffer1, Rashmi Sinha2, Se Jin Song14, John R. Spear39, Austin D. Swafford, Luke R. Thompson40, Luke R. Thompson41, Pedro J. Torres29, Pauline Trinh20, Anupriya Tripathi14, Peter J. Turnbaugh10, Sabah Ul-Hasan42, Justin J. J. van der Hooft43, Fernando Vargas, Yoshiki Vázquez-Baeza14, Emily Vogtmann2, Max von Hippel44, William A. Walters32, Yunhu Wan2, Mingxun Wang14, Jonathan Warren45, Kyle C. Weber37, Kyle C. Weber46, Charles H. D. Williamson1, Amy D. Willis20, Zhenjiang Zech Xu14, Jesse R. Zaneveld20, Yilong Zhang47, Qiyun Zhu14, Rob Knight14, J. Gregory Caporaso1 
TL;DR: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and R.K.P. and partial support was also provided by the following: grants NIH U54CA143925 and U54MD012388.
Abstract: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. Partial support was also provided by the following: grants NIH U54CA143925 (J.G.C. and T.P.) and U54MD012388 (J.G.C. and T.P.); grants from the Alfred P. Sloan Foundation (J.G.C. and R.K.); ERCSTG project MetaPG (N.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences QYZDB-SSW-SMC021 (Y.B.); the Australian National Health and Medical Research Council APP1085372 (G.A.H., J.G.C., Von Bing Yap and R.K.); the Natural Sciences and Engineering Research Council (NSERC) to D.L.G.; and the State of Arizona Technology and Research Initiative Fund (TRIF), administered by the Arizona Board of Regents, through Northern Arizona University. All NCI coauthors were supported by the Intramural Research Program of the National Cancer Institute. S.M.G. and C. Diener were supported by the Washington Research Foundation Distinguished Investigator Award.

8,821 citations

Posted ContentDOI
Evan Bolyen1, Jai Ram Rideout1, Matthew R. Dillon1, Nicholas A. Bokulich1, Christian C. Abnet, Gabriel A. Al-Ghalith2, Harriet Alexander3, Harriet Alexander4, Eric J. Alm5, Manimozhiyan Arumugam6, Francesco Asnicar7, Yang Bai8, Jordan E. Bisanz9, Kyle Bittinger10, Asker Daniel Brejnrod6, Colin J. Brislawn11, C. Titus Brown3, Benjamin J. Callahan12, Andrés Mauricio Caraballo-Rodríguez13, John Chase1, Emily K. Cope1, Ricardo Silva13, Pieter C. Dorrestein13, Gavin M. Douglas14, Daniel M. Durall15, Claire Duvallet5, Christian F. Edwardson16, Madeleine Ernst13, Mehrbod Estaki15, Jennifer Fouquier17, Julia M. Gauglitz13, Deanna L. Gibson15, Antonio Gonzalez18, Kestrel Gorlick1, Jiarong Guo19, Benjamin Hillmann2, Susan Holmes20, Hannes Holste18, Curtis Huttenhower21, Curtis Huttenhower22, Gavin A. Huttley23, Stefan Janssen24, Alan K. Jarmusch13, Lingjing Jiang18, Benjamin D. Kaehler23, Kyo Bin Kang25, Kyo Bin Kang13, Christopher R. Keefe1, Paul Keim1, Scott T. Kelley26, Dan Knights2, Irina Koester18, Irina Koester13, Tomasz Kosciolek18, Jorden Kreps1, Morgan G. I. Langille14, Joslynn S. Lee27, Ruth E. Ley28, Ruth E. Ley29, Yong-Xin Liu8, Erikka Loftfield, Catherine A. Lozupone17, Massoud Maher18, Clarisse Marotz18, Bryan D Martin30, Daniel McDonald18, Lauren J. McIver22, Lauren J. McIver21, Alexey V. Melnik13, Jessica L. Metcalf31, Sydney C. Morgan15, Jamie Morton18, Ahmad Turan Naimey1, Jose A. Navas-Molina18, Jose A. Navas-Molina32, Louis-Félix Nothias13, Stephanie B. Orchanian18, Talima Pearson1, Samuel L. Peoples33, Samuel L. Peoples30, Daniel Petras13, Mary L. Preuss34, Elmar Pruesse17, Lasse Buur Rasmussen6, Adam R. Rivers35, Ii Michael S Robeson36, Patrick Rosenthal34, Nicola Segata7, Michael Shaffer17, Arron Shiffer1, Rashmi Sinha, Se Jin Song18, John R. Spear37, Austin D. Swafford18, Luke R. Thompson38, Luke R. Thompson39, Pedro J. Torres26, Pauline Trinh30, Anupriya Tripathi18, Anupriya Tripathi13, Peter J. Turnbaugh9, Sabah Ul-Hasan40, Justin J. J. van der Hooft41, Fernando Vargas18, Yoshiki Vázquez-Baeza18, Emily Vogtmann, Max von Hippel42, William A. Walters29, Yunhu Wan, Mingxun Wang13, Jonathan Warren43, Kyle C. Weber35, Kyle C. Weber44, Chase Hd Williamson1, Amy D. Willis30, Zhenjiang Zech Xu18, Jesse R. Zaneveld30, Yilong Zhang45, Rob Knight18, J. Gregory Caporaso1 
24 Oct 2018-PeerJ
TL;DR: QIIME 2 provides new features that will drive the next generation of microbiome research, including interactive spatial and temporal analysis and visualization tools, support for metabolomics and shotgun metagenomics analysis, and automated data provenance tracking to ensure reproducible, transparent microbiome data science.
Abstract: We present QIIME 2, an open-source microbiome data science platform accessible to users spanning the microbiome research ecosystem, from scientists and engineers to clinicians and policy makers. QIIME 2 provides new features that will drive the next generation of microbiome research. These include interactive spatial and temporal analysis and visualization tools, support for metabolomics and shotgun metagenomics analysis, and automated data provenance tracking to ensure reproducible, transparent microbiome data science.

875 citations

Journal ArticleDOI
Evan Bolyen1, Jai Ram Rideout1, Matthew R. Dillon1, Nicholas A. Bokulich1, Christian C. Abnet2, Gabriel A. Al-Ghalith3, Harriet Alexander4, Harriet Alexander5, Eric J. Alm6, Manimozhiyan Arumugam7, Francesco Asnicar8, Yang Bai9, Jordan E. Bisanz10, Kyle Bittinger11, Asker Daniel Brejnrod7, Colin J. Brislawn12, C. Titus Brown4, Benjamin J. Callahan13, Andrés Mauricio Caraballo-Rodríguez14, John Chase1, Emily K. Cope1, Ricardo Silva14, Christian Diener15, Pieter C. Dorrestein14, Gavin M. Douglas16, Daniel M. Durall17, Claire Duvallet6, Christian F. Edwardson, Madeleine Ernst18, Madeleine Ernst14, Mehrbod Estaki17, Jennifer Fouquier19, Julia M. Gauglitz14, Sean M. Gibbons15, Sean M. Gibbons20, Deanna L. Gibson17, Antonio Gonzalez21, Kestrel Gorlick1, Jiarong Guo22, Benjamin Hillmann3, Susan Holmes23, Hannes Holste21, Curtis Huttenhower24, Curtis Huttenhower25, Gavin A. Huttley26, Stefan Janssen27, Alan K. Jarmusch14, Lingjing Jiang21, Benjamin D. Kaehler26, Benjamin D. Kaehler28, Kyo Bin Kang14, Kyo Bin Kang29, Christopher R. Keefe1, Paul Keim1, Scott T. Kelley30, Dan Knights3, Irina Koester14, Irina Koester21, Tomasz Kosciolek21, Jorden Kreps1, Morgan G. I. Langille16, Joslynn S. Lee31, Ruth E. Ley32, Ruth E. Ley33, Yong-Xin Liu, Erikka Loftfield2, Catherine A. Lozupone19, Massoud Maher21, Clarisse Marotz21, Bryan D Martin20, Daniel McDonald21, Lauren J. McIver24, Lauren J. McIver25, Alexey V. Melnik14, Jessica L. Metcalf34, Sydney C. Morgan17, Jamie Morton21, Ahmad Turan Naimey1, Jose A. Navas-Molina35, Jose A. Navas-Molina21, Louis-Félix Nothias14, Stephanie B. Orchanian, Talima Pearson1, Samuel L. Peoples36, Samuel L. Peoples20, Daniel Petras14, Mary L. Preuss37, Elmar Pruesse19, Lasse Buur Rasmussen7, Adam R. Rivers38, Michael S. Robeson39, Patrick Rosenthal37, Nicola Segata8, Michael Shaffer19, Arron Shiffer1, Rashmi Sinha2, Se Jin Song21, John R. Spear40, Austin D. Swafford, Luke R. Thompson41, Luke R. Thompson42, Pedro J. Torres30, Pauline Trinh20, Anupriya Tripathi21, Anupriya Tripathi14, Peter J. Turnbaugh10, Sabah Ul-Hasan43, Justin J. J. van der Hooft44, Fernando Vargas, Yoshiki Vázquez-Baeza21, Emily Vogtmann2, Max von Hippel45, William A. Walters33, Yunhu Wan2, Mingxun Wang14, Jonathan Warren46, Kyle C. Weber47, Kyle C. Weber38, Charles H. D. Williamson1, Amy D. Willis20, Zhenjiang Zech Xu21, Jesse R. Zaneveld20, Yilong Zhang48, Qiyun Zhu21, Rob Knight21, J. Gregory Caporaso1 
TL;DR: An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Abstract: In the version of this article initially published, some reference citations were incorrect. The three references to Jupyter Notebooks should have cited Kluyver et al. instead of Gonzalez et al. The reference to Qiita should have cited Gonzalez et al. instead of Schloss et al. The reference to mothur should have cited Schloss et al. instead of McMurdie & Holmes. The reference to phyloseq should have cited McMurdie & Holmes instead of Huber et al. The reference to Bioconductor should have cited Huber et al. instead of Franzosa et al. And the reference to the biobakery suite should have cited Franzosa et al. instead of Kluyver et al. The errors have been corrected in the HTML and PDF versions of the article.

301 citations

Journal ArticleDOI
25 Oct 2016
TL;DR: This work presents mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, and outlines its intended expansion and evolve to meet the changing needs of the omics community.
Abstract: Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community.

83 citations

Journal ArticleDOI
TL;DR: The Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments.
Abstract: Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a “foundation” phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, “extension” phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new “extension tree” child. We applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic methods for larger effect sizes. The Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments. The ghost-tree software package can also be used to develop phylogenetic trees for other marker gene sets that afford different taxonomic resolution, or for bridging genome trees with amplicon trees. ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree .

52 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Some notable features of IQ-TREE version 2 are described and the key advantages over other software are highlighted.
Abstract: IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.

4,337 citations

Journal ArticleDOI

3,734 citations

Journal ArticleDOI
TL;DR: The results illustrate the importance of parameter tuning for optimizing classifier performance, and the recommendations regarding parameter choices for these classifiers under a range of standard operating conditions are made.
Abstract: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

2,475 citations

Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations

25 Apr 2017
TL;DR: This presentation is a case study taken from the travel and holiday industry and describes the effectiveness of various techniques as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib).
Abstract: This presentation is a case study taken from the travel and holiday industry. Paxport/Multicom, based in UK and Sweden, have recently adopted a recommendation system for holiday accommodation bookings. Machine learning techniques such as Collaborative Filtering have been applied using Python (3.5.1), with Jupyter (4.0.6) as the main framework. Data scale and sparsity present significant challenges in the case study, and so the effectiveness of various techniques are described as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib). The presentation is suitable for all levels of programmers.

1,338 citations