Author
Arron Shiffer
Bio: Arron Shiffer is an academic researcher from Northern Arizona University. The author has contributed to research in topics: Microbiome & Raw data. The author has an hindex of 7, co-authored 12 publications receiving 4372 citations.
Topics: Microbiome, Raw data, UniFrac, Taxonomic rank, Phylogenetic tree
Papers
More filters
••
Northern Arizona University1, National Institutes of Health2, University of Minnesota3, University of California, Davis4, Woods Hole Oceanographic Institution5, Massachusetts Institute of Technology6, University of Copenhagen7, University of Trento8, Chinese Academy of Sciences9, University of California, San Francisco10, University of Pennsylvania11, Pacific Northwest National Laboratory12, North Carolina State University13, University of California, San Diego14, Institute for Systems Biology15, Dalhousie University16, University of British Columbia17, Statens Serum Institut18, Anschutz Medical Campus19, University of Washington20, Michigan State University21, Stanford University22, Broad Institute23, Harvard University24, Australian National University25, University of Düsseldorf26, University of New South Wales27, Sookmyung Women's University28, San Diego State University29, Howard Hughes Medical Institute30, Cornell University31, Max Planck Society32, Colorado State University33, Google34, Syracuse University35, Webster University36, United States Department of Agriculture37, University of Arkansas for Medical Sciences38, Colorado School of Mines39, University of Southern Mississippi40, National Oceanic and Atmospheric Administration41, University of California, Merced42, Wageningen University and Research Centre43, University of Arizona44, Environment Agency45, University of Florida46, Merck & Co.47
TL;DR: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and R.K.P. and partial support was also provided by the following: grants NIH U54CA143925 and U54MD012388.
Abstract: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. Partial support was also provided by the following: grants NIH U54CA143925 (J.G.C. and T.P.) and U54MD012388 (J.G.C. and T.P.); grants from the Alfred P. Sloan Foundation (J.G.C. and R.K.); ERCSTG project MetaPG (N.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences QYZDB-SSW-SMC021 (Y.B.); the Australian National Health and Medical Research Council APP1085372 (G.A.H., J.G.C., Von Bing Yap and R.K.); the Natural Sciences and Engineering Research Council (NSERC) to D.L.G.; and the State of Arizona Technology and Research Initiative Fund (TRIF), administered by the Arizona Board of Regents, through Northern Arizona University. All NCI coauthors were supported by the Intramural Research Program of the National Cancer Institute. S.M.G. and C. Diener were supported by the Washington Research Foundation Distinguished Investigator Award.
8,821 citations
••
Northern Arizona University1, University of Minnesota2, University of California, Davis3, Woods Hole Oceanographic Institution4, Massachusetts Institute of Technology5, University of Copenhagen6, University of Trento7, Chinese Academy of Sciences8, University of California, San Francisco9, Children's Hospital of Philadelphia10, Pacific Northwest National Laboratory11, North Carolina State University12, University of Montana13, Dalhousie University14, University of British Columbia15, Shedd Aquarium16, University of Colorado Denver17, University of California, San Diego18, Michigan State University19, Stanford University20, Broad Institute21, Harvard University22, Australian National University23, University of Düsseldorf24, Sookmyung Women's University25, San Diego State University26, Howard Hughes Medical Institute27, Cornell University28, Max Planck Society29, University of Washington30, Colorado State University31, Google32, Syracuse University33, Webster University34, United States Department of Agriculture35, University of Arkansas for Medical Sciences36, Colorado School of Mines37, University of Southern Mississippi38, Atlantic Oceanographic and Meteorological Laboratory39, University of California, Merced40, Wageningen University and Research Centre41, University of Arizona42, Environment Agency43, University of Florida44, Merck & Co.45
TL;DR: QIIME 2 provides new features that will drive the next generation of microbiome research, including interactive spatial and temporal analysis and visualization tools, support for metabolomics and shotgun metagenomics analysis, and automated data provenance tracking to ensure reproducible, transparent microbiome data science.
Abstract: We present QIIME 2, an open-source microbiome data science platform accessible to users spanning the microbiome research ecosystem, from scientists and engineers to clinicians and policy makers. QIIME 2 provides new features that will drive the next generation of microbiome research. These include interactive spatial and temporal analysis and visualization tools, support for metabolomics and shotgun metagenomics analysis, and automated data provenance tracking to ensure reproducible, transparent microbiome data science.
875 citations
••
Northern Arizona University1, National Institutes of Health2, University of Minnesota3, University of California, Davis4, Woods Hole Oceanographic Institution5, Massachusetts Institute of Technology6, University of Copenhagen7, University of Trento8, Chinese Academy of Sciences9, University of California, San Francisco10, University of Pennsylvania11, Pacific Northwest National Laboratory12, North Carolina State University13, University of Montana14, Institute for Systems Biology15, Dalhousie University16, University of British Columbia17, Statens Serum Institut18, Anschutz Medical Campus19, University of Washington20, University of California, San Diego21, Michigan State University22, Stanford University23, Harvard University24, Broad Institute25, Australian National University26, University of Düsseldorf27, University of New South Wales28, Sookmyung Women's University29, San Diego State University30, Howard Hughes Medical Institute31, Cornell University32, Max Planck Society33, Colorado State University34, Google35, Syracuse University36, Webster University37, United States Department of Agriculture38, University of Arkansas for Medical Sciences39, Colorado School of Mines40, National Oceanic and Atmospheric Administration41, University of Southern Mississippi42, University of California, Merced43, Wageningen University and Research Centre44, University of Arizona45, Environment Agency46, University of Florida47, Merck & Co.48
TL;DR: An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Abstract: In the version of this article initially published, some reference citations were incorrect. The three references to Jupyter Notebooks should have cited Kluyver et al. instead of Gonzalez et al. The reference to Qiita should have cited Gonzalez et al. instead of Schloss et al. The reference to mothur should have cited Schloss et al. instead of McMurdie & Holmes. The reference to phyloseq should have cited McMurdie & Holmes instead of Huber et al. The reference to Bioconductor should have cited Huber et al. instead of Franzosa et al. And the reference to the biobakery suite should have cited Franzosa et al. instead of Kluyver et al. The errors have been corrected in the HTML and PDF versions of the article.
301 citations
••
25 Oct 2016
TL;DR: This work presents mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, and outlines its intended expansion and evolve to meet the changing needs of the omics community.
Abstract: Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community.
83 citations
••
TL;DR: The Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments.
Abstract: Fungi play critical roles in many ecosystems, cause serious diseases in plants and animals, and pose significant threats to human health and structural integrity problems in built environments. While most fungal diversity remains unknown, the development of PCR primers for the internal transcribed spacer (ITS) combined with next-generation sequencing has substantially improved our ability to profile fungal microbial diversity. Although the high sequence variability in the ITS region facilitates more accurate species identification, it also makes multiple sequence alignment and phylogenetic analysis unreliable across evolutionarily distant fungi because the sequences are hard to align accurately. To address this issue, we created ghost-tree, a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach starts with a “foundation” phylogeny based on one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families). Then, “extension” phylogenies are built for more closely related organisms (e.g., fungal species or strains) using a second more rapidly evolving genetic marker. These smaller phylogenies are then grafted onto the foundation tree by mapping taxonomic names such that each corresponding foundation-tree tip would branch into its new “extension tree” child. We applied ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. Our analysis of simulated and real fungal ITS data sets found that phylogenetic distances between fungal communities computed using ghost-tree phylogenies explained significantly more variance than non-phylogenetic distances. The phylogenetic metrics also improved our ability to distinguish small differences (effect sizes) between microbial communities, though results were similar to non-phylogenetic methods for larger effect sizes. The Silva/UNITE-based ghost tree presented here can be easily integrated into existing fungal analysis pipelines to enhance the resolution of fungal community differences and improve understanding of these communities in built environments. The ghost-tree software package can also be used to develop phylogenetic trees for other marker gene sets that afford different taxonomic resolution, or for bridging genome trees with amplicon trees. ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree
.
52 citations
Cited by
More filters
••
TL;DR: Some notable features of IQ-TREE version 2 are described and the key advantages over other software are highlighted.
Abstract: IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.
4,337 citations
••
TL;DR: The results illustrate the importance of parameter tuning for optimizing classifier performance, and the recommendations regarding parameter choices for these classifiers under a range of standard operating conditions are made.
Abstract: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier (
https://github.com/qiime2/q2-feature-classifier
), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated “novel” marker-gene sequences, are available in our extensible benchmarking framework, tax-credit (
https://github.com/caporaso-lab/tax-credit-data
). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
2,475 citations
•
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.
2,436 citations
25 Apr 2017
TL;DR: This presentation is a case study taken from the travel and holiday industry and describes the effectiveness of various techniques as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib).
Abstract: This presentation is a case study taken from the travel and holiday industry. Paxport/Multicom, based in UK and Sweden, have recently adopted a recommendation system for holiday accommodation bookings. Machine learning techniques such as Collaborative Filtering have been applied using Python (3.5.1), with Jupyter (4.0.6) as the main framework. Data scale and sparsity present significant challenges in the case study, and so the effectiveness of various techniques are described as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib). The presentation is suitable for all levels of programmers.
1,338 citations