Author
Catherine A. Lozupone
Other affiliations: University of Pennsylvania, Colorado State University, Washington University in St. Louis ...read more
Bio: Catherine A. Lozupone is an academic researcher from Anschutz Medical Campus. The author has contributed to research in topics: Microbiome & Gut flora. The author has an hindex of 61, co-authored 129 publications receiving 91999 citations. Previous affiliations of Catherine A. Lozupone include University of Pennsylvania & Colorado State University.
Topics: Microbiome, Gut flora, Human microbiome, UniFrac, Population
Papers published on a yearly basis
Papers
More filters
••
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Abstract: Supplementary Figure 1 Overview of the analysis pipeline. Supplementary Table 1 Details of conventionally raised and conventionalized mouse samples. Supplementary Discussion Expanded discussion of QIIME analyses presented in the main text; Sequencing of 16S rRNA gene amplicons; QIIME analysis notes; Expanded Figure 1 legend; Links to raw data and processed output from the runs with and without denoising.
28,911 citations
••
Northern Arizona University1, National Institutes of Health2, University of Minnesota3, University of California, Davis4, Woods Hole Oceanographic Institution5, Massachusetts Institute of Technology6, University of Copenhagen7, University of Trento8, Chinese Academy of Sciences9, University of California, San Francisco10, University of Pennsylvania11, Pacific Northwest National Laboratory12, North Carolina State University13, University of California, San Diego14, Institute for Systems Biology15, Dalhousie University16, University of British Columbia17, Statens Serum Institut18, Anschutz Medical Campus19, University of Washington20, Michigan State University21, Stanford University22, Harvard University23, Broad Institute24, Australian National University25, University of Düsseldorf26, University of New South Wales27, Sookmyung Women's University28, San Diego State University29, Howard Hughes Medical Institute30, Cornell University31, Max Planck Society32, Colorado State University33, Google34, Syracuse University35, Webster University36, United States Department of Agriculture37, University of Arkansas for Medical Sciences38, Colorado School of Mines39, National Oceanic and Atmospheric Administration40, University of Southern Mississippi41, University of California, Merced42, Wageningen University and Research Centre43, University of Arizona44, Environment Agency45, University of Florida46, Merck & Co.47
TL;DR: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and R.K.P. and partial support was also provided by the following: grants NIH U54CA143925 and U54MD012388.
Abstract: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. Partial support was also provided by the following: grants NIH U54CA143925 (J.G.C. and T.P.) and U54MD012388 (J.G.C. and T.P.); grants from the Alfred P. Sloan Foundation (J.G.C. and R.K.); ERCSTG project MetaPG (N.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences QYZDB-SSW-SMC021 (Y.B.); the Australian National Health and Medical Research Council APP1085372 (G.A.H., J.G.C., Von Bing Yap and R.K.); the Natural Sciences and Engineering Research Council (NSERC) to D.L.G.; and the State of Arizona Technology and Research Initiative Fund (TRIF), administered by the Arizona Board of Regents, through Northern Arizona University. All NCI coauthors were supported by the Intramural Research Program of the National Cancer Institute. S.M.G. and C. Diener were supported by the Washington Research Foundation Distinguished Investigator Award.
8,821 citations
••
TL;DR: The Human Microbiome Project Consortium reported the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome as discussed by the authors.
Abstract: The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome.
8,410 citations
••
TL;DR: This work sequences a diverse array of 25 environmental samples and three known “mock communities” at a depth averaging 3.1 million reads per sample to demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature.
Abstract: The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In particular, the coevolution of new sequencing platforms and new software tools allows data acquisition and analysis on an unprecedented scale. Here we report the next stage in this coevolutionary arms race, using the Illumina GAIIx platform to sequence a diverse array of 25 environmental samples and three known “mock communities” at a depth averaging 3.1 million reads per sample. We demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature (notably, the saline/nonsaline split in environmental samples and the split between host-associated and free-living communities). We also demonstrate that 2,000 Illumina single-end reads are sufficient to recapture the same relationships among samples that we observe with the full dataset. The results thus open up the possibility of conducting large-scale studies analyzing thousands of samples simultaneously to survey microbial communities at an unprecedented spatial and temporal resolution.
6,767 citations
••
TL;DR: The results illustrate that UniFrac provides a new way of characterizing microbial communities, using the wealth of environmental rRNA sequences, and allows quantitative insight into the factors that underlie the distribution of lineages among environments.
Abstract: We introduce here a new method for computing differences between microbial communities based on phylogenetic information. This method, UniFrac, measures the phylogenetic distance between sets of taxa in a phylogenetic tree as the fraction of the branch length of the tree that leads to descendants from either one environment or the other, but not both. UniFrac can be used to determine whether communities are significantly different, to compare many communities simultaneously using clustering and ordination techniques, and to measure the relative contributions of different factors, such as chemistry and geography, to similarities between samples. We demonstrate the utility of UniFrac by applying it to published 16S rRNA gene libraries from cultured isolates and environmental clones of bacteria in marine sediment, water, and ice. Our results reveal that (i) cultured isolates from ice, water, and sediment resemble each other and environmental clone sequences from sea ice, but not environmental clone sequences from sediment and water; (ii) the geographical location does not correlate strongly with bacterial community differences in ice and sediment from the Arctic and Antarctic; and (iii) bacterial communities differ between terrestrially impacted seawater (whether polar or temperate) and warm oligotrophic seawater, whereas those in individual seawater samples are not more similar to each other than to those in sediment or ice samples. These results illustrate that UniFrac provides a new way of characterizing microbial communities, using the wealth of environmental rRNA sequences, and allows quantitative insight into the factors that underlie the distribution of lineages among environments.
6,679 citations
Cited by
More filters
••
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Abstract: Supplementary Figure 1 Overview of the analysis pipeline. Supplementary Table 1 Details of conventionally raised and conventionalized mouse samples. Supplementary Discussion Expanded discussion of QIIME analyses presented in the main text; Sequencing of 16S rRNA gene amplicons; QIIME analysis notes; Expanded Figure 1 legend; Links to raw data and processed output from the runs with and without denoising.
28,911 citations
••
TL;DR: The extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.
Abstract: SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. The referred database release 111 (July 2012) contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Since the initial description of the project, substantial new features have been introduced, including advanced quality control procedures, an improved rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. Furthermore, the extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.
18,256 citations
••
University of Michigan1, University of Massachusetts Amherst2, University of New Mexico3, University of British Columbia4, Texas A&M University5, University of Minnesota6, University of Warwick7, Dalhousie University8, Colorado School of Mines9, University of Ljubljana10, Graz University of Technology11, Louisiana State University12
TL;DR: M mothur is used as a case study to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments.
Abstract: mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the alpha and beta diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.
17,350 citations
••
TL;DR: The RDP Classifier can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes, and the majority of the classification errors appear to be due to anomalies in the current taxonomies.
Abstract: The Ribosomal Database Project (RDP) Classifier, a naive Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (≥95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and V4 variable regions giving the lowest error rates. The RDP Classifier is suitable both for the analysis of single rRNA sequences and for the analysis of libraries of thousands of sequences. Another related tool, RDP Library Compare, was developed to facilitate microbial-community comparison based on 16S rRNA gene sequence libraries. It combines the RDP Classifier with a statistical test to flag taxa differentially represented between samples. The RDP Classifier and RDP Library Compare are available online at http://rdp.cme.msu.edu/.
16,048 citations
••
TL;DR: UNLABELLED RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML) that has been used to compute ML trees on two of the largest alignments to date.
Abstract: Summary: RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with maximum likelihood (ML). Low-level technical optimizations, a modification of the search algorithm, and the use of the GTR+CAT approximation as replacement for GTR+Γ yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data containing 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2--3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date containing 25 057 (1463 bp) and 2182 (51 089 bp) taxa, respectively.
Availability: icwww.epfl.ch/~stamatak
Contact: Alexandros.Stamatakis@epfl.ch
Supplementary information: Supplementary data are available at Bioinformatics online.
14,847 citations