scispace - formally typeset
Search or ask a question
Author

Rob Knight

Bio: Rob Knight is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Microbiome & Gut flora. The author has an hindex of 201, co-authored 1061 publications receiving 253207 citations. Previous affiliations of Rob Knight include Anschutz Medical Campus & University of Sydney.
Topics: Microbiome, Gut flora, Medicine, Metagenomics, Biology


Papers
More filters
Journal ArticleDOI
TL;DR: EMPeror is improved to create interactive animations that connect successive samples to highlight patterns over time, and to visualize ordinations derived from comparisons of these microbiome communities.

66 citations

Journal ArticleDOI
TL;DR: A machine learning approach, MSHub, is engineered to enable auto-deconvolution of gas chromatography–mass spectrometry data and workflows are designed to enable the community to store, process, share, annotate, compare and perform molecular networking of GC–MS data within theGNPS Molecular Networking analysis platform.
Abstract: We engineered a machine learning approach, MSHub, to enable auto-deconvolution of gas chromatography-mass spectrometry (GC-MS) data. We then designed workflows to enable the community to store, process, share, annotate, compare and perform molecular networking of GC-MS data within the Global Natural Product Social (GNPS) Molecular Networking analysis platform. MSHub/GNPS performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples.

65 citations

Journal ArticleDOI
TL;DR: The data suggest that even in arboreal primates that live in small social groups and spend a relatively low proportion of their time in physical contact, social interactions are associated with variation in gut microbiota composition, and demonstrate that within a given host species, subgroups of individuals may interact with the gut microbiota differently.
Abstract: Studies of human and domestic animal models indicate that related individuals and those that spend the most time in physical contact typically have more similar gut microbial communities. However, few studies have examined these factors in wild mammals where complex social dynamics and a variety of interacting environmental factors may impact the patterns observed in controlled systems. Here, we explore the effect of host kinship and time spent in social contact on the gut microbiota of wild, black howler monkeys (Alouatta pigra). Our results indicate that closely related individuals had less similar gut microbial communities than non-related individuals. However, the effect was small. In contrast, as previously reported in baboons and chimpanzees, individuals that spent more time in contact (0 m) and close proximity (0–1 m) had more similar gut microbial communities. This pattern was driven by adult female-adult female dyads, which generally spend more time in social contact than adult male-adult male dyads or adult male-adult female dyads. Relative abundances of individual microbial genera such as Bacteroides, Clostridium, and Streptococcus were also more similar in individuals that spent more time in contact or close proximity. Overall, our data suggest that even in arboreal primates that live in small social groups and spend a relatively low proportion of their time in physical contact, social interactions are associated with variation in gut microbiota composition. Additionally, these results demonstrate that within a given host species, subgroups of individuals may interact with the gut microbiota differently.

65 citations

Journal ArticleDOI
TL;DR: Those who typically experience frequent gastrointestinal symptoms reported significantly less bowel discomfort or diarrhea, significantly less gas or bloating, more regular bowel movements, and better stool consistency when regularly consuming C. reinhardtii.

65 citations

Mingxun Wang, Jeremy Carver, Vanessa V. Phelan, Laura M. Sanchez, Neha Garg, Yao Peng, Don D. Nguyen, Jeramie D. Watrous, Clifford A. Kapono, Tal Luzzatto-Knaan, Carla Porto, Amina Bouslimani, Alexey V. Melnik, Michael J. Meehan, Wei-Ting Liu, Max Crüsemann, Paul D. Boudreau, Eduardo Esquenazi, Mario Sandoval-Calderón, Roland D. Kersten, Laura A. Pace, Robert A. Quinn, Katherine R. Duncan, Cheng-Chih Hsu, Dimitrios J. Floros, Ronnie G. Gavilan, Karin Kleigrewe, Trent R. Northen, Rachel J. Dutton, Delphine Parrot, Erin E. Carlson, Bertrand Aigle, Charlotte Frydenlund Michelsen, Lars Jelsbak, Christian Sohlenkamp, Pavel A. Pevzner, Anna Edlund, Jeffrey S. McLean, Jörn Piel, Brian T. Murphy, Lena Gerwick, Chih-Chuang Liaw, Yu-Liang Yang, Hans-Ulrich Humpf, Maria Maansson, Robert A. Keyzers, Amy C. Sims, Andrew R. Johnson, Ashley M. Sidebottom, Brian E. Sedio, Andreas Klitgaard, Charles B. Larson, Cristopher A. Boya P., Daniel Torres-Mendoza, David Gonzalez, Denise Brentan Silva, Lucas Miranda Marques, Daniel P. Demarque, Egle Pociute, Ellis C. O’Neill, Enora Briand, Eric J. N. Helfrich, Eve A. Granatosky, Evgenia Glukhov, Florian Ryffel, Hailey Houson, Hosein Mohimani, Jenan J. Kharbush, Yi Zeng, Julia A. Vorholt, Kenji L. Kurita, Pep Charusanti, Kerry L. McPhail, Kristian Fog Nielsen, Lisa Vuong, Maryam Elfeki, Matthew F. Traxler, Niclas Engene, Nobuhiro Koyama, Oliver B. Vining, Ralph S. Baric, Ricardo Pianta Rodrigues da Silva, Samantha J. Mascuch, Sophie Tomasi, Stefan Jenkins, Venkat R. Macherla, Thomas Hoffman, Vinayak Agarwal, Philip G. Williams, Jingqui Dai, Ram P. Neupane, Joshua R. Gurr, Andrés M. C. Rodríguez, Anne Lamsa, Chen Zhang, Kathleen Dorrestein, Brendan M. Duggan, Jehad Almaliti, Pierre-Marie Allard, Prasad Phapale, Louis-Félix Nothias, Theodore Alexandrov, Marc Litaudon, Jean-Luc Wolfender, Jennifer E. Kyle, Thomas O. Metz, Tyler Peryea, Dac-Trung Nguyen, Danielle VanLeer, Paul Shinn, Ajit Jadhav, Rolf Müller, Katrina M. Waters, Wenyuan Shi, Xueting Liu, Lixin Zhang, Rob Knight, Paul R. Jensen, Bernhard O. Palsson, Kit Pogliano, Roger G. Linington, Marcelino Gutiérrez, Norberto Peporine Lopes, William H. Gerwick, Bradley S. Moore, Pieter C. Dorrestein, Nuno Bandeira 
01 Jan 2016
TL;DR: The Global Natural Products Social Molecular Networking (GNPS) as discussed by the authors is an open-access knowledge base for community wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data.
Abstract: The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry techniques are well-suited to high-throughput characterization of natural products, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social molecular networking (GNPS, http://gnps.ucsd.edu), an openaccess knowledge base for community wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Datadriven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of ‘living data’ through continuous reanalysis of deposited data. Introduction Natural products (NPs) from marine and terrestrial environments, including their inhabiting microorganisms, plants, animals, and humans, are routinely analyzed using mass spectrometry. However a single mass spectrometry experiment can collect thousands of MS/MS spectra in minutes and individual projects can acquire millions of spectra. These datasets are too large for manual analysis. Further, comprehensive software and proper computational infrastructure are not readily available and only low-throughput sharing of either raw or annotated spectra is feasible, even among members of the same lab. The potentially useful information in MS/MS datasets can thus remain buried in papers, laboratory notebooks, and private databases, hindering retrieval, mining, and sharing of data and knowledge. Although there are several NP databases — Dictionary of Natural Products, AntiBase and MarinLit — that assist in dereplication (identification of known compounds), these resources are not freely available and do not process mass spectrometry data. Conversely, mass spectrometry databases including Massbank, Metlin, mzCloud, and ReSpect host MS/MS spectra but limit data analyses to several individual spectra or a few LC-MS files. While Metlin and mzCloud provide a spectrum search function, unfortunately, their libraries are not freely available. Global genomics and proteomics research has been facilitated by the development of integral resources such as the National Center for Biotechnology Information (NCBI) and UniProt KnowledgeBase (UniProtKB), which provide robust platforms for data sharing and knowledge dissemination. Recognizing the need for an analogous community platform to effectively share and analyze natural products MS data, we present the Global Natural Products Social Molecular Networking (GNPS, available at gnps.ucsd.edu). GNPS is a data-driven platform for the storage, analysis, and knowledge dissemination of MS/MS spectra that enables community sharing of raw spectra, continuous annotation of deposited data, and collaborative curation of reference spectra (referred to as spectral libraries) and experimental data (organized as datasets). GNPS provides the ability to analyze a dataset and to compare it to all publically available data. By building on the computational infrastructure of the University of California San Diego (UCSD) Center for Computational Mass Spectrometry (CCMS), GNPS provides public dataset deposition/retrieval through the Mass Spectrometry Interactive Virtual Environment (MassIVE) data repository. The GNPS analysis infrastructure further enables online dereplication, automated molecular networking analysis, and crowdsourced MS/MS spectrum curation. Each dataset added to the GNPS repository is automatically reanalyzed in the next monthly cycle of continuous identification (see Living Data by Continuous Analysis below). Each of these tens of millions of spectra in GNPS datasets is matched to reference spectral libraries to annotate molecules and to discover putative analogs (Fig. 1a). From January 2014 to November 2015, GNPS has grown to serve 9,267 users from 100 countries (Fig. 1b), with 42,486 analysis sessions that have processed more than 93 million spectra as molecular networks from a quarter million LC-MS runs. Searches against a combined catalog of over 221,000 MS/MS reference library spectra from 18,163 compounds (Supplementary Table 1) are possible, and GNPS has matched almost one hundred million MS/MS spectra in all public and private search jobs using an estimated 84,000 compute hours. GNPS Spectral Libraries GNPS spectral libraries enable dereplication, variable dereplication (approximate matches to spectra of related molecules), and identification of spectra in molecular networks. GNPS has collected available MS/MS spectral libraries relevant to NPs (which also include other metabolites and molecules), including MassBank, ReSpect and NIST (Table 1, Fig. 2a, and Supplementary Table 1). Altogether, these third party libraries total 212,230 MS/MS spectra representing 12,694 unique compounds (Fig. 2b). While this combined collection of reference spectra, provides a starting point for dereplication, only 1.01% of all spectra public GNPS datasets has been matched to this collection, indicating insufficient chemical space coverage. Although the NP community is working to populate this “missing” chemical space, there is no way to report discoveries of chemistries in an easily verifiable and reusable format. To begin to address this pressing need, GNPS houses both newly-acquired reference spectra (GNPS-Collections) as well as a crowd-sourced library of community-contributed reference spectra (GNPS-Community). GNPS-Collections includes NPs and pharmacologically active compounds totaling 6,629 MS/MS spectra of 4,243 compounds (Fig 2b, Supplementary Table 1, Supplementary Note 1,2, and Supplementary Table 2). The GNPS-Community library has grown to include 2,224 MS/MS spectra of 1,325 compounds from 55 worldwide contributors. While the total number of MS/MS spectra in GNPS libraries is only 4% of the MS/MS spectra collected in third party libraries, GNPS libraries contribute matches of MS/MS spectra at a scale disproportionate to their size (Fig. 2c). The GNPS libraries account for 29% of unique compound matches and 59% of the MS/MS matches in public (88% of public+private) data. This indicates that the GNPS libraries contain compounds that are complementary to the chemical space represented in other libraries (Fig. 2c,d). Moreover, in contrast to third party libraries, spectra submitted to GNPS-Community libraries are immediately searchable by the whole community, such that submissions seamlessly transfer knowledge between laboratories (Fig. 1a) in a process that is akin to the addition of genome annotations to GenBank. In order to create a robust library, it is important for submissions to be peer-reviewed and, if necessary, annotations corrected or updated as appropriate. Reference spectra submitted to the GNPS-Community library are categorized by the estimated reliability of the proposed submissions. Gold reference spectra must be derived from structurally characterized synthetic or purified compounds and can only be submitted by approved users. Approval is given to contributors who have undergone training. Training is initiated by contacting the corresponding authors or CCMS administrators. Silver reference spectra need to be supported by an associated publication, while Bronze reference spectra are all remaining putative annotations (Supplementary Table 3). This type of division of spectra is reminiscent of RefSeq/TPA/GenBank (genomics) and Swiss-Prot/TrEMBL/UniProt (proteomics), allowing for varying tradeoffs between comprehensiveness and reliability of annotations defined as Gold, Silver, and Bronze (Fig. 2e). To enable refinements or corrections of annotations, GNPS allows for community-driven, iterative re-annotation of reference MS/MS spectra in a wiki-like fashion, to progressively improve the library and converge towards consensus annotation of all MS/MS spectra of interest. This is a process similar to the iterative annotation of the human genome (e.g., see series of papers on NCBI GenBank). To date, 563 annotation revisions have been made in GNPS (Supplementary Table 4), most of which added metadata to library spectra or refined compound names. The history of each annotation is retained so that users can discuss the proper annotation and address disagreements via comment threads. Dereplication using GNPS High throughput dereplication of NP MS/MS data is implemented in GNPS by querying newly acquired MS/MS spectra against all the accumulated reference spectra in GNPS spectral libraries (Fig. 3a). To date, more than 93 million MS/MS spectra from various instruments (including Orbitrap, Ion Trap, qTof, and FT-ICR) have been searched at GNPS, yielding putative dereplication matches of 7.7 million spectra to 15,477 compounds. In the second stage of dereplication, GNPS goes beyond re-identification by utilizing variable dereplication, which is a modification-tolerant spectral library search that is mediated by a spectral alignment algorithm. Variable dereplication enables the detection of significant matches to either putative analogs of known compounds (e.g., differing by one modification or substitution of a chemical group) or compounds belonging to the same general class of molecules (Fig. 3b).Variable dereplication is not available through any other computational platform. For example, GNPS variable dereplication has detected compounds with different levels of glycosylation on various substrates. As MS/MS fragmentation preferentially results in peaks from glycan fragments, it is possible to detect sets of compounds with related glycans even when the substrates to which the glycans are attached are themselves unrelated. To date, 3,891 putative analogs have been identified in publi

65 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Abstract: Supplementary Figure 1 Overview of the analysis pipeline. Supplementary Table 1 Details of conventionally raised and conventionalized mouse samples. Supplementary Discussion Expanded discussion of QIIME analyses presented in the main text; Sequencing of 16S rRNA gene amplicons; QIIME analysis notes; Expanded Figure 1 legend; Links to raw data and processed output from the runs with and without denoising.

28,911 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: The extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.
Abstract: SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. The referred database release 111 (July 2012) contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Since the initial description of the project, substantial new features have been introduced, including advanced quality control procedures, an improved rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. Furthermore, the extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.

18,256 citations

Journal ArticleDOI
TL;DR: M mothur is used as a case study to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments.
Abstract: mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the alpha and beta diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.

17,350 citations

Journal ArticleDOI
TL;DR: UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Abstract: Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

17,301 citations