scispace - formally typeset
Search or ask a question
Author

Rob Knight

Bio: Rob Knight is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Microbiome & Gut flora. The author has an hindex of 201, co-authored 1061 publications receiving 253207 citations. Previous affiliations of Rob Knight include Anschutz Medical Campus & University of Sydney.
Topics: Microbiome, Gut flora, Medicine, Metagenomics, Biology


Papers
More filters
Journal ArticleDOI
16 Feb 2018-Genes
TL;DR: This work explores how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types, gene markers, and taxonomic levels, and whether particular suites of indicator microbes were informative across different datasets.
Abstract: Death investigations often include an effort to establish the postmortem interval (PMI) in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness statements and suspect alibis. Recent research has demonstrated that microbes provide an accurate clock that starts at death and relies on ecological change in the microbial communities that normally inhabit a body and its surrounding environment. Here, we explore how to build the most robust Random Forest regression models for prediction of PMI by testing models built on different sample types (gravesoil, skin of the torso, skin of the head), gene markers (16S ribosomal RNA (rRNA), 18S rRNA, internal transcribed spacer regions (ITS)), and taxonomic levels (sequence variants, species, genus, etc.). We also tested whether particular suites of indicator microbes were informative across different datasets. Generally, results indicate that the most accurate models for predicting PMI were built using gravesoil and skin data using the 16S rRNA genetic marker at the taxonomic level of phyla. Additionally, several phyla consistently contributed highly to model accuracy and may be candidate indicators of PMI.

69 citations

Journal ArticleDOI
26 Dec 2017
TL;DR: This work shows by using a combination of simulations and reanalysis of nine real-world microbiome data sets that this new method outperforms existing methods at the differential abundance testing task, producing a false-discovery rate that is up to threefold more accurate, and halves the number of samples required to find a given difference.
Abstract: Differential abundance testing is a critical task in microbiome studies that is complicated by the sparsity of data matrices. Here we adapt for microbiome studies a solution from the field of gene expression analysis to produce a new method, discrete false-discovery rate (DS-FDR), that greatly improves the power to detect differential taxa by exploiting the discreteness of the data. Additionally, DS-FDR is relatively robust to the number of noninformative features, and thus removes the problem of filtering taxonomy tables by an arbitrary abundance threshold. We show by using a combination of simulations and reanalysis of nine real-world microbiome data sets that this new method outperforms existing methods at the differential abundance testing task, producing a false-discovery rate that is up to threefold more accurate, and halves the number of samples required to find a given difference (thus increasing the efficiency of microbiome experiments considerably). We therefore expect DS-FDR to be widely applied in microbiome studies. IMPORTANCE DS-FDR can achieve higher statistical power to detect significant findings in sparse and noisy microbiome data compared to the commonly used Benjamini-Hochberg procedure and other FDR-controlling procedures.

69 citations

Journal ArticleDOI
TL;DR: Men with higher levels of 1,25( OH)2D and higher activation ratios, but not 25(OH)D itself, are more likely to possess butyrate producing bacteria that are associated with better gut microbial health.
Abstract: The vitamin D receptor is highly expressed in the gastrointestinal tract where it transacts gene expression. With current limited understanding of the interactions between the gut microbiome and vitamin D, we conduct a cross-sectional analysis of 567 older men quantifying serum vitamin D metabolites using LC-MSMS and defining stool sub-Operational Taxonomic Units from16S ribosomal RNA gene sequencing data. Faith’s Phylogenetic Diversity and non-redundant covariate analyses reveal that the serum 1,25(OH)2D level explains 5% of variance in α-diversity. In β-diversity analyses using unweighted UniFrac, 1,25(OH)2D is the strongest factor assessed, explaining 2% of variance. Random forest analyses identify 12 taxa, 11 in the phylum Firmicutes, eight of which are positively associated with either 1,25(OH)2D and/or the hormone-to-prohormone [1,25(OH)2D/25(OH)D] “activation ratio.” Men with higher levels of 1,25(OH)2D and higher activation ratios, but not 25(OH)D itself, are more likely to possess butyrate producing bacteria that are associated with better gut microbial health. Here, the authors investigate associations of vitamin D metabolites with gut microbiome in a cross-sectional analysis of 567 elderly men enrolled in the Osteoporotic Fractures in Men (MrOS) Study and find larger alpha-diversity correlates with high 1,25(OH)2D and high 24,25(OH)2D and higher ratios of activation and catabolism.

68 citations

Journal ArticleDOI
TL;DR: Estimates of technical reproducibility, stability at ambient temperature for 4 days, and accuracy comparing a “gold standard” for fecal samples in no solution, 95% ethanol, RNAlater, postdevelopment fecal occult blood test cards, and fecal immunochemical test tubes in a study conducted in Bangladesh are presented.
Abstract: To our knowledge, fecal microbiota collection methods have not been evaluated in low- and middle-income countries. Therefore, we evaluated five different fecal sample collection methods for technical reproducibility, stability, and accuracy within the Health Effects of Arsenic Longitudinal Study (HEALS) in Bangladesh. Fifty participants from the HEALS provided fecal samples in the clinic which were aliquoted into no solution, 95% ethanol, RNAlater, postdevelopment fecal occult blood test (FOBT) cards, and fecal immunochemical test (FIT) tubes. Half of the aliquots were frozen immediately at −80°C (day 0) and the remaining samples were left at ambient temperature for 96 h and then frozen (day 4). Intraclass correlation coefficients (ICC) were calculated for the relative abundances of the top three phyla, for two alpha diversity measures, and for four beta diversity measures. The duplicate samples had relatively high ICCs for technical reproducibility at day 0 and day 4 (range, 0.79 to 0.99). The FOBT card and samples preserved in RNAlater and 95% ethanol had the highest ICCs for stability over 4 days. The FIT tube had lower stability measures overall. In comparison to the “gold standard” method using immediately frozen fecal samples with no solution, the ICCs for many of the microbial metrics were low, but the rank order appeared to be preserved as seen by the Spearman correlation. The FOBT cards, 95% ethanol, and RNAlater were effective fecal preservatives. These fecal collection methods are optimal for future cohort studies, particularly in low- and middle-income countries.

68 citations

Journal ArticleDOI
01 Nov 2005-RNA
TL;DR: Estimated apparent initial abundances suggest that the simplest isoleucine motif was 20- to 40-fold more frequent in selection with 50- or 70-nucleotide randomized regions than with any other length, and support a significant but lesser role for primer sequences in the outcome of selections.
Abstract: Because the abundance of functional molecules in RNA sequence space has many unexplored aspects, we compared the outcome of 11 independent selections, performed using the same affinity selection protocol and contiguous randomized regions of 16, 22, 26, 50, 70, and 90 nucleotides. All affinity selections targeted the simplest isoleucine aptamer, an asymmetric internal loop. This loop should be abundant in all selections, so that it can be compared across all experiments. In some cases, two primer sets intended to favor selection of different structures have also been compared. The simplest isoleucine aptamer dominates all selections except with the shortest tract, 16 contiguous randomized nucleotides. Here the isoleucine aptamer cannot be accommodated and no other motif can be selected. Our results suggest an optimum length for selection; surprisingly, both the shortest and the longest randomized tracts make it more difficult to recover the motif. Estimated apparent initial abundances suggest that the simplest isoleucine motif was 20- to 40-fold more frequent in selection with 50- or 70-nucleotide randomized regions than with any other length. Considering primer sets, a pre-formed stable stem within fixed flanking sequences had a five-to 10-fold negative effect on apparent motif abundance at all lengths. Differing random tract lengths also determined the probable motif permutation and the most abundant helix lengths. These data support a significant but lesser role for primer sequences in the outcome of selections.

67 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An overview of the analysis pipeline and links to raw data and processed output from the runs with and without denoising are provided.
Abstract: Supplementary Figure 1 Overview of the analysis pipeline. Supplementary Table 1 Details of conventionally raised and conventionalized mouse samples. Supplementary Discussion Expanded discussion of QIIME analyses presented in the main text; Sequencing of 16S rRNA gene amplicons; QIIME analysis notes; Expanded Figure 1 legend; Links to raw data and processed output from the runs with and without denoising.

28,911 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: The extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.
Abstract: SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. The referred database release 111 (July 2012) contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Since the initial description of the project, substantial new features have been introduced, including advanced quality control procedures, an improved rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. Furthermore, the extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.

18,256 citations

Journal ArticleDOI
TL;DR: M mothur is used as a case study to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments.
Abstract: mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the alpha and beta diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.

17,350 citations

Journal ArticleDOI
TL;DR: UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters and offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Abstract: Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

17,301 citations