scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Meta-analysis of fecal metagenomes reveals global microbial signatures that are specific for colorectal cancer

TL;DR: A meta-analysis of eight geographically and technically diverse fecal shotgun metagenomic studies of colorectal cancer identified a core set of 29 species significantly enriched in CRC metagenomes, establishing globally generalizable, predictive taxonomic and functional microbiome CRC signatures as a basis for future diagnostics.
Abstract: Association studies have linked microbiome alterations with many human diseases. However, they have not always reported consistent results, thereby necessitating cross-study comparisons. Here, a meta-analysis of eight geographically and technically diverse fecal shotgun metagenomic studies of colorectal cancer (CRC, n = 768), which was controlled for several confounders, identified a core set of 29 species significantly enriched in CRC metagenomes (false discovery rate (FDR) < 1 × 10−5). CRC signatures derived from single studies maintained their accuracy in other studies. By training on multiple studies, we improved detection accuracy and disease specificity for CRC. Functional analysis of CRC metagenomes revealed enriched protein and mucin catabolism genes and depleted carbohydrate degradation genes. Moreover, we inferred elevated production of secondary bile acids from CRC metagenomes, suggesting a metabolic link between cancer-associated gut microbes and a fat- and meat-rich diet. Through extensive validations, this meta-analysis firmly establishes globally generalizable, predictive taxonomic and functional microbiome CRC signatures as a basis for future diagnostics. Cross-study analysis defines fecal microbial species associated with colorectal cancer.
Citations
More filters
Journal ArticleDOI
TL;DR: Future studies will focus on understanding the mechanisms underlying the microbiota-gut-brain axis and attempt to elucidate microbial-based intervention and therapeutic strategies for neuropsychiatric disorders.
Abstract: The importance of the gut-brain axis in maintaining homeostasis has long been appreciated. However, the past 15 yr have seen the emergence of the microbiota (the trillions of microorganisms within ...

1,775 citations

Journal ArticleDOI
TL;DR: This protocol details MicrobiomeAnalyst, a user-friendly, web-based platform for comprehensive statistical, functional, and meta-analysis of microbiome data, a one-stop shop that enables microbiome researchers to thoroughly explore their preprocessed microbiome data via intuitive web interfaces.
Abstract: MicrobiomeAnalyst is an easy-to-use, web-based platform for comprehensive analysis of common data outputs generated from current microbiome studies. It enables researchers and clinicians with little or no bioinformatics training to explore a wide variety of well-established methods for microbiome data processing, statistical analysis, functional profiling and comparison with public datasets or known microbial signatures. MicrobiomeAnalyst currently contains four modules: Marker-gene Data Profiling (MDP), Shotgun Data Profiling (SDP), Projection with Public Data (PPD), and Taxon Set Enrichment Analysis (TSEA). This protocol will first introduce the MDP module by providing a step-wise description of how to prepare, process and normalize data; perform community profiling; identify important features; and conduct correlation and classification analysis. We will then demonstrate how to perform predictive functional profiling and introduce several unique features of the SDP module for functional analysis. The last two sections will describe the key steps involved in using the PPD and TSEA modules for meta-analysis and visual exploration of the results. In summary, MicrobiomeAnalyst offers a one-stop shop that enables microbiome researchers to thoroughly explore their preprocessed microbiome data via intuitive web interfaces. The complete protocol can be executed in ~70 min. This protocol details MicrobiomeAnalyst, a user-friendly, web-based platform for comprehensive statistical, functional, and meta-analysis of microbiome data.

823 citations

Journal ArticleDOI
TL;DR: The role of microorganisms in colorectal carcinogenesis, and the potential clinical translation of the gut microbiota as a biomarker for CRC diagnosis and prognosis are described, and as an approach for disease prevention and to improve therapy are described.
Abstract: Colorectal cancer (CRC) accounts for about 10% of all new cancer cases globally. Located at close proximity to the colorectal epithelium, the gut microbiota comprises a large population of microorganisms that interact with host cells to regulate many physiological processes, such as energy harvest, metabolism and immune response. Sequencing studies have revealed microbial compositional and ecological changes in patients with CRC, whereas functional studies in animal models have pinpointed the roles of several bacteria in colorectal carcinogenesis, including Fusobacterium nucleatum and certain strains of Escherichia coli and Bacteroides fragilis. These findings give new opportunities to take advantage of our knowledge on the gut microbiota for clinical applications, such as gut microbiota analysis as screening, prognostic or predictive biomarkers, or modulating microorganisms to prevent cancer, augment therapies and reduce adverse effects of treatment. This Review aims to provide an overview and discussion of the gut microbiota in colorectal neoplasia, including relevant mechanisms in microbiota-related carcinogenesis, the potential of utilizing the microbiota as CRC biomarkers, and the prospect for modulating the microbiota for CRC prevention or treatment. These scientific findings will pave the way to clinically translate the use of gut microbiota for CRC in the near future.

549 citations

Journal ArticleDOI
TL;DR: The role of diet quality, carbohydrate intake, fermentable FODMAPs, and prebiotic fiber in maintaining healthy gut flora is reviewed and the implications are discussed for various conditions including obesity, diabetes, irritable bowel syndrome, inflammatory bowel disease, depression, and cardiovascular disease.
Abstract: The gut microbiome plays an important role in human health and influences the development of chronic diseases ranging from metabolic disease to gastrointestinal disorders and colorectal cancer. Of increasing prevalence in Western societies, these conditions carry a high burden of care. Dietary patterns and environmental factors have a profound effect on shaping gut microbiota in real time. Diverse populations of intestinal bacteria mediate their beneficial effects through the fermentation of dietary fiber to produce short-chain fatty acids, endogenous signals with important roles in lipid homeostasis and reducing inflammation. Recent progress shows that an individual’s starting microbial profile is a key determinant in predicting their response to intervention with live probiotics. The gut microbiota is complex and challenging to characterize. Enterotypes have been proposed using metrics such as alpha species diversity, the ratio of Firmicutes to Bacteroidetes phyla, and the relative abundance of beneficial genera (e.g., Bifidobacterium, Akkermansia) versus facultative anaerobes (E. coli), pro-inflammatory Ruminococcus, or nonbacterial microbes. Microbiota composition and relative populations of bacterial species are linked to physiologic health along different axes. We review the role of diet quality, carbohydrate intake, fermentable FODMAPs, and prebiotic fiber in maintaining healthy gut flora. The implications are discussed for various conditions including obesity, diabetes, irritable bowel syndrome, inflammatory bowel disease, depression, and cardiovascular disease.

532 citations


Cites background from "Meta-analysis of fecal metagenomes ..."

  • ...Gut microbiota associated with colorectal cancer were recently shown to have an increase in genes associated with TMA lyase and protein catabolism, while microbe carbohydrate degradation pathways were depleted [156,157]....

    [...]

Journal ArticleDOI
27 Feb 2020-Nature
TL;DR: A distinct mutational signature in colorectal cancer is described and it is implied that the underlying mutational process results directly from past exposure to bacteria carrying the colibactin-producing pks pathogenicity island.
Abstract: Various species of the intestinal microbiota have been associated with the development of colorectal cancer1,2, but it has not been demonstrated that bacteria have a direct role in the occurrence of oncogenic mutations. Escherichia coli can carry the pathogenicity island pks, which encodes a set of enzymes that synthesize colibactin3. This compound is believed to alkylate DNA on adenine residues4,5 and induces double-strand breaks in cultured cells3. Here we expose human intestinal organoids to genotoxic pks+ E. coli by repeated luminal injection over five months. Whole-genome sequencing of clonal organoids before and after this exposure revealed a distinct mutational signature that was absent from organoids injected with isogenic pks-mutant bacteria. The same mutational signature was detected in a subset of 5,876 human cancer genomes from two independent cohorts, predominantly in colorectal cancer. Our study describes a distinct mutational signature in colorectal cancer and implies that the underlying mutational process results directly from past exposure to bacteria carrying the colibactin-producing pks pathogenicity island.

507 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

83,420 citations


"Meta-analysis of fecal metagenomes ..." refers methods in this paper

  • ...To adjust for multiple hypothesis testing, P-values were adjusted using the 823 false-discovery rate (FDR) method [70]....

    [...]

Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


"Meta-analysis of fecal metagenomes ..." refers methods in this paper

  • ...For each split, a L1-regularized 875 (LASSO) logistic regression model [68] was trained on the training set, which was then used to predict 876 the test set....

    [...]

  • ...LASSO models were then built on log10-transformed abundances (pseudocount of 10 × 10−9, centered and scaled) of the sets of the 100 top genes returned by mRMR....

    [...]

  • ...Instead, least absolute shrinkage and selection operator (LASSO) logistic regression classifiers were employed to select predictive microbial features and eliminated uninformative ones (see Methods)....

    [...]

  • ...To ascertain the predictive power of a classifier based on the IGC gene abundances30, we applied a series of filters to the abundance tables to reduce the number of genes that would be the input of LASSO modeling....

    [...]

  • ...This importantly 800 also pertained to feature selection, which was either done via the LASSO [68] or by nested cross801 validation procedures to avoid overoptimistic performance assessment [69] (see below for details)....

    [...]

Journal ArticleDOI
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

37,524 citations

Related Papers (5)