scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Global Coexpression Network Approach for Connecting Genes to Specialized Metabolic Pathways in Plants.

TL;DR: It is proposed that global gene coexpression is a rich, largely untapped resource for discovering the genetic basis and architecture of plant natural products and that BGCs are not a hallmark of plant specialized metabolism.
Abstract: Plants produce diverse specialized metabolites (SMs), but the genes responsible for their production and regulation remain largely unknown, hindering efforts to tap plant pharmacopeia. Given that genes comprising SM pathways exhibit environmentally dependent coregulation, we hypothesized that genes within a SM pathway would form tight associations (modules) with each other in coexpression networks, facilitating their identification. To evaluate this hypothesis, we used 10 global coexpression data sets, each a meta-analysis of hundreds to thousands of experiments, across eight plant species to identify hundreds of coexpressed gene modules per data set. In support of our hypothesis, 15.3 to 52.6% of modules contained two or more known SM biosynthetic genes, and module genes were enriched in SM functions. Moreover, modules recovered many experimentally validated SM pathways, including all six known to form biosynthetic gene clusters (BGCs). In contrast, bioinformatically predicted BGCs (i.e., those lacking an associated metabolite) were no more coexpressed than the null distribution for neighboring genes. These results suggest that most predicted plant BGCs are not genuine SM pathways and argue that BGCs are not a hallmark of plant specialized metabolism. We submit that global gene coexpression is a rich, largely untapped resource for discovering the genetic basis and architecture of plant natural products.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A redesigned and significantly enhanced MapMan4 framework is presented, together with a revised version of the associated online Mercator annotation tool, providing protein annotations for all embryophytes with a comparably high quality.

276 citations

Journal ArticleDOI
TL;DR: The development of ePlant is described and several examples illustrating its integrative features for hypothesis generation are presented, including the process of deploying ePl plant as an “app” on Araport.
Abstract: A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an “app” on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research.

247 citations


Cites background from "A Global Coexpression Network Appro..."

  • ...Spatial relationships within the genome can sometimes indicate functional relationships (Chae et al., 2014; Wisecaver et al., 2017)....

    [...]

Journal ArticleDOI
TL;DR: Improved knowledge of the evolutionary life cycle of MGCs will advance the understanding of the ecology of specialized metabolism and of the interplay between the lifestyle of an organism and genome architecture.
Abstract: Fungi contain a remarkable diversity of both primary and secondary metabolic pathways involved in ecologically specialized or accessory functions. Genes in these pathways are frequently physically linked on fungal chromosomes, forming metabolic gene clusters (MGCs). In this Review, we describe the diversity in the structure and content of fungal MGCs, their population-level and species-level variation, the evolutionary mechanisms that underlie their formation, maintenance and decay, and their ecological and evolutionary impact on fungal populations. We also discuss MGCs from other eukaryotes and the reasons for their preponderance in fungi. Improved knowledge of the evolutionary life cycle of MGCs will advance our understanding of the ecology of specialized metabolism and of the interplay between the lifestyle of an organism and genome architecture.

148 citations

Journal ArticleDOI
TL;DR: Genome-wide analyses revealed that both polyploidy and tandem gene duplications modified various pathways involved in the biosynthesis of key phytonutrients in highbush blueberry.
Abstract: Background Highbush blueberry (Vaccinium corymbosum) has long been consumed for its unique flavor and composition of health-promoting phytonutrients. However, breeding efforts to improve fruit quality in blueberry have been greatly hampered by the lack of adequate genomic resources and a limited understanding of the underlying genetics encoding key traits. The genome of highbush blueberry has been particularly challenging to assemble due, in large part, to its polyploid nature and genome size. Findings Here, we present a chromosome-scale and haplotype-phased genome assembly of the cultivar "Draper," which has the highest antioxidant levels among a diversity panel of 71 cultivars and 13 wild Vaccinium species. We leveraged this genome, combined with gene expression and metabolite data measured across fruit development, to identify candidate genes involved in the biosynthesis of important phytonutrients among other metabolites associated with superior fruit quality. Genome-wide analyses revealed that both polyploidy and tandem gene duplications modified various pathways involved in the biosynthesis of key phytonutrients. Furthermore, gene expression analyses hint at the presence of a spatial-temporal specific dominantly expressed subgenome including during fruit development. Conclusions These findings and the reference genome will serve as a valuable resource to guide future genome-enabled breeding of important agronomic traits in highbush blueberry.

137 citations

Journal ArticleDOI
TL;DR: Investigation of natural variation in maize benzoxazinoid accumulation will have a major impact in this research area by leading to the discovery of previously unknown genes and functions of benzoxzinoid metabolism.
Abstract: Benzoxazinoids are a class of indole-derived plant metabolites that function in defense against numerous pests and pathogens. Due to their abundance in maize (Zea mays) and other important cereal crops, benzoxazinoids have been the subject of extensive research for >50 years. Whereas benzoxazinoids can account for 1% or more of the dry weight in young seedlings constitutively, their accumulation in older plants is induced locally by pest and pathogen attack. Although the biosynthetic pathways for most maize benzoxazinoids have been identified, unanswered questions remain about the developmental and defense-induced regulation of benzoxazinoid metabolism. Recent research shows that, in addition to their central role in the maize chemical defense repertoire, benzoxazinoids may have important functions in regulating other defense responses, flowering time, auxin metabolism, iron uptake and perhaps aluminum tolerance. Investigation of natural variation in maize benzoxazinoid accumulation, which is greatly facilitated by recent genomics advances, will have a major impact in this research area by leading to the discovery of previously unknown genes and functions of benzoxazinoid metabolism.

109 citations

References
More filters
Journal ArticleDOI
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Abstract: We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

27,771 citations


"A Global Coexpression Network Appro..." refers methods in this paper

  • ...MAFFT multiple sequence alignment algorithm (Katoh and Standley, 2013);...

    [...]

  • ...Sequenceswerealignedandmaskedusing the GUIDANCE2 server (Sela et al., 2015) using the codon setting and the MAFFT multiple sequence alignment algorithm (Katoh and Standley, 2013); residues with guidance scores <0.9 weremasked....

    [...]

Journal ArticleDOI
TL;DR: This work presents some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees.
Abstract: Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting postanalyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU

23,838 citations


"A Global Coexpression Network Appro..." refers methods in this paper

  • ...The gene phylogenies were inferred using maximum likelihood as implemented in RAxML version 8.0.25 (Stamatakis, 2014) using rapid bootstrapping (1000 replications) and aGTRGAMMAIXsubstitutionmodel,whichwas thebestmodel as indicatedby theBayesianInformationCriterioninIQ-TREEversion1....

    [...]

  • ...25 (Stamatakis, 2014) using rapid bootstrapping 556 (1000 replications) and a GTRGAMMAIX substitution model, which was the best model 557 as indicated by the Bayesian Information Criterion in IQ-TREE version 1....

    [...]

Journal ArticleDOI
TL;DR: It is shown that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented and found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space.
Abstract: Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.

13,668 citations

Book
01 May 2015
TL;DR: An acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm, which computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment.
Abstract: Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

4,492 citations


"A Global Coexpression Network Appro..." refers background in this paper

  • ..., 2016) secondary biosynthesis pathways (hmmsearch using default inclusion thresholds; Eddy, 2011) (Supplemental Data Set 6)....

    [...]

  • ...…nonhomologous genes with a significant match to a curated list of Pfam domains present in experimentally verified (evidence = EV-EXP) genes assigned to MetaCyc (Caspi et al., 2016) secondary biosynthesis pathways (hmmsearch using default inclusion thresholds; Eddy, 2011) (Supplemental Data Set 6)....

    [...]

Journal ArticleDOI
TL;DR: The BioCyc PGDBs generated by SRI are offered for adoption by any interested party for the ongoing integration of metabolic and genome-related information about an organism.
Abstract: The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism.

2,973 citations