scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Genome-Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants

TL;DR: A computational pipeline is presented to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome that implicate local gene duplication and single gene transposition as playing roles in the evolution of plant metabolic gene clusters.
Abstract: Plant metabolism underpins many traits of ecological and agronomic importance. Plants produce numerous compounds to cope with their environments but the biosynthetic pathways for most of these compounds have not yet been elucidated. To engineer and improve metabolic traits, we need comprehensive and accurate knowledge of the organization and regulation of plant metabolism at the genome scale. Here, we present a computational pipeline to identify metabolic enzymes, pathways, and gene clusters from a sequenced genome. Using this pipeline, we generated metabolic pathway databases for 22 species and identified metabolic gene clusters from 18 species. This unified resource can be used to conduct a wide array of comparative studies of plant metabolism. Using the resource, we discovered a widespread occurrence of metabolic gene clusters in plants: 11,969 clusters from 18 species. The prevalence of metabolic gene clusters offers an intriguing possibility of an untapped source for uncovering new metabolite biosynthesis pathways. For example, more than 1,700 clusters contain enzymes that could generate a specialized metabolite scaffold (signature enzymes) and enzymes that modify the scaffold (tailoring enzymes). In four species with sufficient gene expression data, we identified 43 highly coexpressed clusters that contain signature and tailoring enzymes, of which eight were characterized previously to be functional pathways. Finally, we identified patterns of genome organization that implicate local gene duplication and, to a lesser extent, single gene transposition as having played roles in the evolution of plant metabolic gene clusters.
Citations
More filters
Journal ArticleDOI
TL;DR: In the past two years, PubChem made substantial improvements, including a data model change for the data objects used by these pages as well as by programmatic users, and several new services were introduced.
Abstract: PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

1,791 citations

Journal ArticleDOI
TL;DR: This article provides an update on the developments in MetaCyc during the past two years, including the expansion of data and addition of new features.
Abstract: MetaCyc (https://MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains more than 2570 pathways derived from >54 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc is strictly evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in the BioCyc (https://BioCyc.org) and other PGDB collections. This article provides an update on the developments in MetaCyc during the past two years, including the expansion of data and addition of new features.

657 citations

Journal ArticleDOI
TL;DR: This article provides an update on the developments in MetaCyc during September 2017 to August 2019, up to version 23.1.1, which includes modifications to the GlycanBuilder software that enable displaying glycans using symbolic representation, improved graphics and fonts for web displays, and improvements in the PathoLogic component of Pathway Tools.
Abstract: MetaCyc (MetaCyc.org) is a comprehensive reference database of metabolic pathways and enzymes from all domains of life. It contains 2749 pathways derived from more than 60 000 publications, making it the largest curated collection of metabolic pathways. The data in MetaCyc are evidence-based and richly curated, resulting in an encyclopedic reference tool for metabolism. MetaCyc is also used as a knowledge base for generating thousands of organism-specific Pathway/Genome Databases (PGDBs), which are available in BioCyc.org and other genomic portals. This article provides an update on the developments in MetaCyc during September 2017 to August 2019, up to version 23.1. Some of the topics that received intensive curation during this period include cobamides biosynthesis, sterol metabolism, fatty acid biosynthesis, lipid metabolism, carotenoid metabolism, protein glycosylation, antibiotics and cytotoxins biosynthesis, siderophore biosynthesis, bioluminescence, vitamin K metabolism, brominated compound metabolism, plant secondary metabolism and human metabolism. Other additions include modifications to the GlycanBuilder software that enable displaying glycans using symbolic representation, improved graphics and fonts for web displays, improvements in the PathoLogic component of Pathway Tools, and the optional addition of regulatory information to pathway diagrams.

476 citations

Journal ArticleDOI
TL;DR: This document envisage that this document will serve as a guide to metabolomics researchers and other members of the community wishing to perform multi-omics studies and believe that these ideas may allow the full promise of integratedmulti-omics research and, ultimately, of systems biology to be realized.
Abstract: The use of multiple omics techniques (i.e., genomics, transcriptomics, proteomics, and metabolomics) is becoming increasingly popular in all facets of life science. Omics techniques provide a more holistic molecular perspective of studied biological systems compared to traditional approaches. However, due to their inherent data differences, integrating multiple omics platforms remains an ongoing challenge for many researchers. As metabolites represent the downstream products of multiple interactions between genes, transcripts, and proteins, metabolomics, the tools and approaches routinely used in this field could assist with the integration of these complex multi-omics data sets. The question is, how? Here we provide some answers (in terms of methods, software tools and databases) along with a variety of recommendations and a list of continuing challenges as identified during a peer session on multi-omics integration that was held at the recent 'Australian and New Zealand Metabolomics Conference' (ANZMET 2018) in Auckland, New Zealand (Sept. 2018). We envisage that this document will serve as a guide to metabolomics researchers and other members of the community wishing to perform multi-omics studies. We also believe that these ideas may allow the full promise of integrated multi-omics research and, ultimately, of systems biology to be realized.

356 citations


Cites background or methods from "Genome-Wide Prediction of Metabolic..."

  • ...Name to ID conversion Open [41] BiofOmics • Transcriptomics • Proteomics • Metabolomics Biofilm - Experiment library - Data depository Open [113] BioCyc/MetaCyc • Genomics • Proteomics • Metabolomics Unspecified - Online encyclopedia of metabolism - Predicted metabolic pathways in sequenced genomes - Enzyme data set - Metabolite database Open [111] Cell Illustrator 5.0 • Genomic • Transcriptomics • Proteomics Unspecified - Draw biological pathway models and simulations - Run biological cellular simulations and graphical display results Licensed [74] CellML (Open source XML language) • Transcriptomics • Proteomics • Metabolomics Unspecified - Open source language for biological cellular models Open [73] COBRA • Transcriptomics • Proteomics • Metabolomics • Fluxomics Unspecified - Genome scale integrated modeling of cell metabolism and macro-molecular expression Open [12,79] Metabolites 2019, 9, 76 12 of 31 Table 2....

    [...]

  • ...Plant Metabolic Network (PMN) • Genomics • Proteomics • Metabolomics Plants - Plant-specific database containing pathways, enzymes, reactions, and compounds Open [100]...

    [...]

  • ...Software Tool Omics Integrated Domain Functionality Type of license Reference MADMAX (Management and analysis database for multiple omics experiments) • Metagenomics • Transcriptomics • Metabolomics Plants, Medical and Clinical - Integrates omics data - Statistical analysis and pathway mapping Open [123] MapMan • Metagenomics • Transcriptomics • Metabolomics Plants (developed for use with Arabidopsis....

    [...]

  • ...- Rapidly design new pathway maps based on user data and genome-scale models - Visualize data related to genes or proteins on the associated reactions and pathways - Identify trends in common genomic data types Open (MIT license) [116] Gaggle Variety of omics platformbioinformatics solutions Unspecified - Inoperability of the following tools: - Bioinformatics resource manager - Cytoscape - DataMatrixViewer - KEGG - Genome Browser - MeV - PIPE - BioTapestry - N-Browse Open [117] Metabolites 2019, 9, 76 13 of 31 Table 2....

    [...]

  • ...Software Tool Omics Integrated Domain Functionality Type of license Reference GIM3E (Gene Inactivation Moderated by Metabolism, Metabolomics and Expression) • Transcriptomics • Metabolomics Unspecified - Establishes metabolite use requirements with metabolomics data - Model-paired transcriptomics data to find experimentally supported solutions - Calculates the turnover (production/consumption) flux of metabolites Open; Phython based and requires COBRApy 0.2.x. [118] INMEX (Integrative meta-analysis of expression data) • Transcriptomics • Metabolomics Medical and Clinical - Meta and integrative analysis of data - Pathway analysis Open [119] IMPaLA (Integrated Molecular Pathway Level Analysis) • Transcriptomics • Proteomics • Metabolomics Medical and clinical - Enrichment analysis - Pathway analysis Academic only [120] Ingenuity Pathway Analysis • Metagenomics • Transcriptomics • Proteomics • Metabolomics Medical (human) and clinical....

    [...]

Journal ArticleDOI
TL;DR: The rose whole-genome sequencing and assembly and resequencing of major genotypes that contributed to rose domestication and a model of interconnected regulation of scent and flower color are proposed, providing a foundation for understanding the mechanisms governing rose traits.
Abstract: Roses have high cultural and economic importance as ornamental plants and in the perfume industry. We report the rose whole-genome sequencing and assembly and resequencing of major genotypes that contributed to rose domestication. We generated a homozygous genotype from a heterozygous diploid modern rose progenitor, Rosa chinensis ‘Old Blush’. Using single-molecule real-time sequencing and a meta-assembly approach, we obtained one of the most comprehensive plant genomes to date. Diversity analyses highlighted the mosaic origin of ‘La France’, one of the first hybrids combining the growth vigor of European species and the recurrent blooming of Chinese species. Genomic segments of Chinese ancestry identified new candidate genes for recurrent blooming. Reconstructing regulatory and secondary metabolism pathways allowed us to propose a model of interconnected regulation of scent and flower color. This genome provides a foundation for understanding the mechanisms governing rose traits and should accelerate improvement in roses, Rosaceae and ornamentals.

292 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations


"Genome-Wide Prediction of Metabolic..." refers methods in this paper

  • ...Then, protein sequences were BLASTed against the Phytozome protein sequences, all splicing variants included, using BLAST+ 2.2.28+ (Camacho et al., 2009)....

    [...]

  • ...Trends Plant Sci 19: 447–459 Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications....

    [...]

  • ...Pairwise sequence comparisons are performed using BLAST (Altschul et al., 1990; e value threshold# 1e-2) against RPSD v3....

    [...]

  • ...Protein sequences from a genome are first subjected to an all-againstall BLAST followed by clustering using the Markov cluster algorithm (Enright et al., 2002) with the inflation value (cluster granularity parameter) set to 2....

    [...]

  • ...Please follow the restrictions of BLAST and PRIAM....

    [...]

Journal ArticleDOI
TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

13,223 citations


"Genome-Wide Prediction of Metabolic..." refers background or methods in this paper

  • ...…database by filtering for those sequences annotated with (1) a four-part EC number (IUBMB, 1992), (2) a MetaCyc reaction identifier in MetaCyc (Caspi et al., 2014) or PlantCyc (Zhang et al., 2010), or (3) a leaf-node Gene Ontology (Blake et al., 2013) term under catalytic activity…...

    [...]

  • ...To compile the list of signature enzyme reactions, we extracted MetaCyc (Caspi et al., 2014) reactions producing terpenes from these three building blocks....

    [...]

  • ...0, catalytic functions are defined as EF classes, which are based on either four-part EC numbers orMetaCyc reaction identifiers (Caspi et al., 2014)....

    [...]

  • ...Specialized metabolism was classified into subdomains based on the metabolites they produce or metabolize (Wink, 2010; Caspi et al., 2014)....

    [...]

  • ...1 was compiled from manually curated or experimentally supported data in SwissProt (UniProt Consortium, 2011; November 2014 release), BRENDA (Schomburg et al., 2013; November 2014 release), MetaCyc (Caspi et al., 2014; November 2014 release), and PlantCyc (Zhang et al., 2010; November 2014 release)....

    [...]

Journal ArticleDOI
14 Dec 2000-Nature
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

8,742 citations

Journal ArticleDOI
TL;DR: The relationship between butterflies and their food plants is investigated, the examination of patterns of interaction between two major groups of organisms with a close and evident ecological relationship, such as plants and herbivores.
Abstract: One of the least understood aspects of population biology is community evolution-the evolutionary interactions found among different kinds or organisms where exchange of genetic information among the kinds is assumed to be minimal or absent. Studies of community evolution have, in general, tended to be narrow in scope and to ignore the reciprocal aspects of these interactions. Indeed, one group of organisms is all too often viewed'as a kind of physical constant. In an extreme example a parasitologist might not consider the evolutionary history and responses of hosts, while a specialist in vertebrates might assume species of vertebrate parasites to be invariate entities. This viewpoint is one factor in the general lack of progress toward the understanding of organic diversification. One approach to what we would like to call coevolution is the examination of patterns of interaction between two major groups of organisms with a close and evident ecological relationship, such as plants and herbivores. The considerable amount of information available about butterflies and their food plants make them particularly suitable for these investigations. Further, recent detailed investigations have provided a relatively firm basis for statements about the phenetic relationships of the various higher groups of Papilionoidea (Ehrlich, 1958, and unpubl.). It should, however, be remembered that we are considering the butterflies as a model. They are only one of the many groups of herbivorous organisms coevolving with plants. In this paper, we shall investigate the relationship between butterflies and their food

3,932 citations


"Genome-Wide Prediction of Metabolic..." refers methods in this paper

  • ...Protein sequences from a genome are first subjected to an all-againstall BLAST followed by clustering using the Markov cluster algorithm (Enright et al., 2002) with the inflation value (cluster granularity parameter) set to 2....

    [...]

Journal ArticleDOI
TL;DR: Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number of complete plant genomes.
Abstract: The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.

3,728 citations