scispace - formally typeset
Open AccessJournal ArticleDOI

Data integration for plant genomics—exemplars from the integration of Arabidopsis thaliana databases

TLDR
A graph based integration method (Ondex) is used and the utility of these approaches are demonstrated to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.
Abstract
The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.

read more

Citations
More filters
Journal ArticleDOI

Representing and querying disease networks using graph databases

TL;DR: This study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.
Journal ArticleDOI

Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways.

TL;DR: This review of recent successful examples obtained by using the gene coexpression database, ATTED-II, will describe the identification of new genes, such as the subunits of a complex protein, the enzymes in a metabolic pathway and transporters.
Journal ArticleDOI

EnzML: multi-label prediction of enzyme classes using InterPro signatures

TL;DR: InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function that makes multi-label machine learning feasible in reasonable time using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN).
Journal ArticleDOI

Mapping Plant Interactomes Using Literature Curated and Predicted Protein–Protein Interaction Data Sets

TL;DR: The plant science community is informed of the currently available sources of protein interaction data and how they can be useful to researchers and efforts to add value to the interaction data are presented.
Journal ArticleDOI

A Review of the “Omics” Approach to Biomarkers of Oxidative Stress in Oryza sativa

TL;DR: This review highlights the recent breakthrough in molecular strategies (comprising transcriptomics, proteomics, and metabolomics) in identifying oxidative stress biomarkers and the arising opportunities and obstacles observed in research on biomarkers in rice.
References
More filters
Journal ArticleDOI

Basic Local Alignment Search Tool

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Journal ArticleDOI

Database resources of the National Center for Biotechnology Information

TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.
Journal ArticleDOI

UniProt: the Universal Protein knowledgebase

TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Related Papers (5)