Data integration for plant genomics—exemplars from the integration of Arabidopsis thaliana databases

doi:10.1093/BIB/BBP047

Open AccessJournal ArticleDOI

Data integration for plant genomics—exemplars from the integration of Arabidopsis thaliana databases

Artem Lysenko, +4 more

- 01 Nov 2009 -

Briefings in Bioinformatics

- Vol. 10, Iss: 6, pp 676-693

TLDR

A graph based integration method (Ondex) is used and the utility of these approaches are demonstrated to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.

Abstract:

The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Representing and querying disease networks using graph databases

Artem Lysenko, +5 more

- 25 Jul 2016 -

Biodata Mining

TL;DR: This study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.

...read moreread less

Journal ArticleDOI

Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways.

Takeshi Obayashi, +1 more

- 10 Apr 2010 -

Journal of Plant Research

TL;DR: This review of recent successful examples obtained by using the gene coexpression database, ATTED-II, will describe the identification of new genes, such as the subunits of a complex protein, the enzymes in a metabolic pathway and transporters.

...read moreread less

Journal ArticleDOI

EnzML: multi-label prediction of enzyme classes using InterPro signatures

Luna De Ferrari, +4 more

- 25 Apr 2012 -

BMC Bioinformatics

TL;DR: InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function that makes multi-label machine learning feasible in reasonable time using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN).

...read moreread less

Journal ArticleDOI

Mapping Plant Interactomes Using Literature Curated and Predicted Protein–Protein Interaction Data Sets

Ki-Young Lee, +5 more

- 01 Apr 2010 -

The Plant Cell

TL;DR: The plant science community is informed of the currently available sources of protein interaction data and how they can be useful to researchers and efforts to add value to the interaction data are presented.

...read moreread less

Journal ArticleDOI

A Review of the “Omics” Approach to Biomarkers of Oxidative Stress in Oryza sativa

Nyuk Ling Ma, +2 more

- 08 Apr 2013 -

International Journal of Molecular Scien...

TL;DR: This review highlights the recent breakthrough in molecular strategies (comprising transcriptomics, proteomics, and metabolomics) in identifying oxidative stress biomarkers and the arising opportunities and obstacles observed in research on biomarkers in rice.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Basic Local Alignment Search Tool

Stephen F. Altschul, +4 more

- 01 Oct 1990 -

Journal of Molecular Biology

TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

...read moreread less

Journal ArticleDOI

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

Paul Shannon, +8 more

- 01 Nov 2003 -

Genome Research

TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

...read moreread less

Journal ArticleDOI

Bioconductor: open software development for computational biology and bioinformatics

Robert Gentleman, +24 more

- 15 Sep 2004 -

Genome Biology

TL;DR: Details of the aims and methods of Bioconductor, the collaborative creation of extensible software for computational biology and bioinformatics, and current challenges are described.

...read moreread less

Journal ArticleDOI

Database resources of the National Center for Biotechnology Information

David L. Wheeler, +12 more

- 01 Jan 2004 -

Nucleic Acids Research

TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.

...read moreread less

Journal ArticleDOI

UniProt: the Universal Protein knowledgebase

Rolf Apweiler, +14 more

- 01 Jan 2004 -

Nucleic Acids Research

TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.

...read moreread less

Collapse

Related Papers (5)

Graph-based analysis and visualization of experimental results with ONDEX

Jacob Köhler, +8 more

- 01 Jun 2006 -

Bioinformatics

MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics

Heiko Schoof, +5 more

- 01 Jan 2004 -

Nucleic Acids Research

CORNET 2.0: integrating plant coexpression, protein–protein interactions, regulatory interactions, gene associations and functional annotations

Stefanie De Bodt, +4 more

- 01 Aug 2012 -

New Phytologist

Data integration for plant genomics—exemplars from the integration of Arabidopsis thaliana databases

Citations

Representing and querying disease networks using graph databases

Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways.

EnzML: multi-label prediction of enzyme classes using InterPro signatures

Mapping Plant Interactomes Using Literature Curated and Predicted Protein–Protein Interaction Data Sets

A Review of the “Omics” Approach to Biomarkers of Oxidative Stress in Oryza sativa

References

Basic Local Alignment Search Tool

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

Bioconductor: open software development for computational biology and bioinformatics

Database resources of the National Center for Biotechnology Information

UniProt: the Universal Protein knowledgebase

Related Papers (5)

Gene Ontology: tool for the unification of biology

Graph-based analysis and visualization of experimental results with ONDEX

The IntAct molecular interaction database in 2010

MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource for plant genomics

CORNET 2.0: integrating plant coexpression, protein–protein interactions, regulatory interactions, gene associations and functional annotations