Data integration for plant genomics—exemplars from the integration of Arabidopsis thaliana databases
TLDR
A graph based integration method (Ondex) is used and the utility of these approaches are demonstrated to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.Abstract:
The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.read more
Citations
More filters
Journal ArticleDOI
Representing and querying disease networks using graph databases
Artem Lysenko,Irina A. Roznovăţ,Mansoor Saqi,Alexander Mazein,Christopher J. Rawlings,Charles Auffray +5 more
TL;DR: This study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.
Journal ArticleDOI
Coexpression landscape in ATTED-II: usage of gene list and gene network for various types of pathways.
Takeshi Obayashi,Kengo Kinoshita +1 more
TL;DR: This review of recent successful examples obtained by using the gene coexpression database, ATTED-II, will describe the identification of new genes, such as the subunits of a complex protein, the enzymes in a metabolic pathway and transporters.
Journal ArticleDOI
EnzML: multi-label prediction of enzyme classes using InterPro signatures
TL;DR: InterPro signatures are a compact and powerful attribute space for the prediction of enzymatic function that makes multi-label machine learning feasible in reasonable time using the Mulan Binary Relevance Nearest Neighbours algorithm implementation (BR-kNN).
Journal ArticleDOI
Mapping Plant Interactomes Using Literature Curated and Predicted Protein–Protein Interaction Data Sets
Ki-Young Lee,Ki-Young Lee,David Thorneycroft,Premanand Achuthan,Henning Hermjakob,Trey Ideker +5 more
TL;DR: The plant science community is informed of the currently available sources of protein interaction data and how they can be useful to researchers and efforts to add value to the interaction data are presented.
Journal ArticleDOI
A Review of the “Omics” Approach to Biomarkers of Oxidative Stress in Oryza sativa
TL;DR: This review highlights the recent breakthrough in molecular strategies (comprising transcriptomics, proteomics, and metabolomics) in identifying oxidative stress biomarkers and the arising opportunities and obstacles observed in research on biomarkers in rice.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI
Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks
Paul Shannon,Andrew Markiel,Owen Ozier,Nitin S. Baliga,Jonathan T. Wang,Daniel Ramage,Nada Amin,Benno Schwikowski,Trey Ideker +8 more
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Journal ArticleDOI
Bioconductor: open software development for computational biology and bioinformatics
Robert Gentleman,Vincent J. Carey,Douglas M. Bates,Benjamin M. Bolstad,Marcel Dettling,Sandrine Dudoit,Byron Ellis,Laurent Gautier,Yongchao Ge,Jeff Gentry,Kurt Hornik,Torsten Hothorn,Wolfgang Huber,Stefano Maria Iacus,Rafael A. Irizarry,Friedrich Leisch,Cheng Li,Martin Maechler,A. J. Rossini,Günther Sawitzki,Colin A. Smith,Gordon K. Smyth,Luke Tierney,Jean Yang,Jianhua Zhang +24 more
TL;DR: Details of the aims and methods of Bioconductor, the collaborative creation of extensible software for computational biology and bioinformatics, and current challenges are described.
Journal ArticleDOI
Database resources of the National Center for Biotechnology Information
David L. Wheeler,Deanna M. Church,Ron Edgar,Scott Federhen,Wolfgang Helmberg,Thomas L. Madden,Joan Pontius,Gregory D. Schuler,Lynn M. Schriml,Edwin Sequeira,Tugba O. Suzek,Tatiana Tatusova,Lukas Wagner +12 more
TL;DR: In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website.
Journal ArticleDOI
UniProt: the Universal Protein knowledgebase
Rolf Apweiler,Amos Marc Bairoch,Cathy H. Wu,Winona C. Barker,Brigitte Boeckmann,Serenella Ferro,Elisabeth Gasteiger,Hongzhan Huang,Rodrigo Lopez,Michele Magrane,Maria Jesus Martin,Darren A. Natale,Claire O'Donovan,Nicole Redaschi,Lai-Su L. Yeh +14 more
TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Related Papers (5)
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more
The IntAct molecular interaction database in 2010
Bruno Aranda,P. Achuthan,Yasmin Alam-Faruque,Irina M. Armean,Alan Bridge,C. Derow,M Feuermann,Avazeh T. Ghanbarian,Samuel Kerrien,Jyoti Khadake,J. Kerssemakers,C. Leroy,Michael P. Menden,Magali Michaut,Luisa Montecchi-Palazzi,S. N. Neuhauser,Sandra Orchard,Victoria M. Perreau,Bernd Roechert,K. van Eijk,Henning Hermjakob +20 more