scispace - formally typeset
Journal ArticleDOI

Enabling Massive XML-Based Biological Data Management in HBase

Reads0
Chats0
TLDR
This study reports a novel platform to store and query massive XML-based biological data collections, and a formal approach to transform the XML query model into the MapReduce query model is proposed.
Abstract
Publishing biological data in XML formats is attractive for organizations who would like to provide their bioinformatics resources in an extensible and machine-readable format. In the era of big data, massive XML-based biological data management is emerged as a challengeable issue. With the continuous growth of the XML-based biological data sets, it is usually frustrating to use traditional declarative query languages to provide efficient query capabilities in terms of processing speed and scale. In this study, we report a novel platform to store and query massive XML-based biological data collections. A prototype tool for constructing HBase tables from XML-based biological data collections is first developed, and then a formal approach to transform the XML query model into the MapReduce query model is proposed. Finally, an evaluation of the query performance of the proposed approach on the existing XML-based biological databases is presented, showing that the performance advantages of the proposed solution. The source code of the massive XML-based biological data management platform is freely available at https://github.com/lyotvincent/X2H .

read more

Citations
More filters
Journal ArticleDOI

NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens.

TL;DR: This work has developed a web-accessible database, called NeoPeptide, which contains most of the important characteristics of neoantigens derived from published literature and other immunological resources and provides links to resources for further characterization of the novel features of these neoantIGens.
Journal ArticleDOI

Predicting the Disease Genes of Multiple Sclerosis Based on Network Representation Learning.

TL;DR: The proposed framework contains three main steps: capturing the topological structure of the PPI network using NRL-based methods, encoding learned features into low-dimensional space using a stacked autoencoder, and training a support vector machine (SVM) classifier to predict disease-related genes.
Journal ArticleDOI

An effective biomedical data migration tool from resource description framework to JSON

TL;DR: An effective mapping tool that allows data migrations from RDF to JSON for supporting future massive data explosions and releases is presented, and an effective and user-friendly tool called RDF2JSON is developed, which enables automating the process of RDF data extractions and the corresponding JSON data generations.
Journal ArticleDOI

Jointly Integrating VCF-Based Variants and OWL-Based Biomedical Ontologies in MongoDB

TL;DR: This paper proposes a series of rules for the mapping from VCF and OWL files to JSON files, and presents rule-based algorithms for transforming VCF-based genetic variants and OWl-based biological ontologies into JSON objects and introduces effective approaches of integrating the mapped JSON files in MongoDB.
Journal ArticleDOI

Uncertainty Modeling of Object-Oriented Biomedical Information in HBase

TL;DR: A formal approach for reengineering fuzzy object-oriented databases in HBase using the technique of rule-based schema mapping, and a formal approach to map the fuzzyobject-oriented algebra into fuzzy HBase algebra is proposed.
References
More filters
Journal ArticleDOI

UniProt: the Universal Protein knowledgebase

TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Journal ArticleDOI

The BioPAX community standard for pathway data sharing

Emek Demir, +94 more
- 01 Sep 2010 - 
TL;DR: Thousands of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases, and this large amount of pathway data in a computable form will support visualization, analysis and biological discovery.
Journal Article

XRel : A path-based approach to storage and retrieval of XML documents using relational databases

TL;DR: XRel enables us to store XML documents using a fixed relational schema without any information about DTDs and also to utilize indices such as the B 1 -tree and the R-tree supported by database management systems.
Journal ArticleDOI

Hadoop GIS: a high performance spatial data warehousing system over mapreduce

TL;DR: Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.
Related Papers (5)