Journal ArticleDOI
Enabling Massive XML-Based Biological Data Management in HBase
Reads0
Chats0
TLDR
This study reports a novel platform to store and query massive XML-based biological data collections, and a formal approach to transform the XML query model into the MapReduce query model is proposed.Abstract:
Publishing biological data in XML formats is attractive for organizations who would like to provide their bioinformatics resources in an extensible and machine-readable format. In the era of big data, massive XML-based biological data management is emerged as a challengeable issue. With the continuous growth of the XML-based biological data sets, it is usually frustrating to use traditional declarative query languages to provide efficient query capabilities in terms of processing speed and scale. In this study, we report a novel platform to store and query massive XML-based biological data collections. A prototype tool for constructing HBase tables from XML-based biological data collections is first developed, and then a formal approach to transform the XML query model into the MapReduce query model is proposed. Finally, an evaluation of the query performance of the proposed approach on the existing XML-based biological databases is presented, showing that the performance advantages of the proposed solution. The source code of the massive XML-based biological data management platform is freely available at https://github.com/lyotvincent/X2H .read more
Citations
More filters
Journal ArticleDOI
NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens.
Weijun Zhou,Zhi Qu,Chaoyang Song,Yang Sun,An-Li Lai,Ma-Yao Luo,Yu-Zhe Ying,Hu Meng,Zhao Liang,Yanjie He,Yuhua Li,Jian Liu +11 more
TL;DR: This work has developed a web-accessible database, called NeoPeptide, which contains most of the important characteristics of neoantigens derived from published literature and other immunological resources and provides links to resources for further characterization of the novel features of these neoantIGens.
Journal ArticleDOI
Predicting the Disease Genes of Multiple Sclerosis Based on Network Representation Learning.
Haijie Liu,Haijie Liu,Haijie Liu,Jiaojiao Guan,He Li,Zhijie Bao,Qingmei Wang,Xun Luo,Hansheng Xue +8 more
TL;DR: The proposed framework contains three main steps: capturing the topological structure of the PPI network using NRL-based methods, encoding learned features into low-dimensional space using a stacked autoencoder, and training a support vector machine (SVM) classifier to predict disease-related genes.
Journal ArticleDOI
An effective biomedical data migration tool from resource description framework to JSON
TL;DR: An effective mapping tool that allows data migrations from RDF to JSON for supporting future massive data explosions and releases is presented, and an effective and user-friendly tool called RDF2JSON is developed, which enables automating the process of RDF data extractions and the corresponding JSON data generations.
Journal ArticleDOI
Jointly Integrating VCF-Based Variants and OWL-Based Biomedical Ontologies in MongoDB
TL;DR: This paper proposes a series of rules for the mapping from VCF and OWL files to JSON files, and presents rule-based algorithms for transforming VCF-based genetic variants and OWl-based biological ontologies into JSON objects and introduces effective approaches of integrating the mapped JSON files in MongoDB.
Journal ArticleDOI
Uncertainty Modeling of Object-Oriented Biomedical Information in HBase
TL;DR: A formal approach for reengineering fuzzy object-oriented databases in HBase using the technique of rule-based schema mapping, and a formal approach to map the fuzzyobject-oriented algebra into fuzzy HBase algebra is proposed.
References
More filters
Journal ArticleDOI
UniProt: the Universal Protein knowledgebase
Rolf Apweiler,Amos Marc Bairoch,Cathy H. Wu,Winona C. Barker,Brigitte Boeckmann,Serenella Ferro,Elisabeth Gasteiger,Hongzhan Huang,Rodrigo Lopez,Michele Magrane,Maria Jesus Martin,Darren A. Natale,Claire O'Donovan,Nicole Redaschi,Lai-Su L. Yeh +14 more
TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Journal ArticleDOI
ClinVar: improving access to variant interpretations and supporting evidence.
Melissa J. Landrum,Jennifer M. Lee,Mark L. Benson,Garth Brown,Chen Chao,Shanmuga Chitipiralla,Baoshan Gu,Jennifer Hart,Douglas W. Hoffman,Wonhee Jang,Karen Karapetyan,Kenneth S. Katz,Chunlei Liu,Zenith Maddipatla,Malheiro Aj,Kurt McDaniel,Michael Ovetsky,George R. Riley,George Zhou,J. Bradley Holmes,Brandi L. Kattman,Donna Maglott +21 more
TL;DR: ClinVar continues to make improvements to its search and retrieval functions.
Journal ArticleDOI
The BioPAX community standard for pathway data sharing
Emek Demir,Emek Demir,Michael P. Cary,Suzanne M. Paley,Ken Fukuda,Christian Lemer,Imre Vastrik,Guanming Wu,Peter D'Eustachio,Carl F. Schaefer,Joanne S. Luciano,Frank Schacherer,Irma Martínez-Flores,Zhenjun Hu,Verónica Jiménez-Jacinto,Geeta Joshi-Tope,Kumaran Kandasamy,Alejandra López-Fuentes,Huaiyu Mi,Elgar Pichler,Igor Rodchenkov,Andrea Splendiani,Andrea Splendiani,Sasha Tkachev,Jeremy Zucker,Gopal R. Gopinath,Harsha Rajasimha,Harsha Rajasimha,Ranjani Ramakrishnan,Imran Shah,Mustafa H Syed,Nadia Anwar,Özgün Babur,Özgün Babur,Michael L. Blinov,Erik Brauner,Dan Corwin,Sylva L. Donaldson,Frank Gibbons,Robert N. Goldberg,Peter Hornbeck,Augustin Luna,Peter Murray-Rust,Eric K. Neumann,Oliver Reubenacker,Matthias Samwald,Matthias Samwald,Martijn P. van Iersel,Sarala M. Wimalaratne,Keith Allen,Burk Braun,Michelle Whirl-Carrillo,Kei-Hoi Cheung,Kam D. Dahlquist,Andrew Finney,Marc Gillespie,Elizabeth M. Glass,Li Gong,Robin Haw,Michael Honig,Olivier Hubaut,David W. Kane,Shiva Krupa,Martina Kutmon,Julie Leonard,Debbie Marks,David Merberg,Victoria Petri,Alexander R. Pico,Dean Ravenscroft,Liya Ren,Nigam H. Shah,Margot Sunshine,Rebecca Tang,Ryan Whaley,Stan Letovksy,Kenneth H. Buetow,Andrey Rzhetsky,Vincent Schächter,Bruno S. Sobral,Ugur Dogrusoz,Shannon K. McWeeney,Mirit I. Aladjem,Ewan Birney,Julio Collado-Vides,Susumu Goto,Michael Hucka,Nicolas Le Novère,Natalia Maltsev,Akhilesh Pandey,Paul Thomas,Edgar Wingender,Peter D. Karp,Chris Sander,Gary D. Bader +94 more
TL;DR: Thousands of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases, and this large amount of pathway data in a computable form will support visualization, analysis and biological discovery.
Journal Article
XRel : A path-based approach to storage and retrieval of XML documents using relational databases
TL;DR: XRel enables us to store XML documents using a fixed relational schema without any information about DTDs and also to utilize indices such as the B 1 -tree and the R-tree supported by database management systems.
Journal ArticleDOI
Hadoop GIS: a high performance spatial data warehousing system over mapreduce
TL;DR: Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.