scispace - formally typeset
Journal ArticleDOI

MetaStore: an adaptive metadata management framework for heterogeneous metadata models

Reads0
Chats0
TLDR
MetaStore is an adaptive metadata management framework based on a NoSQL database and an RDF triple store that automatically segregates the different categories of metadata in their corresponding data models to maximize the utilization of the data models supported by NoSQL databases.
Abstract
In this paper, we present MetaStore, a metadata management framework for scientific data repositories. Scientific experiments are generating a deluge of data, and the handling of associated metadata is critical, as it enables discovering, analyzing, reusing, and sharing of scientific data. Moreover, metadata produced by scientific experiments are heterogeneous and subject to frequent changes, demanding a flexible data model. Existing metadata management systems provide a broad range of features for handling scientific metadata. However, the principal limitation of these systems is their architecture design that is restricted towards either a single or at the most a few standard metadata models. Support for handling different types of metadata models, i.e., administrative, descriptive, structural, and provenance metadata, and including community-specific metadata models is not possible with these systems. To address this challenge, we present MetaStore, an adaptive metadata management framework based on a NoSQL database and an RDF triple store. MetaStore provides a set of core functionalities to handle heterogeneous metadata models by automatically generating the necessary software code (services) and on-the-fly extends the functionality of the framework. To handle dynamic metadata and to control metadata quality, MetaStore also provides an extended set of functionalities such as enabling annotation of images and text by integrating the Web Annotation Data Model, allowing communities to define discipline-specific vocabularies using Simple Knowledge Organization System, and providing advanced search and analytical capabilities by integrating the ElasticSearch. To maximize the utilization of the data models supported by NoSQL databases, MetaStore automatically segregates the different categories of metadata in their corresponding data models. Complex provenance graphs and dynamic metadata are modeled and stored in an RDF triple store, whereas the static metadata is stored in a NoSQL database. For enabling large-scale harvesting (sharing) of metadata using the METS standard over the OAI-PMH protocol, MetaStore is designed OAI-compliant. Finally, to show the practical usability of the MetaStore framework and that the requirements from the research communities have been realized, we describe our experience in the adoption of MetaStore for three communities.

read more

Citations
More filters
Posted Content

Scientific Data Management in the Coming Decade

TL;DR: Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.

SKOS Core: Simple Knowledge Organisation for the Web

TL;DR: The main purpose of this paper is to provide an initial basis for establishing clear recommendations for the use of SKOS Core and DCMI Metadata Terms in combination.

EUDAT: A New Cross-Disciplinary Data Infrastructure For Science

TL;DR: The EUDAT project is a pan-European data initiative that aims to build a sustainable cross-disciplinary and cross-national data infrastructure that provides a set of shared services for accessing and preserving research data.
Proceedings ArticleDOI

OCR-D: An end-to-end open source OCR framework for historical printed documents

TL;DR: The background of OCR-D is introduced, the main challenges and shortcomings in the availability of open tools and resources for OCR of historical printed documents are introduced and the various software modules and related components that are being made available through O CR-D are discussed.
Dissertation

Methodology to sustain common information spaces for research collaborations

Luca Trani
TL;DR: A methodology to sustain CIS and a conceptual framework that has its foundations on a set of agreed Core Concepts forming a Canonical Core (CC) are introduced that leverages and promotes reuse of existing standards: EPOS-DCAT-AP.
References
More filters

Web Services Business Process Execution Language Version 2.0

TL;DR: The continuity of the basic conceptual model between Abstract and Executable Processes in WSBPEL makes it possible to export and import the public aspects embodied in Abstract Processes as process or role templates while maintaining the intent and structure of the observable behavior.
Proceedings Article

Relational Databases for Querying XML Documents: Limitations and Opportunities

TL;DR: It turns out that the relational approach can handle most (but not all) of the semantics of semi-structured queries over XML data, but is likely to be effective only in some cases.
Related Papers (5)