scispace - formally typeset
Search or ask a question
Proceedings Article

Functional and architectural requirements for metadata: supporting discovery and management of scientific data

TL;DR: System requirements that are essential for metadata supporting the discovery and management of scientific data, and a base-model with three chief principles: principle of least effort, infrastructure service, and portability are explored.
Abstract: The tremendous growth in digital data has led to an increase in metadata initiatives for different types of scientific data, as evident in Ball's survey (2009). Although individual communities have specific needs, there are shared goals that need to be recognized if systems are to effectively support data sharing within and across all domains. This paper considers this need, and explores systems requirements that are essential for metadata supporting the discovery and management of scientific data. The paper begins with an introduction and a review of selected research specific to metadata modeling in the sciences. Next, the paper's goals are stated, followed by the presentation of valuable systems requirements. The results include a base-model with three chief principles: principle of least effort, infrastructure service, and portability. The principles are intended to support "data user" tasks. Results also include a set of defined user tasks and functions, and applications scenarios.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
16 Mar 2017-PLOS ONE
TL;DR: This paper provides a rich, qualitative description of research data curation and use practices in IRs, as well as their structures, roles played, skills needed, contradictions and problems present, solutions sought, and workarounds applied.
Abstract: The importance of managing research data has been emphasized by the government, funding agencies, and scholarly communities. Increased access to research data increases the impact and efficiency of scientific activities and funding. Thus, many research institutions have established or plan to establish research data curation services as part of their Institutional Repositories (IRs). However, in order to design effective research data curation services in IRs, and to build active research data providers and user communities around those IRs, it is essential to study current data curation practices and provide rich descriptions of the sociotechnical factors and relationships shaping those practices. Based on 13 interviews with 15 IR staff members from 13 large research universities in the United States, this paper provides a rich, qualitative description of research data curation and use practices in IRs. In particular, the paper identifies data curation and use activities in IRs, as well as their structures, roles played, skills needed, contradictions and problems present, solutions sought, and workarounds applied. The paper can inform the development of best practice guides, infrastructure and service templates, as well as education in research data curation in Library and Information Science (LIS) schools.

60 citations

Journal ArticleDOI
TL;DR: A model of CMP research project tasks consisting of 10 task constructs is defined and a model of data quality perceptions by CMP scientists consisting of four data quality constructs are developed.
Abstract: To be effective and at the same time sustainable, a community data curation model needs to be aligned with the community's current data practices, including research project activities, data types, and perceptions of data quality. Based on a survey of members of the condensed matter physics (CMP) community gathered around the National High Magnetic Field Laboratory, a large national laboratory, this article defines a model of CMP research project tasks consisting of 10 task constructs. In addition, the study develops a model of data quality perceptions by CMP scientists consisting of four data quality constructs. The paper also discusses relationships among the data quality perceptions, project roles, and demographic characteristics of CMP scientists. The findings of the study can inform the design of a CMP data curation model that is aligned and harmonized with the community's research work structure and data practices.

28 citations


Additional excerpts

  • ...Qin et al. (2012) conceptualized the following 10 user tasks involving scientific data: discovery, identify, select, obtain, verify, analyze, manage, archive, publish, and cite....

    [...]

Proceedings ArticleDOI
08 Sep 2014
TL;DR: This work presents an approach for creating lightweight ontologies to describe research data, and illustrates the process with two ontologies, and uses them as configuration parameters for Dendro, a software platform for research data management currently being developed at the University of Porto.
Abstract: The description of data is a central task in research data management. Describing datasets requires deep knowledge of both the data and the data creation process to ensure adequate capture of their meaning and context. Metadata schemas are usually followed in resource description to enforce comprehensiveness and interoperability, but they can be hard to understand and adopt by researchers. We propose to address data description using ontologies, which can evolve easily, express semantics at different granularity levels and be directly used in system development. Considering that existing ontologies are often hard to use in a cross-domain research data management environment, we present an approach for creating lightweight ontologies to describe research data. We illustrate our process with two ontologies, and then use them as configuration parameters for Dendro, a software platform for research data management currently being developed at the University of Porto.

15 citations


Cites methods from "Functional and architectural requir..."

  • ...After validating a first set of descriptors, the researchers were asked to think about which information would be necessary to provide enough scientific context to allow others to verify, replicate, and reproduce the experiments from which the datasets were gathered [16]....

    [...]

Proceedings Article
Jian Qin1, Kai Li1
02 Sep 2013
TL;DR: Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements and that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.
Abstract: The one-covers-all approach in current metadata standards for scientific data has serious limitations in keeping up with the ever-growing data. This paper reports the findings from a survey to metadata standards in the scientific data domain and argues for the need for a metadata infrastructure. The survey collected 4400+ unique elements from 16 standards and categorized these elements into 9 categories. Findings from the data included that the highest counts of element occurred in the descriptive category and many of them overlapped with DC elements. This pattern also repeated in the elements co-occurred in different standards. A small number of semantically general elements appeared across the largest numbers of standards while the rest of the element co-occurrences formed a long tail with a wide range of specific semantics. The paper discussed implications of the findings in the context of metadata portability and infrastructure and pointed out that large, complex standards and widely varied naming practices are the major hurdles for building a metadata infrastructure.

15 citations

Journal ArticleDOI
TL;DR: The data management and related literatures are analyzed to develop a data identifier taxonomy that includes four categories (domain, entity types, activities, and quality dimensions).
Abstract: As the amount of research data management is growing, the use of identity metadata for discovering, linking, and citing research data is growing too. To support the awareness of different identifier systems and the comparison and selection of an identifier for a particular data management environment, there is need for a knowledge base. This article contributes to that goal and analyzes the data management and related literatures to develop a data identifier taxonomy. The taxonomy includes four categories (domain, entity types, activities, and quality dimensions). In addition, the article describes 14 identifiers referenced in the literature and analyzes them along the taxonomy.

11 citations

References
More filters
Journal ArticleDOI
TL;DR: The authors describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked data community as it moves forward.
Abstract: The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.

5,113 citations

Journal ArticleDOI
01 Feb 2013
TL;DR: This paper makes the case for a scientific data publication model on top of linked data and introduces the notion of Research Objects as first class citizens for sharing and publishing.
Abstract: Scientific data represents a significant portion of the linked open data cloud and scientists stand to benefit from the data fusion capability this will afford. Publishing linked data into the cloud, however, does not ensure the required reusability. Publishing has requirements of provenance, quality, credit, attribution and methods to provide the reproducibility that enables validation of results. In this paper we make the case for a scientific data publication model on top of linked data and introduce the notion of Research Objects as first class citizens for sharing and publishing. Highlights? We identify and characterise different aspects of reuse and reproducibility. ? We examine requirements for such reuse. ? We propose a scientific data publication model that layers on top of linked data publishing.

368 citations


"Functional and architectural requir..." refers background in this paper

  • ...Its potential for promoting interdisciplinary scientific discovery and data is still to be explored and deployed (Bechhofer et al., 2011)....

    [...]

01 Jan 2013

301 citations


"Functional and architectural requir..." refers background in this paper

  • ...Typical entities include agent or person/corporate body, event, place, and object, while “is-a,” “is-part-of,” and “contains” are examples of general relations (Lagoze & Hunter, 2001; IFLA, 2009a; Rust & Bide, 2000)....

    [...]

Proceedings Article
01 Jan 2001
TL;DR: This paper describes the latest version of the ABC metadata model, a metadata model with more logically grounded time and entity semantics that is able to build a metadata repository of RDF descriptions and a search interface which is capable of more sophisticated queries than less-expressive, object-centric metadata models will allow.
Abstract: This paper describes the latest version of the ABC metadata model. This model has been developed within the Harmony international digital library project to provide a common conceptual model to facilitate interoperability between metadata ontologies from different domains. This updated ABC model is the result of collaboration with the CIMI consortium whereby earlier versions of the ABC model were applied to metadata descriptions of complex objects provided by CIMI museums and libraries. The result is a metadata model with more logically grounded time and entity semantics. Based on this model we have been able to build a metadata repository of RDF descriptions and a search interface which is capable of more sophisticated queries than less-expressive, object-centric metadata models will allow.

222 citations

Related Papers (5)