scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Metadata Quality in Digital Repositories: A Survey of the Current State of the Art

Jung-ran Park1
03 Apr 2009-Cataloging & Classification Quarterly (Taylor & Francis Group)-Vol. 47, pp 213-228
TL;DR: Results of the study indicate a pressing need for the building of a common data model that is interoperable across digital repositories.
Abstract: This study presents the current state of research and practice on metadata quality through focus on the functional perspective on metadata quality, measurement, and evaluation criteria coupled with mechanisms for improving metadata quality. Quality metadata reflect the degree to which the metadata in question perform the core bibliographic functions of discovery, use, provenance, currency, authentication, and administration. The functional perspective is closely tied to the criteria and measurements used for assessing metadata quality. Accuracy, completeness, and consistency are the most common criteria used in measuring metadata quality in the literature. Guidelines embedded within a Web form or template perform a valuable function in improving the quality of the metadata. Results of the study indicate a pressing need for the building of a common data model that is interoperable across digital repositories.
Citations
More filters
Journal ArticleDOI
TL;DR: A set of metrics and a framework for assessing data standard quality are developed and shown to be useful and effective and to help improve the quality of financial data and the efficiency of the data supply chain in a networked business environment.
Abstract: The primary purpose of data standards is to improve the interoperability of data in an increasingly networked environment. Given the high cost of developing data standards, it is desirable to assess their quality. We develop a set of metrics and a framework for assessing data standard quality. The metrics include completeness, relevancy, and a combined measure. Standard quality can also be indirectly measured by assessing interoperability of data instances. We evaluate the framework on a data standard for financial reporting in United States, the Generally Accepted Accounting Principles (GAAP) Taxonomy encoded in eXtensible Business Reporting Language (XBRL), and the financial statements created using the standard by public companies. The results show that the data standard quality framework is useful and effective. Our analysis also reveals quality issues of the US GAAP XBRL taxonomy and provides useful feedback to taxonomy users. The Securities and Exchange Commission has mandated that all publicly listed companies must submit their filings using XBRL. Our findings are timely and have practical implications that will ultimately help improve the quality of financial data and the efficiency of the data supply chain in a networked business environment.

57 citations

Journal ArticleDOI
TL;DR: An overview of the frameworks developed to characterize such a multi-faceted concept is presented and the most common quality-related problems affecting metadata both during the creation and the aggregation phase are discussed.
Abstract: In this work, we elaborate on the meaning of metadata quality by surveying efforts and experiences matured in the digital library domain. In particular, an overview of the frameworks developed to characterize such a multi-faceted concept is presented. Moreover, the most common quality-related problems affecting metadata both during the creation and the aggregation phase are discussed together with the approaches, technologies and tools developed to mitigate them. This survey on digital library developments is expected to contribute to the ongoing discussion on data and metadata quality occurring in the emerging yet more general framework of data infrastructures.

48 citations

Journal ArticleDOI
TL;DR: Adopting an ontology-driven approach to case finding could improve the quality of disease registers and of research based on routine data and offer considerable advantages over using limited datasets to define cases.
Abstract: Background The burden of chronic disease is increasing, and research and quality improvement will be less effective if case finding strategies are suboptimal. Objective To describe an ontology-driven approach to case finding in chronic disease and how this approach can be used to create a data dictionary and make the codes used in case finding transparent. Method A five-step process: (1) identifying a reference coding system or terminology; (2) using an ontology-driven approach to identify cases; (3) developing metadata that can be used to identify the extracted data; (4) mapping the extracted data to the reference terminology; and (5) creating the data dictionary. Results Hypertension is presented as an exemplar. A patient with hypertension can be represented by a range of codes including diagnostic, history and administrative. Metadata can link the coding system and data extraction queries to the correct data mapping and translation tool, which then maps it to the equivalent code in the reference terminology. The code extracted, the term, its domain and subdomain, and the name of the data extraction query can then be automatically grouped and published online as a readily searchable data dictionary. An exemplar online is: www.clininf.eu/qickd-data-dictionary.html Conclusion Adopting an ontology-driven approach to case finding could improve the quality of disease registers and of research based on routine data. It would offer considerable advantages over using limited datasets to define cases. This approach should be considered by those involved in research and quality improvement projects which utilise routine data.

48 citations

Journal ArticleDOI
TL;DR: Overall, the metadata the authors analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements, and significant aberrancies that are likely to impede search and secondary use of the associated datasets.
Abstract: We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample—a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples—a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4 M sample metadata records in the two repositories are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the metadata. Most metadata field names and their values are not standardized or controlled. Even simple binary or numeric fields are often populated with inadequate values of different data types. By clustering metadata field names, we discovered there are often many distinct ways to represent the same aspect of a sample. Overall, the metadata we analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The significant aberrancies that we found in the metadata are likely to impede search and secondary use of the associated datasets.

44 citations


Cites background from "Metadata Quality in Digital Reposit..."

  • ...[2,3] specified several high-level principles for the creation of good-quality metadata....

    [...]

References
More filters
Book Chapter
01 Jan 2004
TL;DR: These compounds possess prostaglandin-like activity and thus are useful in the treatment of mammals, where prostag landins are indicated.
Abstract: 9 alpha ,15 alpha -Dihydroxy-11 alpha ,12 alpha -difluoromethylene-prosta-13-trans-enoic acid and 5-cis-13-transdienoic acid, 9-keto-11 alpha ,12 alpha -difluoromethylene-15 alpha -hydroxyprosta-13-trans-enoic acid and 5-cis-13-trans-dienoic acid, the 11 beta ,12 beta -difluoromethylene-12-epi compounds, as well as the pharmaceutically acceptable, non-toxic esters and salts thereof, process for the production of same and intermediates obtained by this process. These compounds possess prostaglandin-like activity and thus are useful in the treatment of mammals, where prostaglandins are indicated.

225 citations

ReportDOI
01 Jan 2000
TL;DR: The SHOE language is presented, which the authors feel has many of the features necessary to enable a semantic web, and an existing set of tools that make it easy to use the language are described.
Abstract: XML will have a profound impact on the way data is exchanged on the Internet. An important feature of thislanguage is the separation of content from presentation, which makes it easier to select and/or reformat the data.However, due to the likelihood of numerous industry and domain specific DTDs, those who wish to integrateinformation will still be faced with the problem of semantic interoperability. In this paper we discuss why thisproblem is not solved by XML, and then discuss why the Resource Description Framework is only a partial solution.We then present the SHOE language, which we feel has many of the features necessary to enable a semantic web,and describe an existing set of tools that make it easy to use the language.Jeff Heflin, University of MarylandJames Hendler, University of Maryland

166 citations

Proceedings ArticleDOI
10 May 2005
TL;DR: The first step towards this framework for automatic metadata generation is the definition of an Application Programmer Interface (API), which is called the Simple Indexing Interface (SII), and the second step is the defined a framework for implementation of the SII.
Abstract: In this paper, we focus on the development of a framework for automatic metadata generation. The first step towards this framework is the definition of an Application Programmer Interface (API), which we call the Simple Indexing Interface (SII). The second step is the definition of a framework for implementation of the SII. Both steps are presented in some detail in this paper. We also report on empirical evaluation of the metadata that the SII and supporting framework generated in a real-life context.

163 citations

Proceedings Article
28 Sep 2003
TL;DR: This paper challenges some of the assumptions underlying the metadata creation process in the context of two communities of practice, based around learning object repositories and open e-Print archives.
Abstract: This paper challenges some of the assumptions underlying the metadata creation process in the context of two communities of practice, based around learning object repositories and open e-Print archives. The importance of quality assurance for metadata creation is discussed and evidence from the literature, from the practical experiences of repositories and archives, and from related research and practices within other communities is presented. Issues for debate and further investigation are identified, formulated as a series of key research questions. Although there is much work to be done in the area of quality assurance for metadata creation, this paper represents an important first step towards a fuller understanding of the subject.

118 citations