Metadata Quality in Digital Repositories: A Survey of the Current State of the Art

doi:10.1080/01639370902737240

Home
/
Papers
/
Metadata Quality in Digital Repositories: A Survey of the Current State of the Art

Journal Article•DOI•

Metadata Quality in Digital Repositories: A Survey of the Current State of the Art

Jung-ran Park¹•Institutions (1)

Drexel University¹

03 Apr 2009-Cataloging & Classification Quarterly (Taylor & Francis Group)-Vol. 47, pp 213-228

TL;DR: Results of the study indicate a pressing need for the building of a common data model that is interoperable across digital repositories.

read less

Abstract: This study presents the current state of research and practice on metadata quality through focus on the functional perspective on metadata quality, measurement, and evaluation criteria coupled with mechanisms for improving metadata quality. Quality metadata reflect the degree to which the metadata in question perform the core bibliographic functions of discovery, use, provenance, currency, authentication, and administration. The functional perspective is closely tied to the criteria and measurements used for assessing metadata quality. Accuracy, completeness, and consistency are the most common criteria used in measuring metadata quality in the literature. Guidelines embedded within a Web form or template perform a valuable function in improving the quality of the metadata. Results of the study indicate a pressing need for the building of a common data model that is interoperable across digital repositories.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

International federation of library associations and institutions

[...]

Frank Kurt Cylke

31 Jan 1979

220 citations

Journal Article•DOI•

Quality of data standards: framework and illustration using XBRL taxonomy and instances

[...]

Hongwei Zhu¹, Harris Wu¹•Institutions (1)

Old Dominion University¹

27 Apr 2011-Electronic Markets

TL;DR: A set of metrics and a framework for assessing data standard quality are developed and shown to be useful and effective and to help improve the quality of financial data and the efficiency of the data supply chain in a networked business environment.

...read moreread less

Abstract: The primary purpose of data standards is to improve the interoperability of data in an increasingly networked environment. Given the high cost of developing data standards, it is desirable to assess their quality. We develop a set of metrics and a framework for assessing data standard quality. The metrics include completeness, relevancy, and a combined measure. Standard quality can also be indirectly measured by assessing interoperability of data instances. We evaluate the framework on a data standard for financial reporting in United States, the Generally Accepted Accounting Principles (GAAP) Taxonomy encoded in eXtensible Business Reporting Language (XBRL), and the financial statements created using the standard by public companies. The results show that the data standard quality framework is useful and effective. Our analysis also reveals quality issues of the US GAAP XBRL taxonomy and provides useful feedback to taxonomy users. The Securities and Exchange Commission has mandated that all publicly listed companies must submit their filings using XBRL. Our findings are timely and have practical implications that will ultimately help improve the quality of financial data and the efficiency of the data supply chain in a networked business environment.

...read moreread less

57 citations

Journal Article•DOI•

Dealing with metadata quality: The legacy of digital library efforts

[...]

Alice Tani¹, Leonardo Candela¹, Donatella Castelli¹•Institutions (1)

Istituto di Scienza e Tecnologie dell'Informazione¹

01 Nov 2013-Information Processing and Management

TL;DR: An overview of the frameworks developed to characterize such a multi-faceted concept is presented and the most common quality-related problems affecting metadata both during the creation and the aggregation phase are discussed.

...read moreread less

Abstract: In this work, we elaborate on the meaning of metadata quality by surveying efforts and experiences matured in the digital library domain. In particular, an overview of the frameworks developed to characterize such a multi-faceted concept is presented. Moreover, the most common quality-related problems affecting metadata both during the creation and the aggregation phase are discussed together with the approaches, technologies and tools developed to mitigate them. This survey on digital library developments is expected to contribute to the ongoing discussion on data and metadata quality occurring in the emerging yet more general framework of data infrastructures.

...read moreread less

48 citations

Journal Article•DOI•

Defining datasets and creating data dictionaries for quality improvement and research in chronic disease using routinely collected data: an ontology-driven approach.

[...]

Simon de Lusignan¹, Siaw-Teng Liaw, Georgios Michalakidis, Simon Jones•Institutions (1)

University of Surrey¹

01 Jan 2011-Journal of innovation in health informatics

TL;DR: Adopting an ontology-driven approach to case finding could improve the quality of disease registers and of research based on routine data and offer considerable advantages over using limited datasets to define cases.

...read moreread less

Abstract: Background The burden of chronic disease is increasing, and research and quality improvement will be less effective if case finding strategies are suboptimal. Objective To describe an ontology-driven approach to case finding in chronic disease and how this approach can be used to create a data dictionary and make the codes used in case finding transparent. Method A five-step process: (1) identifying a reference coding system or terminology; (2) using an ontology-driven approach to identify cases; (3) developing metadata that can be used to identify the extracted data; (4) mapping the extracted data to the reference terminology; and (5) creating the data dictionary. Results Hypertension is presented as an exemplar. A patient with hypertension can be represented by a range of codes including diagnostic, history and administrative. Metadata can link the coding system and data extraction queries to the correct data mapping and translation tool, which then maps it to the equivalent code in the reference terminology. The code extracted, the term, its domain and subdomain, and the name of the data extraction query can then be automatically grouped and published online as a readily searchable data dictionary. An exemplar online is: www.clininf.eu/qickd-data-dictionary.html Conclusion Adopting an ontology-driven approach to case finding could improve the quality of disease registers and of research based on routine data. It would offer considerable advantages over using limited datasets to define cases. This approach should be considered by those involved in research and quality improvement projects which utilise routine data.

...read moreread less

48 citations

Journal Article•DOI•

The variable quality of metadata about biological samples used in biomedical experiments.

[...]

Rafael S. Gonçalves¹, Mark A. Musen¹•Institutions (1)

Stanford University¹

19 Feb 2019-Scientific Data

TL;DR: Overall, the metadata the authors analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements, and significant aberrancies that are likely to impede search and secondary use of the associated datasets.

...read moreread less

Abstract: We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample—a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples—a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4 M sample metadata records in the two repositories are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the metadata. Most metadata field names and their values are not standardized or controlled. Even simple binary or numeric fields are often populated with inadequate values of different data types. By clustering metadata field names, we discovered there are often many distinct ways to represent the same aspect of a sample. Overall, the metadata we analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The significant aberrancies that we found in the metadata are likely to impede search and secondary use of the associated datasets.

...read moreread less

44 citations

Cites background from "Metadata Quality in Digital Reposit..."

...[2,3] specified several high-level principles for the creation of good-quality metadata....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Collapse

References

PDF

Open Access

More filters

Book Chapter•

The Continuum of Metadata Quality: Defining, Expressing, Exploiting

[...]

Thomas R. Bruce, Diane I. Hillmann

01 Jan 2004

TL;DR: These compounds possess prostaglandin-like activity and thus are useful in the treatment of mammals, where prostag landins are indicated.

...read moreread less

Abstract: 9 alpha ,15 alpha -Dihydroxy-11 alpha ,12 alpha -difluoromethylene-prosta-13-trans-enoic acid and 5-cis-13-transdienoic acid, 9-keto-11 alpha ,12 alpha -difluoromethylene-15 alpha -hydroxyprosta-13-trans-enoic acid and 5-cis-13-trans-dienoic acid, the 11 beta ,12 beta -difluoromethylene-12-epi compounds, as well as the pharmaceutically acceptable, non-toxic esters and salts thereof, process for the production of same and intermediates obtained by this process. These compounds possess prostaglandin-like activity and thus are useful in the treatment of mammals, where prostaglandins are indicated.

...read moreread less

225 citations

Book Chapter•DOI•

International federation of library associations and institutions

[...]

Frank Kurt Cylke

31 Jan 1979

220 citations

Report•DOI•

Semantic Interoperability on the Web

[...]

Jeff Heflin, James A. Hendler

01 Jan 2000

TL;DR: The SHOE language is presented, which the authors feel has many of the features necessary to enable a semantic web, and an existing set of tools that make it easy to use the language are described.

...read moreread less

Abstract: XML will have a profound impact on the way data is exchanged on the Internet. An important feature of thislanguage is the separation of content from presentation, which makes it easier to select and/or reformat the data.However, due to the likelihood of numerous industry and domain specific DTDs, those who wish to integrateinformation will still be faced with the problem of semantic interoperability. In this paper we discuss why thisproblem is not solved by XML, and then discuss why the Resource Description Framework is only a partial solution.We then present the SHOE language, which we feel has many of the features necessary to enable a semantic web,and describe an existing set of tools that make it easy to use the language.Jeff Heflin, University of MarylandJames Hendler, University of Maryland

...read moreread less

166 citations

Proceedings Article•DOI•

Automating metadata generation: the simple indexing interface

[...]

Kris Cardinaels¹, Michael Meire¹, Erik Duval¹•Institutions (1)

Katholieke Universiteit Leuven¹

10 May 2005

TL;DR: The first step towards this framework for automatic metadata generation is the definition of an Application Programmer Interface (API), which is called the Simple Indexing Interface (SII), and the second step is the defined a framework for implementation of the SII.

...read moreread less

Abstract: In this paper, we focus on the development of a framework for automatic metadata generation. The first step towards this framework is the definition of an Application Programmer Interface (API), which we call the Simple Indexing Interface (SII). The second step is the definition of a framework for implementation of the SII. Both steps are presented in some detail in this paper. We also report on empirical evaluation of the metadata that the SII and supporting framework generated in a real-life context.

...read moreread less

163 citations

Proceedings Article•

Building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice

[...]

J. Barton¹, Sarah Currier¹, Jessie M.N. Hey²•Institutions (2)

University of Strathclyde¹, University of Southampton²

28 Sep 2003

TL;DR: This paper challenges some of the assumptions underlying the metadata creation process in the context of two communities of practice, based around learning object repositories and open e-Print archives.

...read moreread less

Abstract: This paper challenges some of the assumptions underlying the metadata creation process in the context of two communities of practice, based around learning object repositories and open e-Print archives. The importance of quality assurance for metadata creation is discussed and evidence from the literature, from the practical experiences of repositories and archives, and from related research and practices within other communities is presented. Issues for debate and further investigation are identified, formulated as a series of key research questions. Although there is much work to be done in the area of quality assurance for metadata creation, this paper represents an important first step towards a fuller understanding of the subject.

...read moreread less

118 citations