scispace - formally typeset
Search or ask a question

Showing papers by "Chris F. Taylor published in 2012"


Journal ArticleDOI
TL;DR: The prerequisites for data commoning are described and an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision are presented.
Abstract: To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.

387 citations


Journal ArticleDOI
TL;DR: The utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium is demonstrated.
Abstract: Variability in the extent of the descriptions of data (‘metadata’) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.

12 citations


Book ChapterDOI
01 Jan 2012
TL;DR: This chapter introduces the problems experimentalists in all sectors face in utilizing third party data sets given the unhelpful wealth of formats and terminologies and the consequent mountain of technical frameworks needed to achieve data interoperability.
Abstract: This chapter introduces the problems experimentalists in all sectors face in utilizing third party data sets given the unhelpful wealth of formats and terminologies and the consequent mountain of technical frameworks needed to achieve data interoperability. We argue on the importance of a complementary set of open standards, the challenges we must overcome and the role the BioSharing effort is set to play. As an example of progress, we present the open source ISA software solution in action during the curation of the InnoMed PredTox data set, along with its growing active developer and user community, including academia and industrial sectors, such as The Novartis Institutes for BioMedical Research and Janssen Research & Development.

1 citations