scispace - formally typeset
Search or ask a question

Showing papers by "Carole Goble published in 2011"


Journal ArticleDOI
TL;DR: RightField is an open source application that provides a mechanism for embedding ontology annotation support for Life Science data in Excel spreadsheets, enabling scientists to consistently annotate their data with 'semantic annotation by stealth'.
Abstract: Motivation: In the Life Sciences, guidelines, checklists and ontologies describing what metadata is required for the interpretation and reuse of experimental data are emerging. Data producers, however, may have little experience in the use of such standards and require tools to support this form of data annotation. Results: RightField is an open source application that provides a mechanism for embedding ontology annotation support for Life Science data in Excel spreadsheets. Individual cells, columns or rows can be restricted to particular ranges of allowed classes or instances from chosen ontologies. The RightField-enabled spreadsheet presents selected ontology terms to the users as a simple drop-down list, enabling scientists to consistently annotate their data. The result is ‘semantic annotation by stealth’, with an annotation process that is less error-prone, more efficient, and more consistent with community standards. Availability and implementation: RightField is open source under a BSD license and freely available from http://www.rightfield.org.uk

97 citations


Journal ArticleDOI
01 Jan 2011-Database
TL;DR: A community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore is proposed to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards.
Abstract: The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.

84 citations


Journal ArticleDOI
TL;DR: A community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore is proposed to provide a general overview of the database landscape to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards.
Abstract: The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.

57 citations


01 Nov 2011
TL;DR: This work describes preservation challenges of scientific workflows, and suggests a framework to discuss the reproducibility of workflow results, and describes curation techniques that can be used to avoid the ‘workflow decay’ that occurs when steps of the workflow are vulnerable to external change.
Abstract: Some of the shared digital artefacts of digital research are executable in the sense that they describe an automated process which generates results. One example is the computational scientific workflow which is used to conduct automated data analysis, predictions and validations. We describe preservation challenges of scientific workflows, and suggest a framework to discuss the reproducibility of workflow results. We describe curation techniques that can be used to avoid the ‘workflow decay’ that occurs when steps of the workflow are vulnerable to external change. Our approach makes extensive use of provenance information and also considers aggregate structures called Research Objects as a means for promoting workflow preservation.

55 citations


Proceedings ArticleDOI
15 Jun 2011
TL;DR: A novel and experimental approach to assessment is detailed by modelling the causal relationships between quality, trust, and utility dimensions through the construction of decision networks informed by provenance graphs.
Abstract: In science, quality is paramount. As scientists increasingly look to the Web to share and discover scientific data, there is a growing need to support the scientist in assessing the quality of that data. However, quality is an ambiguous and overloaded term. In order to support the scientific user in discovering useful data we have systematically examined the nature of "quality" by exploiting three, prevalent properties of scientific data sets: (1) that data quality is commonly defined objectively; (2) the provenance and lineage in its production has a well understood role; and (3) "fitness-for-use" is a definition of utility rather than quality or trust, where the quality and trust-worthiness of the data and the entities that produced that data inform its utility. Our study is presented in two stages. First we review existing information quality dimensions and detail an assessment-oriented classification. We introduce definitions for quality, trust and utility in terms of the entities required in their assessment; producer, provider, consumer, process, artifact and quality standard. Next we detail a novel and experimental approach to assessment by modelling the causal relationships between quality, trust, and utility dimensions through the construction of decision networks informed by provenance graphs. To ground and motivate our discussion throughout we draw on the European Bioinformatics Institute's Gene Ontology Annotations database. We present an initial demonstration of our approach with an example for ranking results from the Gene Ontology Annotation database using an emerging objective quality measure, the Gene Ontology Annotation Quality score.

53 citations


Book ChapterDOI
TL;DR: The SEEK is promoted as a data and model management tool that can be adapted to the specific needs of a particular systems biology project and the methods employed for lowering the barriers to adoption of standard formats are presented.
Abstract: Systems biology research is typically performed by multidisciplinary groups of scientists, often in large consortia and in distributed locations. The data generated in these projects tend to be heterogeneous and often involves high-throughput “omics” analyses. Models are developed iteratively from data generated in the projects and from the literature. Consequently, there is a growing requirement for exchanging experimental data, mathematical models, and scientific protocols between consortium members and a necessity to record and share the outcomes of experiments and the links between data and models. The overall output of a research consortium is also a valuable commodity in its own right. The research and associated data and models should eventually be available to the whole community for reuse and future analysis. The SEEK is an open-source, Web-based platform designed for the management and exchange of systems biology data and models. The SEEK was originally developed for the SysMO (systems biology of microorganisms) consortia, but the principles and objectives are applicable to any systems biology project. The SEEK provides an index of consortium resources and acts as gateway to other tools and services commonly used in the community. For example, the model simulation tool, JWS Online, has been integrated into the SEEK, and a plug-in to PubMed allows publications to be linked to supporting data and author profiles in the SEEK. The SEEK is a pragmatic solution to data management which encourages, but does not force, researchers to share and disseminate their data to community standard formats. It provides tools to assist with management and annotation as well as incentives and added value for following these recommendations. Data exchange and reuse rely on sufficient annotation, consistent metadata descriptions, and the use of standard exchange formats for models, data, and the experiments they are derived from. In this chapter, we present the SEEK platform, its functionalities, and the methods employed for lowering the barriers to adoption of standard formats. As the production of biological data continues to grow, in systems biology and in the life sciences in general, the need to record, manage, and exploit this wealth of information in the future is increasing. We promote the SEEK as a data and model management tool that can be adapted to the specific needs of a particular systems biology project.

52 citations


Proceedings ArticleDOI
04 Jul 2011
TL;DR: A follow-up work of the network analysis on my Experiment, an online scientific workflow repository, is presented, presenting Service Map, a network model established to study the best practice of service use and proposing two approaches over the Service Map: association rule mining and relation-aware, cross workflow searching.
Abstract: The wide use of Web services and scientific workflows has enabled bioinformaticians to reuse experimental resources and streamline data processing. This paper presents a follow-up work of our network analysis on my Experiment, an online scientific workflow repository. The motivation comes from two common questions raised by bio-scientists: 1) Given the services that I plan to use, what are other services usually used together with them? and 2) Given two or more services I plan to use together, can I find an operation chain to connect them based on others' past usage? Aiming to provide a system-level GPS-like support to answer the two questions, we present Service Map, a network model established to study the best practice of service use. Two approaches are proposed over the Service Map: association rule mining and relation-aware, cross workflow searching. Both approaches were validated using the real-life data obtained from the my Experiment repository.

29 citations



Journal ArticleDOI
TL;DR: An example workflow is provided-and a simple classification of user questions on the workflow's data products-to combine and interchange contextual metadata through a semantic data model and infrastructure to support enhanced semantic provenance applications.
Abstract: In this article, the authors provide an example workflow-and a simple classification of user questions on the workflow's data products-to combine and interchange contextual metadata through a semantic data model and infrastructure. They also analyze their approach's potential to support enhanced semantic provenance applications.

22 citations


Book ChapterDOI
26 Oct 2011
TL;DR: The exchange of “Research Objects” rather than articles proposes a technical solution; however the obstacles are mainly social ones that require the scientific community to rethink its current value systems for scholarship, data, methods and software.
Abstract: A “knowledge turn” is a cycle of a process by a professional, including the learning generated by the experience, deriving more good and leading to advance. The majority of scientific advances in the public domain result from collective efforts that depend on rapid exchange and effective reuse of results. We have powerful computational instruments, such as scientific workflows, coupled with widespread online information dissemination to accelerate knowledge cycles. However, turns between researchers continue to lag. In particular method obfuscation obstructs reproducibility. The exchange of “Research Objects” rather than articles proposes a technical solution; however the obstacles are mainly social ones that require the scientific community to rethink its current value systems for scholarship, data, methods and software.

22 citations


Journal ArticleDOI
TL;DR: This paper explores the notion of losslessness of OPM graphs relative to Taverna workflows, and shows that Taverna is a suitable model for representing plausible OPM-generating processes and augmenting OPM with two types of annotation makes it lossless with respect to taverna.

Proceedings ArticleDOI
05 Dec 2011
TL;DR: This work proposes a heuristic for locating substitutes that are able to replace unavailable service operations within workflows that exploits provenance traces collected from past executions of workflows to ensure that candidate substitutes perform tasks similar to those of the missing operations.
Abstract: Scientific workflows are increasingly gaining momentum as the new paradigm for modeling and enacting scientific experiments. The value of a workflow specification does not end once it is enacted. Indeed, workflow specifications encapsulate knowledge that documents scientific experiments, and are, therefore, worth preserving. Our experience suggests that workflow preservation is frequently hampered by the volatility of the constituent service operations when these operations are supplied by third-party providers. To deal with this issue, we propose a heuristic for locating substitutes that are able to replace unavailable service operations within workflows. The proposed method uses the data links connecting inputs and outputs of service operations in existing workflow specifications to locate operations with parameters compatible with those of the missing operations. Furthermore, it exploits provenance traces collected from past executions of workflows to ensure that candidate substitutes perform tasks similar to those of the missing operations. The effectiveness of the proposed method has been empirically assessed.

Proceedings ArticleDOI
05 Sep 2011
TL;DR: This work explores the emergence of new digital objects shared as part of the conduct and discourse of science and discusses their evolution.
Abstract: Scientific research is increasingly conducted digitally and online, and consequently we are seeing the emergence of new digital objects shared as part of the conduct and discourse of science These Scientific Social Objects are more than lumps of domain-specific data: they may comprise multiple components which can also be shared separately and independently, and some contain descriptions of scientific processes from which new objects will be generated Using the my Experiment social website as a case study we explore Scientific Social Objects and discuss their evolution