Showing papers by "Carole Goble published in 2014"

PDF

Open Access

Journal Article•DOI•

Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications

[...]

Timothy Clark¹, Timothy Clark², Paolo Ciccarese², Carole Goble¹•Institutions (2)

University of Manchester¹, Harvard University²

04 Jul 2014-Journal of Biomedical Semantics

TL;DR: The micropublications semantic model of scientific argument and evidence as mentioned in this paper is a semantic model for scientific publications, which is based on the OWL 2 vocabulary for OWL and can be used to express a broad spectrum of representational complexity from minimal to maximal forms.

...read moreread less

Abstract: Scientific publications are documentary representations of defeasible arguments, supported by data and repeatable methods. They are the essential mediating artifacts in the ecosystem of scientific communications. The institutional “goal” of science is publishing results. The linear document publication format, dating from 1665, has survived transition to the Web. Intractable publication volumes; the difficulty of verifying evidence; and observed problems in evidence and citation chains suggest a need for a web-friendly and machine-tractable model of scientific publications. This model should support: digital summarization, evidence examination, challenge, verification and remix, and incremental adoption. Such a model must be capable of expressing a broad spectrum of representational complexity, ranging from minimal to maximal forms. The micropublications semantic model of scientific argument and evidence provides these features. Micropublications support natural language statements; data; methods and materials specifications; discussion and commentary; challenge and disagreement; as well as allowing many kinds of statement formalization. The minimal form of a micropublication is a statement with its attribution. The maximal form is a statement with its complete supporting argument, consisting of all relevant evidence, interpretations, discussion and challenges brought forward in support of or opposition to it. Micropublications may be formalized and serialized in multiple ways, including in RDF. They may be added to publications as stand-off metadata. An OWL 2 vocabulary for micropublications is available at http://purl.org/mp . A discussion of this vocabulary along with RDF examples from the case studies, appears as OWL Vocabulary and RDF Examples in Additional file1. Micropublications, because they model evidence and allow qualified, nuanced assertions, can play essential roles in the scientific communications ecosystem in places where simpler, formalized and purely statement-based models, such as the nanopublications model, will not be sufficient. At the same time they will add significant value to, and are intentionally compatible with, statement-based formalizations. We suggest that micropublications, generated by useful software tools supporting such activities as writing, editing, reviewing, and discussion, will be of great value in improving the quality and tractability of biomedical communications.

...read moreread less

109 citations

Journal Article•DOI•

Common motifs in scientific workflows: An empirical analysis

[...]

Daniel Garijo¹, Pinar Alper², Khalid Belhajjame², Oscar Corcho¹, Yolanda Gil³, Carole Goble² - Show less +2 more•Institutions (3)

Technical University of Madrid¹, University of Manchester², University of Southern California³

01 Jul 2014-Future Generation Computer Systems

TL;DR: A manual analysis performed over a set of real-world scientific workflows from Taverna, Wings, Galaxy and Vistrails has resulted in set of scientific workflow motifs that are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions.

...read moreread less

79 citations

Journal Article•DOI•

Better Software, Better Research

[...]

Carole Goble¹•Institutions (1)

University of Manchester¹

28 Aug 2014-IEEE Internet Computing

TL;DR: This article raises a number of points concerning quality, code review, and openness; development practices and training in scientific computing; career recognition of research software engineers; sustainability models for funding scientific software.

...read moreread less

Abstract: Modern scientific research isn't possible without software. However, its vital role is often overlooked by funders, universities, assessment committees, and even the research community itself. This is a serious issue that needs urgent attention. This article raises a number of points concerning quality, code review, and openness; development practices and training in scientific computing; career recognition of research software engineers; and sustainability models for funding scientific software. We must get software recognized to be the first-class experimental scientific instrument that it is and get "better software for better research."

...read moreread less

69 citations

Journal Article•DOI•

Applying linked data approaches to pharmacology: Architectural decisions and implementation

[...]

Alasdair J. G. Gray¹, Paul Groth², Antonis Loizou², Sune Askjaer³, Christian Y. Brenninkmeijer¹, Kees Burger, Christine Chichester, Chris T. Evelo⁴, Carole Goble¹, Lee Harland, Steve Pettifer¹, Mark Thompson, Andra Waagmeester⁴, Antony J. Williams⁵ - Show less +10 more•Institutions (5)

University of Manchester¹, VU University Amsterdam², Lundbeck³, Maastricht University⁴, Royal Society of Chemistry⁵

01 Apr 2014-Social Work

TL;DR: The OPS platform as discussed by the authors is a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications, which is based on a collection of prioritised drug discovery business questions created as part of the Open PHACTS project.

...read moreread less

Abstract: The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. We describe the architecture of the platform focusing on seven design decisions that drove its development with the aim of informing others developing similar software in this or other domains. The utility of the platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.An alpha version of the OPS platform is currently available to the Open PHACTS consortium and a first public release will be made in late 2012, see http://www.openphacts.org/ for details.

...read moreread less

63 citations

Journal Article•DOI•

API-centric Linked Data integration

[...]

Paul Groth¹, Antonis Loizou¹, Alasdair J. G. Gray², Carole Goble³, Lee Harland, Steve Pettifer³ - Show less +2 more•Institutions (3)

VU University Amsterdam¹, Heriot-Watt University², University of Manchester³

01 Dec 2014-Journal of Web Semantics

TL;DR: The Open PHACTS Discovery Platform as discussed by the authors leverages Linked Data to provide integrated access to pharmacology databases and has been accessed over 13.5 million times and has multiple applications that integrate with it.

...read moreread less

51 citations

Journal Article•DOI•

Distilling structure in Taverna scientific workflows: a refactoring approach.

[...]

Sarah Cohen-Boulakia¹, Sarah Cohen-Boulakia², Jiuqiang Chen³, Jiuqiang Chen¹, Jiuqiang Chen², Paolo Missier⁴, Carole Goble⁵, Alan Williams⁵, Christine Froidevaux², Christine Froidevaux¹ - Show less +6 more•Institutions (5)

University of Paris-Sud¹, French Institute for Research in Computer Science and Automation², Lanzhou University³, University of Newcastle⁴, University of Manchester⁵

10 Jan 2014-BMC Bioinformatics

TL;DR: This work designs and implements an approach to improving workflow structure by way of rewriting preserving workflow semantics and introduces a distilling algorithm that takes in a workflow and produces a distilled semantically-equivalent workflow.

...read moreread less

Abstract: Scientific workflows management systems are increasingly used to specify and manage bioinformatics experiments. Their programming model appeals to bioinformaticians, who can use them to easily specify complex data processing pipelines. Such a model is underpinned by a graph structure, where nodes represent bioinformatics tasks and links represent the dataflow. The complexity of such graph structures is increasing over time, with possible impacts on scientific workflows reuse. In this work, we propose effective methods for workflow design, with a focus on the Taverna model. We argue that one of the contributing factors for the difficulties in reuse is the presence of "anti-patterns", a term broadly used in program design, to indicate the use of idiomatic forms that lead to over-complicated design. The main contribution of this work is a method for automatically detecting such anti-patterns, and replacing them with different patterns which result in a reduction in the workflow's overall structural complexity. Rewriting workflows in this way will be beneficial both in terms of user experience (easier design and maintenance), and in terms of operational efficiency (easier to manage, and sometimes to exploit the latent parallelism amongst the tasks). We have conducted a thorough study of the workflows structures available in Taverna, with the aim of finding out workflow fragments whose structure could be made simpler without altering the workflow semantics. We provide four contributions. Firstly, we identify a set of anti-patterns that contribute to the structural workflow complexity. Secondly, we design a series of refactoring transformations to replace each anti-pattern by a new semantically-equivalent pattern with less redundancy and simplified structure. Thirdly, we introduce a distilling algorithm that takes in a workflow and produces a distilled semantically-equivalent workflow. Lastly, we provide an implementation of our refactoring approach that we evaluate on both the public Taverna workflows and on a private collection of workflows from the BioVel project. We have designed and implemented an approach to improving workflow structure by way of rewriting preserving workflow semantics. Future work includes considering our refactoring approach during the phase of workflow design and proposing guidelines for designing distilled workflows.

...read moreread less

46 citations

Journal Article•DOI•

A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control

[...]

Cherian Mathew¹, Anton Güntsch¹, Matthias Obst², Saverio Vicario, Robert Haines³, Alan Williams³, Yde de Jong⁴, Carole Goble³ - Show less +4 more•Institutions (4)

Free University of Berlin¹, University of Gothenburg², University of Manchester³, University of Amsterdam⁴

12 Nov 2014-Biodiversity Data Journal

TL;DR: A Taverna-based Data Refinement Workflow is designed and implemented which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures.

...read moreread less

Abstract: The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users.

...read moreread less

39 citations

Journal Article•DOI•

Structuring research methods and data with the research object model: genomics workflows as a case study

[...]

Kristina Hettne¹, Harish Dharuri¹, Jun Zhao², Katherine Wolstencroft³, Katherine Wolstencroft⁴, Khalid Belhajjame⁴, Stian Soiland-Reyes⁴, Eleni Mina¹, Mark Thompson¹, Don Cruickshank², Lourdes Verdes-Montenegro⁵, Julián Garrido⁵, David De Roure², Oscar Corcho⁶, Graham Klyne², Reinout van Schouwen¹, Peter A C 't Hoen¹, Sean Bechhofer⁴, Carole Goble⁴, Marco Roos¹ - Show less +16 more•Institutions (6)

Leiden University Medical Center¹, University of Oxford², Leiden University³, University of Manchester⁴, Spanish National Research Council⁵, Technical University of Madrid⁶

18 Sep 2014-Journal of Biomedical Semantics

TL;DR: In this paper, a workflow-centric Research Object (RO) model is proposed to aggregate and annotate the resources used in a bioinformatics experiment, allowing to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data.

...read moreread less

Abstract: Background: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. Results: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as “which particular data was input to a particular workflow to test a particular hypothesis?” ,a nd“which particular conclusions were drawn from a particular workflow?”. Conclusions: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. Availability: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro

...read moreread less

29 citations

Dataset•DOI•

UK Research Software Survey 2014

[...]

Simon Hettrick, Mario Antonioletti, Leslie Carr, Neil Chue Hong, Stephen Crouch, David De Roure, Iain Emsley, Carole Goble, Alexander Hay, Devasena Inupakutika, Michael F. Jackson, Aleksandra Nenadic, Tim Parkinson, Mark Parsons, Aleksandra Pawlik, Giacomo Peru, Arno Proeme, John Robinson, Shoaib Sufi - Show less +15 more

04 Dec 2014

TL;DR: This spreadsheet contains the anonymised data collected as part of a survey of UK researchers in their use of research software, which received 417 responses, a statistically significant number of responses that can be used to represent the views of people in research-intensive universities in the UK.

...read moreread less

Abstract: This spreadsheet contains the anonymised data collected as part of a survey of UK researchers in their use of research software. We asked people specifically about “research software” which we defined as: “Software that is used to generate, process or analyse results that you intend to appear in a publication (either in a journal, conference paper, monograph, book or thesis). Research software can be anything from a few lines of code written by yourself, to a professionally developed software package. Software that does not generate, process or analyse results - such as word processing software, or the use of a web search - does not count as ‘research software’ for the purposes of this survey.” We contacted 1,000 randomly selected researchers at each of 15 Russell Group universities. From the 15,000 invitations to complete the survey, we received 417 responses – a rate of 3% which is fairly normal for a blind survey. We used Google Forms to collect responses. The responses have good representation from across the disciplines, seniorities and genders. This is a statistically significant number of responses that can be used to represent the views of people in research-intensive universities in the UK. An overview of the data is available on the worksheet "Summary data". Responses to questions are ordered by unique respondent ID. Please read the "README" worksheet for additional information about the collection and processing of this data. This survey data is licensed under a Creative Commons by Attribution licence. Copyright resides with The University of Edinburgh on behalf of the Software Sustainability Institute. Please cite as: APA Hettrick. S. J., et al. (2014). UK Research Software Survey 2014 [Data set]. doi:10.5281/zenodo.14809 Chicago S.J. Hettrick et al, UK Research Software Survey 2014 (accessed December 4, 2014), 10.5281/zenodo.14809. MLA Hettrick S.J., et al. “UK Research Software Survey 2014” ZENODO, 2014. Web. 4 December 2014. .

...read moreread less

28 citations

Journal Article•

The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

[...]

Khalid Belhajjame, Jun Zhao, Daniel Garijo, Kristina Hettne, Raul Palma, Oscar Corcho, Jose Manuel Gomez-Perez, Sean Bechhofer, Graham Klyne, Carole Goble - Show less +6 more

04 Feb 2014-Journal of Web Semantics

TL;DR: Research Objects as discussed by the authors are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results and provide a single entry point to access information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc.

...read moreread less

25 citations

Book Chapter•DOI•

LabelFlow: Exploiting Workflow Provenance to Surface Scientific Data Provenance

[...]

Pinar Alper¹, Khalid Belhajjame², Carole Goble¹, Pinar Karagoz³•Institutions (3)

University of Manchester¹, Paris Dauphine University², Middle East Technical University³

09 Jun 2014

TL;DR: It is shown that by basic mark-up of the data processing within activities and using a set of domain specific label generation functions, standard workflow provenance can be utilised as a platform for the labelling of data artefacts.

...read moreread less

Abstract: Provenance traces captured by scientific workflows can be useful for designing, debugging and maintenance. However, our experience suggests that they are of limited use for reporting results, in part because traces do not comprise domain-specific annotations needed for explaining results, and the black-box nature of some workflow activities. We show that by basic mark-up of the data processing within activities and using a set of domain specific label generation functions, standard workflow provenance can be utilised as a platform for the labelling of data artefacts. These labels can in turn aid selection of data subsets and proxy for data descriptors for shared datasets.

...read moreread less

Book Chapter•DOI•

Scientific Lenses to Support Multiple Views over Linked Chemistry Data

[...]

Colin Batchelor¹, Christian Y. Brenninkmeijer², Christine Chichester, Mark Davies³, Daniela Digles⁴, Ian Dunlop², Chris T. Evelo⁵, Anna Gaulton³, Carole Goble², Alasdair J. G. Gray⁶, Paul Groth⁷, Lee Harland, Karen Karapetyan¹, Antonis Loizou⁷, John P. Overington³, Steve Pettifer², Jon Steele¹, Robert Stevens², Valery Tkachenko¹, Andra Waagmeester⁵, Antony J. Williams¹, Egon Willighagen⁵ - Show less +18 more•Institutions (7)

Royal Society of Chemistry¹, University of Manchester², European Bioinformatics Institute³, University of Vienna⁴, Maastricht University⁵, Heriot-Watt University⁶, University of Amsterdam⁷

19 Oct 2014

TL;DR: This paper shows that multiple sets of links can be automatically generated according to different equivalence criteria and published with semantic descriptions capturing their context and interpretation, supporting multiple dynamic views over the Linked Data.

...read moreread less

Abstract: When are two entries about a small molecule in different datasets the same? If they have the same drug name, chemical structure, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this paper, we present an approach to enable applications to choose the equivalence criteria to apply between datasets. Thus, supporting multiple dynamic views over the Linked Data. For chemical data, we show that multiple sets of links can be automatically generated according to different equivalence criteria and published with semantic descriptions capturing their context and interpretation. This approach has been applied within a large scale public-private data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.

...read moreread less

Posted Content•

The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

[...]

Khalid Belhajjame, Jun Zhao, Daniel Garijo, Kristina Hettne, Raul Palma, Oscar Corcho, Jose Manuel Gomez-Perez, Sean Bechhofer, Graham Klyne, Carole Goble - Show less +6 more

17 Jan 2014-arXiv: Digital Libraries

...read moreread less

Abstract: Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption.

...read moreread less

Book•DOI•

The Semantic Web – ISWC 2014

[...]

Peter Mika¹, Tania Tudorache², Abraham Bernstein³, Chris Welty, Craig A. Knoblock⁴, Denny Vrandecic⁵, Paul Groth, Natasha Noy², Krzysztof Janowicz⁶, Carole Goble⁷ - Show less +6 more•Institutions (7)

Yahoo!¹, Stanford University², University of Zurich³, University of Southern California⁴, Google⁵, University of California, Santa Barbara⁶, University of Manchester⁷

01 Jan 2014

TL;DR: The Dutch Ships and Sailors Linked data Cloud is a potential hub dataset for digital history research and a prime example of the benefits of Linked Data for this field.

...read moreread less

Abstract: We present the Dutch Ships and Sailors Linked Data Cloud. This heterogeneous dataset brings together four curated datasets on Dutch Maritime history as five-star linked data. The individual datasets use separate datamodels, designed in close collaboration with maritime historical researchers. The individual models are mapped to a common interoperability layer, allowing for analysis of the data on the general level. We present the datasets, modeling decisions, internal links and links to external data sources. We show ways of accessing the data and present a number of examples of how the dataset can be used for historical research. The Dutch Ships and Sailors Linked Data Cloud is a potential hub dataset for digital history research and a prime example of the benefits of Linked Data for this field.

...read moreread less

Journal Article•DOI•

BIFI: a Taverna plugin for a simplified and user-friendly workflow platform.

[...]

Ahmet Yildiz¹, Erkan Dilaveroglu¹, Ilhami Visne¹, Bilal Günay¹, Emrah Sefer¹, Andreas Weinhäusel¹, Frank Rattay², Carole Goble³, Ram Vinay Pandey, Albert Kriegner¹ - Show less +6 more•Institutions (3)

Austrian Institute of Technology¹, Vienna University of Technology², University of Manchester³

20 Oct 2014-BMC Research Notes

TL;DR: The Taverna Workbench is an open source software providing the ability to combine various services within a workflow, and BIFI offers user interfaces that allow users to interactively construct workflow views and share them with the community, thus significantly increasing usability of heterogeneous, distributed service consumption.

...read moreread less

Abstract: Heterogeneity in the features, input-output behaviour and user interface for available bioinformatics tools and services is still a bottleneck for both expert and non-expert users. Advancement in providing common interfaces over such tools and services are gaining interest among researchers. However, the lack of (meta-) information about input-output data and parameter prevents to provide automated and standardized solutions, which can assist users in setting the appropriate parameters. These limitations must be resolved especially in the workflow-based solution in order to ease the integration of software. We report a Taverna Workbench plugin: the XworX BIFI (Beautiful Interfaces for Inputs) implemented as a solution for the aforementioned issues. BIFI provides a Graphical User Interface (GUI) definition language used to layout the user interface and to define parameter options for Taverna workflows. BIFI is also able to submit GUI Definition Files (GDF) directly or discover appropriate instances from a configured repository. In the absence of a GDF, BIFI generates a default interface. The Taverna Workbench is an open source software providing the ability to combine various services within a workflow. Nevertheless, users can supply input data to the workflow via a simple user interface providing only a text area to enter the input in text form. The workflow may contain meta-information in human readable form such as description text for the port and an example value. However, not all workflow ports are documented so well or have all the required information. BIFI uses custom user interface components for ports which give users feedback on the parameter data type or structure to be used for service execution and enables client-side data validations. Moreover, BIFI offers user interfaces that allow users to interactively construct workflow views and share them with the community, thus significantly increasing usability of heterogeneous, distributed service consumption.

...read moreread less

Proceedings Article•DOI•

Software in reproducible research: advice and best practice collected from experiences at the collaborations workshop

[...]

Shoaib Sufi¹, Neil Chue Hong², Simon Hettrick³, Mario Antonioletti², Stephen Crouch³, Alexander Hay³, Devasena Inupakutika³, Michael Jackson², Aleksandra Pawlik¹, Giacomo Peru², John Robinson³, Les Carr³, David De Roure⁴, Carole Goble¹, Mark Parsons² - Show less +11 more•Institutions (4)

University of Manchester¹, University of Edinburgh², University of Southampton³, University of Oxford⁴

09 Jun 2014

TL;DR: This paper describes three distinct areas of concern which emerged from the Collaborations Workshop 2014: collaboration readiness, capability enhancement and advocacy.

...read moreread less

Abstract: The Collaborations Workshop 2014 (CW14) brought together representatives from across the research community to discuss the issues around software's role in reproducible research. In this paper we summarise the themes, practices and ideas raised at the workshop. We also consider how the "unconference" format of the CW14 helps in eliciting information and forming future collaborations around aspects of reproducible research. In particular, we describe three distinct areas of concern which emerged from the event: collaboration readiness, capability enhancement and advocacy.

...read moreread less

Journal Article•DOI•

Api-Centric Linked Data Integration: The Open Phacts Discovery Platform Case Study

[...]

Paul Groth¹, Antonis Loizou¹, Alasdair J. G. Gray², Carole Goble³, Lee Harland, Steve Pettifer³ - Show less +2 more•Institutions (3)

VU University Amsterdam¹, Heriot-Watt University², University of Manchester³

01 Jan 2014-Social Science Research Network

TL;DR: The Open PHACTS Discovery Platform as discussed by the authors leverages Linked Data to provide integrated access to pharmacology databases between its launch in April 2013 and March 2014, the platform has been accessed over 135 million times and has multiple applications that integrate with it.

...read moreread less

Abstract: Data integration is a key challenge faced in pharmacology where there are numerous heterogenous databases spanning multiple domains (eg chemistry and biology) To address this challenge, the Open PHACTS consortium has developed the Open PHACTS Discovery Platform that leverages Linked Data to provide integrated access to pharmacology databases Between its launch in April 2013 and March 2014, the platform has been accessed over 135 million times and has multiple applications that integrate with it In this work, we discuss how Application Programming Interfaces can extend the classical Linked Data Application Architecture to facilitate data integration Additionally, we show how the Open PHACTS Discovery Platform implements this extended architecture

...read moreread less

Applying linked data approaches to pharmacology: Architectural decisions and

[...]

Egaña Aranguren, Michel Dumontier, Alasdair J. G. Gray, Antonis Loizou, Sune Askjaer, Christian Y. Brenninkmeijer, Kees Burger, Christine Chichester, Chris T. Evelo, Carole Goble, Lee Harland, Steve Pettifer, Mark Thompson, Andra Waagmeester, Antony J. Williams, H. Lundbeck - Show less +12 more

01 Jan 2014

TL;DR: This application report describes a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications, drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project.

...read moreread less

Abstract: The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. We describe the architec- ture of the platform focusing on seven design decisions that drove its development with the aim of informing others developing similar software in this or other domains. The utility of the platform is demonstrated by the variety of drug discovery applications being built to access the integrated data. An alpha version of the OPS platform is currently available to the Open PHACTS consortium and a first public release will be made in late 2012, see http://www.openphacts.org/ for details.

...read moreread less

Reference Entry•DOI•

Mark‐up Language

[...]

Carole Goble, Katy Wolstencroft

07 Apr 2014

Book•DOI•

The Semantic Web – ISWC 2014: 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I

[...]

Peter Mika¹, Tania Tudorache², Abraham Bernstein³, Chris Welty⁴, Craig A. Knoblock⁵, Denny Vrandecic⁶, Paul Groth⁷, Natasha Noy², Krzysztof Janowicz⁸, Carole Goble⁹ - Show less +6 more•Institutions (9)

Yahoo!¹, Stanford University², University of Zurich³, IBM⁴, University of Southern California⁵, Google⁶, VU University Amsterdam⁷, University of California, Santa Barbara⁸, University of Manchester⁹

01 Jan 2014

TL;DR: Data integration, search and query answering, Ontology based data access and query rewriting and reasoning, and Natural language processing and information extraction.

...read moreread less

Abstract: Linked data, its quality, link discovery and application in the life sciences.- Data integration.- Search and query answering.- SPARQL.- Ontology based data access and query rewriting and reasoning Natural language processing and information extraction.- User interaction and personalization, and social media Ontology alignment and modularization.- Sensors and streams.- Biomedicine and drug discovery.- Smart cities.- Sensor streams.- Multimedia.- Visualization.- Link generation.- Ontology development.- Linked stream data.- Federated query processing.- Tag recommendation.- Entity summarization.- Mobile semantic web.

...read moreread less

DOI•

Integrating Taverna Player into Scratchpads

[...]

Robert Haines¹, Carole Goble, Simon Rycroft, Vince Smith•Institutions (1)

University of Manchester¹

14 Jul 2014

TL;DR: This work was enabled by BioVeL and ViBRANT and received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration.

...read moreread less

Abstract: This work was enabled by BioVeL (grant no. 283359) and ViBRANT (grant no. 261532) that received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration. www.biovel.eu | www.vbrant.eu Taverna Player: www.taverna.org.uk | Source code: github.com/myGrid/taverna-player | Licence: BSD Scratchpads: scratchpads.eu | Source code: git.scratchpads.eu/git/scratchpads-2.0.git | Licence: GPL2

...read moreread less

Proceedings Article•DOI•

DistillFlow: removing redundancy in scientific workflows

[...]

Jiuqiang Chen¹, Sarah Cohen-Boulakia¹, Christine Froidevaux¹, Carole Goble², Paolo Missier³, Alan Williams² - Show less +2 more•Institutions (3)

University of Paris-Sud¹, University of Manchester², University of Newcastle³

30 Jun 2014

TL;DR: DistillFlow is able to detect "anti-patterns" in the structure of workflows (idiomatic forms that lead to over-complicated design) and replace them with different patterns to reduce the workflow's overall structural complexity.

...read moreread less

Abstract: Scientific workflows management systems are increasingly used by scientists to specify complex data processing pipelines. Workflows are represented using a graph structure, where nodes represent tasks and links represent the dataflow. However, the complexity of workflow structures is increasing over time, reducing the rate of scientific workflows reuse. Here, we introduce DistillFlow, a tool based on effective methods for workflow design, with a focus on the Taverna model. DistillFlow is able to detect "anti-patterns" in the structure of workflows (idiomatic forms that lead to over-complicated design) and replace them with different patterns to reduce the workflow's overall structural complexity. Rewriting workflows in this way is beneficial both in terms of user experience and workflow maintenance.

...read moreread less

Proceedings Article•

Discoveries and Anti-Discoveries on the Web of Argument and Data

[...]

Timothy Clark¹, Carole Goble², Paolo Ciccarese¹•Institutions (2)

Harvard University¹, University of Manchester²

18 Jun 2014

TL;DR: This article used W3C Open Annotation to produce multipolar argumentation networks as an overlay on Web documents to assist in improving scientific reproducibility by integrating discussion and analysis of doubtful results.

...read moreread less

Abstract: Too many scientific arguments in peer-reviewed articles are backed by flawed or inadequate data. Articles can now be annotated in pre- or post-publication review using models such as W3C Open Annotation, to produce multipolar argumentation networks as an overlay on Web documents. This would assist in improving scientific reproducibility by integrating discussion and analysis of doubtful results.

...read moreread less

Journal Article•DOI•

Emerging Computational Methods for the Life Sciences Workshop 2012

[...]

Judy Qiu¹, Ian Foster², Carole Goble³•Institutions (3)

Indiana University¹, University of Chicago², University of Manchester³

25 Apr 2014-Concurrency and Computation: Practice and Experience

TL;DR: This special issue encouraged researchers to submit and present original work related to the latest trends in parallel and distributed high-performance systems applied to life science problems.

...read moreread less

Abstract: Computing systems are rapidly changing with multicore, graphics processing units (GPUs), clusters, volunteer systems, clouds, and grids offering a confusing dazzling array of opportunities. New programming paradigms such as Google MapReduce and many-task computing have joined the traditional repertoire of workflow and parallel computing for the highest performance systems. Meanwhile, the life sciences are continuing to expand in data generated with continuing improvement in the instruments for high-throughput analysis. This ‘fourth paradigm’ (data driven science) is joined by complex systems or biocomplexity that can build phenomenological models of biological systems and processes. This special issue for Emerging Computational Methods for the Life Sciences Workshop ECMLS2012 [1], juxtaposes these trends seeking those computational methods that will enhance scientific discovery. Within this overall scope, this special issue encouraged researchers to submit and present original work related to the latest trends in parallel and distributed high-performance systems applied to life science problems. Important contributions have been provided by Weber et al. [2], Yang et al. [3], Elllingson et al. [4], Hamacher et al. [5], Cushing et al. [6], and Stanberry et al. [7]. These contributions focus on:

...read moreread less

Proceedings Article•

Help me describe my data: a demonstration of the open PHACTS VoID editor

[...]

Carole Goble¹, Alasdair J. G. Gray², Eleftherios Tatakis¹•Institutions (2)

University of Manchester¹, Heriot-Watt University²

21 Oct 2014

TL;DR: The Open PHACTS VoID Editor helps non-Semantic Web experts to create machine interpretable descriptions for their datasets to enable its discovery and reuse.

...read moreread less

Abstract: The Open PHACTS VoID Editor helps non-Semantic Web experts to create machine interpretable descriptions for their datasets. The web app guides the user, an expert in the domain of the data, through a series of questions to capture details of their dataset and then generates a VoID dataset description. The generated dataset description conforms to the Open PHACTS dataset description guidelines that ensure suitable provenance information is available about the dataset to enable its discovery and reuse. The VoID Editor is available at http://voideditor.cs.man.ac.uk. The source code can be found at https://github.com/openphacts/Void-Editor2.

...read moreread less

Reference Entry•DOI•

Minimum Information Models

[...]

Carole Goble, Katy Wolstencroft

07 Apr 2014-Dictionary of Bioinformatics and Computational Biology