Showing papers by "Carole Goble published in 2010"

PDF

Open Access

Journal Article•DOI•

myExperiment: a repository and social network for the sharing of bioinformatics workflows

[...]

Carole Goble¹, Jiten Bhagat¹, Sergejs Aleksejevs¹, Don Cruickshank¹, Danius T. Michaelides¹, David Newman¹, Mark I. Borkum¹, Sean Bechhofer¹, Marco Roos¹, Peter Li¹, David De Roure¹ - Show less +7 more•Institutions (1)

University of Southampton¹

01 Jul 2010-Nucleic Acids Research

TL;DR: MyExperiment is an online research environment that supports the social sharing of bioinformatics workflows consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results.

...read moreread less

Abstract: myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services performed on data from its retrieval, integration and analysis, to the visualisation of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 900 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to bugs@myexperiment.org.

...read moreread less

310 citations

Journal Article•DOI•

BioCatalogue: a universal catalogue of web services for the life sciences

[...]

Jiten Bhagat¹, Franck Tanoh¹, Eric Nzuobontane¹, Thomas Laurent¹, Jerzy Orlowski¹, Marco Roos¹, Katy Wolstencroft¹, Sergejs Aleksejevs¹, Robert Stevens¹, Steve Pettifer¹, Rodrigo Lopez¹, Carole Goble¹ - Show less +8 more•Institutions (1)

University of Manchester¹

01 Jul 2010-Nucleic Acids Research

TL;DR: The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences, but their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult.

...read moreread less

Abstract: The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable 'Web 2.0'-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community.

...read moreread less

225 citations

Journal Article•DOI•

Research Objects: Towards Exchange and Reuse of Digital Knowledge

[...]

Sean Bechhofer¹, David De Roure², Matthew Gamble¹, Carole Goble¹, Iain Buchan¹ - Show less +1 more•Institutions (2)

University of Manchester¹, University of Southampton²

22 Feb 2010-Nature Precedings

TL;DR: The notion of research objects, semantically rich aggregations of resources, that possess some scientifi?c intent or support some research objective are discussed and a number of principles that such objects and their associated services are expected to follow are presented.

...read moreread less

Abstract: What will researchers be publishing in the future? Whilst there is little question that the Web will be the publication platform, as scholars move away from paper towards digital content, there is a need for mechanisms that support the production of self-contained units of knowledge and facilitate the publication, sharing and reuse of such entities. In this paper we discuss the notion of research objects, semantically rich aggregations of resources, that possess some scientifi?c intent or support some research objective. We present a number of principles that we expect such objects and their associated services to follow.

...read moreread less

190 citations

Taverna, reloaded

[...]

Paolo Missier¹, Stian Soiland-Reyes¹, Stuart Owen¹, Wei Tan², Alexandra Nenadic¹, Ian Dunlop¹, Alan Williams¹, Tom Oinn³, Carole Goble¹ - Show less +5 more•Institutions (3)

University of Manchester¹, Argonne National Laboratory², European Bioinformatics Institute³

30 Jun 2010

TL;DR: How the recently overhauled technical architecture of Taverna addresses issues of efficiency, scalability, and extensibility, and presents performance results based on a collection of synthetic workflows is described, as well as a concrete case study involving a production workflow in the area of cancer research.

...read moreread less

Abstract: The Taverna workflow management system is an open source project with a history of widespread adoption within multiple experimental science communities, and a long-term ambition of effectively supporting the evolving need of those communities for complex, data-intensive, service-based experimental pipelines. This short paper describes how the recently overhauled technical architecture of Taverna addresses issues of efficiency, scalability, and extensibility, and presents performance results based on a collection of synthetic workflows, as well as a concrete case study involving a production workflow in the area of cancer research.

...read moreread less

168 citations

Proceedings Article•DOI•

Why Linked Data is Not Enough for Scientists

[...]

Sean Bechhofer¹, John Ainsworth¹, Jiten Bhagat¹, Iain Buchan¹, Philip Couch¹, Don Cruickshank², David De Roure², Mark Delderfield¹, Ian Dunlop¹, Matthew Gamble¹, Carole Goble¹, Danius T. Michaelides², Paolo Missier¹, Stuart Owen¹, David Newman², Shoaib Sufi¹ - Show less +12 more•Institutions (2)

University of Manchester¹, University of Southampton²

07 Dec 2010

TL;DR: This paper makes the case for a scientific data publication model on top of linked data and introduces the notion of Research Objects as first class citizens for sharing and publishing.

...read moreread less

Abstract: Scientific data stands to represent a significant portion of the linked open data cloud and science itself stands to benefit from the data fusion capability that this will afford. However, simply publishing linked data into the cloud does not necessarily meet the requirements of reuse. Publishing has requirements of provenance, quality, credit, attribution, methods in order to provide the \emph{reproducibility} that allows validation of results. In this paper we make the case for a scientific data publication model on top of linked data and introduce the notion of \emph{Research Objects} as first class citizens for sharing and publishing.

...read moreread less

90 citations

Book Chapter•DOI•

Janus: From workflows to semantic provenance and linked open data

[...]

Paolo Missier¹, Satya S. Sahoo², Jun Zhao³, Carole Goble¹, Amit P. Sheth² - Show less +1 more•Institutions (3)

University of Manchester¹, Wright State University², University of Oxford³

15 Jun 2010

TL;DR: This paper proposes a model and architecture for semantic, domain-aware provenance, and demonstrates its usefulness in answering typical user queries, and discusses the additional benefits and the technical implications of publishing provenance graphs as a form of Linked Data.

...read moreread less

Abstract: Data provenance graphs are form of metadata that can be used to establish a variety of properties of data products that undergo sequences of transformations, typically specified as workflows. Their usefulness for answering user provenance queries is limited, however, unless the graphs are enhanced with domain-specific annotations. In this paper we propose a model and architecture for semantic, domain-aware provenance, and demonstrate its usefulness in answering typical user queries. Furthermore, we discuss the additional benefits and the technical implications of publishing provenance graphs as a form of Linked Data. A prototype implementation of the model is available for data produced by the Taverna workflow system.

...read moreread less

71 citations

Journal Article•DOI•

Community-driven computational biology with Debian Linux

[...]

Steffen Möller¹, Hajo N. Krabbenhöft, Andreas Tille, David Paleino², Alan Williams³, Katy Wolstencroft³, Carole Goble³, Richard Holland, Dominique Belhachemi⁴, Charles Plessy - Show less +6 more•Institutions (4)

University of Lübeck¹, University of Palermo², University of Manchester³, University of Pennsylvania⁴

21 Dec 2010-BMC Bioinformatics

TL;DR: Debian Med provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology, and closes the gap between developers and users.

...read moreread less

Abstract: Background The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments.

...read moreread less

70 citations

Journal Article•DOI•

A formal semantics for the Taverna 2 workflow model

[...]

Jacek Sroka¹, Jan Hidders², Paolo Missier³, Carole Goble³•Institutions (3)

University of Warsaw¹, Delft University of Technology², University of Manchester³

01 Sep 2010-Journal of Computer and System Sciences

TL;DR: This paper presents a formal semantics for the Taverna 2 scientific workflow system, which improves upon the existing model by adding support for data pipelining and providing new extensibility points that make it possible to add new operators to the workflow model.

...read moreread less

69 citations

Journal Article•DOI•

CaGrid Workflow Toolkit: A taverna based workflow tool for cancer grid

[...]

Wei Tan¹, Ravi Madduri¹, Aleksandra Nenadic², Stian Soiland-Reyes², Dinanath Sulakhe¹, Ian Foster¹, Carole Goble² - Show less +3 more•Institutions (2)

Argonne National Laboratory¹, University of Manchester²

02 Nov 2010-BMC Bioinformatics

TL;DR: The caGrid Workflow Toolkit is designed and implemented to ease building and running caGrid workflows and provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation.

...read moreread less

Abstract: Background In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.

...read moreread less

53 citations

Journal Article•DOI•

A comparison of using Taverna and BPEL in building scientific workflows: the case of caGrid

[...]

Wei Tan¹, Paolo Missier², Ian Foster¹, Ravi Madduri¹, David De Roure³, Carole Goble² - Show less +2 more•Institutions (3)

Argonne National Laboratory¹, University of Manchester², University of Southampton³

01 Jun 2010-Concurrency and Computation: Practice and Experience

TL;DR: This study chooses BPEL and Taverna as candidates, and compares their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis, to show that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors.

...read moreread less

Abstract: When the emergence of ‘service-oriented science,’ the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create ‘science workflows.’ We present here our findings in providing a workflow solution for the caGrid service-based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; whereas Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers to select a language or tool that meets their specific needs, but also offers some insight into how a workflow language and tool can fulfill the requirement of the scientific community. Copyright © 2009 John Wiley & Sons, Ltd.

...read moreread less

46 citations

Proceedings Article•DOI•

Linking multiple workflow provenance traces for interoperable collaborative science

[...]

Paolo Missier¹, Bertram Ludäscher², Shawn Bowers³, Saumen Dey³, Anandarup Sarkar³, Biva Shrestha⁴, Ilkay Altintas⁵, Manish Kumar Anand⁵, Carole Goble¹ - Show less +5 more•Institutions (5)

University of Manchester¹, University of California, Davis², Gonzaga University³, Appalachian State University⁴, San Diego Supercomputer Center⁵

17 Dec 2010

TL;DR: A model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models is presented.

...read moreread less

Abstract: Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.

...read moreread less

Journal Article•DOI•

Systematic integration of experimental data and models in systems biology

[...]

Peter Li¹, Joseph O. Dada¹, Daniel Jameson¹, Irena Spasic², Neil Swainston¹, Kathleen M. Carroll¹, Warwick B. Dunn¹, Farid Khan¹, Naglis Malys¹, Hanan L. Messiha¹, Evangelos Simeonidis¹, Dieter Weichart¹, Catherine L. Winder¹, Jill A. Wishart¹, David S. Broomhead¹, Carole Goble¹, Simon J. Gaskell¹, Douglas B. Kell¹, Hans V. Westerhoff³, Hans V. Westerhoff¹, Pedro Mendes¹, Pedro Mendes⁴, Norman W. Paton¹ - Show less +19 more•Institutions (4)

University of Manchester¹, Cardiff University², VU University Amsterdam³, Virginia Bioinformatics Institute⁴

29 Nov 2010-BMC Bioinformatics

TL;DR: Distributed information about metabolic reactions that have been described to MIRIAM standards enables the automated assembly of quantitative systems biology models of metabolic networks based on user-defined criteria.

...read moreread less

Abstract: Background: The behaviour of biological systems can be deduced from their mathematical models. However, multiple sources of data in diverse forms are required in the construction of a model in order to define its components and their biochemical reactions, and corresponding parameters. Automating the assembly and use of systems biology models is dependent upon data integration processes involving the interoperation of data and analytical resources. Results: Taverna workflows have been developed for the automated assembly of quantitative parameterised metabolic networks in the Systems Biology Markup Language (SBML). A SBML model is built in a systematic fashion by the workflows which starts with the construction of a qualitative network using data from a MIRIAMcompliant genome-scale model of yeast metabolism. This is followed by parameterisation of the SBML model with experimental data from two repositories, the SABIO-RK enzyme kinetics database and a database of quantitative experimental results. The models are then calibrated and simulated in workflows that call out to COPASIWS, the web service interface to the COPASI software application for analysing biochemical networks. These systems biology workflows were evaluated for their ability to construct a parameterised model of yeast glycolysis. Conclusions: Distributed information about metabolic reactions that have been described to MIRIAM standards enables the automated assembly of quantitative systems biology models of metabolic networks based on userdefined criteria. Such data integration processes can be implemented as Taverna workflows to provide a rapid overview of the components and their relationships within a biochemical system.

...read moreread less

Journal Issue•DOI•

Towards open science: the myExperiment approach

[...]

David De Roure¹, Carole Goble², Sergejs Aleksejevs², Sean Bechhofer², Jiten Bhagat², Don Cruickshank¹, Paul R. Fisher², Duncan Hull², Danius T. Michaelides¹, David Newman¹, Rob Procter², Yuwei Lin², Meik Poschen² - Show less +9 more•Institutions (2)

University of Southampton¹, University of Manchester²

01 Dec 2010-Concurrency and Computation: Practice and Experience

TL;DR: The notion of the Research Object is introduced—the work objects that are built, transformed and published in the course of scientific experiments—and it is suggested that by encapsulating methods with results the authors can achieve research that is more reusable and repeatable and hence rapid and robust.

...read moreread less

Abstract: By making research content more reusable, and providing a social infrastructure that facilitates sharing, the human aspects of the scholarly knowledge cycle may be accelerated and ‘time-to-discovery’ reduced. We propose that the key to this is the sharing of methods and processes. We present myExperiment, a social web site for discovering, sharing and curating Scientific Workflows and experiment plans, and describe how myExperiment facilitates the management and sharing of research workflows, supports a social model for content curation tailored to the researcher and community, and supports Open Science by exposing content and functionality to the users' tools and applications. Based on this, we introduce the notion of the Research Object—the work objects that are built, transformed and published in the course of scientific experiments—and suggest that by encapsulating methods with results we can achieve research that is more reusable and repeatable and hence rapid and robust. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

Proceedings Article•DOI•

ERGOT: A Semantic-Based System for Service Discovery in Distributed Infrastructures

[...]

Giuseppe Pirrò¹, Paolo Trunfio¹, Domenico Talia¹, Paolo Missier², Carole Goble² - Show less +1 more•Institutions (2)

University of Calabria¹, University of Manchester²

17 May 2010

TL;DR: ERGOT is a system that combines DHTs and SONs to enable semantic-based service discovery in distributed infrastructures such as Grids and Clouds and enables semantic- based service matchmaking, using a novel similarity measure between service requests and descriptions.

...read moreread less

Abstract: The increasing number of available online services demands distributed architectures to promote scalability as well as semantics to enable their precise and efficient retrieval. Two common approaches toward this goal are Semantic Overlay Networks (SONs) and Distributed Hash Tables (DHTs) with semantic extensions. This paper presents ERGOT, a system that combines DHTs and SONs to enable semantic-based service discovery in distributed infrastructures such as Grids and Clouds. ERGOT takes advantage of semantic annotations that enrich service specifications in two ways: (i) services are advertised in the DHT on the basis of their annotations, thus allowing to establish a SON among service providers, (ii) annotations enable semantic-based service matchmaking, using a novel similarity measure between service requests and descriptions. Experimental evaluations confirmed the efficiency of ERGOT in terms of accuracy of search and network traffic.

...read moreread less

Journal Article•DOI•

XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments

[...]

Morris A. Swertz¹, Morris A. Swertz², Morris A. Swertz³, K. Joeri van der Velde¹, K. Joeri van der Velde³, Bruno M. Tesson³, Richard A. Scheltema³, Danny Arends¹, Danny Arends³, Gonzalo Vera³, Rudi Alberts, Martijn Dijkstra³, Paul N. Schofield⁴, Klaus Schughart, John M. Hancock, Damian Smedley², Katy Wolstencroft⁵, Carole Goble⁵, Engbert O. de Brock³, Andrew R. Jones⁶, Helen Parkinson², Ritsert C. Jansen¹, Ritsert C. Jansen³ - Show less +19 more•Institutions (6)

University Medical Center Groningen¹, European Bioinformatics Institute², University of Groningen³, University of Cambridge⁴, University of Manchester⁵, University of Liverpool⁶

09 Mar 2010-Genome Biology

TL;DR: An extensible software model for the genotype and phenotype community, XGAP, which has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data.

...read moreread less

Abstract: We present an extensible software model for the genotype and phenotype community, XGAP. Readers can download a standard XGAP (http://www.xgap.org) or auto-generate a custom version using MOLGENIS with programming interfaces to R-software and web-services or user interfaces for biologists. XGAP has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data. Current functionality includes tools ranging from eQTL analysis in mouse to genome-wide association studies in humans.

...read moreread less

Book Chapter•DOI•

Understanding collaborative studies through interoperable workflow provenance

[...]

Ilkay Altintas¹, Ilkay Altintas², Manish Kumar Anand¹, Daniel Crawl¹, Shawn Bowers³, Adam Belloum², Paolo Missier⁴, Bertram Ludäscher¹, Carole Goble⁴, Peter M. A. Sloot² - Show less +6 more•Institutions (4)

University of California¹, University of Amsterdam², Gonzaga University³, University of Manchester⁴

15 Jun 2010

TL;DR: This paper describes a new query model that captures implicit user collaborations and shows how this model maps to OPM and helps to answer collaborative queries, e.g., identifying combined workflows and contributions of users collaborating on a project based on the records of previous workflow executions.

...read moreread less

Abstract: The provenance of a data product contains information about how the product was derived, and is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. Currently, most provenance models are designed to capture the provenance related to a single run, and mostly executed by a single user. However, a scientific discovery is often the result of methodical execution of many scientific workflows with many datasets produced at different times by one or more users. Further, to promote and facilitate exchange of information between multiple workflow systems supporting provenance, the Open Provenance Model (OPM) has been proposed by the scientific workflow community. In this paper, we describe a new query model that captures implicit user collaborations. We show how this model maps to OPM and helps to answer collaborative queries, e.g., identifying combined workflows and contributions of users collaborating on a project based on the records of previous workflow executions. We also adopt and extend the high-level Query Language for Provenance (QLP) with additional constructs, and show how these extensions allow non-expert users to express collaborative provenance queries against this model easily and concisely. Furthermore, we adopt the Provenance Challenge 3 (PC3) workflows as a collaborative and interoperable usecase scenario, where different stages of the workflow are executed in three different workflow environments - Kepler, Taverna, and WSVLAM. Through this usecase, we demonstrate how we can establish and understand collaborative studies through interoperable workflow provenance.

...read moreread less

Journal Article•DOI•

A comparison of using Taverna and BPEL in building scientific workflows

[...]

Wei Tan, Paolo Missier, Ian Foster, Ravi Madduri, David De Roure, Carole Goble - Show less +2 more

01 Jan 2010-Concurrency and Computation: Practice and Experience

...read moreread less

Abstract: When the emergence of ‘service‐oriented science,’ the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create ‘science workflows.’ We present here our findings in providing a workflow solution for the caGrid service‐based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; whereas Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers to select a language or tool that meets their specific needs, but also offers some insight into how a workflow language and tool can fulfill the requirement of the scientific community. Copyright © 2009 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Elements of a computational infrastructure for social simulation

[...]

Mark Birkin¹, Rob Procter², Rob Allan³, Sean Bechhofer², Iain Buchan², Carole Goble², Andrew Hudson-Smith⁴, Paul Lambert⁵, David De Roure⁶, Richard O. Sinnott⁷ - Show less +6 more•Institutions (7)

University of Leeds¹, University of Manchester², Science and Technology Facilities Council³, University College London⁴, University of Stirling⁵, University of Southampton⁶, University of Glasgow⁷

28 Aug 2010-Philosophical Transactions of the Royal Society A

TL;DR: Three applications of e-social science that promote social simulation modelling, data management and visualization are described and an example is outlined in which the three components are brought together in a transport planning context.

...read moreread less

Abstract: Applications of simulation modelling in social science domains are varied and increasingly widespread. The effective deployment of simulation models depends on access to diverse datasets, the use of analysis capabilities, the ability to visualize model outcomes and to capture, share and re-use simulations as evidence in research and policy-making. We describe three applications of e-social science that promote social simulation modelling, data management and visualization. An example is outlined in which the three components are brought together in a transport planning context. We discuss opportunities and benefits for the combination of these and other components into an e-infrastructure for social simulation and review recent progress towards the establishment of such an infrastructure.

...read moreread less

Proceedings Article•DOI•

The Evolution of myExperiment

[...]

David De Roure¹, Carole Goble², Sergejs Aleksejevs², Sean Bechhofer², Jiten Bhagat², Don Cruickshank³, Paul R. Fisher², Nandkumar Kollara³, Danius T. Michaelides³, Paolo Missier², David Newman³, Marcus Ramsden³, Marco Roos⁴, Katy Wolstencroft², Ed Zaluska³, Jun Zhao¹ - Show less +12 more•Institutions (4)

University of Oxford¹, University of Manchester², University of Southampton³, Leiden University⁴

07 Dec 2010

TL;DR: The myExperiment social website for sharing scientific workflows, designed according to Web 2.0 principles, has grown to be the largest public repository of its kind and now supports Linked Data.

...read moreread less

Abstract: The myExperiment social website for sharing scientific workflows, designed according to Web 2.0 principles, has grown to be the largest public repository of its kind. It is distinctive for its focus on sharing methods, its researcher-centric design and its facility to aggregate content into sharable ‘research objects’. This evolution of myExperiment has occurred hand in hand with its users. myExperiment now supports Linked Data as a step toward our vision of the future research environment, which we categorise here as 3rd generation e-Research.

...read moreread less

Proceedings Article•

Anchors in Shifting Sand: the Primacy of Method in the Web of Data

[...]

David De Roure¹, Carole Goble²•Institutions (2)

University of Southampton¹, University of Manchester²

01 Mar 2010

TL;DR: This paper proposes that the "methods" by which results are obtained to be first class citizens in the Web of Data, so that they can be shared and discussed and so that results can be explained, interpreted and reused.

...read moreread less

Abstract: Is the Linkded Data Web ready for people to use open government data or scientific datasets to do reproducible research? For one thing, practice and support for versioning have not yet emerged. In this paper we propose that we also need the "methods" by which results are obtained to be first class citizens in the Web of Data, so that they can be shared and discussed and so that results can be explained, interpreted and reused. We discuss our experience of the myExperiment.org Web Site, a social network of people sharing reusable methods for processing research data, with mechanisms for discovering, sharing, enacting, versioning and curation.

...read moreread less

Journal Article•DOI•

An ActOn-based semantic information service for Grids

[...]

Wei Xing¹, Oscar Corcho², Carole Goble¹, Marios D. Dikaiakos³•Institutions (3)

University of Manchester¹, Technical University of Madrid², University of Cyprus³

01 Mar 2010-Future Generation Computer Systems

TL;DR: A semantic information service that aggregates metadata from a large number of information sources of a large-scale Grid infrastructure using an ontology-based information integration architecture suitable for the highly dynamic distributed information sources available in Grid systems is described.

...read moreread less

Proceedings Article•DOI•

Functional Units: Abstractions for Web Service Annotations

[...]

Paolo Missier¹, Katy Wolstencroft¹, Franck Tanoh¹, Peter Li¹, Sean Bechhofer¹, Khalid Belhajjame¹, Steve Pettifer¹, Carole Goble¹ - Show less +4 more•Institutions (1)

University of Manchester¹

05 Jul 2010

TL;DR: In this paper, the authors introduce functional units (FU) as elementary units of information used to describe a service, and propose techniques for automating the service annotations process by analysing collections of workflows that use those services.

...read moreread less

Abstract: Computational and data-intensive science increasingly depends on a large Web Service infrastructure, as services that provide a broad array of functionality can be composed into workflows to address complex research questions. In this context, the goal of service registries is to offer accurate search and discovery functions to scientists. Their effectiveness, however, depends not only on the model chosen to annotate the services, but also on the level of abstraction chosen for the annotations. The work presented in this paper stems from the observation that current annotation models force users to think in terms of service interfaces, rather than of high-level functionality, thus reducing their effectiveness. To alleviate this problem, we introduce \textit{Functional Units} (FU) as the elementary units of information used to describe a service. Using popular examples of services for the Life Sciences, we define FUs as configurations and compositions of underlying service operations, and show how functional-style service annotations can be easily realised using the OWL semantic Web language. Finally, we suggest techniques for automating the service annotations process, by analysing collections of workflows that use those services.

...read moreread less

Proceedings Article•

Standing on the shoulders of the trusted web: Trust, Scholarship and Linked Data

[...]

Carole Goble¹, Matthew Gamble¹•Institutions (1)

University of Manchester¹

01 Apr 2010

TL;DR: This work proposes the adoption of social trust techniques to share a new emerging class of scientic digital object - Research Objects, and proposes a mechanism for introducing social trust metrics into the distributed social web to facilitate access control to aggregations of linked data resources.

...read moreread less

Abstract: The web of linked data is incompatible with the modern \selsh scientist". What is missing is a mechanism that supports both what scientists share, and how they share. Solutions must be informed by social, technical and cultural issues surrounding the sharing of scientic data in the web of linked data. We propose the adoption of social trust techniques to share a new emerging class of scientic digital object - Research Objects. We suggest a mechanism for introducing social trust metrics into the distributed social web to facilitate access control to aggregations of linked data resources. Through the application and analysis of two established trust metrics, we then present the grounding of the Colleague of a Colleague (Cocoa) trust metric suited to the sharing of scientic knowledge delivered as Research Objects.

...read moreread less

An ActOn-based Semantic Information Service for EGEE

[...]

Wei Xing¹, Oscar Corcho¹, Carole Goble¹, Marios D. Dikaiakos²•Institutions (2)

University of Manchester¹, University of Cyprus²

01 Mar 2010

TL;DR: In this article, the authors describe a semantic information service that aggregates metadata from a large number of information sources of a large-scale Grid infrastructure using an ontology-based information integration architecture (ActOn).

...read moreread less

Abstract: We describe a semantic information service that aggregates metadata from a large number of information sources of a large-scale Grid infrastructure. It uses an ontology-based information integration architecture (ActOn) suitable for the highly dynamic distributed information sources available in Grid systems, where information changes frequently and where the information of distributed sources has to be aggregated in order to solve complex queries. These two challenges are addressed by a Metadata Cache that works with an update-on-demand policy and by an information source selection module that selects the most suitable source at a given point in time. We have evaluated the quality of this information service, and compared it with other similar services from the EGEE production testbed, with promising results.

...read moreread less

Proceedings Article•DOI•

Open workflow infrastructure: a research agenda

[...]

Vlado Stankovski¹, Paolo Missier², Carole Goble², Ian Taylor³•Institutions (3)

University of Ljubljana¹, University of Manchester², Cardiff University³

06 Jun 2010

TL;DR: A research agenda to build an infrastructure with a flexible design to work on an Internet-wide scale to incorporate workflow editing, sharing and enactment capabilities directly into the Internet, thus making distributed applications available and usable in a wide range of pervasive settings.

...read moreread less

Abstract: While current Distributed Computation Platforms (DCPs) provide environments for composition and enactment of distributed applications, a comprehensive methodology and tool that facilitates open, generic and rapid development, sharing and utilization of distributed applications represented through the workflow methodology, is still missing. The goal is to incorporate workflow editing, sharing and enactment capabilities directly into the Internet, thus making distributed applications available and usable in a wide range of pervasive settings. In this position paper, we outline a research agenda to build such an infrastructure with a flexible design to work on an Internet-wide scale. Research and technology development activities are intended to address the following end-user needs: (1) abstracting individual DCPs so that end-users can use common interfaces, (2) allowing rapid customization of the distributed process, (3) sharing processes, lessons and features from distributed applications across the community and (4) incorporating security and provenance framework.

...read moreread less

RightField: embedding ontology term selection into spreadsheets for the annotation of biological data

[...]

Katy Wolstencroft¹, Matthew Horridge¹, Stuart Owen¹, Wolfgang Mueller, Finn Bacall¹, Jacky L. Snoep¹, Olga Krebs, Carole Goble¹ - Show less +4 more•Institutions (1)

University of Manchester¹

09 Nov 2010

TL;DR: RightField is an open source application that provides a mechanism for embedding ontology annotation support for Life Science data in Microsoft Excel spreadsheets, enabling scientists to consistently annotate their data without the need to understand the numerous metadata standards and ontologies available to them.

...read moreread less

Abstract: RightField is an open source application that provides a mechanism for embedding ontology annotation support for Life Science data in Microsoft Excel spreadsheets Individual cells, columns, or rows can be restricted to particular ranges of allowed classes or instances from chosen ontologies Informaticians, with experience in ontologies and data annotation prepare RightField-enabled spreadsheets with embedded ontology term selection for use by a wider community of laboratory scientists The RightField-enabled spreadsheet presents selected ontology terms to the users as a simple drop-down list, enabling scientists to consistently annotate their data without the need to understand the numerous metadata standards and ontologies available to them The spreadsheets are self-contained and remain "vanilla" Excel so that they can be readily exchanged, processed offline and are usable by regular Excel tooling The result is semantic annotation by stealth, with an annotation process that is less error-prone, more efficient, and more consistent with community standards RightField has been developed and deployed for a consortium of some 300 Systems Biologists RightField is open source under a BSD license and freely available from http://wwwsysmo-dborg/RightField

...read moreread less

Proceedings Article•

Understanding Collaborative Studies Through Interoperable Workflow

[...]

Ilkay Altintas¹, Manish Kumar Anand¹, Daniel Crawl¹, Adam Belloum², Paolo Missier³, Carole Goble³, Peter M. A. Sloot² - Show less +3 more•Institutions (3)

University of California¹, University of Amsterdam², University of Manchester³

01 Jun 2010

Book Chapter•DOI•

Supporting e-Science Using Semantic Web Technologies – The Semantic Grid

[...]

David De Roure¹, Carole Goble²•Institutions (2)

University of Southampton¹, University of Manchester²

01 Jan 2010

TL;DR: This chapter focuses on the role of the ideas and technologies of the Semantic Web in providing and utilising the infrastructure for science.

...read moreread less

Abstract: The capabilities of the Web have had a very significant impact in facilitating new practice in science: it supports wide-scale information discovery and sharing, facilitating collaboration and enabling widespread participation in digital science, and increasingly it also provides a platform for developing and delivering software and services to support science. In this chapter we focus on the role of the ideas and technologies of the Semantic Web in providing and utilising the infrastructure for science. Our emphasis on Semantic Web and on the joined-up infrastructure to support the increasing scale of data, computation, collaboration and automation as science and computing advance has led to this field being known as the “Semantic Grid”. Since its instigation in 2001 the Semantic Grid community has established a significant body of work, and the approach continues to underpin new scientific practice in multiple disciplines.

...read moreread less

Seamless Provenance Representation and Use in Collaborative Science Scenarios

[...]

Paolo Missier, B. Ludaescher, Shawn Bowers, Ilkay Altintas, Manish Kumar Anand, Saumen Dey, Anandarup Sarkar, B. Shrestha, Carole Goble - Show less +5 more

01 Dec 2010

Community-driven computational biology with Debian and Taverna

[...]

Steffen Moeller, Hajo Krabbenhoeft, Andreas Tille, David Paleino, Alan Williams, Katherine Wolstencroft, Carole Goble, Charles Plessy - Show less +4 more

01 Jul 2010

TL;DR: This work presents the presented work of the Debian Linux community, an open society of enthusiasts around the globe who collaborate on packaging free software for the Linux and FreeBSD kernels, which renders them available from mobiles to supercomputers and for all common processors.

...read moreread less

Abstract: Computational biology manifests itself in many flavours. It comprises the data analysis and -management of sequences, structures, the observed and synthetical variants of the prior, static or dynamic interactions, and serves the modelling of biological processes in physiological and pathophysiological conditions. The field gained an enormous momentum over the past two decades. The information gathered today covers biological properties of many organisms and serves as a reference and general source for derived work also for neighbouring disciplines. Biologist, physicians and chemists all started using bioinformatics tools, data and models in their routine. The latest trend is to integrate the thinking of engineers and physicis, who construct compounds in silico to later prove the predicted function in the lab. The approach became known as synthetic biology and is perceived by many to allow a fluent transition towards nano-technologies. With research questions becoming increasingly complex, they demand the interaction of highly specialised disciplines. This leads to a steady increase in the number of non-redundant tools and databases that researchers need to interact with - both the computational developer and the biological users. The dependency of the biological research community on such services will increase over the upcoming years. The strong computational demands of the services, and the sheer complexity of the research fosters the collaboration of individuals from many sites, computationally in form of grid and cloud computing, but also between computationally and biologically primed groups. To maintain the software installation consistently is barely achievable for dedicated individuals; the sharing of such across various platforms and institutional boundaries is the driving force behind the here presented work of the Debian Linux community. Debian is an open society of enthusiasts around the globe who collaborate on packaging free software for the Linux and FreeBSD kernels. Packages are prepared by individuals and uploaded to the distribution's main servers for auto-building on today's most prominent platforms, thus rendering them available from mobiles to supercomputers and for all common processors. For complex suites or as a principle, packagers have an option to share their effort as part of a community. This process is aided by portalsauto-prepared by the infrastructure of the Debian blends. Packages invite feedback from users with the Bug Tracking System. Around 80000 users have allowed the counting of their applications via Debian's Popularity-Contest initiative. Separately counted are installations of packages that are forwarded to derived distributions. The most prominent of these is Ubuntu, for which more than 1.3 million users are reporting. Packages are described verbosely and are translated to many languages. More formally they may be selected by manual assignment of terms from a controlled vocabulary. Technical constraints for the packaging are laid out in the Debian Policy document. Changes to it are discussed on the project's mailing lists and may be subject to voting by contributors to the distribution. The Ubuntu Linux distribution adopts the Debian packages for their own software "universe" and as such considerably contributes to the dissemination of the efforts. The computing world experiences continuous transitions, e.g. these days from 32 to 64 bit. Upcoming is an increased acceptance for energy-saving ARM- and MIPS-based operating systems of the mobile world and some special highly parallel systems. With Debian's packages being auto-built on all these different hardware platforms, one can expect continuity during such transitions, and similarly find consistent setups in the typical heterogeneous research infrastructures. This is of particu- lar benefit for distributed computations and contributes to the strong adoption of Debian and Ubuntu for cloud computing. Packaging is most successful, i.e. up-to-date and tested, when it is derived from the packager's daily routine. For computational biology, the community now faces the challenge to scale with the steady increase in complexity: the number of contributors to the packaging needs to match the number of programs that users expect to be available. The group maintenance of applications is one such approach that seeks to lower the entry hurdle for the packaging by mutual training and the distribution of work according to expertise and interests. It also helps the integration of the software developers themselves with the community, e.g. for AutoDock and BALLView: the software developers follow the distribution's bug reports directly, and may contribute a description of their package or were invited to upload their own packages directly to the distribution's servers rather than offering them on their respective home page. With an increasing number of packages available, the interaction between those tools becomes more and more of concern. This addresses the establishment of workflows comprising tools from many packages, but from the distribution's perspective it is also the challenge to work on the exact same version of public databases. The sharing of input between multiple applications is an ongoing work, for which many bioinformatics groups around the globe have provided solutions independently. To tap into that wealth of experiences and use it to share the effort to maintain the infrastructure is our impetus. The distribution's software packages allows the tools included in those packages to be referenced and shared. The UseCase plugin developed as part of the EU KnowARC project extends the Taverna Workflow Workbench to take the description of such tools and include invocations of them within a Taverna workflow. The tools can be configured to run locally, or on a remote machine accessed via secure credentials such as ssh or grid certificates. Multiple invocations of a service can be achieved by the calling of the corresponding tool on a number of nodes at the same time, thus allowing faster running of the workflow over a distributed network of machines. So a workflow developer can write and test a workflow on small amounts of data locally and then by a simple change of configuration, run the workflow on a grid or cloud on much larger data sets. Workflows can include, not only tools within a packaged distribution, but also calls to other services such as WSDL operations, queries of a BioMart database or invocations of R scripts. The workflows can be uploaded to the myExperiment website and shared either publicly or with specific groups of people. The workflows can be downloaded and run, edited or included as part of a wider overall workflow. The development of workflows and the sharing of expertise via the myExperiment website based upon the creation of packaged distributions of tools, allows the collaboration of the Linux and Bioinformatics communities with great future potential. With Taverna as a workflow engine and as a data transporter, to work locally in a most efficient manner, one also needs to have the data locally accessible - with the right indices and APIs and (especially) in the right version. For clouds, locally may now mean remote to the user's location, and it allows for the sharing of the data. The Debian community has prepared a small utility, getData, that knows how to download the most recent versions of a series of common databases, checks for the availability of a series of bioinformatics tools, and performs the respective indexing. When collaborating in clouds, the users can also ensure that any manual updates of databases are performed only once for the instant direct benefit of all other users. To conclude, the dynamics of all three contributors, i.e. Linux distribution, Cloud infrastructure and workflow suite, are forming a symbiosis towards a readily usable infrastructure for performing and sharing biologically inspired research. The clouds bring considerable relief to smaller research groups, allowing them to think large, with the (optional) gained confidence through immediately available expert collaborators

...read moreread less