scispace - formally typeset
Search or ask a question

Showing papers by "Carole Goble published in 2010"


Journal ArticleDOI
TL;DR: MyExperiment is an online research environment that supports the social sharing of bioinformatics workflows consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results.
Abstract: myExperiment (http://www.myexperiment.org) is an online research environment that supports the social sharing of bioinformatics workflows. These workflows are procedures consisting of a series of computational tasks using web services performed on data from its retrieval, integration and analysis, to the visualisation of the results. As a public repository of workflows, myExperiment allows anybody to discover those that are relevant to their research which can then be reused and repurposed to their specific requirements. Conversely, developers can submit their workflows to myExperiment and enable them to be shared in a secure manner. Since its release in 2007, myExperiment currently has over 3500 registered users and contains more than 900 workflows. The social aspect to the sharing of these workflows is facilitated by registered users forming virtual communities bound together by a common interest or research project. Contributors of workflows can build their reputation within these communities by receiving feedback and credit from individuals who reuse their work. Further documentation about myExperiment including its REST web service is available from http://wiki.myexperiment.org. Feedback and requests for support can be sent to bugs@myexperiment.org.

310 citations


Journal ArticleDOI
TL;DR: The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences, but their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult.
Abstract: The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable 'Web 2.0'-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community.

225 citations


Journal ArticleDOI
TL;DR: The notion of research objects, semantically rich aggregations of resources, that possess some scientifi?c intent or support some research objective are discussed and a number of principles that such objects and their associated services are expected to follow are presented.
Abstract: What will researchers be publishing in the future? Whilst there is little question that the Web will be the publication platform, as scholars move away from paper towards digital content, there is a need for mechanisms that support the production of self-contained units of knowledge and facilitate the publication, sharing and reuse of such entities. In this paper we discuss the notion of research objects, semantically rich aggregations of resources, that possess some scientifi?c intent or support some research objective. We present a number of principles that we expect such objects and their associated services to follow.

190 citations


30 Jun 2010
TL;DR: How the recently overhauled technical architecture of Taverna addresses issues of efficiency, scalability, and extensibility, and presents performance results based on a collection of synthetic workflows is described, as well as a concrete case study involving a production workflow in the area of cancer research.
Abstract: The Taverna workflow management system is an open source project with a history of widespread adoption within multiple experimental science communities, and a long-term ambition of effectively supporting the evolving need of those communities for complex, data-intensive, service-based experimental pipelines. This short paper describes how the recently overhauled technical architecture of Taverna addresses issues of efficiency, scalability, and extensibility, and presents performance results based on a collection of synthetic workflows, as well as a concrete case study involving a production workflow in the area of cancer research.

168 citations


Proceedings ArticleDOI
07 Dec 2010
TL;DR: This paper makes the case for a scientific data publication model on top of linked data and introduces the notion of Research Objects as first class citizens for sharing and publishing.
Abstract: Scientific data stands to represent a significant portion of the linked open data cloud and science itself stands to benefit from the data fusion capability that this will afford. However, simply publishing linked data into the cloud does not necessarily meet the requirements of reuse. Publishing has requirements of provenance, quality, credit, attribution, methods in order to provide the \emph{reproducibility} that allows validation of results. In this paper we make the case for a scientific data publication model on top of linked data and introduce the notion of \emph{Research Objects} as first class citizens for sharing and publishing.

90 citations


Book ChapterDOI
15 Jun 2010
TL;DR: This paper proposes a model and architecture for semantic, domain-aware provenance, and demonstrates its usefulness in answering typical user queries, and discusses the additional benefits and the technical implications of publishing provenance graphs as a form of Linked Data.
Abstract: Data provenance graphs are form of metadata that can be used to establish a variety of properties of data products that undergo sequences of transformations, typically specified as workflows. Their usefulness for answering user provenance queries is limited, however, unless the graphs are enhanced with domain-specific annotations. In this paper we propose a model and architecture for semantic, domain-aware provenance, and demonstrate its usefulness in answering typical user queries. Furthermore, we discuss the additional benefits and the technical implications of publishing provenance graphs as a form of Linked Data. A prototype implementation of the model is available for data produced by the Taverna workflow system.

71 citations


Journal ArticleDOI
TL;DR: Debian Med provides a simple method for offering new releases of software and data resources, thus provisioning a local infrastructure for computational biology, and closes the gap between developers and users.
Abstract: Background The Open Source movement and its technologies are popular in the bioinformatics community because they provide freely available tools and resources for research. In order to feed the steady demand for updates on software and associated data, a service infrastructure is required for sharing and providing these tools to heterogeneous computing environments.

70 citations


Journal ArticleDOI
TL;DR: This paper presents a formal semantics for the Taverna 2 scientific workflow system, which improves upon the existing model by adding support for data pipelining and providing new extensibility points that make it possible to add new operators to the workflow model.

69 citations


Journal ArticleDOI
TL;DR: The caGrid Workflow Toolkit is designed and implemented to ease building and running caGrid workflows and provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation.
Abstract: Background In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.

53 citations


Journal ArticleDOI
TL;DR: This study chooses BPEL and Taverna as candidates, and compares their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis, to show that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors.
Abstract: When the emergence of ‘service-oriented science,’ the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create ‘science workflows.’ We present here our findings in providing a workflow solution for the caGrid service-based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; whereas Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers to select a language or tool that meets their specific needs, but also offers some insight into how a workflow language and tool can fulfill the requirement of the scientific community. Copyright © 2009 John Wiley & Sons, Ltd.

46 citations


Proceedings ArticleDOI
17 Dec 2010
TL;DR: A model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models is presented.
Abstract: Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.

Journal ArticleDOI
TL;DR: Distributed information about metabolic reactions that have been described to MIRIAM standards enables the automated assembly of quantitative systems biology models of metabolic networks based on user-defined criteria.
Abstract: Background: The behaviour of biological systems can be deduced from their mathematical models. However, multiple sources of data in diverse forms are required in the construction of a model in order to define its components and their biochemical reactions, and corresponding parameters. Automating the assembly and use of systems biology models is dependent upon data integration processes involving the interoperation of data and analytical resources. Results: Taverna workflows have been developed for the automated assembly of quantitative parameterised metabolic networks in the Systems Biology Markup Language (SBML). A SBML model is built in a systematic fashion by the workflows which starts with the construction of a qualitative network using data from a MIRIAMcompliant genome-scale model of yeast metabolism. This is followed by parameterisation of the SBML model with experimental data from two repositories, the SABIO-RK enzyme kinetics database and a database of quantitative experimental results. The models are then calibrated and simulated in workflows that call out to COPASIWS, the web service interface to the COPASI software application for analysing biochemical networks. These systems biology workflows were evaluated for their ability to construct a parameterised model of yeast glycolysis. Conclusions: Distributed information about metabolic reactions that have been described to MIRIAM standards enables the automated assembly of quantitative systems biology models of metabolic networks based on userdefined criteria. Such data integration processes can be implemented as Taverna workflows to provide a rapid overview of the components and their relationships within a biochemical system.

Journal IssueDOI
TL;DR: The notion of the Research Object is introduced—the work objects that are built, transformed and published in the course of scientific experiments—and it is suggested that by encapsulating methods with results the authors can achieve research that is more reusable and repeatable and hence rapid and robust.
Abstract: By making research content more reusable, and providing a social infrastructure that facilitates sharing, the human aspects of the scholarly knowledge cycle may be accelerated and ‘time-to-discovery’ reduced. We propose that the key to this is the sharing of methods and processes. We present myExperiment, a social web site for discovering, sharing and curating Scientific Workflows and experiment plans, and describe how myExperiment facilitates the management and sharing of research workflows, supports a social model for content curation tailored to the researcher and community, and supports Open Science by exposing content and functionality to the users' tools and applications. Based on this, we introduce the notion of the Research Object—the work objects that are built, transformed and published in the course of scientific experiments—and suggest that by encapsulating methods with results we can achieve research that is more reusable and repeatable and hence rapid and robust. Copyright © 2010 John Wiley & Sons, Ltd.

Proceedings ArticleDOI
17 May 2010
TL;DR: ERGOT is a system that combines DHTs and SONs to enable semantic-based service discovery in distributed infrastructures such as Grids and Clouds and enables semantic- based service matchmaking, using a novel similarity measure between service requests and descriptions.
Abstract: The increasing number of available online services demands distributed architectures to promote scalability as well as semantics to enable their precise and efficient retrieval. Two common approaches toward this goal are Semantic Overlay Networks (SONs) and Distributed Hash Tables (DHTs) with semantic extensions. This paper presents ERGOT, a system that combines DHTs and SONs to enable semantic-based service discovery in distributed infrastructures such as Grids and Clouds. ERGOT takes advantage of semantic annotations that enrich service specifications in two ways: (i) services are advertised in the DHT on the basis of their annotations, thus allowing to establish a SON among service providers, (ii) annotations enable semantic-based service matchmaking, using a novel similarity measure between service requests and descriptions. Experimental evaluations confirmed the efficiency of ERGOT in terms of accuracy of search and network traffic.

Journal ArticleDOI
TL;DR: An extensible software model for the genotype and phenotype community, XGAP, which has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data.
Abstract: We present an extensible software model for the genotype and phenotype community, XGAP. Readers can download a standard XGAP (http://www.xgap.org) or auto-generate a custom version using MOLGENIS with programming interfaces to R-software and web-services or user interfaces for biologists. XGAP has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data. Current functionality includes tools ranging from eQTL analysis in mouse to genome-wide association studies in humans.

Book ChapterDOI
15 Jun 2010
TL;DR: This paper describes a new query model that captures implicit user collaborations and shows how this model maps to OPM and helps to answer collaborative queries, e.g., identifying combined workflows and contributions of users collaborating on a project based on the records of previous workflow executions.
Abstract: The provenance of a data product contains information about how the product was derived, and is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. Currently, most provenance models are designed to capture the provenance related to a single run, and mostly executed by a single user. However, a scientific discovery is often the result of methodical execution of many scientific workflows with many datasets produced at different times by one or more users. Further, to promote and facilitate exchange of information between multiple workflow systems supporting provenance, the Open Provenance Model (OPM) has been proposed by the scientific workflow community. In this paper, we describe a new query model that captures implicit user collaborations. We show how this model maps to OPM and helps to answer collaborative queries, e.g., identifying combined workflows and contributions of users collaborating on a project based on the records of previous workflow executions. We also adopt and extend the high-level Query Language for Provenance (QLP) with additional constructs, and show how these extensions allow non-expert users to express collaborative provenance queries against this model easily and concisely. Furthermore, we adopt the Provenance Challenge 3 (PC3) workflows as a collaborative and interoperable usecase scenario, where different stages of the workflow are executed in three different workflow environments - Kepler, Taverna, and WSVLAM. Through this usecase, we demonstrate how we can establish and understand collaborative studies through interoperable workflow provenance.

Journal ArticleDOI
TL;DR: This study chooses BPEL and Taverna as candidates, and compares their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis, to show that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors.
Abstract: When the emergence of ‘service‐oriented science,’ the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create ‘science workflows.’ We present here our findings in providing a workflow solution for the caGrid service‐based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; whereas Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers to select a language or tool that meets their specific needs, but also offers some insight into how a workflow language and tool can fulfill the requirement of the scientific community. Copyright © 2009 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: Three applications of e-social science that promote social simulation modelling, data management and visualization are described and an example is outlined in which the three components are brought together in a transport planning context.
Abstract: Applications of simulation modelling in social science domains are varied and increasingly widespread. The effective deployment of simulation models depends on access to diverse datasets, the use of analysis capabilities, the ability to visualize model outcomes and to capture, share and re-use simulations as evidence in research and policy-making. We describe three applications of e-social science that promote social simulation modelling, data management and visualization. An example is outlined in which the three components are brought together in a transport planning context. We discuss opportunities and benefits for the combination of these and other components into an e-infrastructure for social simulation and review recent progress towards the establishment of such an infrastructure.

Proceedings ArticleDOI
07 Dec 2010
TL;DR: The myExperiment social website for sharing scientific workflows, designed according to Web 2.0 principles, has grown to be the largest public repository of its kind and now supports Linked Data.
Abstract: The myExperiment social website for sharing scientific workflows, designed according to Web 2.0 principles, has grown to be the largest public repository of its kind. It is distinctive for its focus on sharing methods, its researcher-centric design and its facility to aggregate content into sharable ‘research objects’. This evolution of myExperiment has occurred hand in hand with its users. myExperiment now supports Linked Data as a step toward our vision of the future research environment, which we categorise here as 3rd generation e-Research.

Proceedings Article
01 Mar 2010
TL;DR: This paper proposes that the "methods" by which results are obtained to be first class citizens in the Web of Data, so that they can be shared and discussed and so that results can be explained, interpreted and reused.
Abstract: Is the Linkded Data Web ready for people to use open government data or scientific datasets to do reproducible research? For one thing, practice and support for versioning have not yet emerged. In this paper we propose that we also need the "methods" by which results are obtained to be first class citizens in the Web of Data, so that they can be shared and discussed and so that results can be explained, interpreted and reused. We discuss our experience of the myExperiment.org Web Site, a social network of people sharing reusable methods for processing research data, with mechanisms for discovering, sharing, enacting, versioning and curation.

Journal ArticleDOI
TL;DR: A semantic information service that aggregates metadata from a large number of information sources of a large-scale Grid infrastructure using an ontology-based information integration architecture suitable for the highly dynamic distributed information sources available in Grid systems is described.

Proceedings ArticleDOI
05 Jul 2010
TL;DR: In this paper, the authors introduce functional units (FU) as elementary units of information used to describe a service, and propose techniques for automating the service annotations process by analysing collections of workflows that use those services.
Abstract: Computational and data-intensive science increasingly depends on a large Web Service infrastructure, as services that provide a broad array of functionality can be composed into workflows to address complex research questions. In this context, the goal of service registries is to offer accurate search and discovery functions to scientists. Their effectiveness, however, depends not only on the model chosen to annotate the services, but also on the level of abstraction chosen for the annotations. The work presented in this paper stems from the observation that current annotation models force users to think in terms of service interfaces, rather than of high-level functionality, thus reducing their effectiveness. To alleviate this problem, we introduce \textit{Functional Units} (FU) as the elementary units of information used to describe a service. Using popular examples of services for the Life Sciences, we define FUs as configurations and compositions of underlying service operations, and show how functional-style service annotations can be easily realised using the OWL semantic Web language. Finally, we suggest techniques for automating the service annotations process, by analysing collections of workflows that use those services.

Proceedings Article
01 Apr 2010
TL;DR: This work proposes the adoption of social trust techniques to share a new emerging class of scientic digital object - Research Objects, and proposes a mechanism for introducing social trust metrics into the distributed social web to facilitate access control to aggregations of linked data resources.
Abstract: The web of linked data is incompatible with the modern \selsh scientist". What is missing is a mechanism that supports both what scientists share, and how they share. Solutions must be informed by social, technical and cultural issues surrounding the sharing of scientic data in the web of linked data. We propose the adoption of social trust techniques to share a new emerging class of scientic digital object - Research Objects. We suggest a mechanism for introducing social trust metrics into the distributed social web to facilitate access control to aggregations of linked data resources. Through the application and analysis of two established trust metrics, we then present the grounding of the Colleague of a Colleague (Cocoa) trust metric suited to the sharing of scientic knowledge delivered as Research Objects.

01 Mar 2010
TL;DR: In this article, the authors describe a semantic information service that aggregates metadata from a large number of information sources of a large-scale Grid infrastructure using an ontology-based information integration architecture (ActOn).
Abstract: We describe a semantic information service that aggregates metadata from a large number of information sources of a large-scale Grid infrastructure. It uses an ontology-based information integration architecture (ActOn) suitable for the highly dynamic distributed information sources available in Grid systems, where information changes frequently and where the information of distributed sources has to be aggregated in order to solve complex queries. These two challenges are addressed by a Metadata Cache that works with an update-on-demand policy and by an information source selection module that selects the most suitable source at a given point in time. We have evaluated the quality of this information service, and compared it with other similar services from the EGEE production testbed, with promising results.

Proceedings ArticleDOI
06 Jun 2010
TL;DR: A research agenda to build an infrastructure with a flexible design to work on an Internet-wide scale to incorporate workflow editing, sharing and enactment capabilities directly into the Internet, thus making distributed applications available and usable in a wide range of pervasive settings.
Abstract: While current Distributed Computation Platforms (DCPs) provide environments for composition and enactment of distributed applications, a comprehensive methodology and tool that facilitates open, generic and rapid development, sharing and utilization of distributed applications represented through the workflow methodology, is still missing. The goal is to incorporate workflow editing, sharing and enactment capabilities directly into the Internet, thus making distributed applications available and usable in a wide range of pervasive settings. In this position paper, we outline a research agenda to build such an infrastructure with a flexible design to work on an Internet-wide scale. Research and technology development activities are intended to address the following end-user needs: (1) abstracting individual DCPs so that end-users can use common interfaces, (2) allowing rapid customization of the distributed process, (3) sharing processes, lessons and features from distributed applications across the community and (4) incorporating security and provenance framework.

09 Nov 2010
TL;DR: RightField is an open source application that provides a mechanism for embedding ontology annotation support for Life Science data in Microsoft Excel spreadsheets, enabling scientists to consistently annotate their data without the need to understand the numerous metadata standards and ontologies available to them.
Abstract: RightField is an open source application that provides a mechanism for embedding ontology annotation support for Life Science data in Microsoft Excel spreadsheets Individual cells, columns, or rows can be restricted to particular ranges of allowed classes or instances from chosen ontologies Informaticians, with experience in ontologies and data annotation prepare RightField-enabled spreadsheets with embedded ontology term selection for use by a wider community of laboratory scientists The RightField-enabled spreadsheet presents selected ontology terms to the users as a simple drop-down list, enabling scientists to consistently annotate their data without the need to understand the numerous metadata standards and ontologies available to them The spreadsheets are self-contained and remain "vanilla" Excel so that they can be readily exchanged, processed offline and are usable by regular Excel tooling The result is semantic annotation by stealth, with an annotation process that is less error-prone, more efficient, and more consistent with community standards RightField has been developed and deployed for a consortium of some 300 Systems Biologists RightField is open source under a BSD license and freely available from http://wwwsysmo-dborg/RightField


Book ChapterDOI
01 Jan 2010
TL;DR: This chapter focuses on the role of the ideas and technologies of the Semantic Web in providing and utilising the infrastructure for science.
Abstract: The capabilities of the Web have had a very significant impact in facilitating new practice in science: it supports wide-scale information discovery and sharing, facilitating collaboration and enabling widespread participation in digital science, and increasingly it also provides a platform for developing and delivering software and services to support science. In this chapter we focus on the role of the ideas and technologies of the Semantic Web in providing and utilising the infrastructure for science. Our emphasis on Semantic Web and on the joined-up infrastructure to support the increasing scale of data, computation, collaboration and automation as science and computing advance has led to this field being known as the “Semantic Grid”. Since its instigation in 2001 the Semantic Grid community has established a significant body of work, and the approach continues to underpin new scientific practice in multiple disciplines.


01 Jul 2010
TL;DR: This work presents the presented work of the Debian Linux community, an open society of enthusiasts around the globe who collaborate on packaging free software for the Linux and FreeBSD kernels, which renders them available from mobiles to supercomputers and for all common processors.
Abstract: Computational biology manifests itself in many flavours. It comprises the data analysis and -management of sequences, structures, the observed and synthetical variants of the prior, static or dynamic interactions, and serves the modelling of biological processes in physiological and pathophysiological conditions. The field gained an enormous momentum over the past two decades. The information gathered today covers biological properties of many organisms and serves as a reference and general source for derived work also for neighbouring disciplines. Biologist, physicians and chemists all started using bioinformatics tools, data and models in their routine. The latest trend is to integrate the thinking of engineers and physicis, who construct compounds in silico to later prove the predicted function in the lab. The approach became known as synthetic biology and is perceived by many to allow a fluent transition towards nano-technologies. With research questions becoming increasingly complex, they demand the interaction of highly specialised disciplines. This leads to a steady increase in the number of non-redundant tools and databases that researchers need to interact with - both the computational developer and the biological users. The dependency of the biological research community on such services will increase over the upcoming years. The strong computational demands of the services, and the sheer complexity of the research fosters the collaboration of individuals from many sites, computationally in form of grid and cloud computing, but also between computationally and biologically primed groups. To maintain the software installation consistently is barely achievable for dedicated individuals; the sharing of such across various platforms and institutional boundaries is the driving force behind the here presented work of the Debian Linux community. Debian is an open society of enthusiasts around the globe who collaborate on packaging free software for the Linux and FreeBSD kernels. Packages are prepared by individuals and uploaded to the distribution's main servers for auto-building on today's most prominent platforms, thus rendering them available from mobiles to supercomputers and for all common processors. For complex suites or as a principle, packagers have an option to share their effort as part of a community. This process is aided by portalsauto-prepared by the infrastructure of the Debian blends. Packages invite feedback from users with the Bug Tracking System. Around 80000 users have allowed the counting of their applications via Debian's Popularity-Contest initiative. Separately counted are installations of packages that are forwarded to derived distributions. The most prominent of these is Ubuntu, for which more than 1.3 million users are reporting. Packages are described verbosely and are translated to many languages. More formally they may be selected by manual assignment of terms from a controlled vocabulary. Technical constraints for the packaging are laid out in the Debian Policy document. Changes to it are discussed on the project's mailing lists and may be subject to voting by contributors to the distribution. The Ubuntu Linux distribution adopts the Debian packages for their own software "universe" and as such considerably contributes to the dissemination of the efforts. The computing world experiences continuous transitions, e.g. these days from 32 to 64 bit. Upcoming is an increased acceptance for energy-saving ARM- and MIPS-based operating systems of the mobile world and some special highly parallel systems. With Debian's packages being auto-built on all these different hardware platforms, one can expect continuity during such transitions, and similarly find consistent setups in the typical heterogeneous research infrastructures. This is of particu- lar benefit for distributed computations and contributes to the strong adoption of Debian and Ubuntu for cloud computing. Packaging is most successful, i.e. up-to-date and tested, when it is derived from the packager's daily routine. For computational biology, the community now faces the challenge to scale with the steady increase in complexity: the number of contributors to the packaging needs to match the number of programs that users expect to be available. The group maintenance of applications is one such approach that seeks to lower the entry hurdle for the packaging by mutual training and the distribution of work according to expertise and interests. It also helps the integration of the software developers themselves with the community, e.g. for AutoDock and BALLView: the software developers follow the distribution's bug reports directly, and may contribute a description of their package or were invited to upload their own packages directly to the distribution's servers rather than offering them on their respective home page. With an increasing number of packages available, the interaction between those tools becomes more and more of concern. This addresses the establishment of workflows comprising tools from many packages, but from the distribution's perspective it is also the challenge to work on the exact same version of public databases. The sharing of input between multiple applications is an ongoing work, for which many bioinformatics groups around the globe have provided solutions independently. To tap into that wealth of experiences and use it to share the effort to maintain the infrastructure is our impetus. The distribution's software packages allows the tools included in those packages to be referenced and shared. The UseCase plugin developed as part of the EU KnowARC project extends the Taverna Workflow Workbench to take the description of such tools and include invocations of them within a Taverna workflow. The tools can be configured to run locally, or on a remote machine accessed via secure credentials such as ssh or grid certificates. Multiple invocations of a service can be achieved by the calling of the corresponding tool on a number of nodes at the same time, thus allowing faster running of the workflow over a distributed network of machines. So a workflow developer can write and test a workflow on small amounts of data locally and then by a simple change of configuration, run the workflow on a grid or cloud on much larger data sets. Workflows can include, not only tools within a packaged distribution, but also calls to other services such as WSDL operations, queries of a BioMart database or invocations of R scripts. The workflows can be uploaded to the myExperiment website and shared either publicly or with specific groups of people. The workflows can be downloaded and run, edited or included as part of a wider overall workflow. The development of workflows and the sharing of expertise via the myExperiment website based upon the creation of packaged distributions of tools, allows the collaboration of the Linux and Bioinformatics communities with great future potential. With Taverna as a workflow engine and as a data transporter, to work locally in a most efficient manner, one also needs to have the data locally accessible - with the right indices and APIs and (especially) in the right version. For clouds, locally may now mean remote to the user's location, and it allows for the sharing of the data. The Debian community has prepared a small utility, getData, that knows how to download the most recent versions of a series of common databases, checks for the availability of a series of bioinformatics tools, and performs the respective indexing. When collaborating in clouds, the users can also ensure that any manual updates of databases are performed only once for the instant direct benefit of all other users. To conclude, the dynamics of all three contributors, i.e. Linux distribution, Cloud infrastructure and workflow suite, are forming a symbiosis towards a readily usable infrastructure for performing and sharing biologically inspired research. The clouds bring considerable relief to smaller research groups, allowing them to think large, with the (optional) gained confidence through immediately available expert collaborators