scispace - formally typeset
Search or ask a question

Showing papers on "Scientific workflow system published in 2009"


Journal ArticleDOI
TL;DR: The taxonomy provides end users with a mechanism by which they can assess the suitability of workflow in general and how they might use these features to make an informed choice about which workflow system would be a good choice for their particular application.

903 citations


Journal ArticleDOI
TL;DR: The RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way and is backwards compatible with workflows that use older versions of the RShell processor.
Abstract: Background: R is the statistical language commonly used by many life scientists in (omics) data by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical. Findings: We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv) Syntax highlighting of the R code; v) Improved usability through fewer port types. Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly. Conclusion: Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way.

27 citations


Book ChapterDOI
02 Jun 2009
TL;DR: An autonomous scientific workflow system that enables high-level, natural language based, queries over low-level data sets that avoids a standardized format for storing all data sets or the implementation of a federated, mediator-based, querying framework.
Abstract: Technological success has ushered in massive amounts of data for scientific analysis. To enable effective utilization of these data sets for all classes of users, supporting intuitive data access and manipulation interfaces is crucial. This paper describes an autonomous scientific workflow system that enables high-level, natural language based, queries over low-level data sets. Our technique involves a combination of natural language processing, metadata indexing, and a semantically-aware workflow composition engine which dynamically constructs workflows for answering queries based on service and data availability. A specific contribution of this work is a metadata registration scheme that allows for a unified index of heterogeneous metadata formats and service annotations. Our approach thus avoids a standardized format for storing all data sets or the implementation of a federated, mediator-based, querying framework. We have evaluated our system using a case study from the geospatial domain to show functional results. Our evaluation supports the potential benefits which our approach can offer to scientific workflow systems and other domain-specific, data intensive applications.

11 citations


Proceedings ArticleDOI
08 Mar 2009
TL;DR: A case study based on Sunfall, a distributed, parallel scientific workflow system built for the Nearby Supernova Factory, the largest data-volume supernova search currently in existence, is presented.
Abstract: Observational astrophysics has recently become a data-intensive science after many decades of relative data poverty. As a result, many of the algorithms developed for processing astronomical data, although well established for low-volume data capture, do not scale well to today's high-volume sky surveys and transient searches. Specifically, problems may occur with data transfer, workflow management, efficient parallelization, and integration of legacy code. Observational astrophysics workflows present computational challenges unique in high performance computing, including 24/7 operations, time-critical processing, and very large numbers of relatively small data files which must all be processed and archived. We present a case study based on Sunfall, a distributed, parallel scientific workflow system we built for the Nearby Supernova Factory, the largest data-volume supernova search currently in existence. We describe innovative techniques for data transfer and workflow management, and discuss lessons learned in building a large-scale observational astrophysics workflow management system.

9 citations


Proceedings ArticleDOI
28 Oct 2009
TL;DR: This paper presented a lightweight scientific workflow system, C-SWF, which is specially designed for astronomy, and provides many useful features such as simple, high performance and easy to deploy in order to meet the requirements of astronomers.
Abstract: Due to the urgent requirements of mass data processing in the field of astronomy, with the advent of network grid computing technique, scientific workflow technique was presented and quickly adopted for distributed astronomic data processing. However, current existing scientific workflow systems are too complex and enormous in system deploying and system manipulation. Few people can master it in short period. In this paper, we presented a lightweight scientific workflow system, C-SWF, which is specially designed for astronomy. All required fundamental functions, such as task customization, data movement, provenance and task re-run mechanisms, are fully implemented. Comparing with the existing scientific workflow system, C-SWF provides many useful features such as simple, high performance and easy to deploy in order to meet the requirements of astronomers.

5 citations


Journal ArticleDOI
TL;DR: This paper provides a case study on structured provenance modeling and management problems in the neuroimaging domain by introducing the Bio-Swarm-Pipeline, a new model that systematically addresses the provenance scope, representation, granularity, and implementation issues related to the neuroIMaging domain.
Abstract: A streamlined scientific workflow system that can track the details of the data processing history is critical for the efficient handling of fundamental routines used in scientific research. In the scientific workflow research community, the information that describes the details of data processing history is referred to as provenance which plays an important role in most of the existing workflow management systems. Despite its importance, however, provenance modeling and management is still a relatively new area in the scientific workflow research community. The proper scope, representation, granularity and implementation of a provenance model can vary from domain to domain and pose a number of challenges for an efficient pipeline design. This paper provides a case study on structured provenance modeling and management problems in the neuroimaging domain by introducing the Bio-Swarm-Pipeline (BSP). This new model, which is evaluated in the paper through real world scenarios, systematically addresses the provenance scope, representation, granularity, and implementation issues related to the neuroimaging domain. Although this model stems from applications in neuroimaging, the system can potentially be adapted to a wide range of bio-medical application scenarios.

4 citations


Book ChapterDOI
13 Oct 2009
TL;DR: This work extends its work on collaborative workflow design, by introducing a web-based scientific workflow system that enables easy-to-use semantic service composition with a domain specific workflow notation.
Abstract: Web portals enable sharing, execution and monitoring of scientific workflows, but usually depend on external development systems, with notations, which strive to support general workflows, but are still too complex for every-day use by biologists. The distinction between web-based and non-web based tools is likely to further irritate users. We extend our work on collaborative workflow design, by introducing a web-based scientific workflow system, that enables easy-to-use semantic service composition with a domain specific workflow notation.

2 citations


Proceedings ArticleDOI
21 Nov 2009
TL;DR: A lightweight scientific workflow system, which named C-SWF, is presented to support the complicated data processing in the field of astronomy and has many useful features that simplify the manipulation and deployment of the system.
Abstract: Network-grid computing is widely adopted in complex data computing and distributed data processing. In order to construct loosely-coupled application, the scientific workflow technique was used to invoke distributed services and data. However, there are many drawbacks in current scientific workflow system such as the complexity of manipulation and the difficulty of system deploy. In this paper, aims to the urgent requirement of astronomic data processing, we have a study on a lightweight scientific workflow system and discuss the critical techniques of the system implementation. Finally, we present a lightweight scientific workflow system, which named C-SWF, to support the complicated data processing in the field of astronomy. C-SWF fully implemented the critical features such as task customization, job scheduling, data movement, data’s provenance and task re-run mechanism. Comparing with the existing scientific workflow system, C-SWF has many useful features that simplify the manipulation and deployment of the system.

2 citations


Book ChapterDOI
Lei Li1, Bin Gong1, Yan Ma1
10 Aug 2009
TL;DR: A grid-oriented scientific workflow management system is designed and integrated into GOS (Grid Operating System), which uses light-weight threading techniques and event-driven mechanism to instantiate and dispatch jobs.
Abstract: With the advent of grid and application technologies, domain scientists are building more and more complex applications on distributed resources. In order to enable scientists to conduct application scenarios conveniently, scientific workflow is emerging as one of the most important and challenging grid application classes. GOS (Grid Operating System) is a novel grid middleware and can provide various scientific applications for scientists. In this paper, a grid-oriented scientific workflow management system is designed and integrated into GOS. Workflow engine uses light-weight threading techniques and event-driven mechanism to instantiate and dispatch jobs. Some important components will be discussed, and the main implements of the system are also introduced.

1 citations


Proceedings ArticleDOI
18 Aug 2009
TL;DR: This paper provides a scientific workflow system called EPSWFlow for the escientists in climate domain for services composition and workflow orchestration and provides a service wrapping method and a unified interface for the workflow users to access to the services.
Abstract: The development of large-scale parallel scientific computing applications has put forward more urgent demands for powerful computing capacities and complex process managing technologies. Meanwhile, the scientific experiment processes become more and more complicated which makes it becomes a hard work for e-scientists to control the experiment analysis processes by hand. In this paper, we provide a scientific workflow system called EPSWFlow for the escientists in climate domain for services composition and workflow orchestration. In order to integrate the large number of the existing legacy applications into the system, we provide a service wrapping method and a unified interface for the workflow users to access to the services. The workflow system can process the experiment process dynamically and manage the heterogeneous grid resources transparently.

Proceedings ArticleDOI
27 Aug 2009
TL;DR: A label introduced provenance architecture is presented which classifies provenance information with labels and managesprovenance information according as labels.
Abstract: Scientific workflows have shown great benefit as a means of orchestrating complex distributed scientific computations over grids and speed the advance of scientific progress. To address some requirements of scientific experiment, such as result reproducibility, sharing and validation, provenance support has become an essential component of scientific workflow systems. With the increase of the amount of provenance data, managing enormous set of provenance data becomes a challenge work. In this paper, we will present a label introduced provenance architecture which classifies provenance information with labels and manages provenance information according as labels.

Proceedings ArticleDOI
10 Oct 2009
TL;DR: The results show that scientific workflow system for CO2 flux data processing can solve many problems of too much multifarious calculation, inconsistent development platform and complicated procedure in flux dataprocessing.
Abstract: Scientific workflow systems have become a necessary tool for many applications, enabling the composition and execution of complex analysis. CO2 flux data observed by eddy covariance technique is large in quantity and the procedure of flux data is complex, scientific workflow technique plays a very important role in the sharing, reusing and automatic calculation of flux data processing method. In this paper, we discuss the feasibility and validity of applying scientific workflow technique to flux data processing and make a tentative approach to construct a scientific workflow system for CO2 flux data processing by taking Kepler scientific workflow system as the development platform. CO2 flux data of Changbai Mountain in 2003 is used to verify the scientific workflow system. The results show that scientific workflow system for CO2 flux data processing can solve many problems of too much multifarious calculation, inconsistent development platform and complicated procedure in flux data processing. This approach indicates that the scientific workflow system applied to CO2 flux data processing can provide an automatic calculation platform for flux data processing and prompt the communication and sharing of international flux data processing method, which make it easier for scientists to focus on their research and not computation management.

Proceedings ArticleDOI
06 Jul 2009
TL;DR: As techniques continue to mature and standards emerge, scientific workflows will inevitably become a standard business tool for organizations that rely on complex calculations for their everyday work.
Abstract: Organizations that depend on complex calculations for their day-to-day business such as science and engineering firms need enterprise-level management systems for their calculations These systems ideally allow subject-matter experts to automate calculations and disseminate them to other users in a controlled way that encourages standard practice and tracks results Scientific workflow ideas have a large part to play; graphical workflow composition and provenance metadata are examples that directly apply Scientific workflow systems can provide the leverage to promote business-critical calculations to first-class enterprise-level content As techniques continue to mature and standards emerge, scientific workflows will inevitably become a standard business tool for organizations that rely on complex calculations for their everyday work