scispace - formally typeset
Search or ask a question

Showing papers on "Workflow published in 2011"


Journal ArticleDOI
TL;DR: Typical topics and problems encountered during data processing of diffraction experiments are discussed and the tools provided in the autoPROC software are described.
Abstract: A typical diffraction experiment will generate many images and data sets from different crystals in a very short time. This creates a challenge for the high-throughput operation of modern synchrotron beamlines as well as for the subsequent data processing. Novice users in particular may feel overwhelmed by the tables, plots and numbers that the different data-processing programs and software packages present to them. Here, some of the more common problems that a user has to deal with when processing a set of images that will finally make up a processed data set are shown, concentrating on difficulties that may often show up during the first steps along the path of turning the experiment (i.e. data collection) into a model (i.e. interpreted electron density). Difficulties such as unexpected crystal forms, issues in crystal handling and suboptimal choices of data-collection strategies can often be dealt with, or at least diagnosed, by analysing specific data characteristics during processing. In the end, one wants to distinguish problems over which one has no immediate control once the experiment is finished from problems that can be remedied a posteriori. A new software package, autoPROC, is also presented that combines third-party processing programs with new tools and an automated workflow script that is intended to provide users with both guidance and insight into the offline processing of data affected by the difficulties mentioned above, with particular emphasis on the automated treatment of multi-sweep data sets collected on multi-axis goniostats.

1,239 citations


Journal ArticleDOI
Li Da Xu1
TL;DR: The state of the art in the area of enterprise systems as they relate to industrial informatics is surveyed, highlighting formal methods and systems methods crucial for modeling complex enterprise systems, which poses unique challenges.
Abstract: Rapid advances in industrial information integration methods have spurred tremendous growth in the use of enterprise systems. Consequently, a variety of techniques have been used for probing enterprise systems. These techniques include business process management, workflow management, Enterprise Application Integration (EAI), Service-Oriented Architecture (SOA), grid computing, and others. Many applications require a combination of these techniques, which is giving rise to the emergence of enterprise systems. Development of the techniques has originated from different disciplines and has the potential to significantly improve the performance of enterprise systems. However, the lack of powerful tools still poses a major hindrance to exploiting the full potential of enterprise systems. In particular, formal methods and systems methods are crucial for modeling complex enterprise systems, which poses unique challenges. In this paper, we briefly survey the state of the art in the area of enterprise systems as they relate to industrial informatics.

637 citations


Proceedings ArticleDOI
12 Nov 2011
TL;DR: This paper presents an approach whereby the basic computing elements are virtual machines (VMs) of various sizes/costs, jobs are specified as workflows, users specify performance requirements by assigning (soft) deadlines to jobs, and the goal is to ensure all jobs are finished within their deadlines at minimum financial cost.
Abstract: A goal in cloud computing is to allocate (and thus pay for) only those cloud resources that are truly needed. To date, cloud practitioners have pursued schedule-based (e.g., time-of-day) and rule-based mechanisms to attempt to automate this matching between computing requirements and computing resources. However, most of these "auto-scaling" mechanisms only support simple resource utilization indicators and do not specifically consider both user performance requirements and budget concerns. In this paper, we present an approach whereby the basic computing elements are virtual machines (VMs) of various sizes/costs, jobs are specified as workflows, users specify performance requirements by assigning (soft) deadlines to jobs, and the goal is to ensure all jobs are finished within their deadlines at minimum financial cost. We accomplish our goal by dynamically allocating/deallocating VMs and scheduling tasks on the most cost-efficient instances. We evaluate our approach in four representative cloud workload patterns and show cost savings from 9.8% to 40.4% compared to other approaches.

556 citations


Journal ArticleDOI
TL;DR: It is shown that the eight soundness notions described in the literature are decidable for workflow nets, however, most extensions will make all of these notions undecidable.
Abstract: Workflow nets, a particular class of Petri nets, have become one of the standard ways to model and analyze workflows. Typically, they are used as an abstraction of the workflow that is used to check the so-called soundness property. This property guarantees the absence of livelocks, deadlocks, and other anomalies that can be detected without domain knowledge. Several authors have proposed alternative notions of soundness and have suggested to use more expressive languages, e.g., models with cancellations or priorities. This paper provides an overview of the different notions of soundness and investigates these in the presence of different extensions of workflow nets. We will show that the eight soundness notions described in the literature are decidable for workflow nets. However, most extensions will make all of these notions undecidable. These new results show the theoretical limits of workflow verification. Moreover, we discuss some of the analysis approaches described in the literature.

335 citations


Journal ArticleDOI
TL;DR: The authors explore the tension between expressivity and structured clinical documentation, review methods for obtaining reusable data from clinical notes, and recommend that healthcare providers be able to choose how to document patient care based on workflow and note content needs.

326 citations


Journal ArticleDOI
TL;DR: It is demonstrated that hybrid operators, which combine two pure operators, reduce the number of duplicate structures in the search, which allows for better exploration of the potential energy surface of the system in question, while simultaneously zooming in on the most promising regions.

266 citations


Journal ArticleDOI
TL;DR: The types of software tools that are required at different stages of systems biology research and the current options that are available for systems biology researchers are described.
Abstract: Understanding complex biological systems requires extensive support from software tools. Such tools are needed at each step of a systems biology computational workflow, which typically consists of data handling, network inference, deep curation, dynamical simulation and model analysis. In addition, there are now efforts to develop integrated software platforms, so that tools that are used at different stages of the workflow and by different researchers can easily be used together. This Review describes the types of software tools that are required at different stages of systems biology research and the current options that are available for systems biology researchers. We also discuss the challenges and prospects for modelling the effects of genetic changes on physiology and the concept of an integrated platform.

262 citations


Posted Content
TL;DR: The authors integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers and used humans to compare items for sorting and joining data, two of the most common operations in DBMSs.
Abstract: Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

262 citations


Journal ArticleDOI
TL;DR: This paper presents HCOC: The Hybrid Cloud Optimized Cost scheduling algorithm, which decides which resources should be leased from the public cloud and aggregated to the private cloud to provide sufficient processing power to execute a workflow within a given execution time.
Abstract: Workflows have been used to represent a variety of applications involving high processing and storage demands. As a solution to supply this necessity, the cloud computing paradigm has emerged as an on-demand resources provider. While public clouds charge users in a per-use basis, private clouds are owned by users and can be utilized with no charge. When a public cloud and a private cloud are merged, we have what we call a hybrid cloud. In a hybrid cloud, the user has elasticity provided by public cloud resources that can be aggregated to the private resources pool as necessary. One question faced by the users in such systems is: Which are the best resources to request from a public cloud based on the current demand and on resources costs? In this paper we deal with this problem, presenting HCOC: The Hybrid Cloud Optimized Cost scheduling algorithm. HCOC decides which resources should be leased from the public cloud and aggregated to the private cloud to provide sufficient processing power to execute a workflow within a given execution time. We present extensive experimental and simulation results which show that HCOC can reduce costs while achieving the established desired execution time.

261 citations


Journal ArticleDOI
01 Sep 2011
TL;DR: This paper describes how MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task, and proposes a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them.
Abstract: Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

259 citations


Journal ArticleDOI
TL;DR: This paper suggests an architecture for the automatic execution of large-scale workflow-based applications on dynamically and elastically provisioned computing resources using the core algorithm named PBTS (Partitioned Balanced Time Scheduling), which estimates the minimum number of computing hosts required to execute a workflow within a user-specified finish time.

Book
28 Aug 2011
TL;DR: This approach is based on exploiting ageneric and reusable body of knowledge concerning what kinds of exceptions can occur in collaborative work processes and how these exceptions can handled.
Abstract: This paper describes a novel knowledge-based approach for helping workflow process designers and participants better manage the exceptions (deviations from an ideal collaborative work process caused by errors, failures, resource or requirements changes etc.) that can occur during the enactment of a workflow. This approach is based on exploiting a generic and reusable body of knowledge concerning what kinds of exceptions can occur in collaborative work processes, and how these exceptions can handled (detected, diagnosed and resolved). This work builds upon previous efforts from the MIT Process Handbook project and from research on conflict management in collaborative design.

Proceedings ArticleDOI
04 Jul 2011
TL;DR: The preliminary experiments show that SHEFT not only outperforms several representative workflow scheduling algorithms in optimizing workflow execution time, but also enables resources to scale elastically at runtime.
Abstract: Most existing workflow scheduling algorithms only consider a computing environment in which the number of compute resources is bounded. Compute resources in such an environment usually cannot be provisioned or released on demand of the size of a workflow, and these resources are not released to the environment until an execution of the workflow completes. To address the problem, we firstly formalize a model of a Cloud environment and a workflow graph representation for such an environment. Then, we propose the SHEFT workflow scheduling algorithm to schedule a workflow elastically on a Cloud computing environment. Our preliminary experiments show that SHEFT not only outperforms several representative workflow scheduling algorithms in optimizing workflow execution time, but also enables resources to scale elastically at runtime.

Journal ArticleDOI
TL;DR: This work covers the entire workflow of metabolomics studies, starting from experimental design and sample-size determination to tools that can aid in biological interpretation and discusses the problems that have to be dealt with in data analysis in metabolomics.
Abstract: Metabolomics studies aim at a better understanding of biochemical processes by studying relations between metabolites and between metabolites and other types of information (e.g., sensory and phenotypic features). The objectives of these studies are diverse, but the types of data generated and the methods for extracting information from the data and analysing the data are similar. Besides instrumental analysis tools, various data-analysis tools are needed to extract this relevant information. The entire data-processing workflow is complex and has many steps. For a comprehensive overview, we cover the entire workflow of metabolomics studies, starting from experimental design and sample-size determination to tools that can aid in biological interpretation. We include illustrative examples and discuss the problems that have to be dealt with in data analysis in metabolomics. We also discuss where the challenges are for developing new methods and tailor-made quantitative strategies.

Journal ArticleDOI
TL;DR: Describes the Wings intelligent workflow system that assists scientists with designing computational experiments by automatically tracking constraints and ruling out invalid designs, letting scientists focus on their experiments and goals.
Abstract: Describes the Wings intelligent workflow system that assists scientists with designing computational experiments by automatically tracking constraints and ruling out invalid designs, letting scientists focus on their experiments and goals.

Proceedings Article
22 May 2011
TL;DR: This paper proposes to deal with problems using appropriate initialization for the early stages as well as convergence speedups applied throughout the learning phases of reinforcement learning, and presents the first experimental results for these.
Abstract: Dynamic and appropriate resource dimensioning isa crucial issue in cloud computing. As applications go more andmore 24/7, online policies must be sought to balance performancewith the cost of allocated virtual machines. Most industrialapproaches to date use ad hoc manual policies, such as thresholdbasedones. Providing good thresholds proved to be tricky andhard to automatize to fit every application requirement. Researchis being done to apply automatic decision-making approaches,such as reinforcement learning. Yet, they face a lot of problemsto go to the field: having good policies in the early phasesof learning, time for the learning to converge to an optimalpolicy and coping with changes in the application performancebehavior over time. In this paper, we propose to deal with theseproblems using appropriate initialization for the early stages aswell as convergence speedups applied throughout the learningphases and we present our first experimental results for these.We also introduce a performance model change detection onwhich we are currently working to complete the learning processmanagement. Even though some of these proposals were knownin the reinforcement learning field, the key contribution of thispaper is to integrate them in a real cloud controller and toprogram them as an automated workflow.

Proceedings ArticleDOI
08 Jun 2011
TL;DR: This paper describes the experiences running a scientific workflow application developed to process astronomy data released by the Kepler project, a NASA mission to search for Earth-like planets orbiting other stars, and demonstrates how Pegasus was able to support sky computing by executing a single workflow across multiple cloud infrastructures simultaneously.
Abstract: Clouds are rapidly becoming an important platform for scientific applications In this paper we describe our experiences running a scientific workflow application in the cloud The application was developed to process astronomy data released by the Kepler project, a NASA mission to search for Earth-like planets orbiting other stars This workflow was deployed across multiple clouds using the Pegasus Workflow Management System The clouds used include several sites within the FutureGrid, NERSC's Magellan cloud, and Amazon EC2 We describe how the application was deployed, evaluate its performance executing in different clouds (based on Nimbus, Eucalyptus, and EC2), and discuss the challenges of deploying and executing workflows in a cloud environment We also demonstrate how Pegasus was able to support sky computing by executing a single workflow across multiple cloud infrastructures simultaneously

Journal ArticleDOI
01 Dec 2011
TL;DR: This work presents a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies.
Abstract: Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow provenance. It also enables a number of novel graph transformation operations, allowing to choose the desired level of granularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We implemented our approach in the Lipstick system and developed a benchmark in support of a systematic performance evaluation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.

Journal ArticleDOI
TL;DR: It is found that the development of systems supporting individual clinical decisions is evolving toward the implementation of adaptable care pathways on the semantic web, incorporating formal, clinical, and organizational ontologies, and the use of workflow management systems.

Proceedings ArticleDOI
18 Nov 2011
TL;DR: The architecture of Airavata and its modules are discussed, and how the software can be used as individual components or as an integrated solution to build science gateways or general-purpose distributed application and workflow management systems are illustrated.
Abstract: In this paper, we introduce Apache Airavata, a software framework to compose, manage, execute, and monitor distributed applications and workflows on computational resources ranging from local resources to computational grids and clouds. Airavata builds on general concepts of service-oriented computing, distributed messaging, and workflow composition and orchestration. This paper discusses the architecture of Airavata and its modules, and illustrates how the software can be used as individual components or as an integrated solution to build science gateways or general-purpose distributed application and workflow management systems.

Journal ArticleDOI
TL;DR: The experiments show that the RD reputation improves the reliability of an application with more accurate reputations, while the LAGA provides better solutions than existing list heuristics and evolves to better solutions more quickly than a traditional GA.

Journal ArticleDOI
Bradley Jones1, John Sall1
TL;DR: JMP as mentioned in this paper is a statistical software environment that enables scientists, engineers, and business analysts to make discoveries through data exploration, and it supports custom design, an innovative approach to the statistical design of experiments, but whether your results come from designed experiments or from an observational study, they provide analytical tools that put graphs up front.
Abstract: JMP is a statistical software environment that enables scientists, engineers, and business analysts to make discoveries through data exploration. One powerful method for beginning the process of discovery employs statistically designed experiments. A well-designed experiment ensures that the resulting data have large information content. We support this method with custom design, an innovative approach to the statistical design of experiments. But whether your results come from designed experiments or from an observational study, we provide analytical tools that put graphs up front. JMP's graphical user interface (GUI) makes these plots interactive and dynamically linked to each other and to the data. Moreover, in the design of JMP's user interface the priority was to make a smooth and natural workflow for data analysis. Notice how the data exploration flows in the following case study that investigates the relationship between life expectancy and health-care spending for 166 countries. WIREs Comp Stat 2011 3 188–194 DOI: 10.1002/wics.162 For further resources related to this article, please visit the WIREs website

Journal ArticleDOI
TL;DR: The detailed knowledge engineering workflow is shown to be useful for structuring a complex iterative BN development process and methods for incorporating it into the knowledge engineering process, including visualisation and analysis of the learned networks are presented.

Journal ArticleDOI
01 Aug 2011
TL;DR: This work proposes an algebraic approach (inspired by relational algebra) and a parallel execution model that enable automatic optimization of scientific workflows and demonstrates performance improvements of up to 226% compared to an ad-hoc workflow implementation.
Abstract: Scientific workflows have emerged as a basic abstraction for structuring and executing scientific experiments in computational environments. In many situations, these workflows are computationally and data intensive, thus requiring execution in large-scale parallel computers. However, parallelization of scientific workflows remains low-level, ad-hoc and labor-intensive, which makes it hard to exploit optimization opportunities. To address this problem, we propose an algebraic approach (inspired by relational algebra) and a parallel execution model that enable automatic optimization of scientific workflows. We conducted a thorough validation of our approach using both a real oil exploitation application and synthetic data scenarios. The experiments were run in Chiron, a data-centric scientific workflow engine implemented to support our algebraic approach. Our experiments demonstrate performance improvements of up to 226% compared to an ad-hoc workflow implementation.

Patent
01 Sep 2011
TL;DR: In this article, a role-based access control (RBAC) system is presented which simulates the organizational structure and workflow of a typical IT department to enable workflow management via the GUI for any component or function of a customer's virtual data center.
Abstract: Embodiments provide techniques for customers to easily, quickly and remotely manage their virtual data centers. Using, for example, a "single pane of glass" GUI view which shows all of the components (including e.g., machines (cpu and RAM), network services (load balancers, firewalls, network address translation, IP management) and storage) of their virtual data centers, provides a complete overview and a starting point for system or component management. According to embodiments, a Roles Based Access Control (RBAC) system is provided which simulates the organizational structure and workflow of a typical IT department to enable workflow management via the GUI for any component or function of a customer's virtual data center.

Journal ArticleDOI
TL;DR: The SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows, thus facilitating the intersection of Web services and Semantic Web technologies.
Abstract: The complexity and inter-related nature of biological data poses a difficult challenge for data and tool integration. There has been a proliferation of interoperability standards and projects over the past decade, none of which has been widely adopted by the bioinformatics community. Recent attempts have focused on the use of semantics to assist integration, and Semantic Web technologies are being welcomed by this community. SADI - Semantic Automated Discovery and Integration - is a lightweight set of fully standards-compliant Semantic Web service design patterns that simplify the publication of services of the type commonly found in bioinformatics and other scientific domains. Using Semantic Web technologies at every level of the Web services "stack", SADI services consume and produce instances of OWL Classes following a small number of very straightforward best-practices. In addition, we provide codebases that support these best-practices, and plug-in tools to popular developer and client software that dramatically simplify deployment of services by providers, and the discovery and utilization of those services by their consumers. SADI Services are fully compliant with, and utilize only foundational Web standards; are simple to create and maintain for service providers; and can be discovered and utilized in a very intuitive way by biologist end-users. In addition, the SADI design patterns significantly improve the ability of software to automatically discover appropriate services based on user-needs, and automatically chain these into complex analytical workflows. We show that, when resources are exposed through SADI, data compliant with a given ontological model can be automatically gathered, or generated, from these distributed, non-coordinating resources - a behaviour we have not observed in any other Semantic system. Finally, we show that, using SADI, data dynamically generated from Web services can be explored in a manner very similar to data housed in static triple-stores, thus facilitating the intersection of Web services and Semantic Web technologies.

Journal ArticleDOI
TL;DR: An open-source VM system based on the Google Maps engine that allows students and faculty to collaboratively create content, annotate slides with markers, and be enhanced with social networking features to give the community of learners more control over the system has been an effective solution to the challenges facing traditional histopathology laboratories and the novel needs of the revised curriculum.
Abstract: Curricular reform efforts and a desire to use novel educational strategies that foster student collaboration are challenging the traditional microscope-based teaching of histology Computer-based histology teaching tools and Virtual Microscopes (VM), computer-based digital slide viewers, have been shown to be effective and efficient educational strategies We developed an open-source VM system based on the Google Maps engine to transform our histology education and introduce new teaching methods This VM allows students and faculty to collaboratively create content, annotate slides with markers, and it is enhanced with social networking features to give the community of learners more control over the system We currently have 1,037 slides in our VM system comprised of 39,386,941 individual JPEG files that take up 349 gigabytes of server storage space Of those slides 682 are for general teaching and available to our students and the public; the remaining 355 slides are used for practical exams and have restricted access The system has seen extensive use with 289,352 unique slide views to date Students viewed an average of 563 slides per month during the histology course and accessed the system at all hours of the day Of the 621 annotations added to 126 slides 262% were added by faculty and 738% by students The use of the VM system reduced the amount of time faculty spent administering the course by 210 hours, but did not reduce the number of laboratory sessions or the number of required faculty Laboratory sessions were reduced from three hours to two hours each due to the efficiencies in the workflow of the VM system Our virtual microscope system has been an effective solution to the challenges facing traditional histopathology laboratories and the novel needs of our revised curriculum The web-based system allowed us to empower learners to have greater control over their content, as well as the ability to work together in collaborative groups The VM system saved faculty time and there was no significant difference in student performance on an identical practical exam before and after its adoption We have made the source code of our VM freely available and encourage use of the publically available slides on our website

Proceedings ArticleDOI
07 May 2011
TL;DR: In this paper, the authors present a new method for automating task and workflow design for high-level, complex tasks, which is recursive, recruiting workers from the crowd to help plan out how problems can be solved most effectively.
Abstract: Completing complex tasks on crowdsourcing platforms like Mechanical Turk currently requires significant up-front investment into task decomposition and workflow design. We present a new method for automating task and workflow design for high-level, complex tasks. Unlike previous approaches, our strategy is recursive, recruiting workers from the crowd to help plan out how problems can be solved most effectively. Our initial experiments suggest that this strategy can successfully create workflows to solve tasks considered difficult from an AI perspective, although it is highly sensitive to the design choices made by workers.

Patent
19 Apr 2011
TL;DR: In this paper, a system and technique for displaying a document's workflow history are disclosed, which includes a graphical user interface for displaying one or more graphical representations of events generated by an application configured to edit a document.
Abstract: A system and technique for displaying a document's workflow history are disclosed. The system includes a graphical user interface for displaying one or more graphical representations of events generated by an application configured to edit a document. Each of the events generated by the application may be stored in a data structure that is associated with one or more portions of the document. The data structure may also be associated with a digital image that reflects the state of the document at the time the event was generated and one or more frames of digital video captured substantially simultaneously with the generation of the event. The system may display the stored events via graphical representations in the graphical user interface that represent a portion of the total document workflow history. A user may navigate through the graphical events based on a hierarchical algorithm for clustering events.

Journal ArticleDOI
TL;DR: The paper summarizes the most advanced features of P‐GRADE, such as parameter sweep workflow execution, multi‐grid workflow execution and integration with the DSpace workflow repository, as well as introducing the second generation P‐ GRADE portal called WS‐PGRADE.
Abstract: P-GRADE portal is one of the most widely used general-purpose grid portal in Europe. The paper summarizes the most advanced features of P-GRADE, such as parameter sweep workflow execution, multi-grid workflow execution and integration with the DSpace workflow repository. It also shows the NGS P-GRADE portal that extends P-GRADE with the GEMLCA legacy code execution support in Grid systems, as well as with coarse-grain workflow interoperability services. Next, the paper introduces the second generation P-GRADE portal called WS-PGRADE that merges the advanced features of the first generation P-GRADE portals and extends them with new workflow and architecture concepts. Finally, the application-specific science gateway of the CancerGrid project is briefly described to demonstrate that application-specific portals can easily be developed on top of the general-purpose WS-PGRADE portal. Copyright © 2010 John Wiley & Sons, Ltd.