scispace - formally typeset

Proceedings ArticleDOI

A Conceptual Framework Supporting Pattern Design Selection for Scientific Workflow Applications in Cloud Computing.

01 Jan 2020-pp 229-236

TL;DR: A conceptual framework is proposed that facilitates in recommending a suitable pattern design based on the quality requirements and capabilities given and advertised by cloud consumers and providers, respectively and guidelines to assist in a smooth migrating of SWFA from other computation paradigms to cloud computing.

AbstractScientific Workflow Applications (SWFA) play a vital role for both service consumers and service providers in designing and implementing large and complex scientific processes. Previously, researchers used parallel and distributed computing technologies, such as utility and grid computing to execute the SWFAs, these technologies provide limited utilization for the shared resources. In contrast, the scalability and flexibility challenges are better handled by using cloud-computing technologies for SWFA. Since cloud computing offers a technology that can significantly utilize the amounts of storage space and computing resources necessary for processing large-size and complex SWFAs. The workflow pattern design has provided the facility of re-using previously developed workflow solutions that enable the developers to adopt them for the considered SWFA. Inspired by this, the researchers have adopted several patterns of design to better design the SWFA. Effective pattern design that can consider challenges that may not become visible only in the implementation stage of a SWFA. However, the selection of the most effective pattern design in accordance with an execution method, data size, and problem complexity of a SWFA remains a challenging task. Motivated by this, we have proposed a conceptual framework that facilitates in recommending a suitable pattern design based on the quality requirements and capabilities are given and advertised by cloud consumers and providers, respectively. Finally, guidelines to assist in a smooth migrating of SWFA from other computation paradigms to cloud computing.

Topics: Workflow management system (61%), Cloud computing (57%), Grid computing (56%), Workflow (55%), Software design pattern (54%)

Summary (2 min read)

1 INTRODUCTION

  • The workflow has been defined by Workflow Management Coalition (WfMC) as “The automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules” (Hollingsworth and Hampshire, 1993).
  • Researchers aim on providing methods for the design and realisation of scientific workflows and also setting up computational environments that provide significant storage space and powerful computational resources and most importantly the researchers aim to map the designed workflow components to existing computation environments.
  • Thus, the interpretation stage simplified these complexities between the tasks by selecting the appropri- ate services/scripts which are depend on the selection of actual data resources.
  • Furthermore, a classification of potential workflow pattern design is proposed to guide the framework in selecting a most appropriate workflow pattern design for the considered SWFA.

2 THE PROPOSED CONCEPTUAL FRAMEWORK

  • The utilization of cloud computing resources is depending on the respective application requirements and also relates to the resources being consumed by the WFM.
  • The developers design a SWFA that could simplify the process for scientists to reuse the same workflows and provides an easy-to-use environment to track and share output results virtually.
  • The use-cases depict different scenarios that may occur while the developer interacts with the cloud resources.
  • Thus, the authors could define workflow pattern design as a selection of the right use cases in order to instantiate a specific workflow management scenario.
  • The authors consider that different use case kinds are covered and they try to select them in order to support the whole management of an SWFA.

4.1 Demand Cloud Resource

  • Cloud actors can request and acquire cloud resources at any time according to their requirements to execute SWFA.
  • This quality constraint provides flexibility in terms of problem size, budget, and Make-span.
  • It improves resource utilization by providing instant responsiveness to cloud user requests.

4.3 Legacy Applications

  • The WFMS integrates these heterogeneous types of components into a single application.
  • The legacy code can easily be executed using virtualization technology together with clouds.

4.6 Scalability

  • This is one of the most important quality constraints, especially when dealing with large-size data and resources.
  • To efficiently deal with data-intensive and computer-intensive workflows, a necessary graphical environment is required to fully utilize the capabilities of cloud resources.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

Delft University of Technology
A Conceptual framework supporting pattern design selection for scientific workflow
applications in cloud computing
Al-Khannaq, Ehab Nabiel; Khan, Saif Ur Rehman; Verbraeck, Alexander; Van Lint, Hans
Publication date
2020
Document Version
Final published version
Published in
CLOSER 2020 - Proceedings of the 10th International Conference on Cloud Computing and Services
Science
Citation (APA)
Al-Khannaq, E. N., Khan, S. U. R., Verbraeck, A., & Van Lint, H. (2020). A Conceptual framework
supporting pattern design selection for scientific workflow applications in cloud computing. In D. Ferguson,
M. Helfert, & C. Pahl (Eds.),
CLOSER 2020 - Proceedings of the 10th International Conference on Cloud
Computing and Services Science
(pp. 229-236). (CLOSER 2020 - Proceedings of the 10th International
Conference on Cloud Computing and Services Science). SciTePress.
Important note
To cite this publication, please use the final published version (if applicable).
Please check the document version above.
Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent
of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Takedown policy
Please contact us and provide details if you believe this document breaches copyrights.
We will remove access to the work immediately and investigate your claim.
This work is downloaded from Delft University of Technology.
For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

A Conceptual Framework Supporting Pattern Design Selection for
Scientific Workflow Applications in Cloud Computing
Ehab Nabiel Alkhanak
1 a
, Saif Ur Rehman Khan
2 b
, Alexander Verbraeck
3 c
and Hans van Lint
1 d
1
Transport and Planning Department, Faculty of Civil Engineering and Geosciences (CiTG),
Delft University of Technology (TU Delft), Delft, The Netherlands
2
Department of Computer Science, COMSATS University Islmabad (CUI), Islamabad, Pakistan
3
Department Multi-actor Systems, Faculty of Technology, Policy and Management (TPM),
Delft University of Technology (TU Delft), Delft, The Netherlands
Keywords:
Scientific Workflow Application, Workflow Management System, Design Patterns, Cloud Computing
Environment.
Abstract:
Scientific Workflow Applications (SWFA) play a vital role for both service consumers and service providers
in designing and implementing large and complex scientific processes. Previously, researchers used parallel
and distributed computing technologies, such as utility and grid computing to execute the SWFAs, these
technologies provide limited utilization for the shared resources. In contrast, the scalability and flexibility
challenges are better handled by using cloud-computing technologies for SWFA. Since cloud computing offers
a technology that can significantly utilize the amounts of storage space and computing resources necessary for
processing large-size and complex SWFAs. The workflow pattern design has provided the facility of re-using
previously developed workflow solutions that enable the developers to adopt them for the considered SWFA.
Inspired by this, the researchers have adopted several patterns of design to better design the SWFA. Effective
pattern design that can consider challenges that may not become visible only in the implementation stage
of a SWFA. However, the selection of the most effective pattern design in accordance with an execution
method, data size, and problem complexity of a SWFA remains a challenging task. Motivated by this, we
have proposed a conceptual framework that facilitates in recommending a suitable pattern design based on the
quality requirements and capabilities are given and advertised by cloud consumers and providers, respectively.
Finally, guidelines to assist in a smooth migrating of SWFA from other computation paradigms to cloud
computing.
1 INTRODUCTION
The workflow has been defined by Workflow Man-
agement Coalition (WfMC) as “The automation of a
business process, in whole or part, during which doc-
uments, information or tasks are passed from one par-
ticipant to another for action, according to a set of pro-
cedural rules” (Hollingsworth and Hampshire, 1993).
In this research context, a workflow can be defined
as a series of processes and these processes represent
one or several computational tasks based on their de-
pendencies. These computational tasks can be any
executable instances (e.g., load sets, report sets, pro-
a
https://orcid.org/0000-0002-9880-6365
b
https://orcid.org/0000-0002-9643-6858
c
https://orcid.org/0000-0002-1572-0997
d
https://orcid.org/0000-0003-1493-6750
grams, and data) with different structures (e.g., jobs,
pipeline, data distribution, data aggregation, and data
redistribution). Generally speaking, a Workflow Ap-
plication (WFA) acts like software that automatically
handles scientific or business-related jobs by creat-
ing and initiating these instances while the Workflow
Management Service offers a certain API to facilitate
the management of these processes. Historically, re-
searchers have emphasized workflow applications for
a business domain (e.g., banking systems, transaction
systems). Developers need to consider large-scale,
fault-tolerant, complex, and maintainable scientific
processes when they need to deal with scientific work-
flows. Some of the main application areas of Sci-
entific Workflow Applications (SWFA) are Bioinfor-
matics, Geoinformatics, Cheminformatics, Biomedi-
cal Informatics, and Astrophysics (Pan et al., 2019).
Alkhanak, E., Khan, S., Verbraeck, A. and van Lint, H.
A Conceptual Framework Supporting Pattern Design Selection for Scientific Workflow Applications in Cloud Computing.
DOI: 10.5220/0008916102290236
In Proceedings of the 10th International Conference on Cloud Computing and Services Science (CLOSER 2020), pages 229-236
ISBN: 978-989-758-424-4
Copyright
c
2020 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
229

Scientific Workflow Management Systems (SWFMS)
like Pegasus SWFMS (Deelman et al., 2015) com-
prise a set of components (including Mapper, Local
execution engine, Job scheduler, Monitoring compo-
nent) that focus on automating the processes of large-
size data. SWFMS provides a graphical environment
that supports the reuse and integration of domain-
related workflows. Ultimately, SWFMS facilitate sci-
entists in performing their domain-specific workflows
by analysing given dataset(s) and visualizing the re-
sult of the processing of the dataset(s).
Previously, researchers have used parallel and grid
computing resources to execute SWFAs (Dong, 2009;
Juve et al., 2013). Today’s SWFAs generate a large
amount of data and due to this huge increase in size
and complexity, parallel and grid computing tech-
nologies are unable to fulfil the associated computa-
tional demands (i.e. vast amount of storage space,
strict completion deadlines, etc) because these tech-
nologies are not scalable and unable to fully utilize
the available recourse.
Consequently, researchers aim on providing meth-
ods for the design and realisation of scientific work-
flows and also setting up computational environments
that provide significant storage space and powerful
computational resources and most importantly the re-
searchers aim to map the designed workflow compo-
nents to existing computation environments. Cloud
computing resources provide optimal resource utiliza-
tion that best matches the demands of scientists by
providing on-demand, scalable, and flexible solutions
for considered SWFA (Juve and Deelman, 2011).
The workflow pattern design in cloud computing
environments provides the facility of re-using previ-
ously developed workflow templates, which conse-
quently enables the developer to use them to ensure
better design of SWFA. In this manuscript, the au-
thors focus on type level of workflow patterns and not
on complete workflows which include the mapping of
their task/components to actual scripts and services.
Fig. 1 depicts three main stages of the SWFA pat-
tern design lifecycle: (i) workflow design, (ii) de-
signer interpretation, and (iii) cloud execution envi-
ronment. In the workflow design stage, the scientists
use previous workflows patters and thus effectively
systematically reuse knowledge for controlled exper-
imentation (Pan et al., 2019). After that, in the de-
sign interpretation stage, the parameters and data re-
sources are determined based on the experimentation
requirements. As tasks could be realised by different
services/scripts which could operate on equivalent-
typed but different in format and structure data sets.
Thus, the interpretation stage simplified these com-
plexities between the tasks by selecting the appropri-
ate services/scripts which are depend on the selection
of actual data resources. Notice that the determined
selections will subsequently be used for the execu-
tion stage in the pattern design lifecycle. Finally, the
main purpose of cloud execution environment stage is
to first deploy the cloud computing environment that
is able to execute the workflow. And based on the
complete specification of the workflow that has been
collected from previous two stages (workflow pattern
selection, workflow authoring, development/selection
of workflow components, etc.) the cloud resources
consume and create the data of SWFA for the desired
execution. Due to parameters dependences (that have
been determined in designer interpretation stage) ef-
fect the task executions, thus all these tasks cannot
be executed concurrently in the cloud execution envi-
ronment. The runtime executions required a runtime
monitoring component that could continuously moni-
tors and informs the scientists about the current status
of the SWFA execution. This monitoring functional-
ity plays a crucial role about successful execution of
SWFA. As overall, the pattern design stages are con-
sisting of design, deployment and provisioning where
the latter includes execution, monitoring and adapta-
tion.
In the literature, a number of workflow pattern de-
signs have been proposed to better design the SWFA
(Ramakrishnan et al., 2011; Genez et al., 2012; Wu
et al., 2013b). However, the selection of most effi-
cient workflow pattern design remains a challenging
task as it required to consider several aspects. The
user requirements are the main aspects which include
the make-span, reliability, deadline, and budget, while
the other aspects are related to the particular execu-
tion method, data size, and problem complexity of a
SWFA. In this paper, we propose a conceptual frame-
work that suggest suitable workflow pattern designs
based on user functional and non-functional require-
ments service consumers and quality constraints of
service providers. The migration of SWFA from other
computation paradigms to cloud computing requires
considering several aspects and constraints. Thus,
based on the literature we provide a set of guide-
lines to assist this transformation to fully utilize the
strengths of cloud computing. Furthermore, a classi-
fication of potential workflow pattern design is pro-
posed to guide the framework in selecting a most ap-
propriate workflow pattern design for the considered
SWFA.
The remainder of this paper is organized as fol-
lows: Section 2 presents our proposed conceptual
framework. Section 3 provides a classification of re-
ported pattern design for SWFAs. A set of guide-
lines to shift SWFA from different computing envi-
CLOSER 2020 - 10th International Conference on Cloud Computing and Services Science
230

Figure 1: The Pattern Design Stages of Scientific Workflow
Applications Lifecycle.
ronments to the cloud computing is provided in Sec-
tion 4. Section 5 discusses previously conducted rel-
evant works. Finally, Section VI concludes this paper
and outlines a certain number of future research di-
rections.
2 THE PROPOSED
CONCEPTUAL FRAMEWORK
The utilization of cloud computing resources is de-
pending on the respective application requirements
and also relates to the resources being consumed by
the WFM. Thus, there is a need for a lightweight
WFMS that still offers the right functionality to its
users and this can be achieved by selecting appropri-
ate workflow pattern design. Motivated by this, we
present our proposed conceptual framework for se-
lecting workflow pattern design for SFWA in cloud
computing. Figure 2 depicts the high-level layered
architecture of our proposed conceptual framework.
The proposed framework consists of four main lay-
ers, including cloud actor, cloud scientific workflow
application, cloud use cases, and pattern design. The
first layer represents the various cloud actors, which
can be categorized based on the specified rules of
the SWFA users, tasks and access rights. The main
actors are: (i) Scientist: SWFA provides interac-
tive tools to help scientists in better executing their
own workflows and visualise the results in a real-time
manner. (ii) System developer: or system admin-
istrator responsible for designing the WfMS based
on the requirements of service consumers. (iii) Ser-
vice vendor, and (iv) Resource owner: they are ser-
vice providers (i.e. cloud service providers) that offer
the SWFS system with virtualised computational re-
sources (i.e. private, public or hybrid).
There are two types of interactions handled in the
cloud environment using Application Programming
Interfaces (APIs): (i) inter-interaction (i.e. the inter-
action between cloud actor and cloud SWFA), and (ii)
intra-interaction (i.e. interaction between one cloud
actor with another cloud actor). An example of inter-
action is inter-interaction between scientists and de-
veloper. The developers design a SWFA that could
simplify the process for scientists to reuse the same
workflows and provides an easy-to-use environment
to track and share output results virtually.
The second layer manages the cloud scientific
workflow application, and we distinguish two main
types: (i) scientific applications, and (ii) manage-
ment applications. The scientific application pro-
vides a user-friendly interface that enables cloud ac-
tors to interact with a scientific workflow. The man-
agement application handles all management related
issues that may occur while executing the SWFAs.
E.g., one application for the design of an SWFA and
another for the execution of an SWFA. Note that de-
velopers continuously monitor the performance of the
management application and perform maintenance re-
lated tasks to avoid any service degradation as well as
the system could also perform some adaptations to the
SWFAs. The third layer represents cloud use-cases.
The use-cases depict different scenarios that may oc-
cur while the developer interacts with the cloud re-
sources. Therefore, in the context of cloud use-cases,
cloud computing offers different types of application
paradigms. In the literature, cloud use cases are cat-
egorized into four classes, including application use
cases, delivery of services use cases, deployment of
application use cases, and interaction between the ap-
plication and cloud services use cases based on user
profiles (Petcu, 2010).
The fourth layer then encompasses pattern design.
A Conceptual Framework Supporting Pattern Design Selection for Scientific Workflow Applications in Cloud Computing
231

The circles denote the cloud use-cases (with differ-
ent colours and each colour maps to a different kind
of cloud use case) and the arrows represent their de-
pendencies. Thus, we could define workflow pattern
design as a selection of the right use cases in order to
instantiate a specific workflow management scenario.
In other words, workflow pattern design results in a
cluster of use cases on which specific dependencies
are introduced. We consider that different use case
kinds are covered and we try to select them in order
to support the whole management of an SWFA.
The following are the constraints that need to be
considered in order to determine suitable workflow
pattern design for SWFA in a cloud computing en-
vironment:
Identify the interaction between cloud actors
and SWFA. This mapping is based on the require-
ments and preferences such an actor would have.
These requirements and preferences are bound to
the profile of that actor that is maintained by the
proposed framework.
Identify the real mapping between cloud actors
and cloud use-cases.
Identify the required computational require-
ment based on the nature and type of SWFA.
Identify the values for the parameters and con-
trolling mechanism based on the type of schedul-
ing or optimization methods.
Select the most suitable workflow pattern de-
sign from the classification list, which has been
presented in the next section.
3 CLASSIFICATION OF
SCIENTIFIC WORKFLOW
APPLICATIONS (SWFA)
PATTERN DESIGN
The current state-of-the-art lacks in using different
types of pattern designs for SWFA. The majority of
the work focuses on control flow and workflow data
pattern design only. However, (Kiepuszewski et al.,
2003) reports that both control flow and workflow
data pattern design is difficult to adopt and follow
for SWFA due to implicitly defined control flow re-
lations. In contrast, there are several useful but ig-
nored pattern designs, which are effective to model
and implement existing SWFAs. We have analysed
and stated possible classification SWFA pattern de-
signs based on the reviewed papers. Fig. 3 presents
this categorization by presenting two levels of cate-
gories of SWFA design patterns. Our classification
Figure 2: High-Level Layered Architecture of Proposed
Framework.
identifies four main categories. These main categories
are: (i) control flow patterns, (ii) workflow data pat-
terns, (iii) workflow resource patterns, and (iv) other
scientific patterns. Each aspect is further categorized
into several classes based on computational nature
and use cases. It can be clearly seen that control flow
pattern design has a greater number of sub-categories
than any other aspect.
4 GUIDELINES TO ASSIST THE
MIGRATION OF SWFA FROM
OTHER COMPUTATION
PARADIGMS TO CLOUD
COMPUTING
Migration from one computational paradigm to an-
other is never an easy task. In order to fully utilize
the strengths of cloud computing, developers need to
consider the following quality constraints (Figure 4)
CLOSER 2020 - 10th International Conference on Cloud Computing and Services Science
232

Citations
More filters

Journal ArticleDOI
TL;DR: This work proposes a novel SWFS cost optimization model effective in solving the challenge of determining an approximate (near-optimal) solution within polynomial computational time and achieves an optimal Job completion time and total computational cost for small and large sizes of the considered dataset.
Abstract: Scientific Workflow Applications (SWFAs) can deliver collaborative tools useful to researchers in executing large and complex scientific processes. Particularly, Scientific Workflow Scheduling (SWFS) accelerates the computational procedures between the available computational resources and the dependent workflow jobs based on the researchers’ requirements. However, cost optimization is one of the SWFS challenges in handling massive and complicated tasks and requires determining an approximate (near-optimal) solution within polynomial computational time. Motivated by this, current work proposes a novel SWFS cost optimization model effective in solving this challenge. The proposed model contains three main stages: (i) scientific workflow application, (ii) targeted computational environment, and (iii) cost optimization criteria. The model has been used to optimize completion time (makespan) and overall computational cost of SWFS in cloud computing for all considered scenarios in this research context. This will ultimately reduce the cost for service consumers. At the same time, reducing the cost has a positive impact on the profitability of service providers towards utilizing all computational resources to achieve a competitive advantage over other cloud service providers. To evaluate the effectiveness of this proposed model, an empirical comparison was conducted by employing three core types of heuristic approaches, including Single-based (i.e., Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Invasive Weed Optimization (IWO)), Hybrid-based (i.e., Hybrid-based Heuristics Algorithms (HIWO)), and Hyper-based (i.e., Dynamic Hyper-Heuristic Algorithm (DHHA)). Additionally, a simulation-based implementation was used for SIPHT SWFA by considering three different sizes of datasets. The proposed model provides an efficient platform to optimally schedule workflow tasks by handling data-intensiveness and computational-intensiveness of SWFAs. The results reveal that the proposed cost optimization model attained an optimal Job completion time (makespan) and total computational cost for small and large sizes of the considered dataset. In contrast, hybrid and hyper-based approaches consistently achieved better results for the medium-sized dataset.

4 citations


References
More filters

Proceedings ArticleDOI
20 Apr 2010
TL;DR: This paper presents a particle swarm optimization (PSO) based heuristic to schedule applications to cloud resources that takes into account both computation cost and data transmission cost, and shows that PSO can achieve as much as 3 times cost savings as compared to BRS.
Abstract: Cloud computing environments facilitate applications by providing virtualized resources that can be provisioned dynamically. However, users are charged on a pay-per-use basis. User applications may incur large data retrieval and execution costs when they are scheduled taking into account only the ‘execution time’. In addition to optimizing execution time, the cost arising from data transfers between resources as well as execution costs must also be taken into account. In this paper, we present a particle swarm optimization (PSO) based heuristic to schedule applications to cloud resources that takes into account both computation cost and data transmission cost. We experiment with a workflow application by varying its computation and communication costs. We compare the cost savings when using PSO and existing ‘Best Resource Selection’ (BRS) algorithm. Our results show that PSO can achieve: a) as much as 3 times cost savings as compared to BRS, and b) good distribution of workload onto resources.

787 citations


"A Conceptual Framework Supporting P..." refers background in this paper

  • ..., 2009) Control flow Cloud/Grid provider (Saeid Abrishami, 2013) Control flow Cloud/Grid provider (Lin and Lu, 2011) Control flow Cloud/Grid provider (Pandey et al., 2010) Control flow Cloud/Grid provider (Tanaka and...

    [...]

  • ...…2009) Control flow Cloud/Grid provider (Saeid Abrishami, 2013) Control flow Cloud/Grid provider (Lin and Lu, 2011) Control flow Cloud/Grid provider (Pandey et al., 2010) Control flow Cloud/Grid provider (Tanaka and Tatebe, 2012) Control flow Cloud/Grid provider (Bittencourt and Madeira, 2011)…...

    [...]


Journal ArticleDOI
01 Sep 2005
TL;DR: A taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids is proposed that highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.
Abstract: With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.

576 citations


"A Conceptual Framework Supporting P..." refers background or methods in this paper

  • ...A decade ago, (Yu and Buyya, 2005) propose taxonomy by considering four aspects: (i) workflow design, (ii) workflow scheduling, (iii) fault tolerance, and (iv) data movement to evaluate scientific WFMSs in terms of Grid computing, however, this classification is not suitable for selecting a most…...

    [...]

  • ...A decade ago, (Yu and Buyya, 2005) propose taxonomy by considering four aspects: (i) workflow design, (ii) workflow A Conceptual Framework Supporting Pattern Design Selection for Scientific Workflow Applications in Cloud Computing...

    [...]


Journal ArticleDOI
TL;DR: An integrated view of the Pegasus system is provided, showing its capabilities that have been developed over time in response to application needs and to the evolution of the scientific computing platforms.
Abstract: Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable workflow management systems that are able to reliably and efficiently coordinate and automate data movement and task execution on distributed computational resources: campus clusters, national cyberinfrastructures, and commercial and academic clouds. This paper describes the design, development and evolution of the Pegasus Workflow Management System, which maps abstract workflow descriptions onto distributed computing infrastructures. Pegasus has been used for more than twelve years by scientists in a wide variety of domains, including astronomy, seismology, bioinformatics, physics and others. This paper provides an integrated view of the Pegasus system, showing its capabilities that have been developed over time in response to application needs and to the evolution of the scientific computing platforms. The paper describes how Pegasus achieves reliable, scalable workflow execution across a wide variety of computing infrastructures. Comprehensive description of the Pegasus Workflow Management System.Detailed explanation of Pegasus workflow transformations.Data management in Pegasus.Earthquake science application example.

556 citations


"A Conceptual Framework Supporting P..." refers background or methods in this paper

  • ...SWFMS provides a graphical environment that supports the reuse and integration of domainrelated workflows....

    [...]

  • ...All rights reserved 229 Scientific Workflow Management Systems (SWFMS) like Pegasus SWFMS (Deelman et al., 2015) comprise a set of components (including Mapper, Local execution engine, Job scheduler, Monitoring component) that focus on automating the processes of largesize data....

    [...]

  • ...Ultimately, SWFMS facilitate scientists in performing their domain-specific workflows by analysing given dataset(s) and visualizing the result of the processing of the dataset(s)....

    [...]

  • ...On the other hand, a general interaction pattern for service providers is recommended by (Deelman et al., 2015)....

    [...]

  • ...Scientific Workflow Management Systems (SWFMS) like Pegasus SWFMS (Deelman et al., 2015) comprise a set of components (including Mapper, Local execution engine, Job scheduler, Monitoring component) that focus on automating the processes of largesize data....

    [...]


Journal ArticleDOI
TL;DR: Two workflow scheduling algorithms are proposed which aim to minimize the workflow execution cost while meeting a deadline and have a polynomial time complexity which make them suitable options for scheduling large workflows in IaaS Clouds.
Abstract: The advent of Cloud computing as a new model of service provisioning in distributed systems encourages researchers to investigate its benefits and drawbacks on executing scientific applications such as workflows. One of the most challenging problems in Clouds is workflow scheduling, i.e., the problem of satisfying the QoS requirements of the user as well as minimizing the cost of workflow execution. We have previously designed and analyzed a two-phase scheduling algorithm for utility Grids, called Partial Critical Paths (PCP), which aims to minimize the cost of workflow execution while meeting a user-defined deadline. However, we believe Clouds are different from utility Grids in three ways: on-demand resource provisioning, homogeneous networks, and the pay-as-you-go pricing model. In this paper, we adapt the PCP algorithm for the Cloud environment and propose two workflow scheduling algorithms: a one-phase algorithm which is called IaaS Cloud Partial Critical Paths (IC-PCP), and a two-phase algorithm which is called IaaS Cloud Partial Critical Paths with Deadline Distribution (IC-PCPD2). Both algorithms have a polynomial time complexity which make them suitable options for scheduling large workflows. The simulation results show that both algorithms have a promising performance, with IC-PCP performing better than IC-PCPD2 in most cases. Highlights? We propose two workflow scheduling algorithms for IaaS Clouds. ? The algorithms aim to minimize the workflow execution cost while meeting a deadline. ? The pricing model of the Clouds is considered which is based on a time interval. ? The algorithms are compared with a list heuristic through simulation. ? The experiments show the promising performance of both algorithms.

497 citations


Journal ArticleDOI
TL;DR: The hierarchical scheduling strategy is being implemented in the SwinDeW-C cloud workflow system and demonstrating satisfactory performance, and the experimental results show that the overall performance of ACO based scheduling algorithm is better than others on three basic measurements: the optimisations rate on makespan, the optimisation rate on cost and the CPU time.
Abstract: A cloud workflow system is a type of platform service which facilitates the automation of distributed applications based on the novel cloud infrastructure. One of the most important aspects which differentiate a cloud workflow system from its other counterparts is the market-oriented business model. This is a significant innovation which brings many challenges to conventional workflow scheduling strategies. To investigate such an issue, this paper proposes a market-oriented hierarchical scheduling strategy in cloud workflow systems. Specifically, the service-level scheduling deals with the Task-to-Service assignment where tasks of individual workflow instances are mapped to cloud services in the global cloud markets based on their functional and non-functional QoS requirements; the task-level scheduling deals with the optimisation of the Task-to-VM (virtual machine) assignment in local cloud data centres where the overall running cost of cloud workflow systems will be minimised given the satisfaction of QoS constraints for individual tasks. Based on our hierarchical scheduling strategy, a package based random scheduling algorithm is presented as the candidate service-level scheduling algorithm and three representative metaheuristic based scheduling algorithms including genetic algorithm (GA), ant colony optimisation (ACO), and particle swarm optimisation (PSO) are adapted, implemented and analysed as the candidate task-level scheduling algorithms. The hierarchical scheduling strategy is being implemented in our SwinDeW-C cloud workflow system and demonstrating satisfactory performance. Meanwhile, the experimental results show that the overall performance of ACO based scheduling algorithm is better than others on three basic measurements: the optimisation rate on makespan, the optimisation rate on cost and the CPU time.

261 citations


"A Conceptual Framework Supporting P..." refers background or methods in this paper

  • ...In the literature, a number of workflow pattern designs have been proposed to better design the SWFA (Ramakrishnan et al., 2011; Genez et al., 2012; Wu et al., 2013b)....

    [...]

  • ...Reference Pattern Design Cloud Actor (Wu et al., 2013b) Control flow Service consumer (Abrishami and Naghibzadeh, 2012) Control flow Service consumer (Ramakrishnan et al., 2011) Control flow Cloud/Grid provider (Nargunam and Shajin, 2012) Control flow Cloud/Grid provider (Wu et al., 2013a) Control…...

    [...]

  • ...…flow Service consumer (Ramakrishnan et al., 2011) Control flow Cloud/Grid provider (Nargunam and Shajin, 2012) Control flow Cloud/Grid provider (Wu et al., 2013a) Control flow Cloud/Grid provider (Ostermann et al., 2010) Control flow Cloud/Grid provider (Liu et al., 2011) Control flow…...

    [...]

  • ...Reference Pattern Design Cloud Actor (Wu et al., 2013b) Control flow Service consumer (Abrishami and Naghibzadeh, 2012) Control flow Service consumer (Ramakrishnan et al., 2011) Control flow Cloud/Grid provider (Nargunam and Shajin, 2012) Control flow Cloud/Grid provider (Wu et al., 2013a) Control flow Cloud/Grid provider (Ostermann et al., 2010) Control flow Cloud/Grid provider (Liu et al., 2011) Control flow Cloud/Grid provider (Yang et al., 2008) Control flow Service consumer (Genez et al., 2012) Control flow Service consumer (Xu et al., 2009) Control flow Cloud/Grid provider (Saeid Abrishami, 2013) Control flow Cloud/Grid provider (Lin and Lu, 2011) Control flow Cloud/Grid provider (Pandey et al., 2010) Control flow Cloud/Grid provider (Tanaka and Tatebe, 2012) Control flow Cloud/Grid provider (Bittencourt and Madeira, 2011) Control flow Cloud/Grid provider of storage space and computing resources necessary for processing large-size and complex SWFAs....

    [...]


Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "A conceptual framework supporting pattern design selection for scientific workflow applications in cloud computing" ?

Previously, researchers used parallel and distributed computing technologies, such as utility and grid computing to execute the SWFAs, these technologies provide limited utilization for the shared resources. The workflow pattern design has provided the facility of re-using previously developed workflow solutions that enable the developers to adopt them for the considered SWFA. Motivated by this, the authors have proposed a conceptual framework that facilitates in recommending a suitable pattern design based on the quality requirements and capabilities are given and advertised by cloud consumers and providers, respectively. 

In the future, other types of workflow applications ( e. g., business workflows ) would be considered in a cloud computing environment. Furthermore, the authors plan to make a connection between their conceptual framework and a WFMS by identifying the traditional architecture of a WFMS and which conceptual elements are exploited by which WFMS components such showcase can be exploited across the whole scientific workflow ( management ) lifecycle.