scispace - formally typeset
Search or ask a question

Showing papers on "Scientific workflow system published in 2017"


Proceedings ArticleDOI
01 Jul 2017
TL;DR: MONAD, a self-adaptive micro-service infrastructure for heterogeneous scientific workflows that helps improve the flexibility of workflow composition and execution, and enables fine-grained scheduling at task level, considering task sharing across different workflows is presented.
Abstract: Scientific workflows have become a popular computational model in a variety of application domains, such as astronomy, material science, physics, and biology. As scientific applications are moving to the cloud to take advantage of the elasticity and service level agreement of resources, there has been a number of recent research efforts on cloud-based workflow systems that support various types of performance guarantees under resource cost constraints. However, most of the related work often requires advanced knowledge about workflow structures to perform scheduling and resource optimization. In addition, existing workflow systems usually employ a monolithic approach in workflow implementation and execution, which makes them inefficient in dealing with heterogeneous types of workflows. In this paper, we present MONAD, a self-adaptive micro-service infrastructure for heterogeneous scientific workflows. Specifically, our micro-service architecture helps improve the flexibility of workflow composition and execution, and enables fine-grained scheduling at task level, considering task sharing across different workflows. In addition, we employ a feedback control approach with artificial neural network-based system identification to provide resource adaptation without any advanced knowledge of workflow structures. Our evaluation on multiple realistic heterogeneous workflows demonstrates that our system is robust and efficient in dealing with dynamic scientific workloads.

24 citations


Journal ArticleDOI
TL;DR: It is argued that scientific applications traditionally considered as representing typical HPC workloads can be successfully and efficiently ported to a cloud infrastructure and constitute a valuable step toward a wider adoption of cloud infrastructures for computational science applications.

13 citations


Journal ArticleDOI
01 Jan 2017
TL;DR: This paper proposes a framework for facilitating the reproducibility of scientific workflows at the task level by giving scientists complete control over the execution environments of the tasks in their workflows and integrating execution environment specifications into scientific workflow systems.
Abstract: Scientific workflows are designed to solve complex scientific problems and accelerate scientific progress. Ideally, scientific workflows should improve the reproducibility of scientific applications by making it easier to share and reuse workflows between scientists. However, scientists often find it difficult to reuse others’ workflows, which is known as workflow decay . In this paper, we explore the challenges in reproducing scientific workflows, and propose a framework for facilitating the reproducibility of scientific workflows at the task level by giving scientists complete control over the execution environments of the tasks in their workflows and integrating execution environment specifications into scientific workflow systems. Our framework allows dependencies to be archived in basic units of OS image, software and data instead of gigantic all-in-one images. We implement a prototype of our framework by integrating Umbrella , an execution environment creator, into Makeflow , a scientific workflow system. To evaluate our framework, we use it to run two bioinformatics scientific workflows, BLAST and BWA . The execution environment of the tasks in each workflow is specified as an Umbrella specification file, and sent to execution nodes where Umbrella is used to create the specified environment for running the tasks. For each workflow we evaluate the size of the Umbrella specification file, the time and space overheads of creating execution environments using Umbrella , and the heterogeneity of execution nodes contributing to each workflow. The evaluation results show that our framework improves the utilization of heterogeneous computing resources, and improves the portability and reproducibility of scientific workflows.

13 citations


Journal ArticleDOI
01 Jan 2017-Dyna
TL;DR: This work presents a numerical approach for the automatic adjustment of parameters of the Mazars Damage Model, applied to a thermohydro- mechanical modeling of concrete structures based on a Scientific Workflow System that improves efficiency and makes the strategy easier when compared to manual procedures.
Abstract: This work presents a numerical approach for the automatic adjustment of parameters of the Mazars Damage Model, applied to a thermohydro- mechanical modeling of concrete structures. The procedure is based on a Scientific Workflow System (SWS) that addresses the combinatorial universe of adjustable parameters by minimizing the number of simulations required for optimized results. Not only does SWS improve efficiency, by also makes the strategy easier when compared to manual procedures. The adopted algorithm is developed in an intuitive script language and employs a distributed computational environment. Comparison to experimental data indicates that the proposed methodology was efficient and effective in improving the analysis, by minimizing errors and saving processing time.

2 citations


Journal ArticleDOI
TL;DR: This study shows the potential of RFlow to serve as the primary integration platform for legacy R scripts, with implications for other data- and compute-intensive agronomic projects.
Abstract: Reproducibility is a major feature of Science. Even agronomic research of exemplary quality may have irreproducible empirical findings because of random or systematic error. The ability to reproduce agronomic experiments based on statistical data and legacy scripts are not easily achieved. We propose RFlow, a tool that aid researchers to manage, share, and enact the scientific experiments that encapsulate legacy R scripts. RFlow transparently captures provenance of scripts and endows experiments reproducibility. Unlike existing computational approaches, RFlow is non-intrusive, does not require users to change their working way, it wraps agronomic experiments in a scientific workflow system. Our computational experiments show that the tool can collect different types of provenance metadata of real experiments and enrich agronomic data with provenance metadata. This study shows the potential of RFlow to serve as the primary integration platform for legacy R scripts, with implications for other data- and compute-intensive agronomic projects.

1 citations


Proceedings ArticleDOI
01 Nov 2017
TL;DR: The integrated VisTrails-MATLAB system supports reproducible computing with truly prospective and retrospective provenance at multiple granularity levels as scientists choose for their scripts, and at the same time, is very easy to use for big data computing and analytics.
Abstract: Reproducible computing and research are of great importance for scientific investigation in any discipline. This paper presents a general approach to provenance in the context of workflows for widely used script languages. Our solution is based on system integration, and is demonstrated by integrating MATLAB with VisTrails, an open source scientific workflow system. The integrated VisTrails-MATLAB system supports reproducible computing with truly prospective and retrospective provenance at multiple granularity levels as scientists choose for their scripts, and at the same time, is very easy to use for big data computing and analytics.

Journal ArticleDOI
TL;DR: The introduction and motivation for active service approach is reviewed, the architecture of scientific workflow system is described, and the technologies used in active service are discussed, which uses digital archives technology to enable multi-service organizations achieve scientific workflow goals.
Abstract: This paper describes the architecture of active service model of scientific workflow system. Digital archives technologies are available for all the resources needed to construct services specific group. This paper reviews the introduction and motivation for active service approach, describes the architecture of scientific workflow system, discusses the technologies used in active service, which uses digital archives technology to enable multi-service organizations achieve scientific workflow goals.