scispace - formally typeset
Search or ask a question

Showing papers on "Scientific workflow system published in 2013"


01 Jan 2013
TL;DR: The Karabo software framework allows simple integration and adaption to changing control requirements and the addition of new scientific analysis algorithms, making them automatically and immediately available to experimentalists.
Abstract: The expected very high data rates and volumes at the European XFEL [1] demand an efficient concurrent approach of performing experiments. Data analysis must already start whilst data is still being acquired and initial analysis results must immediately be usable to re-adjust the current experiment setup. We have developed a software framework, called Karabo, which allows such a tight integration of these tasks (see Fig. 1). Karabo is in essence a pluggable, distributed application management system. All Karabo applications (called devices) have a standardized interface for self-description/configuration, program-flow organization (state machine), logging and communication. Central services exist for user management, access control, data logging, configuration management etc. The design provides a very scalable but still maintainable system that at the same time can act as a fully-fledged control or a highly parallel distributed scientific workflow system. It allows simple integration and adaption to changing control requirements and the addition of new scientific analysis algorithms, making them automatically and immediately available to experimentalists. Figure 1: A homogenous software framework.

35 citations


Journal ArticleDOI
01 Sep 2013
TL;DR: The main ingredients of Model-as-you-go are presented, how existing workflow concepts have to be extended in order to cover the requirements of scientists, the application of the concepts to BPEL is discussed, and the current prototype of the system is introduced.
Abstract: Most of the existing scientific workflow systems rely on proprietary concepts and workflow languages. We are convinced that the conventional workflow technology that is established in business scenarios for years is also beneficial for scientists and scientific applications. We are therefore working on a scientific workflow system based on business workflow concepts and technologies. The system offers advanced flexibility features to scientists in order to support them in creating workflows in an explorative manner and to increase robustness of scientific applications. We named the approach Model-as-you-go because it enables users to model and execute workflows in an iterative process that eventually results in a complete scientific workflow. In this paper, we present main ingredients of Model-as-you-go, show how existing workflow concepts have to be extended in order to cover the requirements of scientists, discuss the application of the concepts to BPEL, and introduce the current prototype of the system.

27 citations


Journal ArticleDOI
TL;DR: This paper presents a coupled modeling system that includes the proposed methodology to create self-describing models with common model component interfaces and shows that the coupled atmosphere-ocean model is able to reproduce the Mediterranean Sea surface temperature when it is compared with the used CCSM3 initial and boundary conditions.
Abstract: The complexity of Earth system models and their applications is increasing as a consequence of scientific advances, user demand, and the ongoing development of computing platforms, storage systems and distributed high-resolution observation networks. Multi-component Earth system models need to be redesigned to make interactions among model components and other applications external to the modeling system easier. To that end, the common component interfaces of Earth system models can be redesigned to increase interoperability between models and other applications such as various web services, data portals and science gateways. The models can be made self-describing so that the many configuration, build options and inputs of a simulation can be recorded. In this paper, we present a coupled modeling system that includes the proposed methodology to create self-describing models with common model component interfaces. The designed coupled atmosphere-ocean modeling system is also integrated into a scientific workflow system to simplify routine modeling tasks and relationships between these tasks and to demonstrate the enhanced interoperability between different technologies and components. Later on, the work environment is tested using a realistic Earth system modeling application. As can be seen through this example, a layered design for collecting provenance and metadata has the added benefit of documenting a run in far greater detail than before. In this way, it facilitates exploration and understanding of simulations and leads to possible reproducibility. In addition to designing self-describing Earth system models, the regular modeling tasks are also simplified and automated by using a scientific workflow which provides meaningful abstractions for the model, computing environment and provenance/metadata collection mechanisms. Our aim here is to solve a specific instance of a complex model integration problem by using a framework and scientific workflow approach together. The reader may also note that the methods presented in this paper might be also generalized to other types of Earth system models, leading to improved ease of use and flexibility. The initial results also show that the coupled atmosphere-ocean model, which is controlled by the designed workflow environment, is able to reproduce the Mediterranean Sea surface temperature when it is compared with the used CCSM3 initial and boundary conditions.

26 citations


Posted Content
TL;DR: How VisTrails has developed and how its efforts in structuring and advertising the system have contributed to its adoption in many domains are described.
Abstract: With the increasing amount of data and use of computation in science, software has become an important component in many different domains. Computing is now being used more often and in more aspects of scientific work including data acquisition, simulation, analysis, and visualization. To ensure reproducibility, it is important to capture the different computational processes used as well as their executions. VisTrails is an open-source scientific workflow system for data analysis and visualization that seeks to address the problem of integrating varied tools as well as automatically documenting the methods and parameters employed. Growing from a specific project need to supporting a wide array of users required close collaborations in addition to new research ideas to design a usable and efficient system. The VisTrails project now includes standard software processes like unit testing and developer documentation while serving as a base for further research. In this paper, we describe how VisTrails has developed and how our efforts in structuring and advertising the system have contributed to its adoption in many domains.

16 citations


Journal Article
TL;DR: The computational challenges in species distribution modeling and solutions using scientific workflow systems are described and the Software for Assisted Species Modeling (SAHM) a package within VisTrails, an open-source scientific workflow system is focused on.
Abstract: An important component in the fields of ecology and conservation biology is understanding the environmental conditions and geographic areas that are suitable for a given species to inhabit. A common tool in determining such areas is species distribution modeling which uses computer algorithms to determine the spatial distribution of organisms. Most commonly the correlative relationships between the organism and environmental variables are the primary consideration. The data requirements for this type of modeling consist of known presence and possibly absence locations of the species as well as the values of environmental or climatic covariates thought to define the species habitat suitability at these locations. These covariate data are generally extracted from remotely sensed imagery, interpolated/gridded historical climate data, or downscaled climate model output. Traditionally, ecologists and biologists have constructed species distribution models using workflows and data that reside primarily on their local workstations or networks. This workflow is becoming challenging as scientists increasingly try to use these modeling techniques to inform management decisions under different climate change scenarios. This challenge stems from the fact that remote sensing products, gridded historical climate, and downscaled climate models are not only increasing in spatial and temporal resolution but proliferating as well. Any rigorous assessment of uncertainty requires a computationally intensive sensitivity analysis accounting for various sources of uncertainty. The scientists fitting these models generally do not have the background in computer science required to take advantage of recent advances in web-service based data acquisition, remote high-powered data processing, or scientific workflow systems. Ecologists in the field of modeling are in need of a tractable platform that abstracts the inherent computational complexity required to incorporate the burgeoning field of coupled climate and ecological response modeling. In this paper we describe the computational challenges in species distribution modeling and solutions using scientific workflow systems. We focus on the Software for Assisted Species Modeling (SAHM) a package within VisTrails, an open-source scientific workflow system.

6 citations


01 Jan 2013
TL;DR: A sensitivity analysis of a river system model using new and emergent technologies and the merits of four methodologies are discussed, which were the most complex to configure, but provided very fast runtimes and automated input and output marshalling and cluster job creation and submission.
Abstract: Recent water reforms in Australia and the release of the Murray-Darling Basin Plan have been supported by climate models and detailed hydrological modelling including river system models. Sensitivity analysis of these river system models provides valuable insights into the often complex and non-linear relationships between uncertainty in input variables and parameters and model outputs. An understanding of these relationships is an important component of assessing the risks in the planning process. However, a comprehensive sensitivity analysis is computationally intensive, requiring many thousands of simulations to examine a few parameters and may require months of computer time to complete. In this paper we consider a sensitivity analysis of a river system model using new and emergent technologies and discuss the merits of four methodologies for undertaking this analysis. In each case some new tools and techniques have been developed and these are applicable to sensitivity, uncertainty and error analysis of other simulation models. The Murray-Darling basin is represented by a range of regional river models that are connected together to describe the entire basin. CSIRO recently calibrated regional Source models that, when combined, describe all of the Murray-Darling Basin. The Murrumbidgee regional model was selected from this project and subsequently simplified to reduce the runtime while still being representative of the system's behaviour. As part of a risk assessment, the sensitivity of this model was explored. The sensitivity analysis examined uncertainty in inflows, rainfall, evaporation and groundwater/surface water interaction, via 100,000 simulations and the results can be found in Peeters et. al. (2013), submitted to this conference. The four methodologies considered to support this work are: 1. Running all 100,000 simulations on a single computer; 2. Running the simulations using several dedicated machines; 3. Running the simulations using ad-hoc computing resources; and 4. Multi-core execution, where runs are executed on a cluster. Method 1 was the simplest, but requires the most computer time. Method 2 improved total runtime, but required dedicated computer resources. Method 3 gave reasonable runtimes, did not require dedicated resources, but did require constant monitoring and input. Method 4 was the most complex to configure, but provided very fast runtimes and automated input and output marshalling and cluster job creation and submission. Methods 1 through 3 used Source's command line interface, while for method 4 the Source model was imported as a workflow activity into Project Trident via 'the Hydrologists Workbench'. Project Trident is a scientific workflow system developed by Microsoft Research and the Hydrologists Workbench is a suite of add-on tools for Trident developed by CSIRO's Water for a Healthy Country Flagship. Using Trident and the Hydrologists Workbench for sensitivity analysis allows the modeler to easily leverage available resources without requiring extensive or complex coding.

2 citations


Proceedings ArticleDOI
22 Aug 2013
TL;DR: A method to make recommendations for scientists based on trust, an extended provenance model that captures users' behavioral information during scientific workflow execution, and a prototype system to enhance the scientific workflow system's usability by providing scientific data recommendations.
Abstract: The comparing method plays an important role in scientific research. Scientists often make discoveries by studying differences. Particularly in life science research, the sequence alignment is accomplished by searching for similar structures in reference data files. As the scale of scientific data grows, scientists have to spend much time selecting appropriate data files in experiments, in which trust plays a critical role. This paper presents a method to make recommendations for scientists based on trust. We first propose an extended provenance model that captures users' behavioral information during scientific workflow execution. Such provenance information can be used to compute the user's trust in data and mutual trust degree between users. Then based on predicted trust value, data files can be recommended to users. We also design and implement a prototype system to enhance the scientific workflow system's usability by providing scientific data recommendations. Our experiments show that, the recommended data files do a good job in helping scientists to execute workflow successfully.

2 citations


01 Jan 2013
TL;DR: The introduction and motivation for active service approach is reviewed, the architecture of scientific workflow system is described, and the technologies used in active service are discussed, which uses digital archives technology to enable multi-service organizations achieve scientific workflow goals.
Abstract: This paper describes the architecture of active service model of scientific workflow system. Digital archives technologies are available for all the resources needed to construct services specific group. This paper reviews the introduction and motivation for active service approach, describes the architecture of scientific workflow system, discusses the technologies used in active service , which uses digital archives technology to enable multi-service organizations achieve scientific workflow goals.

Proceedings ArticleDOI
23 Mar 2013
TL;DR: A cloud scientific workflow system-CSWf, which based on Hadoop, NoSQL and Web Service technology, is proposed to make an effective integration of data and service resources in the loosely-coupled cloud service environment.
Abstract: Accompany with the development of information service and rapid expansion of cloud computing, scientific workflow system is facing challenges of growing size of heterogeneous data, complexity of scientific computing and difficulty of task integration. In this paper, a cloud scientific workflow system-CSWf, which based on Hadoop, NoSQL and Web Service technology, is proposed to make an effective integration of data and service resources in the loosely-coupled cloud service environment. To implement CSWf, a simple but effective cloud service workflow modeling language is designed, and then a reliable workflow engine is developed to parse and schedule the workflow processes, also a distributed execution framework is built to encapsulate workflow jobs and execute workflow for CSWf in the cloud environment. CSWf takes benefits of massive data storage capacity and distributed parallel computing power of cloud computing. It accommodates well on the requirement of modeling, scheduling, coordinating and executing workflows on a distributed workflow system. At the end of this paper, an urban regional air pollution workflow with different size input data on the cluster is run, to measure the performance of CSWf. The result shows that CSWf can significantly improve the efficiency of workflow execution.