Showing papers on "Workflow published in 2012"

PDF

Open Access

Journal Article•DOI•

Snakemake--a scalable bioinformatics workflow engine.

[...]

Johannes Köster¹, Sven Rahmann¹•Institutions (1)

01 Oct 2012-Bioinformatics

TL;DR: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow.

...read moreread less

Abstract: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow. It is the first system to support the use of automatically inferred multiple named wildcards (or variables) in input and output filenames.

...read moreread less

1,932 citations

Journal Article•DOI•

Selected reaction monitoring–based proteomics: workflows, potential, pitfalls and future directions

[...]

Paola Picotti¹, Ruedi Aebersold², Ruedi Aebersold¹•Institutions (2)

ETH Zurich¹, University of Zurich²

01 Jun 2012-Nature Methods

TL;DR: How SRM is applied in proteomics is described, recent advances are reviewed, present selected applications and a perspective on the future of this powerful technology is provided.

...read moreread less

Abstract: Selected reaction monitoring (SRM) is a targeted mass spectrometry technique that is emerging in the field of proteomics as a complement to untargeted shotgun methods. SRM is particularly useful when predetermined sets of proteins, such as those constituting cellular networks or sets of candidate biomarkers, need to be measured across multiple samples in a consistent, reproducible and quantitatively precise manner. Here we describe how SRM is applied in proteomics, review recent advances, present selected applications and provide a perspective on the future of this powerful technology.

...read moreread less

1,187 citations

Journal Article•DOI•

The BioGRID interaction database: 2013 update

[...]

Andrew Chatr-aryamontri¹, Bobby-Joe Breitkreutz², Sven Heinicke², Lorrie Boucher², Andrew G. Winter², Chris Stark², Julie Nixon², Lindsay Ramage², Nadine Kolas², Lara O'Donnell², Teresa Reguly², Ashton Breitkreutz², Adnane Sellam², Daici Chen², Christie S. Chang², Jennifer M. Rust², Michael S. Livstone², Rose Oughtred², Kara Dolinski², Mike Tyers² - Show less +16 more•Institutions (2)

Université de Montréal¹, Mount Sinai Hospital, Toronto²

30 Nov 2012-Nucleic Acids Research

TL;DR: The Biological General Repository for Interaction Datasets (BioGRID) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species.

...read moreread less

Abstract: The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin.

...read moreread less

1,000 citations

Posted Content•

The Future of Crowd Work

[...]

Aniket Kittur¹, Jeffrey V. Nickerson², Michael S. Bernstein³, Elizabeth M. Gerber⁴, Aaron Shaw⁵, Aaron Shaw⁶, John Zimmerman¹, Matthew Lease⁷, John Horton⁸ - Show less +5 more•Institutions (8)

Carnegie Mellon University¹, Stevens Institute of Technology², Stanford University³, Northwestern University⁴, University of California, Berkeley⁵, Harvard University⁶, University of Texas at Austin⁷, New York University⁸

18 Dec 2012-Social Science Research Network

TL;DR: In this paper, the authors outline a framework that will enable crowd work that is complex, collaborative, and sustainable, and lay out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.

...read moreread less

Abstract: Paid crowd work offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale. But it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework. Can we foresee a future crowd workplace in which we would want our children to participate? This paper frames the major challenges that stand in the way of this goal. Drawing on theory from organizational behavior and distributed computing, as well as direct feedback from workers, we outline a framework that will enable crowd work that is complex, collaborative, and sustainable. The framework lays out research challenges in twelve major areas: workflow, task assignment, hierarchy, real-time response, synchronous collaboration, quality control, crowds guiding AIs, AIs guiding crowds, platforms, job design, reputation, and motivation.

...read moreread less

803 citations

Journal Article•DOI•

Business process analysis in healthcare environments: A methodology based on process mining

[...]

Álvaro Rebuge, Diogo R. Ferreira¹•Institutions (1)

Technical University of Lisbon¹

01 Apr 2012-Information Systems

TL;DR: This work introduces a methodology for the application of process mining techniques that leads to the identification of regular behavior, process variants, and exceptional medical cases in a case study conducted at a hospital emergency service.

...read moreread less

452 citations

Journal Article•DOI•

Topological analysis and interactive visualization of biological networks and protein structures

[...]

Nadezhda Tsankova Doncheva¹, Yassen Assenov¹, Francisco S. Domingues, Mario Albrecht¹•Institutions (1)

Max Planck Society¹

01 Apr 2012-Nature Protocols

TL;DR: This protocol describes three workflows based on the NetworkAnalyzer and RINalyzer plug-ins for Cytoscape, a popular software platform for networks, to perform a topological analysis of biological networks.

...read moreread less

Abstract: Computational analysis and interactive visualization of biological networks and protein structures are common tasks for gaining insight into biological processes. This protocol describes three workflows based on the NetworkAnalyzer and RINalyzer plug-ins for Cytoscape, a popular software platform for networks. NetworkAnalyzer has become a standard Cytoscape tool for comprehensive network topology analysis. In addition, RINalyzer provides methods for exploring residue interaction networks derived from protein structures. The first workflow uses NetworkAnalyzer to perform a topological analysis of biological networks. The second workflow applies RINalyzer to study protein structure and function and to compute network centrality measures. The third workflow combines NetworkAnalyzer and RINalyzer to compare residue networks. The full protocol can be completed in ∼2 h.

...read moreread less

424 citations

Proceedings Article•DOI•

WorkflowSim: A toolkit for simulating scientific workflows in distributed environments

[...]

Weiwei Chen¹, Ewa Deelman¹•Institutions (1)

University of Southern California¹

08 Oct 2012

TL;DR: WorkflowSim as mentioned in this paper extends the existing CloudSim simulator by providing a higher layer of workflow management, which takes into consideration heterogeneous system overheads and failures, and it is shown that to ignore system overhead and failures in simulating scientific workflows could cause significant inaccuracies in the predicted workflow runtime.

...read moreread less

Abstract: Simulation is one of the most popular evaluation methods in scientific workflow studies. However, existing workflow simulators fail to provide a framework that takes into consideration heterogeneous system overheads and failures. They also lack the support for widely used workflow optimization techniques such as task clustering. In this paper, we introduce WorkflowSim, which extends the existing CloudSim simulator by providing a higher layer of workflow management. We also indicate that to ignore system overheads and failures in simulating scientific workflows could cause significant inaccuracies in the predicted workflow runtime. To further validate its value in promoting other research work, we introduce two promising research areas for which WorkflowSim provides a unique and effective evaluation platform.

...read moreread less

405 citations

Book Chapter•DOI•

The NeOn Methodology for Ontology Engineering

[...]

Mari Carmen Suárez-Figueroa¹, Asunción Gómez-Pérez¹, Mariano Fernández-López²•Institutions (2)

Technical University of Madrid¹, CEU San Pablo University²

01 Jan 2012

TL;DR: The NeOn Methodology suggests a variety of pathways for developing ontologies in commonly occurring situations, for example, when available ontologies need to be re-engineered, aligned, modularized, localized to support different languages and cultures, and integrated with ontology design patterns and non-ontological resources.

...read moreread less

Abstract: In contrast to other approaches that provide methodological guidance for ontology engineering, the NeOn Methodology does not prescribe a rigid workflow, but instead it suggests a variety of pathways for developing ontologies. The nine scenarios proposed in the methodology cover commonly occurring situations, for example, when available ontologies need to be re-engineered, aligned, modularized, localized to support different languages and cultures, and integrated with ontology design patterns and non-ontological resources, such as folksonomies or thesauri. In addition, the NeOn Methodology framework provides (a) a glossary of processes and activities involved in the development of ontologies, (b) two ontology life cycle models, and (c) a set of methodological guidelines for different processes and activities, which are described (a) functionally, in terms of goals, inputs, outputs, and relevant constraints; (b) procedurally, by means of workflow specifications; and (c) empirically, through a set of illustrative examples.

...read moreread less

361 citations

Proceedings Article•DOI•

Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds

[...]

Maciej Malawski¹, Gideon Juve, Ewa Deelman, Jarek Nabrzyski²•Institutions (2)

AGH University of Science and Technology¹, University of Notre Dame²

10 Nov 2012

TL;DR: It is found that the key factor determining the performance of an algorithm is its ability to decide which workflows in an ensemble to admit or reject for execution, and an admission procedure based on workflow structure and estimates of task runtimes can significantly improve the quality of solutions.

...read moreread less

Abstract: Large-scale applications expressed as scientific workflows are often grouped into ensembles of inter-related workflows. In this paper, we address a new and important problem concerning the efficient management of such ensembles under budget and deadline constraints on Infrastructure- as-a-Service (IaaS) clouds. We discuss, develop, and assess algorithms based on static and dynamic strategies for both task scheduling and resource provisioning. We perform the evaluation via simulation using a set of scientific workflow ensembles with a broad range of budget and deadline parameters, taking into account uncertainties in task runtime estimations, provisioning delays, and failures. We find that the key factor determining the performance of an algorithm is its ability to decide which workflows in an ensemble to admit or reject for execution. Our results show that an admission procedure based on workflow structure and estimates of task runtimes can significantly improve the quality of solutions.

...read moreread less

312 citations

Proceedings Article•DOI•

Collaboratively crowdsourcing workflows with turkomatic

[...]

Anand Kulkarni¹, Matthew Can², Björn Hartmann¹•Institutions (2)

University of California, Berkeley¹, Stanford University²

11 Feb 2012

TL;DR: It is argued that Turkomatic's collaborative approach can be more successful than the conventional workflow design process and implications for the design of collaborative crowd planning systems are discussed.

...read moreread less

Abstract: Preparing complex jobs for crowdsourcing marketplaces requires careful attention to workflow design, the process of decomposing jobs into multiple tasks, which are solved by multiple workers. Can the crowd help design such workflows? This paper presents Turkomatic, a tool that recruits crowd workers to aid requesters in planning and solving complex jobs. While workers decompose and solve tasks, requesters can view the status of worker-designed workflows in real time; intervene to change tasks and solutions; and request new solutions to subtasks from the crowd. These features lower the threshold for crowd employers to request complex work. During two evaluations, we found that allowing the crowd to plan without requester supervision is partially successful, but that requester intervention during workflow planning and execution improves quality substantially. We argue that Turkomatic's collaborative approach can be more successful than the conventional workflow design process and discuss implications for the design of collaborative crowd planning systems.

...read moreread less

298 citations

Journal Article•DOI•

Statistical modeling and recognition of surgical workflow.

[...]

Nicolas Padoy¹, Tobias Blum², Seyed-Ahmad Ahmadi², Hubertus Feussner², Marie-Odile Berger³, Nassir Navab² - Show less +2 more•Institutions (3)

Johns Hopkins University¹, Technische Universität München², French Institute for Research in Computer Science and Automation³

01 Apr 2012-Medical Image Analysis

TL;DR: A new representation of interventions in terms of multidimensional time-series formed by synchronized signals acquired over time is proposed, which results in workflow models combining low-level signals with high-level information such as predefined phases, which can be used to detect actions and trigger an event.

...read moreread less

Patent•

System and method for implementing a scalable data storage service

[...]

Swaminathan Sivasubramanian, Stefani Stefano, Chiranjeeb Buragohain, Rande A. Blackman, Timothy Andrew Rath, Raymond S. Bradford, Grant Alexander MacDonald McAlister, Jakub Kulesza, James R. Hamilton, Luis Felipe Cabrera - Show less +6 more

27 Jun 2012

TL;DR: A scalable data storage service may maintain tables in a non-relational data store on behalf of clients as discussed by the authors, where items stored in tables may be partitioned and indexed using a simple or composite primary key.

...read moreread less

Abstract: A system that implements a scalable data storage service may maintain tables in a non-relational data store on behalf of clients. The system may provide a Web services interface through which service requests are received, and an API usable to request that a table be created, deleted, or described; that an item be stored, retrieved, deleted, or its attributes modified; or that a table be queried (or scanned) with filtered items and/or their attributes returned. An asynchronous workflow may be invoked to create or delete a table. Items stored in tables may be partitioned and indexed using a simple or composite primary key. The system may not impose pre-defined limits on table size, and may employ a flexible schema. The service may provide a best-effort or committed throughput model. The system may automatically scale and/or re-partition tables in response to detecting workload changes, node failures, or other conditions or anomalies.

...read moreread less

Patent•

System to generate a deployment plan for a cloud infrastructure according to logical, multi-tier application blueprint

[...]

Komal Nitin Mangtani¹, Sesh Jalagam¹, Vishwas Nagaraja¹, David Winterfeldt¹•Institutions (1)

VMware¹

02 Mar 2012

TL;DR: In this article, a deployment system enables a developer to generate a deployment plan according to a logical, multi-tier application blueprint defined by application architects, which includes tasks to be executed for deploying application components on virtual computing resource provided in a cloud infrastructure.

...read moreread less

Abstract: A deployment system enables a developer to generate a deployment plan according to a logical, multi-tier application blueprint defined by application architects. The deployment plan includes tasks to be executed for deploying application components on virtual computing resource provided in a cloud infrastructure. The deployment plan includes time dependencies that determine an execution order of the tasks according to dependencies between application components specified in the application blueprint. The deployment plan enables system administrators to view the application blueprint as an ordered workflow view that facilitates collaboration between system administrators and application architects.

...read moreread less

Journal Article•DOI•

DataSpaces: an interaction and coordination framework for coupled simulation workflows

[...]

Ciprian Docan¹, Manish Parashar¹, Scott Klasky²•Institutions (2)

Rutgers University¹, Oak Ridge National Laboratory²

01 Jun 2012-Cluster Computing

TL;DR: DataSpaces essentially implements a semantically specialized virtual shared space abstraction that can be associatively accessed by all components and services in the application workflow and enables live data to be extracted from running simulation components, indexes this data online, and then allows it to be monitored, queried and accessed by other components and Services via the space using semantically meaningful operators.

...read moreread less

Abstract: Emerging high-performance distributed computing environments are enabling new end-to-end formulations in science and engineering that involve multiple interacting processes and data-intensive application workflows. For example, current fusion simulation efforts are exploring coupled models and codes that simultaneously simulate separate application processes, such as the core and the edge turbulence. These components run on different high performance computing resources, need to interact at runtime with each other and with services for data monitoring, data analysis and visualization, and data archiving. As a result, they require efficient and scalable support for dynamic and flexible couplings and interactions, which remains a challenge. This paper presents DataSpaces a flexible interaction and coordination substrate that addresses this challenge. DataSpaces essentially implements a semantically specialized virtual shared space abstraction that can be associatively accessed by all components and services in the application workflow. It enables live data to be extracted from running simulation components, indexes this data online, and then allows it to be monitored, queried and accessed by other components and services via the space using semantically meaningful operators. The underlying data transport is asynchronous, low-overhead and largely memory-to-memory. The design, implementation, and experimental evaluation of DataSpaces using a coupled fusion simulation workflow is presented.

...read moreread less

Journal Article•DOI•

A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs

[...]

Jochen De Weerdt¹, Manu De Backer¹, Jan Vanthienen¹, Bart Baesens¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Nov 2012-Information Systems

TL;DR: The results of this study indicate that the HeuristicsMiner algorithm is especially suited in a real-life setting, and it is shown that, particularly for highly complex event logs, knowledge discovery from such data sets can become a major problem for traditional process discovery techniques.

...read moreread less

Journal Article•DOI•

Cost-Driven Scheduling of Grid Workflows Using Partial Critical Paths

[...]

Saeid Abrishami¹, Mahmoud Naghibzadeh¹, Dick Epema²•Institutions (2)

Ferdowsi University of Mashhad¹, Delft University of Technology²

01 Aug 2012-IEEE Transactions on Parallel and Distributed Systems

TL;DR: This paper proposes a new QoS-based workflow scheduling algorithm based on a novel concept called Partial Critical Paths (PCP), that tries to minimize the cost of workflow execution while meeting a user-defined deadline.

...read moreread less

Abstract: Recently, utility Grids have emerged as a new model of service provisioning in heterogeneous distributed systems. In this model, users negotiate with service providers on their required Quality of Service and on the corresponding price to reach a Service Level Agreement. One of the most challenging problems in utility Grids is workflow scheduling, i.e., the problem of satisfying the QoS of the users as well as minimizing the cost of workflow execution. In this paper, we propose a new QoS-based workflow scheduling algorithm based on a novel concept called Partial Critical Paths (PCP), that tries to minimize the cost of workflow execution while meeting a user-defined deadline. The PCP algorithm has two phases: in the deadline distribution phase it recursively assigns subdeadlines to the tasks on the partial critical paths ending at previously assigned tasks, and in the planning phase it assigns the cheapest service to each task while meeting its subdeadline. The simulation results show that the performance of the PCP algorithm is very promising.

...read moreread less

Book Chapter•DOI•

Business Process Management Architectures

[...]

Mathias Weske¹•Institutions (1)

University of Potsdam¹

01 Jan 2012

TL;DR: An architecture that allows to dynamically modify running workflow instances based on an object-oriented approach is introduced and case handling is introduced as a technique for flexible process enactment based on data dependencies rather than process structures.

...read moreread less

Abstract: BPM architectures are in the centre of Chapter 7, starting from the WfMC Architecture and proceeding towards service oriented architectures and architectures for flexible workflow management. In particular, an architecture that allows to dynamically modify running workflow instances based on an object-oriented approach is introduced. Web services and their composition are sketched, describing the core concepts of the XML-based service composition language WS-BPEL. Advanced service composition based on semantic concepts are sketched, and case handling is introduced as a technique for flexible process enactment based on data dependencies rather than process structures.

...read moreread less

Journal Article•DOI•

Bpipe : A Tool for Running and Managing Bioinformatics Pipelines

[...]

Simon Sadedin¹, Bernard J. Pope², Alicia Oshlack²•Institutions (2)

Royal Children's Hospital¹, Victorian Life Sciences Computation Initiative²

01 Jun 2012-Bioinformatics

TL;DR: Bpipe is a simple, dedicated programming language for defining and executing bioinformatics pipelines that is fully self-contained and cross-platform, making it very easy to adopt and deploy into existing environments.

...read moreread less

Abstract: Summary Bpipe is a simple, dedicated programming language for defining and executing bioinformatics pipelines. It specializes in enabling users to turn existing pipelines based on shell scripts or command line tools into highly flexible, adaptable and maintainable workflows with a minimum of effort. Bpipe ensures that pipelines execute in a controlled and repeatable fashion and keeps audit trails and logs to ensure that experimental results are reproducible. Requiring only Java as a dependency, Bpipe is fully self-contained and cross-platform, making it very easy to adopt and deploy into existing environments. Availability and implementation Bpipe is freely available from http://bpipe.org under a BSD License.

...read moreread less

Journal Article•DOI•

Scientific workflow systems: Pipeline Pilot and KNIME.

[...]

Wendy A. Warr

27 May 2012-Journal of Computer-aided Molecular Design

TL;DR: This short article will concentrate only on cheminformatics applications and the workflow tools most commonly used in chemin formatics, namely Pipeline Pilot and KNIME.

...read moreread less

Abstract: There are many examples of scientific workflow systems [1, 2]; in this short article I will concentrate only on cheminformatics applications and the workflow tools most commonly used in cheminformatics, namely Pipeline Pilot [3] and KNIME [4]. Workflow solutions have been used for years in bioinformatics and other sciences, and some also have applications in so-called “business intelligence” and “predictive analytics”. Readers can find details of Discovery Net, Galaxy, Kepler, Triana, SOMA, SMILA, VisTrails, and others on the Web. Kappler has compared Competitive Workflow, Taverna and Pipeline Pilot [5]. Taverna has been widely used in bioinformatics but is also used with the Chemistry Development Kit (CDK) [6, 7]. CDK-Taverna workflows are made freely available at myExperiment.org [8]. (myExperiment.org also includes KNIME workflows.) DiscoveryNet was one of the earliest examples of a scientific workflow system; its concepts were later commercialized in InforSense Knowledge Discovery Environment (KDE). My 2007 review [1] centered on Pipeline Pilot and InforSense KDE; KNIME was then a relative newcomer. In 2009 the loss-making InforSense organization was acquired by IDBS and KDE has made progress in translational medicine [9]. InforSense’s ChemSense [10] used ChemAxon’s JChem Cartridge, and ChemAxon chemical structure, property prediction, and enumeration tools. ChemSense’s three major pharmaceutical customers have turned to other solutions. The InforSense Suite lives on but it not seen as a “personal productivity tool”; rather it is integrated into the IDBS ELN platform. KNIME and Pipeline Pilot are now the market leaders in personal productivity in cheminformatics.

...read moreread less

Proceedings Article•DOI•

Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids

[...]

Michael Albrecht¹, Patrick J. Donnelly¹, Peter Bui², Douglas Thain¹•Institutions (2)

University of Notre Dame¹, University of Wisconsin–Eau Claire²

20 May 2012

TL;DR: This work introduces Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description, and introduces Workbench, a suite of benchmarks designed for analyzing common workflow patterns.

...read moreread less

Abstract: In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.

...read moreread less

Journal Article•DOI•

The SMART Platform: early experience enabling substitutable applications for electronic health records.

[...]

Kenneth D. Mandl¹, Kenneth D. Mandl², Joshua C. Mandel¹, Joshua C. Mandel², Shawn N. Murphy³, Shawn N. Murphy², Elmer V. Bernstam⁴, Rachel L. Ramoni², Rachel L. Ramoni¹, David A. Kreda², J. Michael McCoy⁵, Ben Adida, Isaac S. Kohane¹, Isaac S. Kohane² - Show less +10 more•Institutions (5)

Boston Children's Hospital¹, Harvard University², Partners HealthCare³, University of Texas Health Science Center at Houston⁴, University of Southern California⁵

01 Jul 2012-Journal of the American Medical Informatics Association

TL;DR: The Substitutable Medical Applications, Reusable Technologies (SMART) Platforms project as mentioned in this paper aims to develop a health information technology platform with substitutable applications (apps) constructed around core services.

...read moreread less

Journal Article•DOI•

Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support.

[...]

Mohamed Abouelhoda¹, Mohamed Abouelhoda², Shadi Alaa Issa¹, Moustafa Ghanem³, Moustafa Ghanem¹ - Show less +1 more•Institutions (3)

Nile University¹, Cairo University², Imperial College London³

04 May 2012-BMC Bioinformatics

TL;DR: Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation and enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows.

...read moreread less

Abstract: Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts. In this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure. Tavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis. The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org .

...read moreread less

Journal Article•DOI•

Scheduling in hybrid clouds

[...]

Luiz F. Bittencourt¹, Edmundo R. M. Madeira¹, N.L.S. da Fonseca¹•Institutions (1)

State University of Campinas¹

10 Sep 2012-IEEE Communications Magazine

TL;DR: The scheduling problem in hybrid clouds is introduced, presenting the main characteristics to be considered when scheduling workflows, as well as a brief survey of some of the scheduling algorithms used in these systems.

...read moreread less

Abstract: Schedulers for cloud computing determine on which processing resource jobs of a workflow should be allocated. In hybrid clouds, jobs can be allocated on either a private cloud or a public cloud on a pay per use basis. The capacity of the communication channels connecting these two types of resources impacts the makespan and the cost of workflow execution. This article introduces the scheduling problem in hybrid clouds presenting the main characteristics to be considered when scheduling workflows, as well as a brief survey of some of the scheduling algorithms used in these systems. To assess the influence of communication channels on job allocation, we compare and evaluate the impact of the available bandwidth on the performance of some of the scheduling algorithms.

...read moreread less

Journal Article•DOI•

Health information exchange technology on the front lines of healthcare: workflow factors and patterns of use.

[...]

Kim M. Unertl¹, Kevin B. Johnson¹, Nancy M. Lorenzi¹•Institutions (1)

Vanderbilt University¹

01 May 2012-Journal of the American Medical Informatics Association

TL;DR: Understanding end users' perspectives towards HIE technology is crucial to the long-term success of HIE, and user and role-specific customization to accommodate differences in workflow and information needs may increase the adoption and use of Hie.

...read moreread less

Journal Article•DOI•

ReStore: reusing results of MapReduce jobs

[...]

Iman Elghandour¹, Ashraf Aboulnaga¹•Institutions (1)

University of Waterloo¹

01 Feb 2012

TL;DR: ReStore is a system that manages the storage and reuse of intermediate results of whole MapReduce jobs that are part of a workflow, and it can also create additional reuse opportunities by materializing and storing the output of query execution operators that are executed within a Map Reduce job.

...read moreread less

Abstract: Analyzing large scale data has emerged as an important activity for many organizations in the past few years. This large scale data analysis is facilitated by the MapReduce programming and execution model and its implementations, most notably Hadoop. Users of MapReduce often have analysis tasks that are too complex to express as individual MapReduce jobs. Instead, they use high-level query languages such as Pig, Hive, or Jaql to express their complex tasks. The compilers of these languages translate queries into workflows of MapReduce jobs. Each job in these workflows reads its input from the distributed file system used by the MapReduce system and produces output that is stored in this distributed file system and read as input by the next job in the workflow. The current practice is to delete these intermediate results from the distributed file system at the end of executing the workflow. One way to improve the performance of workflows of MapReduce jobs is to keep these intermediate results and reuse them for future workflows submitted to the system. In this paper, we present ReStore, a system that manages the storage and reuse of such intermediate results. ReStore can reuse the output of whole MapReduce jobs that are part of a workflow, and it can also create additional reuse opportunities by materializing and storing the output of query execution operators that are executed within a MapReduce job. We have implemented ReStore as an extension to the Pig dataflow system on top of Hadoop, and we experimentally demonstrate significant speedups on queries from the PigMix benchmark.

...read moreread less

Journal Article•DOI•

Yabi: An online research environment for grid, high performance and cloud computing

[...]

Adam Hunter, A. Macgregor, T. Szabo, C. Wellington, Matthew I. Bellgard - Show less +1 more

15 Feb 2012

TL;DR: The Yabi system encapsulates considered design of both execution and data models, while abstracting technical details away from users who are not skilled in HPC and providing an intuitive drag-and-drop scalable web-based workflow environment where the same tools can also be accessed via a command line.

...read moreread less

Abstract: Background There is a significant demand for creating pipelines or workflows in the life science discipline that chain a number of discrete compute and data intensive analysis tasks into sophisticated analysis procedures. This need has led to the development of general as well as domain-specific workflow environments that are either complex desktop applications or Internet-based applications. Complexities can arise when configuring these applications in heterogeneous compute and storage environments if the execution and data access models are not designed appropriately. These complexities manifest themselves through limited access to available HPC resources, significant overhead required to configure tools and inability for users to simply manage files across heterogenous HPC storage infrastructure.

...read moreread less

Proceedings Article•DOI•

Why workflows break — Understanding and combating decay in Taverna workflows

[...]

Jun Zhao¹, Jose Manuel Gomez-Perez, Khalid Belhajjame², Graham Klyne¹, Esteban García-Cuesta, Aleix Garrido, Kristina Hettne³, Marco Roos³, David De Roure¹, Carole Goble² - Show less +6 more•Institutions (3)

University of Oxford¹, University of Manchester², Leiden University³

08 Oct 2012

TL;DR: A minimal set of auxiliary resources to be preserved together with the workflows as an aggregation object and provide a software tool for end-users to create such aggregations and to assess their completeness.

...read moreread less

Abstract: Workflows provide a popular means for preserving scientific methods by explicitly encoding their process. However, some of them are subject to a decay in their ability to be re-executed or reproduce the same results over time, largely due to the volatility of the resources required for workflow executions. This paper provides an analysis of the root causes of workflow decay based on an empirical study of a collection of Taverna workflows from the myExperiment repository. Although our analysis was based on a specific type of workflow, the outcomes and methodology should be applicable to workflows from other systems, at least those whose executions also rely largely on accessing third-party resources. Based on our understanding about decay we recommend a minimal set of auxiliary resources to be preserved together with the workflows as an aggregation object and provide a software tool for end-users to create such aggregations and to assess their completeness.

...read moreread less

Posted Content•

ReStore: Reusing Results of MapReduce Jobs

[...]

Iman Elghandour¹, Ashraf Aboulnaga¹•Institutions (1)

University of Waterloo¹

01 Mar 2012-arXiv: Databases

TL;DR: ReStore as discussed by the authors is an extension to the Pig dataflow system on top of Hadoop, and it can also create additional reuse opportunities by materializing and storing the output of query execution operators that are executed within a MapReduce job.

...read moreread less

Journal Article•DOI•

Using Propositional Logic for Requirements Verification of Service Workflow

[...]

Li Da Xu, Wattana Viriyasitavat¹, Puripant Ruchikachorn¹, Andrew P. Martin²•Institutions (2)

Chulalongkorn University¹, University of Oxford²

14 Feb 2012-IEEE Transactions on Industrial Informatics

TL;DR: It is demonstrated that logic-based workflow verification can be applied to SWSpec which is capable of checking compliance and also detecting conflicts of the imposed requirements and will support scalable services interoperation in the form of workflows in opened environments.

...read moreread less

Abstract: This paper presents a requirement-oriented automated framework for formal verification of service workflows. It is based on our previous work describing the requirement-oriented service workflow specification language called SWSpec. This language has been developed to facilitate workflow composer as well as arbitrary services willing to participate in a workflow to formally and uniformly impose their own requirements. As such, SWSpec provides a formal way to regulate and control workflows. The key component of the to-be-proposed framework centers on verification algorithms that rely on propositional logic. We demonstrate that logic-based workflow verification can be applied to SWSpec which is capable of checking compliance and also detecting conflicts of the imposed requirements. By automating compliance checking process, this framework will support scalable services interoperation in the form of workflows in opened environments.

...read moreread less

Journal Article•DOI•

SWSpec: The Requirements Specification Language in Service Workflow Environments

[...]

Wattana Viriyasitavat¹, Li Da Xu, Andrew P. Martin¹•Institutions (1)

University of Oxford¹

02 Jan 2012-IEEE Transactions on Industrial Informatics

TL;DR: A Service Workflow Specification language is proposed, called SWSpec, which allows arbitrary services in a workflow to formally and uniformly impose their requirements, and will provide a formal way to regulate and control workflows as well as enrich the proliferation of service provisions and consumptions in opened environments.

...read moreread less

Abstract: Advanced technologies have changed the nature of business processes in the form of services. In coordinating services to achieve a particular objective, service workflow is used to control service composition, execution sequences as well as path selection. Since existing mechanisms are insufficient for addressing the diversity and dynamicity of the requirements in a large-scale distributed environment, developing formal requirements specification is necessary. In this paper, we propose a Service Workflow Specification language, called SWSpec, which allows arbitrary services in a workflow to formally and uniformly impose their requirements. As such, the solution will provide a formal way to regulate and control workflows as well as enrich the proliferation of service provisions and consumptions in opened environments.

...read moreread less

Collapse